2023-03-02 22:22:15

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v2 00/10] Better fixes for grouping of events

The rules for grouping events has grown more complex. Topdown events
must be grouped, but flags like --metric-no-group and flags on metrics
don't respect this. Uncore events may be expanded using wild cards for
PMU names, but then the events need reordering so the group members
are adjacent. Rather than fixing metrics, this change fixes the main
event parsing code to first sort and then regroup evsels.

As this is shared functionality changes to it should cause
concern. The change is done with the intent of simplifying and making
more robust the grouping logic, examples are given. If additional
changes are necessary, they are most likely necessary to the
evsel__pmu_name logic as the code avoids breaking groups that are on
the same PMU. The pmu_name is tweaked in the case of software and aux
events, that use groups in a slightly different manner to conventional
events.

The code was manually tested as well as passing perf test on a Intel
tigerlake CPU with intel-pt.

v2. Fix up the commit message on 4/10 (thanks Arnaldo). Drop
unnecessary v1 5/10 (thanks Kan). evlist->core.nr_groups wasn't
being correctly maintained after the sort/regrouping and so the
new patch 10/10 removes that variable and computes it from the
evlist when necessary, generally just tests.

Ian Rogers (10):
libperf evlist: Avoid a use of evsel idx
perf stat: Don't remove all grouped events when CPU maps disagree
perf record: Early auxtrace initialization before event parsing
perf stat: Modify the group test
perf evsel: Allow const evsel for certain accesses
perf evsel: Add function to compute pmu_name
perf parse-events: Pass ownership of the group name
perf parse-events: Sort and group parsed events
perf evsel: Remove use_uncore_alias
perf evlist: Remove nr_groups

tools/lib/perf/evlist.c | 31 ++-
tools/lib/perf/include/internal/evlist.h | 1 -
tools/lib/perf/include/perf/evlist.h | 1 +
tools/perf/arch/x86/util/auxtrace.c | 17 +-
tools/perf/arch/x86/util/evlist.c | 39 ++--
tools/perf/builtin-record.c | 8 +-
tools/perf/builtin-report.c | 2 +-
tools/perf/builtin-stat.c | 24 ++-
tools/perf/tests/bpf.c | 1 -
tools/perf/tests/parse-events.c | 22 +-
tools/perf/tests/pfm.c | 12 +-
tools/perf/util/auxtrace.h | 2 +
tools/perf/util/evlist.c | 2 +-
tools/perf/util/evlist.h | 8 +-
tools/perf/util/evsel.c | 27 ++-
tools/perf/util/evsel.h | 8 +-
tools/perf/util/header.c | 3 +-
tools/perf/util/parse-events.c | 254 +++++++++++------------
tools/perf/util/parse-events.h | 7 +-
tools/perf/util/parse-events.y | 27 +--
tools/perf/util/pfm.c | 1 -
tools/perf/util/pmu.c | 6 +-
tools/perf/util/python.c | 2 +-
tools/perf/util/stat-shadow.c | 2 +-
24 files changed, 270 insertions(+), 237 deletions(-)

--
2.40.0.rc0.216.gc4246ad0f0-goog



2023-03-02 22:25:52

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v2 02/10] perf stat: Don't remove all grouped events when CPU maps disagree

If the events in an evlist's CPU map differ then the entire group is
removed. For example:

```
$ perf stat -e '{imc_free_running/data_read/,imc_free_running/data_write/,cs}' -a sleep 1
WARNING: grouped events cpus do not match, disabling group:
anon group { imc_free_running/data_read/, imc_free_running/data_write/, cs }
```

Change the behavior so that just the events not matching the leader
are removed. So in the example above, just 'cs' will be removed.

Modify the warning so that it is produced once for each group, rather
than once for the entire evlist. Shrink the scope and size of the
warning text buffer.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-stat.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index d70b1ec88594..5c12ae5efce5 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -181,14 +181,13 @@ static bool cpus_map_matched(struct evsel *a, struct evsel *b)

static void evlist__check_cpu_maps(struct evlist *evlist)
{
- struct evsel *evsel, *pos, *leader;
- char buf[1024];
+ struct evsel *evsel, *warned_leader = NULL;

if (evlist__has_hybrid(evlist))
evlist__warn_hybrid_group(evlist);

evlist__for_each_entry(evlist, evsel) {
- leader = evsel__leader(evsel);
+ struct evsel *leader = evsel__leader(evsel);

/* Check that leader matches cpus with each member. */
if (leader == evsel)
@@ -197,19 +196,26 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
continue;

/* If there's mismatch disable the group and warn user. */
- WARN_ONCE(1, "WARNING: grouped events cpus do not match, disabling group:\n");
- evsel__group_desc(leader, buf, sizeof(buf));
- pr_warning(" %s\n", buf);
-
+ if (warned_leader != leader) {
+ char buf[200];
+
+ pr_warning("WARNING: grouped events cpus do not match.\n"
+ "Events with CPUs not matching the leader will "
+ "be removed from the group.\n");
+ evsel__group_desc(leader, buf, sizeof(buf));
+ pr_warning(" %s\n", buf);
+ warned_leader = leader;
+ }
if (verbose > 0) {
+ char buf[200];
+
cpu_map__snprint(leader->core.cpus, buf, sizeof(buf));
pr_warning(" %s: %s\n", leader->name, buf);
cpu_map__snprint(evsel->core.cpus, buf, sizeof(buf));
pr_warning(" %s: %s\n", evsel->name, buf);
}

- for_each_group_evsel(pos, leader)
- evsel__remove_from_group(pos, leader);
+ evsel__remove_from_group(evsel, leader);
}
}

--
2.40.0.rc0.216.gc4246ad0f0-goog


2023-03-02 22:30:05

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v2 08/10] perf parse-events: Sort and group parsed events

This change is intended to be a no-op for most current cases, the
default sort order is the order the events were parsed. Where it
varies is in how groups are handled. Previously an uncore and core
event that are grouped would most often cause the group to be removed:

```
$ perf stat -e '{instructions,uncore_imc_free_running_0/data_total/}' -a sleep 1
WARNING: grouped events cpus do not match, disabling group:
anon group { instructions, uncore_imc_free_running_0/data_total/ }
...
```

However, when wildcards are used the events should be re-sorted and
re-grouped in parse_events__set_leader, but this currently fails for
simple examples:

```
$ perf stat -e '{uncore_imc_free_running/data_read/,uncore_imc_free_running/data_write/}' -a sleep 1

Performance counter stats for 'system wide':

<not counted> MiB uncore_imc_free_running/data_read/
<not counted> MiB uncore_imc_free_running/data_write/

1.000996992 seconds time elapsed
```

A futher failure mode, fixed in this patch, is to force topdown events
into a group.

This change moves sorting the evsels in the evlist after parsing. It
requires parsing to set up groups. First the evsels are sorted
respecting the existing groupings and parse order, but also reordering
to ensure evsels of the same PMU and group appear together. So that
software and aux events respect groups, their pmu_name is taken from
the group leader. The sorting is done with list_sort removing a memory
allocation.

After sorting a pass is done to correct the group leaders and for
topdown events ensuring they have a group leader.

This fixes the problems seen before:

```
$ perf stat -e '{uncore_imc_free_running/data_read/,uncore_imc_free_running/data_write/}' -a sleep 1

Performance counter stats for 'system wide':

727.42 MiB uncore_imc_free_running/data_read/
81.84 MiB uncore_imc_free_running/data_write/

1.000948615 seconds time elapsed
```

As well as making groups not fail for cases like:

```
$ perf stat -e '{imc_free_running_0/data_total/,imc_free_running_1/data_total/}' -a sleep 1

Performance counter stats for 'system wide':

256.47 MiB imc_free_running_0/data_total/
256.48 MiB imc_free_running_1/data_total/

1.001165442 seconds time elapsed
```

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/arch/x86/util/evlist.c | 39 ++---
tools/perf/util/evlist.h | 2 +-
tools/perf/util/parse-events.c | 240 +++++++++++++++---------------
tools/perf/util/parse-events.h | 3 +-
tools/perf/util/parse-events.y | 4 +-
5 files changed, 136 insertions(+), 152 deletions(-)

diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index 8a7ae4162563..d4193479a364 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -65,29 +65,22 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
}

-struct evsel *arch_evlist__leader(struct list_head *list)
+int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
{
- struct evsel *evsel, *first, *slots = NULL;
- bool has_topdown = false;
-
- first = list_first_entry(list, struct evsel, core.node);
-
- if (!topdown_sys_has_perf_metrics())
- return first;
-
- /* If there is a slots event and a topdown event then the slots event comes first. */
- __evlist__for_each_entry(list, evsel) {
- if (evsel->pmu_name && !strncmp(evsel->pmu_name, "cpu", 3) && evsel->name) {
- if (strcasestr(evsel->name, "slots")) {
- slots = evsel;
- if (slots == first)
- return first;
- }
- if (strcasestr(evsel->name, "topdown"))
- has_topdown = true;
- if (slots && has_topdown)
- return slots;
- }
+ if (topdown_sys_has_perf_metrics() &&
+ (!lhs->pmu_name || !strncmp(lhs->pmu_name, "cpu", 3))) {
+ /* Ensure the topdown slots comes first. */
+ if (strcasestr(lhs->name, "slots"))
+ return -1;
+ if (strcasestr(rhs->name, "slots"))
+ return 1;
+ /* Followed by topdown events. */
+ if (strcasestr(lhs->name, "topdown") && !strcasestr(rhs->name, "topdown"))
+ return -1;
+ if (!strcasestr(lhs->name, "topdown") && strcasestr(rhs->name, "topdown"))
+ return 1;
}
- return first;
+
+ /* Default ordering by insertion index. */
+ return lhs->core.idx - rhs->core.idx;
}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 01fa9d592c5a..d89d8f92802b 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -119,7 +119,7 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
#define evlist__add_default_attrs(evlist, array) \
arch_evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))

-struct evsel *arch_evlist__leader(struct list_head *list);
+int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs);

int evlist__add_dummy(struct evlist *evlist);
struct evsel *evlist__add_aux_dummy(struct evlist *evlist, bool system_wide);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 1be454697d57..d6eb0a54dd2d 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/hw_breakpoint.h>
#include <linux/err.h>
+#include <linux/list_sort.h>
#include <linux/zalloc.h>
#include <dirent.h>
#include <errno.h>
@@ -1655,125 +1656,7 @@ int parse_events__modifier_group(struct list_head *list,
return parse_events__modifier_event(list, event_mod, true);
}

-/*
- * Check if the two uncore PMUs are from the same uncore block
- * The format of the uncore PMU name is uncore_#blockname_#pmuidx
- */
-static bool is_same_uncore_block(const char *pmu_name_a, const char *pmu_name_b)
-{
- char *end_a, *end_b;
-
- end_a = strrchr(pmu_name_a, '_');
- end_b = strrchr(pmu_name_b, '_');
-
- if (!end_a || !end_b)
- return false;
-
- if ((end_a - pmu_name_a) != (end_b - pmu_name_b))
- return false;
-
- return (strncmp(pmu_name_a, pmu_name_b, end_a - pmu_name_a) == 0);
-}
-
-static int
-parse_events__set_leader_for_uncore_aliase(char *name, struct list_head *list,
- struct parse_events_state *parse_state)
-{
- struct evsel *evsel, *leader;
- uintptr_t *leaders;
- bool is_leader = true;
- int i, nr_pmu = 0, total_members, ret = 0;
-
- leader = list_first_entry(list, struct evsel, core.node);
- evsel = list_last_entry(list, struct evsel, core.node);
- total_members = evsel->core.idx - leader->core.idx + 1;
-
- leaders = calloc(total_members, sizeof(uintptr_t));
- if (WARN_ON(!leaders))
- return 0;
-
- /*
- * Going through the whole group and doing sanity check.
- * All members must use alias, and be from the same uncore block.
- * Also, storing the leader events in an array.
- */
- __evlist__for_each_entry(list, evsel) {
-
- /* Only split the uncore group which members use alias */
- if (!evsel->use_uncore_alias)
- goto out;
-
- /* The events must be from the same uncore block */
- if (!is_same_uncore_block(leader->pmu_name, evsel->pmu_name))
- goto out;
-
- if (!is_leader)
- continue;
- /*
- * If the event's PMU name starts to repeat, it must be a new
- * event. That can be used to distinguish the leader from
- * other members, even they have the same event name.
- */
- if ((leader != evsel) &&
- !strcmp(leader->pmu_name, evsel->pmu_name)) {
- is_leader = false;
- continue;
- }
-
- /* Store the leader event for each PMU */
- leaders[nr_pmu++] = (uintptr_t) evsel;
- }
-
- /* only one event alias */
- if (nr_pmu == total_members) {
- parse_state->nr_groups--;
- goto handled;
- }
-
- /*
- * An uncore event alias is a joint name which means the same event
- * runs on all PMUs of a block.
- * Perf doesn't support mixed events from different PMUs in the same
- * group. The big group has to be split into multiple small groups
- * which only include the events from the same PMU.
- *
- * Here the uncore event aliases must be from the same uncore block.
- * The number of PMUs must be same for each alias. The number of new
- * small groups equals to the number of PMUs.
- * Setting the leader event for corresponding members in each group.
- */
- i = 0;
- __evlist__for_each_entry(list, evsel) {
- if (i >= nr_pmu)
- i = 0;
- evsel__set_leader(evsel, (struct evsel *) leaders[i++]);
- }
-
- /* The number of members and group name are same for each group */
- for (i = 0; i < nr_pmu; i++) {
- evsel = (struct evsel *) leaders[i];
- evsel->core.nr_members = total_members / nr_pmu;
- evsel->group_name = name ? strdup(name) : NULL;
- }
-
- /* Take the new small groups into account */
- parse_state->nr_groups += nr_pmu - 1;
-
-handled:
- ret = 1;
- free(name);
-out:
- free(leaders);
- return ret;
-}
-
-__weak struct evsel *arch_evlist__leader(struct list_head *list)
-{
- return list_first_entry(list, struct evsel, core.node);
-}
-
-void parse_events__set_leader(char *name, struct list_head *list,
- struct parse_events_state *parse_state)
+void parse_events__set_leader(char *name, struct list_head *list)
{
struct evsel *leader;

@@ -1782,13 +1665,9 @@ void parse_events__set_leader(char *name, struct list_head *list,
return;
}

- if (parse_events__set_leader_for_uncore_aliase(name, list, parse_state))
- return;
-
- leader = arch_evlist__leader(list);
+ leader = list_first_entry(list, struct evsel, core.node);
__perf_evlist__set_leader(list, &leader->core);
leader->group_name = name;
- list_move(&leader->core.node, list);
}

/* list_event is assumed to point to malloc'ed memory */
@@ -2245,6 +2124,117 @@ static int parse_events__with_hybrid_pmu(struct parse_events_state *parse_state,
return ret;
}

+__weak int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
+{
+ /* Order by insertion index. */
+ return lhs->core.idx - rhs->core.idx;
+}
+
+static int evlist__cmp(void *state, const struct list_head *l, const struct list_head *r)
+{
+ const struct perf_evsel *lhs_core = container_of(l, struct perf_evsel, node);
+ const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
+ const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
+ const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
+ int *leader_idx = state;
+ int lhs_leader_idx = *leader_idx, rhs_leader_idx = *leader_idx, ret;
+ const char *lhs_pmu_name, *rhs_pmu_name;
+
+ /*
+ * First sort by grouping/leader. Read the leader idx only if the evsel
+ * is part of a group, as -1 indicates no group.
+ */
+ if (lhs_core->leader != lhs_core || lhs_core->nr_members > 1)
+ lhs_leader_idx = lhs_core->leader->idx;
+ if (rhs_core->leader != rhs_core || rhs_core->nr_members > 1)
+ rhs_leader_idx = rhs_core->leader->idx;
+
+ if (lhs_leader_idx != rhs_leader_idx)
+ return lhs_leader_idx - rhs_leader_idx;
+
+ /* Group by PMU. Groups can't span PMUs. */
+ lhs_pmu_name = evsel__pmu_name(lhs);
+ rhs_pmu_name = evsel__pmu_name(rhs);
+ ret = strcmp(lhs_pmu_name, rhs_pmu_name);
+ if (ret)
+ return ret;
+
+ /* Architecture specific sorting. */
+ return arch_evlist__cmp(lhs, rhs);
+}
+
+static void parse_events__sort_events_and_fix_groups(struct list_head *list)
+{
+ int idx = -1;
+ struct evsel *pos, *cur_leader = NULL;
+ struct perf_evsel *cur_leaders_grp = NULL;
+
+ /*
+ * Compute index to insert ungrouped events at. Place them where the
+ * first ungrouped event appears.
+ */
+ list_for_each_entry(pos, list, core.node) {
+ const struct evsel *pos_leader = evsel__leader(pos);
+
+ if (pos != pos_leader || pos->core.nr_members > 1)
+ continue;
+
+ idx = pos->core.idx;
+ break;
+ }
+
+ /* Sort events. */
+ list_sort(&idx, list, evlist__cmp);
+
+ /*
+ * Recompute groups, splitting for PMUs and adding groups for events
+ * that require them.
+ */
+ idx = 0;
+ list_for_each_entry(pos, list, core.node) {
+ const struct evsel *pos_leader = evsel__leader(pos);
+ const char *pos_pmu_name = evsel__pmu_name(pos);
+ const char *cur_leader_pmu_name, *pos_leader_pmu_name;
+ bool force_grouped = arch_evsel__must_be_in_group(pos);
+
+ /* Reset index and nr_members. */
+ pos->core.idx = idx++;
+ pos->core.nr_members = 0;
+
+ /*
+ * Set the group leader respecting the given groupings and that
+ * groups can't span PMUs.
+ */
+ if (!cur_leader)
+ cur_leader = pos;
+
+ cur_leader_pmu_name = evsel__pmu_name(cur_leader);
+ if ((cur_leaders_grp != pos->core.leader && !force_grouped) ||
+ strcmp(cur_leader_pmu_name, pos_pmu_name)) {
+ /* Event is for a different group/PMU than last. */
+ cur_leader = pos;
+ /*
+ * Remember the leader's group before it is overwritten,
+ * so that later events match as being in the same
+ * group.
+ */
+ cur_leaders_grp = pos->core.leader;
+ }
+ pos_leader_pmu_name = evsel__pmu_name(pos_leader);
+ if (strcmp(pos_leader_pmu_name, pos_pmu_name) || force_grouped) {
+ /*
+ * Event's PMU differs from its leader's. Groups can't
+ * span PMUs, so update leader from the group/PMU
+ * tracker.
+ */
+ evsel__set_leader(pos, cur_leader);
+ }
+ }
+ list_for_each_entry(pos, list, core.node) {
+ pos->core.leader->nr_members++;
+ }
+}
+
int __parse_events(struct evlist *evlist, const char *str,
struct parse_events_error *err, struct perf_pmu *fake_pmu)
{
@@ -2266,6 +2256,8 @@ int __parse_events(struct evlist *evlist, const char *str,
return -1;
}

+ parse_events__sort_events_and_fix_groups(&parse_state.list);
+
/*
* Add list to the evlist even with errors to allow callers to clean up.
*/
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 428e72eaafcc..22fc11b0bd59 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -200,8 +200,7 @@ int parse_events_copy_term_list(struct list_head *old,

enum perf_pmu_event_symbol_type
perf_pmu__parse_check(const char *name);
-void parse_events__set_leader(char *name, struct list_head *list,
- struct parse_events_state *parse_state);
+void parse_events__set_leader(char *name, struct list_head *list);
void parse_events_update_lists(struct list_head *list_event,
struct list_head *list_all);
void parse_events_evlist_error(struct parse_events_state *parse_state,
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 541b8dde2063..90d12f2bc8be 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -203,7 +203,7 @@ PE_NAME '{' events '}'

inc_group_count(list, _parse_state);
/* Takes ownership of $1. */
- parse_events__set_leader($1, list, _parse_state);
+ parse_events__set_leader($1, list);
$$ = list;
}
|
@@ -212,7 +212,7 @@ PE_NAME '{' events '}'
struct list_head *list = $2;

inc_group_count(list, _parse_state);
- parse_events__set_leader(NULL, list, _parse_state);
+ parse_events__set_leader(NULL, list);
$$ = list;
}

--
2.40.0.rc0.216.gc4246ad0f0-goog


2023-03-02 22:34:17

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v2 03/10] perf record: Early auxtrace initialization before event parsing

This allows event parsing to use the evsel__is_aux_event function,
which is important when determining event grouping.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/arch/x86/util/auxtrace.c | 17 +++++++++++++----
tools/perf/builtin-record.c | 6 ++++++
tools/perf/util/auxtrace.h | 2 ++
3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
index 3da506e13f49..de1e4842ea2e 100644
--- a/tools/perf/arch/x86/util/auxtrace.c
+++ b/tools/perf/arch/x86/util/auxtrace.c
@@ -15,6 +15,19 @@
#include "../../../util/intel-bts.h"
#include "../../../util/evlist.h"

+void auxtrace__early_init(void)
+{
+ struct perf_pmu *intel_pt_pmu;
+ struct perf_pmu *intel_bts_pmu;
+
+ intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
+ if (intel_pt_pmu)
+ intel_pt_pmu->auxtrace = true;
+ intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
+ if (intel_bts_pmu)
+ intel_bts_pmu->auxtrace = true;
+}
+
static
struct auxtrace_record *auxtrace_record__init_intel(struct evlist *evlist,
int *err)
@@ -26,11 +39,7 @@ struct auxtrace_record *auxtrace_record__init_intel(struct evlist *evlist,
bool found_bts = false;

intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
- if (intel_pt_pmu)
- intel_pt_pmu->auxtrace = true;
intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
- if (intel_bts_pmu)
- intel_bts_pmu->auxtrace = true;

evlist__for_each_entry(evlist, evsel) {
if (intel_pt_pmu && evsel->core.attr.type == intel_pt_pmu->type)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8374117e66f6..a0870c076dc0 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -3940,6 +3940,10 @@ static int record__init_thread_masks(struct record *rec)
return ret;
}

+__weak void auxtrace__early_init(void)
+{
+}
+
int cmd_record(int argc, const char **argv)
{
int err;
@@ -3985,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
if (err)
return err;

+ auxtrace__early_init();
+
argc = parse_options(argc, argv, record_options, record_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
if (quiet)
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 29eb82dff574..49a86aa6ac94 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -457,6 +457,8 @@ struct addr_filters {

struct auxtrace_cache;

+void auxtrace__early_init(void);
+
#ifdef HAVE_AUXTRACE_SUPPORT

u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm);
--
2.40.0.rc0.216.gc4246ad0f0-goog


2023-03-03 00:38:29

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v2 08/10] perf parse-events: Sort and group parsed events

On Thu, Mar 2, 2023 at 1:26 PM Ian Rogers <[email protected]> wrote:
>
> This change is intended to be a no-op for most current cases, the
> default sort order is the order the events were parsed. Where it
> varies is in how groups are handled. Previously an uncore and core
> event that are grouped would most often cause the group to be removed:
>
> ```
> $ perf stat -e '{instructions,uncore_imc_free_running_0/data_total/}' -a sleep 1
> WARNING: grouped events cpus do not match, disabling group:
> anon group { instructions, uncore_imc_free_running_0/data_total/ }
> ...
> ```
>
> However, when wildcards are used the events should be re-sorted and
> re-grouped in parse_events__set_leader, but this currently fails for
> simple examples:
>
> ```
> $ perf stat -e '{uncore_imc_free_running/data_read/,uncore_imc_free_running/data_write/}' -a sleep 1
>
> Performance counter stats for 'system wide':
>
> <not counted> MiB uncore_imc_free_running/data_read/
> <not counted> MiB uncore_imc_free_running/data_write/
>
> 1.000996992 seconds time elapsed
> ```
>
> A futher failure mode, fixed in this patch, is to force topdown events
> into a group.
>
> This change moves sorting the evsels in the evlist after parsing. It
> requires parsing to set up groups. First the evsels are sorted
> respecting the existing groupings and parse order, but also reordering
> to ensure evsels of the same PMU and group appear together. So that
> software and aux events respect groups, their pmu_name is taken from
> the group leader. The sorting is done with list_sort removing a memory
> allocation.
>
> After sorting a pass is done to correct the group leaders and for
> topdown events ensuring they have a group leader.
>
> This fixes the problems seen before:
>
> ```
> $ perf stat -e '{uncore_imc_free_running/data_read/,uncore_imc_free_running/data_write/}' -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 727.42 MiB uncore_imc_free_running/data_read/
> 81.84 MiB uncore_imc_free_running/data_write/
>
> 1.000948615 seconds time elapsed
> ```
>
> As well as making groups not fail for cases like:
>
> ```
> $ perf stat -e '{imc_free_running_0/data_total/,imc_free_running_1/data_total/}' -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 256.47 MiB imc_free_running_0/data_total/
> 256.48 MiB imc_free_running_1/data_total/

I didn't expect we can group events from different PMUs.
Not sure if it can handle multiplexing properly..

Thanks,
Namhyung


>
> 1.001165442 seconds time elapsed
> ```
>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/arch/x86/util/evlist.c | 39 ++---
> tools/perf/util/evlist.h | 2 +-
> tools/perf/util/parse-events.c | 240 +++++++++++++++---------------
> tools/perf/util/parse-events.h | 3 +-
> tools/perf/util/parse-events.y | 4 +-
> 5 files changed, 136 insertions(+), 152 deletions(-)
>
> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> index 8a7ae4162563..d4193479a364 100644
> --- a/tools/perf/arch/x86/util/evlist.c
> +++ b/tools/perf/arch/x86/util/evlist.c
> @@ -65,29 +65,22 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
> return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
> }
>
> -struct evsel *arch_evlist__leader(struct list_head *list)
> +int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
> {
> - struct evsel *evsel, *first, *slots = NULL;
> - bool has_topdown = false;
> -
> - first = list_first_entry(list, struct evsel, core.node);
> -
> - if (!topdown_sys_has_perf_metrics())
> - return first;
> -
> - /* If there is a slots event and a topdown event then the slots event comes first. */
> - __evlist__for_each_entry(list, evsel) {
> - if (evsel->pmu_name && !strncmp(evsel->pmu_name, "cpu", 3) && evsel->name) {
> - if (strcasestr(evsel->name, "slots")) {
> - slots = evsel;
> - if (slots == first)
> - return first;
> - }
> - if (strcasestr(evsel->name, "topdown"))
> - has_topdown = true;
> - if (slots && has_topdown)
> - return slots;
> - }
> + if (topdown_sys_has_perf_metrics() &&
> + (!lhs->pmu_name || !strncmp(lhs->pmu_name, "cpu", 3))) {
> + /* Ensure the topdown slots comes first. */
> + if (strcasestr(lhs->name, "slots"))
> + return -1;
> + if (strcasestr(rhs->name, "slots"))
> + return 1;
> + /* Followed by topdown events. */
> + if (strcasestr(lhs->name, "topdown") && !strcasestr(rhs->name, "topdown"))
> + return -1;
> + if (!strcasestr(lhs->name, "topdown") && strcasestr(rhs->name, "topdown"))
> + return 1;
> }
> - return first;
> +
> + /* Default ordering by insertion index. */
> + return lhs->core.idx - rhs->core.idx;
> }
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 01fa9d592c5a..d89d8f92802b 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -119,7 +119,7 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
> #define evlist__add_default_attrs(evlist, array) \
> arch_evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
>
> -struct evsel *arch_evlist__leader(struct list_head *list);
> +int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs);
>
> int evlist__add_dummy(struct evlist *evlist);
> struct evsel *evlist__add_aux_dummy(struct evlist *evlist, bool system_wide);
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 1be454697d57..d6eb0a54dd2d 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -1,6 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
> #include <linux/hw_breakpoint.h>
> #include <linux/err.h>
> +#include <linux/list_sort.h>
> #include <linux/zalloc.h>
> #include <dirent.h>
> #include <errno.h>
> @@ -1655,125 +1656,7 @@ int parse_events__modifier_group(struct list_head *list,
> return parse_events__modifier_event(list, event_mod, true);
> }
>
> -/*
> - * Check if the two uncore PMUs are from the same uncore block
> - * The format of the uncore PMU name is uncore_#blockname_#pmuidx
> - */
> -static bool is_same_uncore_block(const char *pmu_name_a, const char *pmu_name_b)
> -{
> - char *end_a, *end_b;
> -
> - end_a = strrchr(pmu_name_a, '_');
> - end_b = strrchr(pmu_name_b, '_');
> -
> - if (!end_a || !end_b)
> - return false;
> -
> - if ((end_a - pmu_name_a) != (end_b - pmu_name_b))
> - return false;
> -
> - return (strncmp(pmu_name_a, pmu_name_b, end_a - pmu_name_a) == 0);
> -}
> -
> -static int
> -parse_events__set_leader_for_uncore_aliase(char *name, struct list_head *list,
> - struct parse_events_state *parse_state)
> -{
> - struct evsel *evsel, *leader;
> - uintptr_t *leaders;
> - bool is_leader = true;
> - int i, nr_pmu = 0, total_members, ret = 0;
> -
> - leader = list_first_entry(list, struct evsel, core.node);
> - evsel = list_last_entry(list, struct evsel, core.node);
> - total_members = evsel->core.idx - leader->core.idx + 1;
> -
> - leaders = calloc(total_members, sizeof(uintptr_t));
> - if (WARN_ON(!leaders))
> - return 0;
> -
> - /*
> - * Going through the whole group and doing sanity check.
> - * All members must use alias, and be from the same uncore block.
> - * Also, storing the leader events in an array.
> - */
> - __evlist__for_each_entry(list, evsel) {
> -
> - /* Only split the uncore group which members use alias */
> - if (!evsel->use_uncore_alias)
> - goto out;
> -
> - /* The events must be from the same uncore block */
> - if (!is_same_uncore_block(leader->pmu_name, evsel->pmu_name))
> - goto out;
> -
> - if (!is_leader)
> - continue;
> - /*
> - * If the event's PMU name starts to repeat, it must be a new
> - * event. That can be used to distinguish the leader from
> - * other members, even they have the same event name.
> - */
> - if ((leader != evsel) &&
> - !strcmp(leader->pmu_name, evsel->pmu_name)) {
> - is_leader = false;
> - continue;
> - }
> -
> - /* Store the leader event for each PMU */
> - leaders[nr_pmu++] = (uintptr_t) evsel;
> - }
> -
> - /* only one event alias */
> - if (nr_pmu == total_members) {
> - parse_state->nr_groups--;
> - goto handled;
> - }
> -
> - /*
> - * An uncore event alias is a joint name which means the same event
> - * runs on all PMUs of a block.
> - * Perf doesn't support mixed events from different PMUs in the same
> - * group. The big group has to be split into multiple small groups
> - * which only include the events from the same PMU.
> - *
> - * Here the uncore event aliases must be from the same uncore block.
> - * The number of PMUs must be same for each alias. The number of new
> - * small groups equals to the number of PMUs.
> - * Setting the leader event for corresponding members in each group.
> - */
> - i = 0;
> - __evlist__for_each_entry(list, evsel) {
> - if (i >= nr_pmu)
> - i = 0;
> - evsel__set_leader(evsel, (struct evsel *) leaders[i++]);
> - }
> -
> - /* The number of members and group name are same for each group */
> - for (i = 0; i < nr_pmu; i++) {
> - evsel = (struct evsel *) leaders[i];
> - evsel->core.nr_members = total_members / nr_pmu;
> - evsel->group_name = name ? strdup(name) : NULL;
> - }
> -
> - /* Take the new small groups into account */
> - parse_state->nr_groups += nr_pmu - 1;
> -
> -handled:
> - ret = 1;
> - free(name);
> -out:
> - free(leaders);
> - return ret;
> -}
> -
> -__weak struct evsel *arch_evlist__leader(struct list_head *list)
> -{
> - return list_first_entry(list, struct evsel, core.node);
> -}
> -
> -void parse_events__set_leader(char *name, struct list_head *list,
> - struct parse_events_state *parse_state)
> +void parse_events__set_leader(char *name, struct list_head *list)
> {
> struct evsel *leader;
>
> @@ -1782,13 +1665,9 @@ void parse_events__set_leader(char *name, struct list_head *list,
> return;
> }
>
> - if (parse_events__set_leader_for_uncore_aliase(name, list, parse_state))
> - return;
> -
> - leader = arch_evlist__leader(list);
> + leader = list_first_entry(list, struct evsel, core.node);
> __perf_evlist__set_leader(list, &leader->core);
> leader->group_name = name;
> - list_move(&leader->core.node, list);
> }
>
> /* list_event is assumed to point to malloc'ed memory */
> @@ -2245,6 +2124,117 @@ static int parse_events__with_hybrid_pmu(struct parse_events_state *parse_state,
> return ret;
> }
>
> +__weak int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
> +{
> + /* Order by insertion index. */
> + return lhs->core.idx - rhs->core.idx;
> +}
> +
> +static int evlist__cmp(void *state, const struct list_head *l, const struct list_head *r)
> +{
> + const struct perf_evsel *lhs_core = container_of(l, struct perf_evsel, node);
> + const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
> + const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
> + const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
> + int *leader_idx = state;
> + int lhs_leader_idx = *leader_idx, rhs_leader_idx = *leader_idx, ret;
> + const char *lhs_pmu_name, *rhs_pmu_name;
> +
> + /*
> + * First sort by grouping/leader. Read the leader idx only if the evsel
> + * is part of a group, as -1 indicates no group.
> + */
> + if (lhs_core->leader != lhs_core || lhs_core->nr_members > 1)
> + lhs_leader_idx = lhs_core->leader->idx;
> + if (rhs_core->leader != rhs_core || rhs_core->nr_members > 1)
> + rhs_leader_idx = rhs_core->leader->idx;
> +
> + if (lhs_leader_idx != rhs_leader_idx)
> + return lhs_leader_idx - rhs_leader_idx;
> +
> + /* Group by PMU. Groups can't span PMUs. */
> + lhs_pmu_name = evsel__pmu_name(lhs);
> + rhs_pmu_name = evsel__pmu_name(rhs);
> + ret = strcmp(lhs_pmu_name, rhs_pmu_name);
> + if (ret)
> + return ret;
> +
> + /* Architecture specific sorting. */
> + return arch_evlist__cmp(lhs, rhs);
> +}
> +
> +static void parse_events__sort_events_and_fix_groups(struct list_head *list)
> +{
> + int idx = -1;
> + struct evsel *pos, *cur_leader = NULL;
> + struct perf_evsel *cur_leaders_grp = NULL;
> +
> + /*
> + * Compute index to insert ungrouped events at. Place them where the
> + * first ungrouped event appears.
> + */
> + list_for_each_entry(pos, list, core.node) {
> + const struct evsel *pos_leader = evsel__leader(pos);
> +
> + if (pos != pos_leader || pos->core.nr_members > 1)
> + continue;
> +
> + idx = pos->core.idx;
> + break;
> + }
> +
> + /* Sort events. */
> + list_sort(&idx, list, evlist__cmp);
> +
> + /*
> + * Recompute groups, splitting for PMUs and adding groups for events
> + * that require them.
> + */
> + idx = 0;
> + list_for_each_entry(pos, list, core.node) {
> + const struct evsel *pos_leader = evsel__leader(pos);
> + const char *pos_pmu_name = evsel__pmu_name(pos);
> + const char *cur_leader_pmu_name, *pos_leader_pmu_name;
> + bool force_grouped = arch_evsel__must_be_in_group(pos);
> +
> + /* Reset index and nr_members. */
> + pos->core.idx = idx++;
> + pos->core.nr_members = 0;
> +
> + /*
> + * Set the group leader respecting the given groupings and that
> + * groups can't span PMUs.
> + */
> + if (!cur_leader)
> + cur_leader = pos;
> +
> + cur_leader_pmu_name = evsel__pmu_name(cur_leader);
> + if ((cur_leaders_grp != pos->core.leader && !force_grouped) ||
> + strcmp(cur_leader_pmu_name, pos_pmu_name)) {
> + /* Event is for a different group/PMU than last. */
> + cur_leader = pos;
> + /*
> + * Remember the leader's group before it is overwritten,
> + * so that later events match as being in the same
> + * group.
> + */
> + cur_leaders_grp = pos->core.leader;
> + }
> + pos_leader_pmu_name = evsel__pmu_name(pos_leader);
> + if (strcmp(pos_leader_pmu_name, pos_pmu_name) || force_grouped) {
> + /*
> + * Event's PMU differs from its leader's. Groups can't
> + * span PMUs, so update leader from the group/PMU
> + * tracker.
> + */
> + evsel__set_leader(pos, cur_leader);
> + }
> + }
> + list_for_each_entry(pos, list, core.node) {
> + pos->core.leader->nr_members++;
> + }
> +}
> +
> int __parse_events(struct evlist *evlist, const char *str,
> struct parse_events_error *err, struct perf_pmu *fake_pmu)
> {
> @@ -2266,6 +2256,8 @@ int __parse_events(struct evlist *evlist, const char *str,
> return -1;
> }
>
> + parse_events__sort_events_and_fix_groups(&parse_state.list);
> +
> /*
> * Add list to the evlist even with errors to allow callers to clean up.
> */
> diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> index 428e72eaafcc..22fc11b0bd59 100644
> --- a/tools/perf/util/parse-events.h
> +++ b/tools/perf/util/parse-events.h
> @@ -200,8 +200,7 @@ int parse_events_copy_term_list(struct list_head *old,
>
> enum perf_pmu_event_symbol_type
> perf_pmu__parse_check(const char *name);
> -void parse_events__set_leader(char *name, struct list_head *list,
> - struct parse_events_state *parse_state);
> +void parse_events__set_leader(char *name, struct list_head *list);
> void parse_events_update_lists(struct list_head *list_event,
> struct list_head *list_all);
> void parse_events_evlist_error(struct parse_events_state *parse_state,
> diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> index 541b8dde2063..90d12f2bc8be 100644
> --- a/tools/perf/util/parse-events.y
> +++ b/tools/perf/util/parse-events.y
> @@ -203,7 +203,7 @@ PE_NAME '{' events '}'
>
> inc_group_count(list, _parse_state);
> /* Takes ownership of $1. */
> - parse_events__set_leader($1, list, _parse_state);
> + parse_events__set_leader($1, list);
> $$ = list;
> }
> |
> @@ -212,7 +212,7 @@ PE_NAME '{' events '}'
> struct list_head *list = $2;
>
> inc_group_count(list, _parse_state);
> - parse_events__set_leader(NULL, list, _parse_state);
> + parse_events__set_leader(NULL, list);
> $$ = list;
> }
>
> --
> 2.40.0.rc0.216.gc4246ad0f0-goog
>

2023-03-03 01:39:45

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v2 08/10] perf parse-events: Sort and group parsed events

On Thu, Mar 2, 2023 at 4:37 PM Namhyung Kim <[email protected]> wrote:
>
> On Thu, Mar 2, 2023 at 1:26 PM Ian Rogers <[email protected]> wrote:
> >
> > This change is intended to be a no-op for most current cases, the
> > default sort order is the order the events were parsed. Where it
> > varies is in how groups are handled. Previously an uncore and core
> > event that are grouped would most often cause the group to be removed:
> >
> > ```
> > $ perf stat -e '{instructions,uncore_imc_free_running_0/data_total/}' -a sleep 1
> > WARNING: grouped events cpus do not match, disabling group:
> > anon group { instructions, uncore_imc_free_running_0/data_total/ }
> > ...
> > ```
> >
> > However, when wildcards are used the events should be re-sorted and
> > re-grouped in parse_events__set_leader, but this currently fails for
> > simple examples:
> >
> > ```
> > $ perf stat -e '{uncore_imc_free_running/data_read/,uncore_imc_free_running/data_write/}' -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > <not counted> MiB uncore_imc_free_running/data_read/
> > <not counted> MiB uncore_imc_free_running/data_write/
> >
> > 1.000996992 seconds time elapsed
> > ```
> >
> > A futher failure mode, fixed in this patch, is to force topdown events
> > into a group.
> >
> > This change moves sorting the evsels in the evlist after parsing. It
> > requires parsing to set up groups. First the evsels are sorted
> > respecting the existing groupings and parse order, but also reordering
> > to ensure evsels of the same PMU and group appear together. So that
> > software and aux events respect groups, their pmu_name is taken from
> > the group leader. The sorting is done with list_sort removing a memory
> > allocation.
> >
> > After sorting a pass is done to correct the group leaders and for
> > topdown events ensuring they have a group leader.
> >
> > This fixes the problems seen before:
> >
> > ```
> > $ perf stat -e '{uncore_imc_free_running/data_read/,uncore_imc_free_running/data_write/}' -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 727.42 MiB uncore_imc_free_running/data_read/
> > 81.84 MiB uncore_imc_free_running/data_write/
> >
> > 1.000948615 seconds time elapsed
> > ```
> >
> > As well as making groups not fail for cases like:
> >
> > ```
> > $ perf stat -e '{imc_free_running_0/data_total/,imc_free_running_1/data_total/}' -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 256.47 MiB imc_free_running_0/data_total/
> > 256.48 MiB imc_free_running_1/data_total/
>
> I didn't expect we can group events from different PMUs.
> Not sure if it can handle multiplexing properly..

You are right, this example is now working as the sorting and
regrouping breaks the events into two groups. The rules around
grouping are complex and Arnaldo mentioned that maybe cases like this
should be warned about. The problem then is that wildcard and metric
expansion may naturally produce these cases and we don't want the
warning. It is something of a shame that the grouping information in
the perf stat output isn't clearer.

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> >
> > 1.001165442 seconds time elapsed
> > ```
> >
> > Signed-off-by: Ian Rogers <[email protected]>
> > ---
> > tools/perf/arch/x86/util/evlist.c | 39 ++---
> > tools/perf/util/evlist.h | 2 +-
> > tools/perf/util/parse-events.c | 240 +++++++++++++++---------------
> > tools/perf/util/parse-events.h | 3 +-
> > tools/perf/util/parse-events.y | 4 +-
> > 5 files changed, 136 insertions(+), 152 deletions(-)
> >
> > diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> > index 8a7ae4162563..d4193479a364 100644
> > --- a/tools/perf/arch/x86/util/evlist.c
> > +++ b/tools/perf/arch/x86/util/evlist.c
> > @@ -65,29 +65,22 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
> > return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
> > }
> >
> > -struct evsel *arch_evlist__leader(struct list_head *list)
> > +int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
> > {
> > - struct evsel *evsel, *first, *slots = NULL;
> > - bool has_topdown = false;
> > -
> > - first = list_first_entry(list, struct evsel, core.node);
> > -
> > - if (!topdown_sys_has_perf_metrics())
> > - return first;
> > -
> > - /* If there is a slots event and a topdown event then the slots event comes first. */
> > - __evlist__for_each_entry(list, evsel) {
> > - if (evsel->pmu_name && !strncmp(evsel->pmu_name, "cpu", 3) && evsel->name) {
> > - if (strcasestr(evsel->name, "slots")) {
> > - slots = evsel;
> > - if (slots == first)
> > - return first;
> > - }
> > - if (strcasestr(evsel->name, "topdown"))
> > - has_topdown = true;
> > - if (slots && has_topdown)
> > - return slots;
> > - }
> > + if (topdown_sys_has_perf_metrics() &&
> > + (!lhs->pmu_name || !strncmp(lhs->pmu_name, "cpu", 3))) {
> > + /* Ensure the topdown slots comes first. */
> > + if (strcasestr(lhs->name, "slots"))
> > + return -1;
> > + if (strcasestr(rhs->name, "slots"))
> > + return 1;
> > + /* Followed by topdown events. */
> > + if (strcasestr(lhs->name, "topdown") && !strcasestr(rhs->name, "topdown"))
> > + return -1;
> > + if (!strcasestr(lhs->name, "topdown") && strcasestr(rhs->name, "topdown"))
> > + return 1;
> > }
> > - return first;
> > +
> > + /* Default ordering by insertion index. */
> > + return lhs->core.idx - rhs->core.idx;
> > }
> > diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> > index 01fa9d592c5a..d89d8f92802b 100644
> > --- a/tools/perf/util/evlist.h
> > +++ b/tools/perf/util/evlist.h
> > @@ -119,7 +119,7 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
> > #define evlist__add_default_attrs(evlist, array) \
> > arch_evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
> >
> > -struct evsel *arch_evlist__leader(struct list_head *list);
> > +int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs);
> >
> > int evlist__add_dummy(struct evlist *evlist);
> > struct evsel *evlist__add_aux_dummy(struct evlist *evlist, bool system_wide);
> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > index 1be454697d57..d6eb0a54dd2d 100644
> > --- a/tools/perf/util/parse-events.c
> > +++ b/tools/perf/util/parse-events.c
> > @@ -1,6 +1,7 @@
> > // SPDX-License-Identifier: GPL-2.0
> > #include <linux/hw_breakpoint.h>
> > #include <linux/err.h>
> > +#include <linux/list_sort.h>
> > #include <linux/zalloc.h>
> > #include <dirent.h>
> > #include <errno.h>
> > @@ -1655,125 +1656,7 @@ int parse_events__modifier_group(struct list_head *list,
> > return parse_events__modifier_event(list, event_mod, true);
> > }
> >
> > -/*
> > - * Check if the two uncore PMUs are from the same uncore block
> > - * The format of the uncore PMU name is uncore_#blockname_#pmuidx
> > - */
> > -static bool is_same_uncore_block(const char *pmu_name_a, const char *pmu_name_b)
> > -{
> > - char *end_a, *end_b;
> > -
> > - end_a = strrchr(pmu_name_a, '_');
> > - end_b = strrchr(pmu_name_b, '_');
> > -
> > - if (!end_a || !end_b)
> > - return false;
> > -
> > - if ((end_a - pmu_name_a) != (end_b - pmu_name_b))
> > - return false;
> > -
> > - return (strncmp(pmu_name_a, pmu_name_b, end_a - pmu_name_a) == 0);
> > -}
> > -
> > -static int
> > -parse_events__set_leader_for_uncore_aliase(char *name, struct list_head *list,
> > - struct parse_events_state *parse_state)
> > -{
> > - struct evsel *evsel, *leader;
> > - uintptr_t *leaders;
> > - bool is_leader = true;
> > - int i, nr_pmu = 0, total_members, ret = 0;
> > -
> > - leader = list_first_entry(list, struct evsel, core.node);
> > - evsel = list_last_entry(list, struct evsel, core.node);
> > - total_members = evsel->core.idx - leader->core.idx + 1;
> > -
> > - leaders = calloc(total_members, sizeof(uintptr_t));
> > - if (WARN_ON(!leaders))
> > - return 0;
> > -
> > - /*
> > - * Going through the whole group and doing sanity check.
> > - * All members must use alias, and be from the same uncore block.
> > - * Also, storing the leader events in an array.
> > - */
> > - __evlist__for_each_entry(list, evsel) {
> > -
> > - /* Only split the uncore group which members use alias */
> > - if (!evsel->use_uncore_alias)
> > - goto out;
> > -
> > - /* The events must be from the same uncore block */
> > - if (!is_same_uncore_block(leader->pmu_name, evsel->pmu_name))
> > - goto out;
> > -
> > - if (!is_leader)
> > - continue;
> > - /*
> > - * If the event's PMU name starts to repeat, it must be a new
> > - * event. That can be used to distinguish the leader from
> > - * other members, even they have the same event name.
> > - */
> > - if ((leader != evsel) &&
> > - !strcmp(leader->pmu_name, evsel->pmu_name)) {
> > - is_leader = false;
> > - continue;
> > - }
> > -
> > - /* Store the leader event for each PMU */
> > - leaders[nr_pmu++] = (uintptr_t) evsel;
> > - }
> > -
> > - /* only one event alias */
> > - if (nr_pmu == total_members) {
> > - parse_state->nr_groups--;
> > - goto handled;
> > - }
> > -
> > - /*
> > - * An uncore event alias is a joint name which means the same event
> > - * runs on all PMUs of a block.
> > - * Perf doesn't support mixed events from different PMUs in the same
> > - * group. The big group has to be split into multiple small groups
> > - * which only include the events from the same PMU.
> > - *
> > - * Here the uncore event aliases must be from the same uncore block.
> > - * The number of PMUs must be same for each alias. The number of new
> > - * small groups equals to the number of PMUs.
> > - * Setting the leader event for corresponding members in each group.
> > - */
> > - i = 0;
> > - __evlist__for_each_entry(list, evsel) {
> > - if (i >= nr_pmu)
> > - i = 0;
> > - evsel__set_leader(evsel, (struct evsel *) leaders[i++]);
> > - }
> > -
> > - /* The number of members and group name are same for each group */
> > - for (i = 0; i < nr_pmu; i++) {
> > - evsel = (struct evsel *) leaders[i];
> > - evsel->core.nr_members = total_members / nr_pmu;
> > - evsel->group_name = name ? strdup(name) : NULL;
> > - }
> > -
> > - /* Take the new small groups into account */
> > - parse_state->nr_groups += nr_pmu - 1;
> > -
> > -handled:
> > - ret = 1;
> > - free(name);
> > -out:
> > - free(leaders);
> > - return ret;
> > -}
> > -
> > -__weak struct evsel *arch_evlist__leader(struct list_head *list)
> > -{
> > - return list_first_entry(list, struct evsel, core.node);
> > -}
> > -
> > -void parse_events__set_leader(char *name, struct list_head *list,
> > - struct parse_events_state *parse_state)
> > +void parse_events__set_leader(char *name, struct list_head *list)
> > {
> > struct evsel *leader;
> >
> > @@ -1782,13 +1665,9 @@ void parse_events__set_leader(char *name, struct list_head *list,
> > return;
> > }
> >
> > - if (parse_events__set_leader_for_uncore_aliase(name, list, parse_state))
> > - return;
> > -
> > - leader = arch_evlist__leader(list);
> > + leader = list_first_entry(list, struct evsel, core.node);
> > __perf_evlist__set_leader(list, &leader->core);
> > leader->group_name = name;
> > - list_move(&leader->core.node, list);
> > }
> >
> > /* list_event is assumed to point to malloc'ed memory */
> > @@ -2245,6 +2124,117 @@ static int parse_events__with_hybrid_pmu(struct parse_events_state *parse_state,
> > return ret;
> > }
> >
> > +__weak int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
> > +{
> > + /* Order by insertion index. */
> > + return lhs->core.idx - rhs->core.idx;
> > +}
> > +
> > +static int evlist__cmp(void *state, const struct list_head *l, const struct list_head *r)
> > +{
> > + const struct perf_evsel *lhs_core = container_of(l, struct perf_evsel, node);
> > + const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
> > + const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
> > + const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
> > + int *leader_idx = state;
> > + int lhs_leader_idx = *leader_idx, rhs_leader_idx = *leader_idx, ret;
> > + const char *lhs_pmu_name, *rhs_pmu_name;
> > +
> > + /*
> > + * First sort by grouping/leader. Read the leader idx only if the evsel
> > + * is part of a group, as -1 indicates no group.
> > + */
> > + if (lhs_core->leader != lhs_core || lhs_core->nr_members > 1)
> > + lhs_leader_idx = lhs_core->leader->idx;
> > + if (rhs_core->leader != rhs_core || rhs_core->nr_members > 1)
> > + rhs_leader_idx = rhs_core->leader->idx;
> > +
> > + if (lhs_leader_idx != rhs_leader_idx)
> > + return lhs_leader_idx - rhs_leader_idx;
> > +
> > + /* Group by PMU. Groups can't span PMUs. */
> > + lhs_pmu_name = evsel__pmu_name(lhs);
> > + rhs_pmu_name = evsel__pmu_name(rhs);
> > + ret = strcmp(lhs_pmu_name, rhs_pmu_name);
> > + if (ret)
> > + return ret;
> > +
> > + /* Architecture specific sorting. */
> > + return arch_evlist__cmp(lhs, rhs);
> > +}
> > +
> > +static void parse_events__sort_events_and_fix_groups(struct list_head *list)
> > +{
> > + int idx = -1;
> > + struct evsel *pos, *cur_leader = NULL;
> > + struct perf_evsel *cur_leaders_grp = NULL;
> > +
> > + /*
> > + * Compute index to insert ungrouped events at. Place them where the
> > + * first ungrouped event appears.
> > + */
> > + list_for_each_entry(pos, list, core.node) {
> > + const struct evsel *pos_leader = evsel__leader(pos);
> > +
> > + if (pos != pos_leader || pos->core.nr_members > 1)
> > + continue;
> > +
> > + idx = pos->core.idx;
> > + break;
> > + }
> > +
> > + /* Sort events. */
> > + list_sort(&idx, list, evlist__cmp);
> > +
> > + /*
> > + * Recompute groups, splitting for PMUs and adding groups for events
> > + * that require them.
> > + */
> > + idx = 0;
> > + list_for_each_entry(pos, list, core.node) {
> > + const struct evsel *pos_leader = evsel__leader(pos);
> > + const char *pos_pmu_name = evsel__pmu_name(pos);
> > + const char *cur_leader_pmu_name, *pos_leader_pmu_name;
> > + bool force_grouped = arch_evsel__must_be_in_group(pos);
> > +
> > + /* Reset index and nr_members. */
> > + pos->core.idx = idx++;
> > + pos->core.nr_members = 0;
> > +
> > + /*
> > + * Set the group leader respecting the given groupings and that
> > + * groups can't span PMUs.
> > + */
> > + if (!cur_leader)
> > + cur_leader = pos;
> > +
> > + cur_leader_pmu_name = evsel__pmu_name(cur_leader);
> > + if ((cur_leaders_grp != pos->core.leader && !force_grouped) ||
> > + strcmp(cur_leader_pmu_name, pos_pmu_name)) {
> > + /* Event is for a different group/PMU than last. */
> > + cur_leader = pos;
> > + /*
> > + * Remember the leader's group before it is overwritten,
> > + * so that later events match as being in the same
> > + * group.
> > + */
> > + cur_leaders_grp = pos->core.leader;
> > + }
> > + pos_leader_pmu_name = evsel__pmu_name(pos_leader);
> > + if (strcmp(pos_leader_pmu_name, pos_pmu_name) || force_grouped) {
> > + /*
> > + * Event's PMU differs from its leader's. Groups can't
> > + * span PMUs, so update leader from the group/PMU
> > + * tracker.
> > + */
> > + evsel__set_leader(pos, cur_leader);
> > + }
> > + }
> > + list_for_each_entry(pos, list, core.node) {
> > + pos->core.leader->nr_members++;
> > + }
> > +}
> > +
> > int __parse_events(struct evlist *evlist, const char *str,
> > struct parse_events_error *err, struct perf_pmu *fake_pmu)
> > {
> > @@ -2266,6 +2256,8 @@ int __parse_events(struct evlist *evlist, const char *str,
> > return -1;
> > }
> >
> > + parse_events__sort_events_and_fix_groups(&parse_state.list);
> > +
> > /*
> > * Add list to the evlist even with errors to allow callers to clean up.
> > */
> > diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> > index 428e72eaafcc..22fc11b0bd59 100644
> > --- a/tools/perf/util/parse-events.h
> > +++ b/tools/perf/util/parse-events.h
> > @@ -200,8 +200,7 @@ int parse_events_copy_term_list(struct list_head *old,
> >
> > enum perf_pmu_event_symbol_type
> > perf_pmu__parse_check(const char *name);
> > -void parse_events__set_leader(char *name, struct list_head *list,
> > - struct parse_events_state *parse_state);
> > +void parse_events__set_leader(char *name, struct list_head *list);
> > void parse_events_update_lists(struct list_head *list_event,
> > struct list_head *list_all);
> > void parse_events_evlist_error(struct parse_events_state *parse_state,
> > diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> > index 541b8dde2063..90d12f2bc8be 100644
> > --- a/tools/perf/util/parse-events.y
> > +++ b/tools/perf/util/parse-events.y
> > @@ -203,7 +203,7 @@ PE_NAME '{' events '}'
> >
> > inc_group_count(list, _parse_state);
> > /* Takes ownership of $1. */
> > - parse_events__set_leader($1, list, _parse_state);
> > + parse_events__set_leader($1, list);
> > $$ = list;
> > }
> > |
> > @@ -212,7 +212,7 @@ PE_NAME '{' events '}'
> > struct list_head *list = $2;
> >
> > inc_group_count(list, _parse_state);
> > - parse_events__set_leader(NULL, list, _parse_state);
> > + parse_events__set_leader(NULL, list);
> > $$ = list;
> > }
> >
> > --
> > 2.40.0.rc0.216.gc4246ad0f0-goog
> >

2023-03-03 15:50:30

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v2 02/10] perf stat: Don't remove all grouped events when CPU maps disagree



On 2023-03-02 4:25 p.m., Ian Rogers wrote:
> If the events in an evlist's CPU map differ then the entire group is
> removed. For example:
>
> ```
> $ perf stat -e '{imc_free_running/data_read/,imc_free_running/data_write/,cs}' -a sleep 1
> WARNING: grouped events cpus do not match, disabling group:
> anon group { imc_free_running/data_read/, imc_free_running/data_write/, cs }
> ```
>
> Change the behavior so that just the events not matching the leader
> are removed. So in the example above, just 'cs' will be removed.
>
> Modify the warning so that it is produced once for each group, rather
> than once for the entire evlist. Shrink the scope and size of the
> warning text buffer.

For the uncore, we usually have to create a group for each uncore PMU.
The number of groups may be big. For example, on ICX, we have 40 CHA
PMUs. For SPR, there should be more CHAs. If we have something like
{cycles,uncore_cha/event=0x1/}, is the warning shown 40 times on ICX?
If so, it should be very annoying.

Maybe it's better to keep the current behavior which only print a
warning once and notify the users that perf will re-group the events.
For the details, they can get it from the -v option.

Thanks,
Kan
>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/builtin-stat.c | 24 +++++++++++++++---------
> 1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index d70b1ec88594..5c12ae5efce5 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -181,14 +181,13 @@ static bool cpus_map_matched(struct evsel *a, struct evsel *b)
>
> static void evlist__check_cpu_maps(struct evlist *evlist)
> {
> - struct evsel *evsel, *pos, *leader;
> - char buf[1024];
> + struct evsel *evsel, *warned_leader = NULL;
>
> if (evlist__has_hybrid(evlist))
> evlist__warn_hybrid_group(evlist);
>
> evlist__for_each_entry(evlist, evsel) {
> - leader = evsel__leader(evsel);
> + struct evsel *leader = evsel__leader(evsel);
>
> /* Check that leader matches cpus with each member. */
> if (leader == evsel)
> @@ -197,19 +196,26 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
> continue;
>
> /* If there's mismatch disable the group and warn user. */
> - WARN_ONCE(1, "WARNING: grouped events cpus do not match, disabling group:\n");
> - evsel__group_desc(leader, buf, sizeof(buf));
> - pr_warning(" %s\n", buf);
> -
> + if (warned_leader != leader) {
> + char buf[200];
> +
> + pr_warning("WARNING: grouped events cpus do not match.\n"
> + "Events with CPUs not matching the leader will "
> + "be removed from the group.\n");
> + evsel__group_desc(leader, buf, sizeof(buf));
> + pr_warning(" %s\n", buf);
> + warned_leader = leader;
> + }
> if (verbose > 0) {
> + char buf[200];
> +
> cpu_map__snprint(leader->core.cpus, buf, sizeof(buf));
> pr_warning(" %s: %s\n", leader->name, buf);
> cpu_map__snprint(evsel->core.cpus, buf, sizeof(buf));
> pr_warning(" %s: %s\n", evsel->name, buf);
> }
>
> - for_each_group_evsel(pos, leader)
> - evsel__remove_from_group(pos, leader);
> + evsel__remove_from_group(evsel, leader);
> }
> }
>

2023-03-03 16:40:19

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v2 03/10] perf record: Early auxtrace initialization before event parsing



On 2023-03-02 4:25 p.m., Ian Rogers wrote:
> This allows event parsing to use the evsel__is_aux_event function,
> which is important when determining event grouping.
>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/arch/x86/util/auxtrace.c | 17 +++++++++++++----
> tools/perf/builtin-record.c | 6 ++++++
> tools/perf/util/auxtrace.h | 2 ++
> 3 files changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
> index 3da506e13f49..de1e4842ea2e 100644
> --- a/tools/perf/arch/x86/util/auxtrace.c
> +++ b/tools/perf/arch/x86/util/auxtrace.c
> @@ -15,6 +15,19 @@
> #include "../../../util/intel-bts.h"
> #include "../../../util/evlist.h"
>
> +void auxtrace__early_init(void)
> +{
> + struct perf_pmu *intel_pt_pmu;
> + struct perf_pmu *intel_bts_pmu;
> +
> + intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
> + if (intel_pt_pmu)
> + intel_pt_pmu->auxtrace = true;
> + intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
> + if (intel_bts_pmu)
> + intel_bts_pmu->auxtrace = true;
> +}
> +
> static
> struct auxtrace_record *auxtrace_record__init_intel(struct evlist *evlist,
> int *err)
> @@ -26,11 +39,7 @@ struct auxtrace_record *auxtrace_record__init_intel(struct evlist *evlist,
> bool found_bts = false;
>
> intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
> - if (intel_pt_pmu)
> - intel_pt_pmu->auxtrace = true;
> intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
> - if (intel_bts_pmu)
> - intel_bts_pmu->auxtrace = true;
>
> evlist__for_each_entry(evlist, evsel) {
> if (intel_pt_pmu && evsel->core.attr.type == intel_pt_pmu->type)
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 8374117e66f6..a0870c076dc0 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -3940,6 +3940,10 @@ static int record__init_thread_masks(struct record *rec)
> return ret;
> }
>
> +__weak void auxtrace__early_init(void)
> +{
> +}
> +
> int cmd_record(int argc, const char **argv)
> {
> int err;
> @@ -3985,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
> if (err)
> return err;
>
> + auxtrace__early_init();

So the auxtrace__early_init() will be unconditionally invoked even there
is no PT or BTS events, right?

Maybe we should move the auxtrace__early_init() to evsel__is_aux_event()
and cache the value. The initialization will only be invoked when it's
required.
Something as below (not tested.)

+void auxtrace__init(void)
+{
+ struct perf_pmu *intel_pt_pmu;
+ struct perf_pmu *intel_bts_pmu;
+ static bool cached;
+
+ if (cached)
+ return;
+ intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
+ if (intel_pt_pmu)
+ intel_pt_pmu->auxtrace = true;
+ intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
+ if (intel_bts_pmu)
+ intel_bts_pmu->auxtrace = true;
+}

bool evsel__is_aux_event(struct evsel *evsel)
{
struct perf_pmu *pmu = evsel__find_pmu(evsel);
+ auxtrace__init();
return pmu && pmu->auxtrace;
}



Thanks,
Kan

> +
> argc = parse_options(argc, argv, record_options, record_usage,
> PARSE_OPT_STOP_AT_NON_OPTION);
> if (quiet)
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index 29eb82dff574..49a86aa6ac94 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -457,6 +457,8 @@ struct addr_filters {
>
> struct auxtrace_cache;
>
> +void auxtrace__early_init(void);
> +
> #ifdef HAVE_AUXTRACE_SUPPORT
>
> u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm);

2023-03-03 16:45:03

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v2 02/10] perf stat: Don't remove all grouped events when CPU maps disagree

On Fri, Mar 3, 2023 at 7:50 AM Liang, Kan <[email protected]> wrote:
>
>
>
> On 2023-03-02 4:25 p.m., Ian Rogers wrote:
> > If the events in an evlist's CPU map differ then the entire group is
> > removed. For example:
> >
> > ```
> > $ perf stat -e '{imc_free_running/data_read/,imc_free_running/data_write/,cs}' -a sleep 1
> > WARNING: grouped events cpus do not match, disabling group:
> > anon group { imc_free_running/data_read/, imc_free_running/data_write/, cs }
> > ```
> >
> > Change the behavior so that just the events not matching the leader
> > are removed. So in the example above, just 'cs' will be removed.
> >
> > Modify the warning so that it is produced once for each group, rather
> > than once for the entire evlist. Shrink the scope and size of the
> > warning text buffer.
>
> For the uncore, we usually have to create a group for each uncore PMU.
> The number of groups may be big. For example, on ICX, we have 40 CHA
> PMUs. For SPR, there should be more CHAs. If we have something like
> {cycles,uncore_cha/event=0x1/}, is the warning shown 40 times on ICX?
> If so, it should be very annoying.
>
> Maybe it's better to keep the current behavior which only print a
> warning once and notify the users that perf will re-group the events.
> For the details, they can get it from the -v option.

Thanks Kan, I could imagine that but I was also worried about cases
where there are multiple groups like:

```
$ perf stat -e '{imc_free_running/data_read/,cs},{uncore_clock/clockticks/,cs}'
-a sleep 1
WARNING: grouped events cpus do not match.
Events with CPUs not matching the leader will be removed from the group.
anon group { imc_free_running/data_read/, cs }
WARNING: grouped events cpus do not match.
Events with CPUs not matching the leader will be removed from the group.
anon group { uncore_clock/clockticks/, cs }

Performance counter stats for 'system wide':

1,255.75 MiB imc_free_running/data_read/
7,571 cs
1,327,285,527 uncore_clock/clockticks/
7,571 cs

1.002772882 seconds time elapsed
```

Knowing that both groups were broken there feels like a value add.
Given that this is a warning, and it can be fixed by moving the event
out of the group or forcing the CPUs, I lean toward being
informative/spammy as the spam is somewhat straightforwardly fixed on
the command line.

Thanks,
Ian

> Thanks,
> Kan
> >
> > Signed-off-by: Ian Rogers <[email protected]>
> > ---
> > tools/perf/builtin-stat.c | 24 +++++++++++++++---------
> > 1 file changed, 15 insertions(+), 9 deletions(-)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index d70b1ec88594..5c12ae5efce5 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -181,14 +181,13 @@ static bool cpus_map_matched(struct evsel *a, struct evsel *b)
> >
> > static void evlist__check_cpu_maps(struct evlist *evlist)
> > {
> > - struct evsel *evsel, *pos, *leader;
> > - char buf[1024];
> > + struct evsel *evsel, *warned_leader = NULL;
> >
> > if (evlist__has_hybrid(evlist))
> > evlist__warn_hybrid_group(evlist);
> >
> > evlist__for_each_entry(evlist, evsel) {
> > - leader = evsel__leader(evsel);
> > + struct evsel *leader = evsel__leader(evsel);
> >
> > /* Check that leader matches cpus with each member. */
> > if (leader == evsel)
> > @@ -197,19 +196,26 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
> > continue;
> >
> > /* If there's mismatch disable the group and warn user. */
> > - WARN_ONCE(1, "WARNING: grouped events cpus do not match, disabling group:\n");
> > - evsel__group_desc(leader, buf, sizeof(buf));
> > - pr_warning(" %s\n", buf);
> > -
> > + if (warned_leader != leader) {
> > + char buf[200];
> > +
> > + pr_warning("WARNING: grouped events cpus do not match.\n"
> > + "Events with CPUs not matching the leader will "
> > + "be removed from the group.\n");
> > + evsel__group_desc(leader, buf, sizeof(buf));
> > + pr_warning(" %s\n", buf);
> > + warned_leader = leader;
> > + }
> > if (verbose > 0) {
> > + char buf[200];
> > +
> > cpu_map__snprint(leader->core.cpus, buf, sizeof(buf));
> > pr_warning(" %s: %s\n", leader->name, buf);
> > cpu_map__snprint(evsel->core.cpus, buf, sizeof(buf));
> > pr_warning(" %s: %s\n", evsel->name, buf);
> > }
> >
> > - for_each_group_evsel(pos, leader)
> > - evsel__remove_from_group(pos, leader);
> > + evsel__remove_from_group(evsel, leader);
> > }
> > }
> >

2023-03-03 17:36:58

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v2 02/10] perf stat: Don't remove all grouped events when CPU maps disagree



On 2023-03-03 11:44 a.m., Ian Rogers wrote:
> On Fri, Mar 3, 2023 at 7:50 AM Liang, Kan <[email protected]> wrote:
>>
>>
>>
>> On 2023-03-02 4:25 p.m., Ian Rogers wrote:
>>> If the events in an evlist's CPU map differ then the entire group is
>>> removed. For example:
>>>
>>> ```
>>> $ perf stat -e '{imc_free_running/data_read/,imc_free_running/data_write/,cs}' -a sleep 1
>>> WARNING: grouped events cpus do not match, disabling group:
>>> anon group { imc_free_running/data_read/, imc_free_running/data_write/, cs }
>>> ```
>>>
>>> Change the behavior so that just the events not matching the leader
>>> are removed. So in the example above, just 'cs' will be removed.
>>>
>>> Modify the warning so that it is produced once for each group, rather
>>> than once for the entire evlist. Shrink the scope and size of the
>>> warning text buffer.
>>
>> For the uncore, we usually have to create a group for each uncore PMU.
>> The number of groups may be big. For example, on ICX, we have 40 CHA
>> PMUs. For SPR, there should be more CHAs. If we have something like
>> {cycles,uncore_cha/event=0x1/}, is the warning shown 40 times on ICX?
>> If so, it should be very annoying.
>>
>> Maybe it's better to keep the current behavior which only print a
>> warning once and notify the users that perf will re-group the events.
>> For the details, they can get it from the -v option.
>
> Thanks Kan, I could imagine that but I was also worried about cases
> where there are multiple groups like:
>
> ```
> $ perf stat -e '{imc_free_running/data_read/,cs},{uncore_clock/clockticks/,cs}'
> -a sleep 1
> WARNING: grouped events cpus do not match.
> Events with CPUs not matching the leader will be removed from the group.
> anon group { imc_free_running/data_read/, cs }
> WARNING: grouped events cpus do not match.
> Events with CPUs not matching the leader will be removed from the group.
> anon group { uncore_clock/clockticks/, cs }
>
> Performance counter stats for 'system wide':
>
> 1,255.75 MiB imc_free_running/data_read/
> 7,571 cs
> 1,327,285,527 uncore_clock/clockticks/
> 7,571 cs
>
> 1.002772882 seconds time elapsed
> ```
>
> Knowing that both groups were broken there feels like a value add.
> Given that this is a warning, and it can be fixed by moving the event
> out of the group or forcing the CPUs, I lean toward being
> informative/spammy as the spam is somewhat straightforwardly fixed on
> the command line.


I did some tests with the patch. The issue I was worried about didn't
occur. The change looks good to me.

But I found another issue. If I specify a CPU set, the group removal
fails. It's not an issue of this patch. It looks like the current perf
compares the user defined CPU set, rather than the PMU's cpumask.

./perf stat -e '{cycles,uncore_cha/event=0x1/}' -C0 sleep 1

Performance counter stats for 'CPU(s) 0':

<not counted> cycles
<not supported> uncore_cha/event=0x1/

1.001783936 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
The events in group usually have to be from the same PMU. Try
reorganizing the group.

Thanks,
Kan

>
> Thanks,
> Ian
>
>> Thanks,
>> Kan
>>>
>>> Signed-off-by: Ian Rogers <[email protected]>
>>> ---
>>> tools/perf/builtin-stat.c | 24 +++++++++++++++---------
>>> 1 file changed, 15 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>>> index d70b1ec88594..5c12ae5efce5 100644
>>> --- a/tools/perf/builtin-stat.c
>>> +++ b/tools/perf/builtin-stat.c
>>> @@ -181,14 +181,13 @@ static bool cpus_map_matched(struct evsel *a, struct evsel *b)
>>>
>>> static void evlist__check_cpu_maps(struct evlist *evlist)
>>> {
>>> - struct evsel *evsel, *pos, *leader;
>>> - char buf[1024];
>>> + struct evsel *evsel, *warned_leader = NULL;
>>>
>>> if (evlist__has_hybrid(evlist))
>>> evlist__warn_hybrid_group(evlist);
>>>
>>> evlist__for_each_entry(evlist, evsel) {
>>> - leader = evsel__leader(evsel);
>>> + struct evsel *leader = evsel__leader(evsel);
>>>
>>> /* Check that leader matches cpus with each member. */
>>> if (leader == evsel)
>>> @@ -197,19 +196,26 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
>>> continue;
>>>
>>> /* If there's mismatch disable the group and warn user. */
>>> - WARN_ONCE(1, "WARNING: grouped events cpus do not match, disabling group:\n");
>>> - evsel__group_desc(leader, buf, sizeof(buf));
>>> - pr_warning(" %s\n", buf);
>>> -
>>> + if (warned_leader != leader) {
>>> + char buf[200];
>>> +
>>> + pr_warning("WARNING: grouped events cpus do not match.\n"
>>> + "Events with CPUs not matching the leader will "
>>> + "be removed from the group.\n");
>>> + evsel__group_desc(leader, buf, sizeof(buf));
>>> + pr_warning(" %s\n", buf);
>>> + warned_leader = leader;
>>> + }
>>> if (verbose > 0) {
>>> + char buf[200];
>>> +
>>> cpu_map__snprint(leader->core.cpus, buf, sizeof(buf));
>>> pr_warning(" %s: %s\n", leader->name, buf);
>>> cpu_map__snprint(evsel->core.cpus, buf, sizeof(buf));
>>> pr_warning(" %s: %s\n", evsel->name, buf);
>>> }
>>>
>>> - for_each_group_evsel(pos, leader)
>>> - evsel__remove_from_group(pos, leader);
>>> + evsel__remove_from_group(evsel, leader);
>>> }
>>> }
>>>

2023-03-04 02:22:33

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v2 08/10] perf parse-events: Sort and group parsed events

On Thu, Mar 2, 2023 at 5:39 PM Ian Rogers <[email protected]> wrote:
>
> On Thu, Mar 2, 2023 at 4:37 PM Namhyung Kim <[email protected]> wrote:
> >
> > On Thu, Mar 2, 2023 at 1:26 PM Ian Rogers <[email protected]> wrote:
> > >
> > > This change is intended to be a no-op for most current cases, the
> > > default sort order is the order the events were parsed. Where it
> > > varies is in how groups are handled. Previously an uncore and core
> > > event that are grouped would most often cause the group to be removed:
> > >
> > > ```
> > > $ perf stat -e '{instructions,uncore_imc_free_running_0/data_total/}' -a sleep 1
> > > WARNING: grouped events cpus do not match, disabling group:
> > > anon group { instructions, uncore_imc_free_running_0/data_total/ }
> > > ...
> > > ```
> > >
> > > However, when wildcards are used the events should be re-sorted and
> > > re-grouped in parse_events__set_leader, but this currently fails for
> > > simple examples:
> > >
> > > ```
> > > $ perf stat -e '{uncore_imc_free_running/data_read/,uncore_imc_free_running/data_write/}' -a sleep 1
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > <not counted> MiB uncore_imc_free_running/data_read/
> > > <not counted> MiB uncore_imc_free_running/data_write/
> > >
> > > 1.000996992 seconds time elapsed
> > > ```
> > >
> > > A futher failure mode, fixed in this patch, is to force topdown events
> > > into a group.
> > >
> > > This change moves sorting the evsels in the evlist after parsing. It
> > > requires parsing to set up groups. First the evsels are sorted
> > > respecting the existing groupings and parse order, but also reordering
> > > to ensure evsels of the same PMU and group appear together. So that
> > > software and aux events respect groups, their pmu_name is taken from
> > > the group leader. The sorting is done with list_sort removing a memory
> > > allocation.
> > >
> > > After sorting a pass is done to correct the group leaders and for
> > > topdown events ensuring they have a group leader.
> > >
> > > This fixes the problems seen before:
> > >
> > > ```
> > > $ perf stat -e '{uncore_imc_free_running/data_read/,uncore_imc_free_running/data_write/}' -a sleep 1
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > 727.42 MiB uncore_imc_free_running/data_read/
> > > 81.84 MiB uncore_imc_free_running/data_write/
> > >
> > > 1.000948615 seconds time elapsed
> > > ```
> > >
> > > As well as making groups not fail for cases like:
> > >
> > > ```
> > > $ perf stat -e '{imc_free_running_0/data_total/,imc_free_running_1/data_total/}' -a sleep 1
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > 256.47 MiB imc_free_running_0/data_total/
> > > 256.48 MiB imc_free_running_1/data_total/
> >
> > I didn't expect we can group events from different PMUs.
> > Not sure if it can handle multiplexing properly..
>
> You are right, this example is now working as the sorting and
> regrouping breaks the events into two groups. The rules around
> grouping are complex and Arnaldo mentioned that maybe cases like this
> should be warned about. The problem then is that wildcard and metric
> expansion may naturally produce these cases and we don't want the
> warning. It is something of a shame that the grouping information in
> the perf stat output isn't clearer.

Oh, that means the events are not in a group in this case.
Yeah.. it can be somewhat confusing. It seems the wildcard
is a kind of exception. Then we can warn if there's no wildcard?

Thanks,
Namhyung

2023-03-05 08:32:54

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH v2 03/10] perf record: Early auxtrace initialization before event parsing

On 3/03/23 18:40, Liang, Kan wrote:
>
>
> On 2023-03-02 4:25 p.m., Ian Rogers wrote:
>> This allows event parsing to use the evsel__is_aux_event function,
>> which is important when determining event grouping.
>>
>> Signed-off-by: Ian Rogers <[email protected]>
>> ---
>> tools/perf/arch/x86/util/auxtrace.c | 17 +++++++++++++----
>> tools/perf/builtin-record.c | 6 ++++++
>> tools/perf/util/auxtrace.h | 2 ++
>> 3 files changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
>> index 3da506e13f49..de1e4842ea2e 100644
>> --- a/tools/perf/arch/x86/util/auxtrace.c
>> +++ b/tools/perf/arch/x86/util/auxtrace.c
>> @@ -15,6 +15,19 @@
>> #include "../../../util/intel-bts.h"
>> #include "../../../util/evlist.h"
>>
>> +void auxtrace__early_init(void)
>> +{
>> + struct perf_pmu *intel_pt_pmu;
>> + struct perf_pmu *intel_bts_pmu;
>> +
>> + intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
>> + if (intel_pt_pmu)
>> + intel_pt_pmu->auxtrace = true;
>> + intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
>> + if (intel_bts_pmu)
>> + intel_bts_pmu->auxtrace = true;
>> +}
>> +
>> static
>> struct auxtrace_record *auxtrace_record__init_intel(struct evlist *evlist,
>> int *err)
>> @@ -26,11 +39,7 @@ struct auxtrace_record *auxtrace_record__init_intel(struct evlist *evlist,
>> bool found_bts = false;
>>
>> intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
>> - if (intel_pt_pmu)
>> - intel_pt_pmu->auxtrace = true;
>> intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
>> - if (intel_bts_pmu)
>> - intel_bts_pmu->auxtrace = true;
>>
>> evlist__for_each_entry(evlist, evsel) {
>> if (intel_pt_pmu && evsel->core.attr.type == intel_pt_pmu->type)
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 8374117e66f6..a0870c076dc0 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -3940,6 +3940,10 @@ static int record__init_thread_masks(struct record *rec)
>> return ret;
>> }
>>
>> +__weak void auxtrace__early_init(void)
>> +{
>> +}
>> +
>> int cmd_record(int argc, const char **argv)
>> {
>> int err;
>> @@ -3985,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
>> if (err)
>> return err;
>>
>> + auxtrace__early_init();
>
> So the auxtrace__early_init() will be unconditionally invoked even there
> is no PT or BTS events, right?
>
> Maybe we should move the auxtrace__early_init() to evsel__is_aux_event()
> and cache the value. The initialization will only be invoked when it's
> required.

Although perf_pmu__find() will be called unconditionally via
record__auxtrace_init() anyway.

> Something as below (not tested.)
>
> +void auxtrace__init(void)
> +{
> + struct perf_pmu *intel_pt_pmu;
> + struct perf_pmu *intel_bts_pmu;
> + static bool cached;
> +
> + if (cached)
> + return;
> + intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
> + if (intel_pt_pmu)
> + intel_pt_pmu->auxtrace = true;
> + intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
> + if (intel_bts_pmu)
> + intel_bts_pmu->auxtrace = true;
> +}
>
> bool evsel__is_aux_event(struct evsel *evsel)
> {
> struct perf_pmu *pmu = evsel__find_pmu(evsel);
> + auxtrace__init();
> return pmu && pmu->auxtrace;
> }
>
>
>
> Thanks,
> Kan
>
>> +
>> argc = parse_options(argc, argv, record_options, record_usage,
>> PARSE_OPT_STOP_AT_NON_OPTION);
>> if (quiet)
>> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
>> index 29eb82dff574..49a86aa6ac94 100644
>> --- a/tools/perf/util/auxtrace.h
>> +++ b/tools/perf/util/auxtrace.h
>> @@ -457,6 +457,8 @@ struct addr_filters {
>>
>> struct auxtrace_cache;
>>
>> +void auxtrace__early_init(void);
>> +
>> #ifdef HAVE_AUXTRACE_SUPPORT
>>
>> u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm);


2023-03-06 09:32:11

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH v2 03/10] perf record: Early auxtrace initialization before event parsing

On 5/03/23 10:32, Adrian Hunter wrote:
> On 3/03/23 18:40, Liang, Kan wrote:
>>
>>
>> On 2023-03-02 4:25 p.m., Ian Rogers wrote:
>>> This allows event parsing to use the evsel__is_aux_event function,
>>> which is important when determining event grouping.
>>>
>>> Signed-off-by: Ian Rogers <[email protected]>
>>> ---
>>> tools/perf/arch/x86/util/auxtrace.c | 17 +++++++++++++----
>>> tools/perf/builtin-record.c | 6 ++++++
>>> tools/perf/util/auxtrace.h | 2 ++
>>> 3 files changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
>>> index 3da506e13f49..de1e4842ea2e 100644
>>> --- a/tools/perf/arch/x86/util/auxtrace.c
>>> +++ b/tools/perf/arch/x86/util/auxtrace.c
>>> @@ -15,6 +15,19 @@
>>> #include "../../../util/intel-bts.h"
>>> #include "../../../util/evlist.h"
>>>
>>> +void auxtrace__early_init(void)
>>> +{
>>> + struct perf_pmu *intel_pt_pmu;
>>> + struct perf_pmu *intel_bts_pmu;
>>> +
>>> + intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
>>> + if (intel_pt_pmu)
>>> + intel_pt_pmu->auxtrace = true;
>>> + intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
>>> + if (intel_bts_pmu)
>>> + intel_bts_pmu->auxtrace = true;
>>> +}
>>> +
>>> static
>>> struct auxtrace_record *auxtrace_record__init_intel(struct evlist *evlist,
>>> int *err)
>>> @@ -26,11 +39,7 @@ struct auxtrace_record *auxtrace_record__init_intel(struct evlist *evlist,
>>> bool found_bts = false;
>>>
>>> intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
>>> - if (intel_pt_pmu)
>>> - intel_pt_pmu->auxtrace = true;
>>> intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
>>> - if (intel_bts_pmu)
>>> - intel_bts_pmu->auxtrace = true;
>>>
>>> evlist__for_each_entry(evlist, evsel) {
>>> if (intel_pt_pmu && evsel->core.attr.type == intel_pt_pmu->type)
>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>>> index 8374117e66f6..a0870c076dc0 100644
>>> --- a/tools/perf/builtin-record.c
>>> +++ b/tools/perf/builtin-record.c
>>> @@ -3940,6 +3940,10 @@ static int record__init_thread_masks(struct record *rec)
>>> return ret;
>>> }
>>>
>>> +__weak void auxtrace__early_init(void)
>>> +{
>>> +}
>>> +
>>> int cmd_record(int argc, const char **argv)
>>> {
>>> int err;
>>> @@ -3985,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
>>> if (err)
>>> return err;
>>>
>>> + auxtrace__early_init();
>>
>> So the auxtrace__early_init() will be unconditionally invoked even there
>> is no PT or BTS events, right?
>>
>> Maybe we should move the auxtrace__early_init() to evsel__is_aux_event()
>> and cache the value. The initialization will only be invoked when it's
>> required.
>
> Although perf_pmu__find() will be called unconditionally via
> record__auxtrace_init() anyway.

However auxtrace__early_init() is before parsing 'verbose' so
debug prints don't work anymore.

How about this instead:

diff --git a/tools/perf/arch/x86/util/auxtrace.c
b/tools/perf/arch/x86/util/auxtrace.c
index 3da506e13f49d..330d03216b0e6 100644
--- a/tools/perf/arch/x86/util/auxtrace.c
+++ b/tools/perf/arch/x86/util/auxtrace.c
@@ -26,11 +26,7 @@ struct auxtrace_record
*auxtrace_record__init_intel(struct evlist *evlist,
bool found_bts = false;

intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
- if (intel_pt_pmu)
- intel_pt_pmu->auxtrace = true;
intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
- if (intel_bts_pmu)
- intel_bts_pmu->auxtrace = true;

evlist__for_each_entry(evlist, evsel) {
if (intel_pt_pmu && evsel->core.attr.type == intel_pt_pmu->type)
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
index 358340b342431..f73b80dcd8bdc 100644
--- a/tools/perf/arch/x86/util/pmu.c
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -27,10 +27,14 @@ static bool cached_list;
struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu
*pmu __maybe_unused)
{
#ifdef HAVE_AUXTRACE_SUPPORT
- if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
+ if (!strcmp(pmu->name, INTEL_PT_PMU_NAME)) {
+ pmu->auxtrace = true;
return intel_pt_pmu_default_config(pmu);
- if (!strcmp(pmu->name, INTEL_BTS_PMU_NAME))
+ }
+ if (!strcmp(pmu->name, INTEL_BTS_PMU_NAME)) {
+ pmu->auxtrace = true;
pmu->selectable = true;
+ }
#endif
return NULL;
}



>
>> Something as below (not tested.)
>>
>> +void auxtrace__init(void)
>> +{
>> + struct perf_pmu *intel_pt_pmu;
>> + struct perf_pmu *intel_bts_pmu;
>> + static bool cached;
>> +
>> + if (cached)
>> + return;
>> + intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
>> + if (intel_pt_pmu)
>> + intel_pt_pmu->auxtrace = true;
>> + intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
>> + if (intel_bts_pmu)
>> + intel_bts_pmu->auxtrace = true;
>> +}
>>
>> bool evsel__is_aux_event(struct evsel *evsel)
>> {
>> struct perf_pmu *pmu = evsel__find_pmu(evsel);
>> + auxtrace__init();
>> return pmu && pmu->auxtrace;
>> }
>>
>>
>>
>> Thanks,
>> Kan
>>
>>> +
>>> argc = parse_options(argc, argv, record_options, record_usage,
>>> PARSE_OPT_STOP_AT_NON_OPTION);
>>> if (quiet)
>>> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
>>> index 29eb82dff574..49a86aa6ac94 100644
>>> --- a/tools/perf/util/auxtrace.h
>>> +++ b/tools/perf/util/auxtrace.h
>>> @@ -457,6 +457,8 @@ struct addr_filters {
>>>
>>> struct auxtrace_cache;
>>>
>>> +void auxtrace__early_init(void);
>>> +
>>> #ifdef HAVE_AUXTRACE_SUPPORT
>>>
>>> u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm);
>


2023-03-06 14:14:06

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v2 03/10] perf record: Early auxtrace initialization before event parsing



On 2023-03-06 4:31 a.m., Adrian Hunter wrote:
> On 5/03/23 10:32, Adrian Hunter wrote:
>> On 3/03/23 18:40, Liang, Kan wrote:
>>>
>>>
>>> On 2023-03-02 4:25 p.m., Ian Rogers wrote:
>>>> This allows event parsing to use the evsel__is_aux_event function,
>>>> which is important when determining event grouping.
>>>>
>>>> Signed-off-by: Ian Rogers <[email protected]>
>>>> ---
>>>> tools/perf/arch/x86/util/auxtrace.c | 17 +++++++++++++----
>>>> tools/perf/builtin-record.c | 6 ++++++
>>>> tools/perf/util/auxtrace.h | 2 ++
>>>> 3 files changed, 21 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
>>>> index 3da506e13f49..de1e4842ea2e 100644
>>>> --- a/tools/perf/arch/x86/util/auxtrace.c
>>>> +++ b/tools/perf/arch/x86/util/auxtrace.c
>>>> @@ -15,6 +15,19 @@
>>>> #include "../../../util/intel-bts.h"
>>>> #include "../../../util/evlist.h"
>>>>
>>>> +void auxtrace__early_init(void)
>>>> +{
>>>> + struct perf_pmu *intel_pt_pmu;
>>>> + struct perf_pmu *intel_bts_pmu;
>>>> +
>>>> + intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
>>>> + if (intel_pt_pmu)
>>>> + intel_pt_pmu->auxtrace = true;
>>>> + intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
>>>> + if (intel_bts_pmu)
>>>> + intel_bts_pmu->auxtrace = true;
>>>> +}
>>>> +
>>>> static
>>>> struct auxtrace_record *auxtrace_record__init_intel(struct evlist *evlist,
>>>> int *err)
>>>> @@ -26,11 +39,7 @@ struct auxtrace_record *auxtrace_record__init_intel(struct evlist *evlist,
>>>> bool found_bts = false;
>>>>
>>>> intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
>>>> - if (intel_pt_pmu)
>>>> - intel_pt_pmu->auxtrace = true;
>>>> intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
>>>> - if (intel_bts_pmu)
>>>> - intel_bts_pmu->auxtrace = true;
>>>>
>>>> evlist__for_each_entry(evlist, evsel) {
>>>> if (intel_pt_pmu && evsel->core.attr.type == intel_pt_pmu->type)
>>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>>>> index 8374117e66f6..a0870c076dc0 100644
>>>> --- a/tools/perf/builtin-record.c
>>>> +++ b/tools/perf/builtin-record.c
>>>> @@ -3940,6 +3940,10 @@ static int record__init_thread_masks(struct record *rec)
>>>> return ret;
>>>> }
>>>>
>>>> +__weak void auxtrace__early_init(void)
>>>> +{
>>>> +}
>>>> +
>>>> int cmd_record(int argc, const char **argv)
>>>> {
>>>> int err;
>>>> @@ -3985,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
>>>> if (err)
>>>> return err;
>>>>
>>>> + auxtrace__early_init();
>>>
>>> So the auxtrace__early_init() will be unconditionally invoked even there
>>> is no PT or BTS events, right?
>>>
>>> Maybe we should move the auxtrace__early_init() to evsel__is_aux_event()
>>> and cache the value. The initialization will only be invoked when it's
>>> required.
>>
>> Although perf_pmu__find() will be called unconditionally via
>> record__auxtrace_init() anyway.
>
> However auxtrace__early_init() is before parsing 'verbose' so
> debug prints don't work anymore.
>
> How about this instead:

Yes, I think it should be a better place to initialize them.

Thanks,
Kan
>
> diff --git a/tools/perf/arch/x86/util/auxtrace.c
> b/tools/perf/arch/x86/util/auxtrace.c
> index 3da506e13f49d..330d03216b0e6 100644
> --- a/tools/perf/arch/x86/util/auxtrace.c
> +++ b/tools/perf/arch/x86/util/auxtrace.c
> @@ -26,11 +26,7 @@ struct auxtrace_record
> *auxtrace_record__init_intel(struct evlist *evlist,
> bool found_bts = false;
>
> intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
> - if (intel_pt_pmu)
> - intel_pt_pmu->auxtrace = true;
> intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
> - if (intel_bts_pmu)
> - intel_bts_pmu->auxtrace = true;
>
> evlist__for_each_entry(evlist, evsel) {
> if (intel_pt_pmu && evsel->core.attr.type == intel_pt_pmu->type)
> diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
> index 358340b342431..f73b80dcd8bdc 100644
> --- a/tools/perf/arch/x86/util/pmu.c
> +++ b/tools/perf/arch/x86/util/pmu.c
> @@ -27,10 +27,14 @@ static bool cached_list;
> struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu
> *pmu __maybe_unused)
> {
> #ifdef HAVE_AUXTRACE_SUPPORT
> - if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
> + if (!strcmp(pmu->name, INTEL_PT_PMU_NAME)) {
> + pmu->auxtrace = true;
> return intel_pt_pmu_default_config(pmu);
> - if (!strcmp(pmu->name, INTEL_BTS_PMU_NAME))
> + }
> + if (!strcmp(pmu->name, INTEL_BTS_PMU_NAME)) {
> + pmu->auxtrace = true;
> pmu->selectable = true;
> + }
> #endif
> return NULL;
> }
>
>
>
>>
>>> Something as below (not tested.)
>>>
>>> +void auxtrace__init(void)
>>> +{
>>> + struct perf_pmu *intel_pt_pmu;
>>> + struct perf_pmu *intel_bts_pmu;
>>> + static bool cached;
>>> +
>>> + if (cached)
>>> + return;
>>> + intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
>>> + if (intel_pt_pmu)
>>> + intel_pt_pmu->auxtrace = true;
>>> + intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
>>> + if (intel_bts_pmu)
>>> + intel_bts_pmu->auxtrace = true;
>>> +}
>>>
>>> bool evsel__is_aux_event(struct evsel *evsel)
>>> {
>>> struct perf_pmu *pmu = evsel__find_pmu(evsel);
>>> + auxtrace__init();
>>> return pmu && pmu->auxtrace;
>>> }
>>>
>>>
>>>
>>> Thanks,
>>> Kan
>>>
>>>> +
>>>> argc = parse_options(argc, argv, record_options, record_usage,
>>>> PARSE_OPT_STOP_AT_NON_OPTION);
>>>> if (quiet)
>>>> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
>>>> index 29eb82dff574..49a86aa6ac94 100644
>>>> --- a/tools/perf/util/auxtrace.h
>>>> +++ b/tools/perf/util/auxtrace.h
>>>> @@ -457,6 +457,8 @@ struct addr_filters {
>>>>
>>>> struct auxtrace_cache;
>>>>
>>>> +void auxtrace__early_init(void);
>>>> +
>>>> #ifdef HAVE_AUXTRACE_SUPPORT
>>>>
>>>> u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm);
>>
>