2018-03-03 03:48:52

by Agustin Vega-Frias

[permalink] [raw]
Subject: [RFC V2 0/3] perf stat: improvements for handling of multiple PMUs

This series of patches adds some simple improvements to the way perf stat
handles PMUs that have multiple instances by:

1. Adding glob-like matching in addition to the prefix-based matching
introduced previously (patch 1).
2. Adding the ability to recover the PMU names when printing the events
separately with the --no-merge option (patch 2).
3. Restoring auto-merge for events created by prefix or glob-like match
(patch 3). Note that this still keeps the behavior that disables
auto-merging of legacy symbolic events (e.g. cycles).

V2:

- Updated the documentation to explain prefix and glob matching of PMU
names, and event auto-merging.
- Added sample output to the third patch.

Agustin Vega-Frias (3):
perf, tools: Support wildcards on pmu name in dynamic pmu events
perf, tools: Display pmu name when printing unmerged events in stat
perf pmu: Auto-merge PMU events created by prefix or glob match

tools/perf/Documentation/perf-list.txt | 8 +++++++-
tools/perf/Documentation/perf-stat.txt | 16 ++++++++++++++++
tools/perf/builtin-stat.c | 29 ++++++++++++++++++++++++++++-
tools/perf/util/evsel.c | 1 +
tools/perf/util/evsel.h | 1 +
tools/perf/util/parse-events.c | 21 ++++++++++-----------
tools/perf/util/parse-events.h | 2 +-
tools/perf/util/parse-events.l | 2 +-
tools/perf/util/parse-events.y | 7 ++++---
9 files changed, 69 insertions(+), 18 deletions(-)

--
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.



2018-03-03 03:49:27

by Agustin Vega-Frias

[permalink] [raw]
Subject: [RFC V2 1/3] perf, tools: Support wildcards on pmu name in dynamic pmu events

Starting on v4.12 event parsing code for dynamic pmu events already
supports prefix-based matching of multiple pmus when creating dynamic
events. E.g., in a system with the following dynamic pmus:

mypmu_0
mypmu_1
mypmu_2
mypmu_4

passing mypmu/<config>/ as an event spec will result in the creation
of the event in all of the pmus. This change expands this matching
through the use of fnmatch so glob-like expressions can be used to
create events in multiple pmus. E.g., in the system described above
if a user only wants to create the event in mypmu_0 and mypmu_1,
mypmu_[01]/<config>/ can be passed.

Signed-off-by: Agustin Vega-Frias <[email protected]>
---
tools/perf/Documentation/perf-list.txt | 8 +++++++-
tools/perf/Documentation/perf-stat.txt | 12 ++++++++++++
tools/perf/util/parse-events.l | 2 +-
tools/perf/util/parse-events.y | 3 ++-
4 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
index e2a897a..2549c34 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -141,7 +141,13 @@ on the first memory controller on socket 0 of a Intel Xeon system

Each memory controller has its own PMU. Measuring the complete system
bandwidth would require specifying all imc PMUs (see perf list output),
-and adding the values together.
+and adding the values together. To simplify creation of multiple events,
+prefix and glob matching is supported in the PMU name, and the prefix
+'uncore_' is also ignored when performing the match. So the command above
+can be expanded to all memory controllers by using the syntaxes:
+
+ perf stat -C 0 -a imc/cas_count_read/,imc/cas_count_write/ -I 1000 ...
+ perf stat -C 0 -a *imc*/cas_count_read/,*imc*/cas_count_write/ -I 1000 ...

This example measures the combined core power every second

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 823fce7..49983a7 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -49,6 +49,12 @@ report::
parameters are defined by corresponding entries in
/sys/bus/event_source/devices/<pmu>/format/*

+ Note that the last two syntaxes support prefix and glob matching in
+ the PMU name to simplify creation of events accross multiple instances
+ of the same type of PMU (e.g. memory controller PMU) in large systems.
+ Multiple PMU instances are typical for uncore PMUs, so the prefix
+ 'uncore_' is also ignored when performing this match.
+
-i::
--no-inherit::
child tasks do not inherit counters
@@ -246,6 +252,12 @@ taskset.
--no-merge::
Do not merge results from same PMUs.

+When multiple events are created from a single event alias, stat will,
+by default, aggregate the event counts and show the result in a single
+row. This option disables that behavior and shows the individual events
+and counts. Aliases are listed immediately after the Kernel PMU events
+by perf list.
+
--smi-cost::
Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 655ecff..a1a01b1 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -175,7 +175,7 @@ bpf_source [^,{}]+\.c[a-zA-Z0-9._]*
num_dec [0-9]+
num_hex 0x[a-fA-F0-9]+
num_raw_hex [a-fA-F0-9]+
-name [a-zA-Z_*?][a-zA-Z0-9_*?.]*
+name [a-zA-Z_*?\[\]][a-zA-Z0-9_*?.\[\]]*
name_minus [a-zA-Z_*?][a-zA-Z0-9\-_*?.:]*
drv_cfg_term [a-zA-Z0-9_\.]+(=[a-zA-Z0-9_*?\.:]+)?
/* If you add a modifier you need to update check_modifier() */
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index e81a20e..c528469 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -8,6 +8,7 @@

#define YYDEBUG 1

+#include <fnmatch.h>
#include <linux/compiler.h>
#include <linux/list.h>
#include <linux/types.h>
@@ -241,7 +242,7 @@ PE_NAME opt_event_config
if (!strncmp(name, "uncore_", 7) &&
strncmp($1, "uncore_", 7))
name += 7;
- if (!strncmp($1, name, strlen($1))) {
+ if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
if (parse_events_copy_term_list(orig_terms, &terms))
YYABORT;
if (!parse_events_add_pmu(_parse_state, list, pmu->name, terms))
--
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.


2018-03-03 03:50:10

by Agustin Vega-Frias

[permalink] [raw]
Subject: [RFC V2 2/3] perf, tools: Display pmu name when printing unmerged events in stat

To simplify creation of events accross multiple instances of the same type
of PMU stat supports two methods for creating multiple events from a single
event specification:
1. A prefix or glob can be used in the PMU name.
2. Aliases, which are listed immediately after the Kernel PMU events
by perf list, are used.

When the --no-merge option is passed and these events are displayed
individually the PMU name is lost and it's not possible to see which
count corresponds to which pmu:

$ ./perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null

Performance counter stats for 'system wide':

67 l3cache/read-miss/
67 l3cache/read-miss/
63 l3cache/read-miss/
60 l3cache/read-miss/

0.001675706 seconds time elapsed

$ ./perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null

Performance counter stats for 'system wide':

12 l3cache_read_miss
17 l3cache_read_miss
10 l3cache_read_miss
8 l3cache_read_miss

0.001661305 seconds time elapsed

This change adds the original pmu name to the event. For dynamic pmu
events the pmu name is restored in the event name:

$ ./perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null

Performance counter stats for 'system wide':

63 l3cache_0_3/read-miss/
74 l3cache_0_1/read-miss/
64 l3cache_0_2/read-miss/
74 l3cache_0_0/read-miss/

0.001675706 seconds time elapsed

For alias events the name is added after the event name:

$ ./perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null

Performance counter stats for 'system wide':

10 l3cache_read_miss [l3cache_0_3]
12 l3cache_read_miss [l3cache_0_1]
10 l3cache_read_miss [l3cache_0_2]
17 l3cache_read_miss [l3cache_0_0]

0.001661305 seconds time elapsed

Signed-off-by: Agustin Vega-Frias <[email protected]>
---
tools/perf/builtin-stat.c | 29 ++++++++++++++++++++++++++++-
tools/perf/util/evsel.c | 1 +
tools/perf/util/evsel.h | 1 +
tools/perf/util/parse-events.c | 8 +++++++-
4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 98bf9d3..d196972 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1225,6 +1225,31 @@ static void aggr_update_shadow(void)
}
}

+static void uniquify_event_name(struct perf_evsel *counter)
+{
+ char *new_name;
+ char *config;
+
+ if (!counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
+ strlen(counter->pmu_name)))
+ return;
+
+ config = strchr(counter->name, '/');
+ if (config) {
+ if (asprintf(&new_name,
+ "%s%s", counter->pmu_name, config) > 0) {
+ free(counter->name);
+ counter->name = new_name;
+ }
+ } else {
+ if (asprintf(&new_name,
+ "%s [%s]", counter->name, counter->pmu_name) > 0) {
+ free(counter->name);
+ counter->name = new_name;
+ }
+ }
+}
+
static void collect_all_aliases(struct perf_evsel *counter,
void (*cb)(struct perf_evsel *counter, void *data,
bool first),
@@ -1253,7 +1278,9 @@ static bool collect_data(struct perf_evsel *counter,
if (counter->merged_stat)
return false;
cb(counter, data, true);
- if (!no_merge && counter->auto_merge_stats)
+ if (no_merge)
+ uniquify_event_name(counter);
+ else if (counter->auto_merge_stats)
collect_all_aliases(counter, cb, data);
return true;
}
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ef35168..4841000 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -244,6 +244,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
evsel->metric_name = NULL;
evsel->metric_events = NULL;
evsel->collect_stat = false;
+ evsel->pmu_name = NULL;
}

struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index a7487c6..c2ac16a 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -142,6 +142,7 @@ struct perf_evsel {
struct perf_evsel **metric_events;
bool collect_stat;
bool weak_group;
+ const char *pmu_name;
};

union u64_swap {
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 34589c4..bafc91e 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1247,7 +1247,12 @@ static int __parse_events_add_pmu(struct parse_events_state *parse_state,
if (!head_config) {
attr.type = pmu->type;
evsel = __add_event(list, &parse_state->idx, &attr, NULL, pmu, NULL, auto_merge_stats);
- return evsel ? 0 : -ENOMEM;
+ if (evsel) {
+ evsel->pmu_name = name;
+ return 0;
+ } else {
+ return -ENOMEM;
+ }
}

if (perf_pmu__check_alias(pmu, head_config, &info))
@@ -1276,6 +1281,7 @@ static int __parse_events_add_pmu(struct parse_events_state *parse_state,
evsel->snapshot = info.snapshot;
evsel->metric_expr = info.metric_expr;
evsel->metric_name = info.metric_name;
+ evsel->pmu_name = name;
}

return evsel ? 0 : -ENOMEM;
--
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.


2018-03-03 03:50:19

by Agustin Vega-Frias

[permalink] [raw]
Subject: [RFC V2 3/3] perf pmu: Auto-merge PMU events created by prefix or glob match

Auto-merge for these events was disabled when auto-merging of non-alias
events was disabled in commit 63ce844 (perf stat: Only auto-merge events
that are PMU aliases).

Non-merging of legacy events is preserved:

$ ./perf stat -ag -e cache-misses,cache-misses sleep 1

Performance counter stats for 'system wide':

86,323 cache-misses
86,323 cache-misses

1.002623307 seconds time elapsed

But prefix or glob matching auto-merges the events created:

$ ./perf stat -a -e l3cache/read-miss/ sleep 1

Performance counter stats for 'system wide':

328 l3cache/read-miss/

1.002627008 seconds time elapsed

$ ./perf stat -a -e l3cache_0_[01]/read-miss/ sleep 1

Performance counter stats for 'system wide':

172 l3cache/read-miss/

1.002627008 seconds time elapsed

As with events created with aliases, auto-merging can be suppressed with
the --no-merge option:

$ ./perf stat -a -e l3cache/read-miss/ --no-merge sleep 1

Performance counter stats for 'system wide':

67 l3cache/read-miss/
67 l3cache/read-miss/
63 l3cache/read-miss/
60 l3cache/read-miss/

1.002622192 seconds time elapsed

Signed-off-by: Agustin Vega-Frias <[email protected]>
---
tools/perf/Documentation/perf-stat.txt | 14 +++++++++-----
tools/perf/util/parse-events.c | 13 +++----------
tools/perf/util/parse-events.h | 2 +-
tools/perf/util/parse-events.y | 4 ++--
4 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 49983a7..ae406f7 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -252,11 +252,15 @@ taskset.
--no-merge::
Do not merge results from same PMUs.

-When multiple events are created from a single event alias, stat will,
-by default, aggregate the event counts and show the result in a single
-row. This option disables that behavior and shows the individual events
-and counts. Aliases are listed immediately after the Kernel PMU events
-by perf list.
+When multiple events are created from a single event specification,
+stat will, by default, aggregate the event counts and show the result
+in a single row. This option disables that behavior and shows
+the individual events and counts.
+
+Multiple events are created from a single event specification when:
+1. Prefix or glob matching is used for the PMU name.
+2. Aliases, which are listed immediately after the Kernel PMU events
+ by perf list, are used.

--smi-cost::
Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index bafc91e..4e80ca3 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1217,7 +1217,7 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,
get_config_name(head_config), &config_terms);
}

-static int __parse_events_add_pmu(struct parse_events_state *parse_state,
+int parse_events_add_pmu(struct parse_events_state *parse_state,
struct list_head *list, char *name,
struct list_head *head_config, bool auto_merge_stats)
{
@@ -1287,13 +1287,6 @@ static int __parse_events_add_pmu(struct parse_events_state *parse_state,
return evsel ? 0 : -ENOMEM;
}

-int parse_events_add_pmu(struct parse_events_state *parse_state,
- struct list_head *list, char *name,
- struct list_head *head_config)
-{
- return __parse_events_add_pmu(parse_state, list, name, head_config, false);
-}
-
int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
char *str, struct list_head **listp)
{
@@ -1323,8 +1316,8 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
return -1;
list_add_tail(&term->list, head);

- if (!__parse_events_add_pmu(parse_state, list,
- pmu->name, head, true)) {
+ if (!parse_events_add_pmu(parse_state, list,
+ pmu->name, head, true)) {
pr_debug("%s -> %s/%s/\n", str,
pmu->name, alias->str);
ok++;
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 88108cd..5015cfd 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -167,7 +167,7 @@ int parse_events_add_breakpoint(struct list_head *list, int *idx,
void *ptr, char *type, u64 len);
int parse_events_add_pmu(struct parse_events_state *parse_state,
struct list_head *list, char *name,
- struct list_head *head_config);
+ struct list_head *head_config, bool auto_merge_stats);

int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
char *str,
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index c528469..b51278f 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -232,7 +232,7 @@ PE_NAME opt_event_config
YYABORT;

ALLOC_LIST(list);
- if (parse_events_add_pmu(_parse_state, list, $1, $2)) {
+ if (parse_events_add_pmu(_parse_state, list, $1, $2, false)) {
struct perf_pmu *pmu = NULL;
int ok = 0;

@@ -245,7 +245,7 @@ PE_NAME opt_event_config
if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
if (parse_events_copy_term_list(orig_terms, &terms))
YYABORT;
- if (!parse_events_add_pmu(_parse_state, list, pmu->name, terms))
+ if (!parse_events_add_pmu(_parse_state, list, pmu->name, terms, true))
ok++;
parse_events_terms__delete(terms);
}
--
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.


2018-03-03 14:35:32

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC V2 1/3] perf, tools: Support wildcards on pmu name in dynamic pmu events

On Fri, Mar 02, 2018 at 06:41:30PM -0500, Agustin Vega-Frias wrote:

SNIP

>
> diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> index 655ecff..a1a01b1 100644
> --- a/tools/perf/util/parse-events.l
> +++ b/tools/perf/util/parse-events.l
> @@ -175,7 +175,7 @@ bpf_source [^,{}]+\.c[a-zA-Z0-9._]*
> num_dec [0-9]+
> num_hex 0x[a-fA-F0-9]+
> num_raw_hex [a-fA-F0-9]+
> -name [a-zA-Z_*?][a-zA-Z0-9_*?.]*
> +name [a-zA-Z_*?\[\]][a-zA-Z0-9_*?.\[\]]*
> name_minus [a-zA-Z_*?][a-zA-Z0-9\-_*?.:]*
> drv_cfg_term [a-zA-Z0-9_\.]+(=[a-zA-Z0-9_*?\.:]+)?
> /* If you add a modifier you need to update check_modifier() */
> diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> index e81a20e..c528469 100644
> --- a/tools/perf/util/parse-events.y
> +++ b/tools/perf/util/parse-events.y
> @@ -8,6 +8,7 @@
>
> #define YYDEBUG 1
>
> +#include <fnmatch.h>
> #include <linux/compiler.h>
> #include <linux/list.h>
> #include <linux/types.h>
> @@ -241,7 +242,7 @@ PE_NAME opt_event_config
> if (!strncmp(name, "uncore_", 7) &&
> strncmp($1, "uncore_", 7))
> name += 7;
> - if (!strncmp($1, name, strlen($1))) {
> + if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {

could we now get rid of the strncmp in here and keep the
glob matching only? I find it confusing now that following
commands give me same results:

- [root@krava perf]# ./perf stat -e 'cbox/clockticks/' --no-merge -a sleep 1

Performance counter stats for 'system wide':

<not supported> uncore_cbox_1/clockticks/
281,474,957,674,239 uncore_cbox_0/clockticks/

1.000958335 seconds time elapsed

- [root@krava perf]# ./perf stat -e '*cbox*/clockticks/' --no-merge -a sleep 1

Performance counter stats for 'system wide':

<not supported> uncore_cbox_1/clockticks/
5,427,337 uncore_cbox_0/clockticks/

1.000962724 seconds time elapsed

- [root@krava perf]# ./perf stat -e 'cbox*/clockticks/' --no-merge -a sleep 1

Performance counter stats for 'system wide':

<not supported> uncore_cbox_1/clockticks/
281,474,969,621,374 uncore_cbox_0/clockticks/

1.001026179 seconds time elapsed

and this one fails:

- [root@krava perf]# ./perf stat -e '*cbox/clockticks/' --no-merge -a sleep 1
event syntax error: '*cbox/clockticks/'
\___ Cannot find PMU `*cbox'. Missing kernel support?
Run 'perf list' for a list of valid events

Usage: perf stat [<options>] [<command>]

-e, --event <event> event selector. use 'perf list' to list available events


despite the fact that it makes as much sense as the previous one: perf stat -e 'cbox*/clockticks/'


I'd think let's keep just the glob matching, so it's clear
you what you use wildcards for.. thoughts?

thanks,
jirka

2018-03-04 17:36:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC V2 1/3] perf, tools: Support wildcards on pmu name in dynamic pmu events

> > +#include <fnmatch.h>
> > #include <linux/compiler.h>
> > #include <linux/list.h>
> > #include <linux/types.h>
> > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
> > if (!strncmp(name, "uncore_", 7) &&
> > strncmp($1, "uncore_", 7))
> > name += 7;
> > - if (!strncmp($1, name, strlen($1))) {
> > + if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
>
> could we now get rid of the strncmp in here and keep the
> glob matching only?

That would break existing command lines. Not a good idea.

-Andi


2018-03-04 18:32:34

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC V2 1/3] perf, tools: Support wildcards on pmu name in dynamic pmu events

On Sun, Mar 04, 2018 at 09:12:45AM -0800, Andi Kleen wrote:
> > > +#include <fnmatch.h>
> > > #include <linux/compiler.h>
> > > #include <linux/list.h>
> > > #include <linux/types.h>
> > > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
> > > if (!strncmp(name, "uncore_", 7) &&
> > > strncmp($1, "uncore_", 7))
> > > name += 7;
> > > - if (!strncmp($1, name, strlen($1))) {
> > > + if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
> >
> > could we now get rid of the strncmp in here and keep the
> > glob matching only?
>
> That would break existing command lines. Not a good idea.

I hoped that only you guys are using this and would rewrite your scripts ;-)

I had no idea there's fnmatch func before.. too bad, ok

jirka

2018-03-05 15:09:34

by Agustin Vega-Frias

[permalink] [raw]
Subject: Re: [RFC V2 1/3] perf, tools: Support wildcards on pmu name in dynamic pmu events

On 2018-03-04 13:10, Jiri Olsa wrote:
> On Sun, Mar 04, 2018 at 09:12:45AM -0800, Andi Kleen wrote:
>> > > +#include <fnmatch.h>
>> > > #include <linux/compiler.h>
>> > > #include <linux/list.h>
>> > > #include <linux/types.h>
>> > > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
>> > > if (!strncmp(name, "uncore_", 7) &&
>> > > strncmp($1, "uncore_", 7))
>> > > name += 7;
>> > > - if (!strncmp($1, name, strlen($1))) {
>> > > + if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
>> >
>> > could we now get rid of the strncmp in here and keep the
>> > glob matching only?
>>
>> That would break existing command lines. Not a good idea.
>
> I hoped that only you guys are using this and would rewrite your
> scripts ;-)
>
> I had no idea there's fnmatch func before.. too bad, ok
>
> jirka

An option to keep backward compatibility and consistency would be
to wrap the pattern/string passed in *'s, that way we can just use
fnmatch and have all the examples Jiri brought up work the same.
With that in place we can actually also drop the explicit ignoring
of the uncore_ prefix since the globbing would take care of that.

Thoughts?

Agustín

--
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
Linux Foundation Collaborative Project.

2018-03-05 17:57:55

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC V2 1/3] perf, tools: Support wildcards on pmu name in dynamic pmu events

Agustin Vega-Frias <[email protected]> writes:
>
> An option to keep backward compatibility and consistency would be
> to wrap the pattern/string passed in *'s, that way we can just use
> fnmatch and have all the examples Jiri brought up work the same.
> With that in place we can actually also drop the explicit ignoring
> of the uncore_ prefix since the globbing would take care of that.

Prepending with * would seem dangerous, could result in false
matches. But adding it at the end should be ok.

-Andi

2018-03-05 19:11:24

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC V2 1/3] perf, tools: Support wildcards on pmu name in dynamic pmu events

On Mon, Mar 05, 2018 at 10:08:18AM -0500, Agustin Vega-Frias wrote:
> On 2018-03-04 13:10, Jiri Olsa wrote:
> > On Sun, Mar 04, 2018 at 09:12:45AM -0800, Andi Kleen wrote:
> > > > > +#include <fnmatch.h>
> > > > > #include <linux/compiler.h>
> > > > > #include <linux/list.h>
> > > > > #include <linux/types.h>
> > > > > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
> > > > > if (!strncmp(name, "uncore_", 7) &&
> > > > > strncmp($1, "uncore_", 7))
> > > > > name += 7;
> > > > > - if (!strncmp($1, name, strlen($1))) {
> > > > > + if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
> > > >
> > > > could we now get rid of the strncmp in here and keep the
> > > > glob matching only?
> > >
> > > That would break existing command lines. Not a good idea.
> >
> > I hoped that only you guys are using this and would rewrite your scripts
> > ;-)
> >
> > I had no idea there's fnmatch func before.. too bad, ok
> >
> > jirka
>
> An option to keep backward compatibility and consistency would be
> to wrap the pattern/string passed in *'s, that way we can just use
> fnmatch and have all the examples Jiri brought up work the same.
> With that in place we can actually also drop the explicit ignoring
> of the uncore_ prefix since the globbing would take care of that.

I don't mind the strcmp as such, I wanted to get rid of the wildcard
matching without using '*' ... but as Andi said it's been out
there and it's been a while, so let's keep it

but if there's a way to make it simpler, let's go for it

thanks,
jirka

2018-03-05 20:12:13

by Agustin Vega-Frias

[permalink] [raw]
Subject: Re: [RFC V2 1/3] perf, tools: Support wildcards on pmu name in dynamic pmu events

On 2018-03-05 14:09, Jiri Olsa wrote:
> On Mon, Mar 05, 2018 at 10:08:18AM -0500, Agustin Vega-Frias wrote:
>> On 2018-03-04 13:10, Jiri Olsa wrote:
>> > On Sun, Mar 04, 2018 at 09:12:45AM -0800, Andi Kleen wrote:
>> > > > > +#include <fnmatch.h>
>> > > > > #include <linux/compiler.h>
>> > > > > #include <linux/list.h>
>> > > > > #include <linux/types.h>
>> > > > > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
>> > > > > if (!strncmp(name, "uncore_", 7) &&
>> > > > > strncmp($1, "uncore_", 7))
>> > > > > name += 7;
>> > > > > - if (!strncmp($1, name, strlen($1))) {
>> > > > > + if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
>> > > >
>> > > > could we now get rid of the strncmp in here and keep the
>> > > > glob matching only?
>> > >
>> > > That would break existing command lines. Not a good idea.
>> >
>> > I hoped that only you guys are using this and would rewrite your scripts
>> > ;-)
>> >
>> > I had no idea there's fnmatch func before.. too bad, ok
>> >
>> > jirka
>>
>> An option to keep backward compatibility and consistency would be
>> to wrap the pattern/string passed in *'s, that way we can just use
>> fnmatch and have all the examples Jiri brought up work the same.
>> With that in place we can actually also drop the explicit ignoring
>> of the uncore_ prefix since the globbing would take care of that.
>
> I don't mind the strcmp as such, I wanted to get rid of the wildcard
> matching without using '*' ... but as Andi said it's been out
> there and it's been a while, so let's keep it
>
> but if there's a way to make it simpler, let's go for it
>
> thanks,
> jirka

Sounds good. I have a new version ready (see sample output below).
But I wanted to ping about the other two patches before submitting.
Any feedback on those?

Thanks,
Agustín

PS:
Sample output:

$ ./perf stat -a -e imc/umask=0x3,event=0x4/ --no-merge ls -l >
/dev/null

Performance counter stats for 'system wide':

2,613 uncore_imc_0/umask=0x3,event=0x4/
2,736 uncore_imc_1/umask=0x3,event=0x4/
2,671 uncore_imc_2/umask=0x3,event=0x4/
2,508 uncore_imc_3/umask=0x3,event=0x4/
2,439 uncore_imc_4/umask=0x3,event=0x4/
2,465 uncore_imc_5/umask=0x3,event=0x4/

0.004159243 seconds time elapsed

$ ./perf stat -a -e *imc/umask=0x3,event=0x4/ --no-merge ls -l >
/dev/null

Performance counter stats for 'system wide':

2,704 uncore_imc_0/umask=0x3,event=0x4/
2,601 uncore_imc_1/umask=0x3,event=0x4/
2,625 uncore_imc_2/umask=0x3,event=0x4/
2,370 uncore_imc_3/umask=0x3,event=0x4/
2,485 uncore_imc_4/umask=0x3,event=0x4/
2,431 uncore_imc_5/umask=0x3,event=0x4/

0.002716763 seconds time elapsed

$ ./perf stat -a -e imc*/umask=0x3,event=0x4/ --no-merge ls -l >
/dev/null

Performance counter stats for 'system wide':

1,294 uncore_imc_0/umask=0x3,event=0x4/
1,303 uncore_imc_1/umask=0x3,event=0x4/
1,242 uncore_imc_2/umask=0x3,event=0x4/
1,125 uncore_imc_3/umask=0x3,event=0x4/
1,137 uncore_imc_4/umask=0x3,event=0x4/
1,159 uncore_imc_5/umask=0x3,event=0x4/

0.002790441 seconds time elapsed

$ ./perf stat -a -e *imc*/umask=0x3,event=0x4/ --no-merge ls -l >
/dev/null

Performance counter stats for 'system wide':

1,524 uncore_imc_0/umask=0x3,event=0x4/
1,508 uncore_imc_1/umask=0x3,event=0x4/
1,501 uncore_imc_2/umask=0x3,event=0x4/
1,405 uncore_imc_3/umask=0x3,event=0x4/
1,427 uncore_imc_4/umask=0x3,event=0x4/
1,450 uncore_imc_5/umask=0x3,event=0x4/

0.002720907 seconds time elapsed

--
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
Linux Foundation Collaborative Project.

2018-03-05 21:54:03

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC V2 1/3] perf, tools: Support wildcards on pmu name in dynamic pmu events

On Mon, Mar 05, 2018 at 03:10:43PM -0500, Agustin Vega-Frias wrote:
> On 2018-03-05 14:09, Jiri Olsa wrote:
> > On Mon, Mar 05, 2018 at 10:08:18AM -0500, Agustin Vega-Frias wrote:
> > > On 2018-03-04 13:10, Jiri Olsa wrote:
> > > > On Sun, Mar 04, 2018 at 09:12:45AM -0800, Andi Kleen wrote:
> > > > > > > +#include <fnmatch.h>
> > > > > > > #include <linux/compiler.h>
> > > > > > > #include <linux/list.h>
> > > > > > > #include <linux/types.h>
> > > > > > > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
> > > > > > > if (!strncmp(name, "uncore_", 7) &&
> > > > > > > strncmp($1, "uncore_", 7))
> > > > > > > name += 7;
> > > > > > > - if (!strncmp($1, name, strlen($1))) {
> > > > > > > + if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
> > > > > >
> > > > > > could we now get rid of the strncmp in here and keep the
> > > > > > glob matching only?
> > > > >
> > > > > That would break existing command lines. Not a good idea.
> > > >
> > > > I hoped that only you guys are using this and would rewrite your scripts
> > > > ;-)
> > > >
> > > > I had no idea there's fnmatch func before.. too bad, ok
> > > >
> > > > jirka
> > >
> > > An option to keep backward compatibility and consistency would be
> > > to wrap the pattern/string passed in *'s, that way we can just use
> > > fnmatch and have all the examples Jiri brought up work the same.
> > > With that in place we can actually also drop the explicit ignoring
> > > of the uncore_ prefix since the globbing would take care of that.
> >
> > I don't mind the strcmp as such, I wanted to get rid of the wildcard
> > matching without using '*' ... but as Andi said it's been out
> > there and it's been a while, so let's keep it
> >
> > but if there's a way to make it simpler, let's go for it
> >
> > thanks,
> > jirka
>
> Sounds good. I have a new version ready (see sample output below).
> But I wanted to ping about the other two patches before submitting.
> Any feedback on those?

the rest looks ok to me, so does the output below

thanks,
jirka

>
> Thanks,
> Agust?n
>
> PS:
> Sample output:
>
> $ ./perf stat -a -e imc/umask=0x3,event=0x4/ --no-merge ls -l > /dev/null
>
> Performance counter stats for 'system wide':
>
> 2,613 uncore_imc_0/umask=0x3,event=0x4/
> 2,736 uncore_imc_1/umask=0x3,event=0x4/
> 2,671 uncore_imc_2/umask=0x3,event=0x4/
> 2,508 uncore_imc_3/umask=0x3,event=0x4/
> 2,439 uncore_imc_4/umask=0x3,event=0x4/
> 2,465 uncore_imc_5/umask=0x3,event=0x4/
>
> 0.004159243 seconds time elapsed
>
> $ ./perf stat -a -e *imc/umask=0x3,event=0x4/ --no-merge ls -l > /dev/null
>
> Performance counter stats for 'system wide':
>
> 2,704 uncore_imc_0/umask=0x3,event=0x4/
> 2,601 uncore_imc_1/umask=0x3,event=0x4/
> 2,625 uncore_imc_2/umask=0x3,event=0x4/
> 2,370 uncore_imc_3/umask=0x3,event=0x4/
> 2,485 uncore_imc_4/umask=0x3,event=0x4/
> 2,431 uncore_imc_5/umask=0x3,event=0x4/
>
> 0.002716763 seconds time elapsed
>
> $ ./perf stat -a -e imc*/umask=0x3,event=0x4/ --no-merge ls -l > /dev/null
>
> Performance counter stats for 'system wide':
>
> 1,294 uncore_imc_0/umask=0x3,event=0x4/
> 1,303 uncore_imc_1/umask=0x3,event=0x4/
> 1,242 uncore_imc_2/umask=0x3,event=0x4/
> 1,125 uncore_imc_3/umask=0x3,event=0x4/
> 1,137 uncore_imc_4/umask=0x3,event=0x4/
> 1,159 uncore_imc_5/umask=0x3,event=0x4/
>
> 0.002790441 seconds time elapsed
>
> $ ./perf stat -a -e *imc*/umask=0x3,event=0x4/ --no-merge ls -l > /dev/null
>
> Performance counter stats for 'system wide':
>
> 1,524 uncore_imc_0/umask=0x3,event=0x4/
> 1,508 uncore_imc_1/umask=0x3,event=0x4/
> 1,501 uncore_imc_2/umask=0x3,event=0x4/
> 1,405 uncore_imc_3/umask=0x3,event=0x4/
> 1,427 uncore_imc_4/umask=0x3,event=0x4/
> 1,450 uncore_imc_5/umask=0x3,event=0x4/
>
> 0.002720907 seconds time elapsed
>
> --
> Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux
> Foundation Collaborative Project.