2024-06-11 05:55:57

by Wang, Weilin

[permalink] [raw]
Subject: [RFC PATCH v12 0/8] TPEBS counting mode support

From: Weilin Wang <[email protected]>

Changes in v12:
- Update MTL metric JSON file to include E-Core TMA3.6 changes.
- Update comments and code for better code quality. Keep tpebs_start() and
tpebs_delete() at evsel level for now and add comments on these functions with
more details about potential future changes. Remove tpebs_stop() call from
tpebs_set_evsel(). Simplify the tpebs_start() and tpebs_stop()/tpebs_delete()
interfaces. Also fixed the bugs on not freed "new" pointer and the incorrect
variable value assignment to val instead of counter->val.

Changes in v11:
- Make retire_latency evsels not opened for counting. This works correctly now
with the code Namhyung suggested that adjusting group read size when
retire_latency evsels included in the group.
- Update retire_latency value assignment using rint() to reduce precision loss
while keeping code generic.
- Fix the build error caused by not used variable in the test.

Other changes in v10:
- Change perf record fork from perf stat to evsel. All the major operations
like tpebs start, stop, read_evsel should directly work through evsel.
- Make intel-tpebs x86_64 only. This change is cross-compiled to arm64.
- Put tpebs code to intel-tepbs and simplify intel-tpebs APIs to minimum number
of functions and variables. Update funtion name and variable names to use
consistent prefix. Also improve error handling.
- Integrate code patch from Ian for the :R parser.
- Update MTL metrics to TMA 4.8.

V9: https://lore.kernel.org/all/[email protected]/

Changes in v9:
- Update the retire_latency result print and metric calculation method. Plugin
the value to evsel so that no special code is required.
- Update --control:fifo to use pipe instead of named pipe.
- Add test for TPEBS counting mode.
- Update Document with more details.

Changes in v8:
- In this revision, the code is updated to base on Ian's patch on R modifier
parser https://lore.kernel.org/lkml/[email protected]/
After this change, there is no special code required for R modifier in
metricgroup.c and metricgroup.h files.

Caveat of this change:
Ideally, we will need to add special handling to skip counting events with R
modifier in evsel. Currently, this is not implemented so the event with :R will
be both counted and sampled. Usually, in a metric formula that uses retire_latency,
it would already require to count the event. As a result, we will endup count the
same event twice. This should be able to be handled properly when we finalize our
design on evsel R modifier support.

- Move TPEBS specific code out from main perf stat code to separate files in
util/intel-tpebs.c and util/intel-tpebs.h. [Namhyung]
- Use --control:fifo to ack perf stat from forked perf record instead of sleep(2) [Namhyung]
- Add introductions about TPEBS and R modifier in Documents. [Namhyung]


Changes in v7:
- Update code and comments for better code quality [Namhyung]
- Add a separate commit for perf data [Namhyung]
- Update retire latency print function to improve alignment [Namhyung]

Changes in v6:
- Update code and add comments for better code quality [Namhyung]
- Remove the added fd var and directly pass the opened fd to data.file.fd [Namhyung]
- Add kill() to stop perf record when perf stat exists early [Namhyung]
- Add command opt check to ensure only start perf record when -a/-C given [Namhyung]
- Squash commits [Namhyung]

Changes in v5:
- Update code and add comments for better code quality [Ian]

Changes in v4:
- Remove uncessary debug print and update code and comments for better
readability and quality [Namhyung]
- Update mtl metric json file with consistent TmaL1 and TopdownL1 metricgroup

Changes in v3:
- Remove ':' when event name has '@' [Ian]
- Use 'R' as the modifier instead of "retire_latency" [Ian]

Changes in v2:
- Add MTL metric file
- Add more descriptions and example to the patch [Arnaldo]

Here is an example of running perf stat to collect a metric that uses
retire_latency value of event MEM_INST_RETIRED.STLB_HIT_STORES on a MTL system.

In this simple example, there is no MEM_INST_RETIRED.STLB_HIT_STORES sample.
Therefore, the MEM_INST_RETIRED.STLB_HIT_STORES:p count and retire_latency value
are all 0.

./perf stat -M tma_dtlb_store -a -- sleep 1

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]

Performance counter stats for 'system wide':

181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 % tma_dtlb_store
3,195,608 cpu_core/topdown-retiring/
40,156,649 cpu_core/topdown-mem-bound/
3,550,925 cpu_core/topdown-bad-spec/
117,571,818 cpu_core/topdown-fe-bound/
57,118,087 cpu_core/topdown-be-bound/
69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
0.00 MEM_INST_RETIRED.STLB_HIT_STORES:p 0 0

1.003105924 seconds time elapsed

v1:
TPEBS is one of the features provided by the next generation of Intel PMU.
Please refer to Section 8.4.1 of "Intel® Architecture Instruction Set Extensions
Programming Reference" [1] for more details about this feature.

This set of patches supports TPEBS in counting mode. The code works in the
following way: it forks a perf record process from perf stat when retire_latency
of one or more events are used in a metric formula. Perf stat would send a
SIGTERM signal to perf record before it needs the retire latency value for
metric calculation. Perf stat will then process sample data to extract the
retire latency data for metric calculations. Currently, the code uses the
arithmetic average of retire latency values.

[1] https://www.intel.com/content/www/us/en/content-details/812218/intel-architecture-instruction-set-extensions-programming-reference.html?wapkw=future%20features



Ian Rogers (1):
perf parse-events: Add a retirement latency modifier

Weilin Wang (7):
perf data: Allow to use given fd in data->file.fd
perf stat: Fork and launch perf record when perf stat needs to get
retire latency value for a metric.
perf stat: Plugin retire_lat value from sampled data to evsel
perf vendor events intel: Add MTL metric json files
perf stat: Add command line option for enabling tpebs recording
perf Document: Add TPEBS to Documents
perf test: Add test for Intel TPEBS counting mode

tools/perf/Documentation/perf-list.txt | 1 +
tools/perf/Documentation/topdown.txt | 30 +
tools/perf/arch/x86/util/evlist.c | 6 +
tools/perf/builtin-stat.c | 6 +
.../arch/x86/meteorlake/metricgroups.json | 142 +
.../arch/x86/meteorlake/mtl-metrics.json | 2535 +++++++++++++++++
.../perf/tests/shell/test_stat_intel_tpebs.sh | 18 +
tools/perf/util/Build | 1 +
tools/perf/util/data.c | 7 +-
tools/perf/util/evlist.c | 2 +
tools/perf/util/evsel.c | 81 +-
tools/perf/util/evsel.h | 6 +
tools/perf/util/intel-tpebs.c | 400 +++
tools/perf/util/intel-tpebs.h | 35 +
tools/perf/util/parse-events.c | 2 +
tools/perf/util/parse-events.h | 1 +
tools/perf/util/parse-events.l | 3 +-
17 files changed, 3272 insertions(+), 4 deletions(-)
create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/metricgroups.json
create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/mtl-metrics.json
create mode 100755 tools/perf/tests/shell/test_stat_intel_tpebs.sh
create mode 100644 tools/perf/util/intel-tpebs.c
create mode 100644 tools/perf/util/intel-tpebs.h

--
2.43.0



2024-06-11 05:56:00

by Wang, Weilin

[permalink] [raw]
Subject: [RFC PATCH v12 1/8] perf parse-events: Add a retirement latency modifier

From: Ian Rogers <[email protected]>

Retirement latency is a separate sampled count used on newer Intel
CPUs.

Signed-off-by: Ian Rogers <[email protected]>
Signed-off-by: Weilin Wang <[email protected]>
---
tools/perf/util/evsel.h | 1 +
tools/perf/util/parse-events.c | 2 ++
tools/perf/util/parse-events.h | 1 +
tools/perf/util/parse-events.l | 3 ++-
4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 43f6fd1dcb4d..bd8e84954e34 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -98,6 +98,7 @@ struct evsel {
bool bpf_counter;
bool use_config_name;
bool skippable;
+ bool retire_lat;
int bpf_fd;
struct bpf_object *bpf_obj;
struct list_head config_terms;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 0f308b4db2b9..9c2a76ec8c99 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1818,6 +1818,8 @@ static int parse_events__modifier_list(struct parse_events_state *parse_state,
evsel->weak_group = true;
if (mod.bpf)
evsel->bpf_counter = true;
+ if (mod.retire_lat)
+ evsel->retire_lat = true;
}
return 0;
}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 5695308efab9..eb94d1247dae 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -201,6 +201,7 @@ struct parse_events_modifier {
bool hypervisor : 1; /* 'h' */
bool guest : 1; /* 'G' */
bool host : 1; /* 'H' */
+ bool retire_lat : 1; /* 'R' */
};

int parse_events__modifier_event(struct parse_events_state *parse_state, void *loc,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 08ea2d845dc3..85015f080240 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -209,6 +209,7 @@ static int modifiers(struct parse_events_state *parse_state, yyscan_t scanner)
CASE('W', weak);
CASE('e', exclusive);
CASE('b', bpf);
+ CASE('R', retire_lat);
default:
return PE_ERROR;
}
@@ -250,7 +251,7 @@ drv_cfg_term [a-zA-Z0-9_\.]+(=[a-zA-Z0-9_*?\.:]+)?
* If you add a modifier you need to update check_modifier().
* Also, the letters in modifier_event must not be in modifier_bp.
*/
-modifier_event [ukhpPGHSDIWeb]{1,15}
+modifier_event [ukhpPGHSDIWebR]{1,16}
modifier_bp [rwx]{1,3}
lc_type (L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node)
lc_op_result (load|loads|read|store|stores|write|prefetch|prefetches|speculative-read|speculative-load|refs|Reference|ops|access|misses|miss)
--
2.43.0


2024-06-11 06:03:58

by Wang, Weilin

[permalink] [raw]
Subject: [RFC PATCH v12 6/8] perf stat: Add command line option for enabling tpebs recording

From: Weilin Wang <[email protected]>

With this command line option, tpebs recording is turned off in perf stat on
default. It will only be turned on when this option is given in perf stat
command.

Exampe with --enable-tpebs-recording:

perf stat -M tma_dtlb_store -a --enable-tpebs-recording sleep 1

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]

Performance counter stats for 'system wide':

181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 % tma_dtlb_store
3,195,608 cpu_core/topdown-retiring/
40,156,649 cpu_core/topdown-mem-bound/
3,550,925 cpu_core/topdown-bad-spec/
117,571,818 cpu_core/topdown-fe-bound/
57,118,087 cpu_core/topdown-be-bound/
69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
0 MEM_INST_RETIRED.STLB_HIT_STORES:R

1.003105924 seconds time elapsed

Exampe without --enable-tpebs-recording:

perf stat -M tma_dtlb_store -a sleep 1

Performance counter stats for 'system wide':

181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 % tma_dtlb_store
3,195,608 cpu_core/topdown-retiring/
40,156,649 cpu_core/topdown-mem-bound/
3,550,925 cpu_core/topdown-bad-spec/
117,571,818 cpu_core/topdown-fe-bound/
57,118,087 cpu_core/topdown-be-bound/
69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
0 MEM_INST_RETIRED.STLB_HIT_STORES:R

1.003105924 seconds time elapsed

Signed-off-by: Weilin Wang <[email protected]>
---
tools/perf/builtin-stat.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 14be132f7b6f..055139e44763 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1235,6 +1235,8 @@ static struct option stat_options[] = {
"disable adding events for the metric threshold calculation"),
OPT_BOOLEAN(0, "topdown", &topdown_run,
"measure top-down statistics"),
+ OPT_BOOLEAN(0, "enable-tpebs-recording", &tpebs_recording,
+ "enable recording for tpebs when retire_latency required"),
OPT_UINTEGER(0, "td-level", &stat_config.topdown_level,
"Set the metrics level for the top-down statistics (0: max level)"),
OPT_BOOLEAN(0, "smi-cost", &smi_cost,
--
2.43.0


2024-06-11 16:59:50

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v12 6/8] perf stat: Add command line option for enabling tpebs recording

Hello,

On Mon, Jun 10, 2024 at 10:24 PM <[email protected]> wrote:
>
> From: Weilin Wang <[email protected]>
>
> With this command line option, tpebs recording is turned off in perf stat on
> default. It will only be turned on when this option is given in perf stat
> command.
>
> Exampe with --enable-tpebs-recording:
>
> perf stat -M tma_dtlb_store -a --enable-tpebs-recording sleep 1
>
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.000 MB - ]
>
> Performance counter stats for 'system wide':
>
> 181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 % tma_dtlb_store
> 3,195,608 cpu_core/topdown-retiring/
> 40,156,649 cpu_core/topdown-mem-bound/
> 3,550,925 cpu_core/topdown-bad-spec/
> 117,571,818 cpu_core/topdown-fe-bound/
> 57,118,087 cpu_core/topdown-be-bound/
> 69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
> 4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
> 30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
> 30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
> 168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
> 0 MEM_INST_RETIRED.STLB_HIT_STORES:R

Here I guess we can expect a non-zero value, right?

>
> 1.003105924 seconds time elapsed
>
> Exampe without --enable-tpebs-recording:
>
> perf stat -M tma_dtlb_store -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 % tma_dtlb_store
> 3,195,608 cpu_core/topdown-retiring/
> 40,156,649 cpu_core/topdown-mem-bound/
> 3,550,925 cpu_core/topdown-bad-spec/
> 117,571,818 cpu_core/topdown-fe-bound/
> 57,118,087 cpu_core/topdown-be-bound/
> 69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
> 4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
> 30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
> 30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
> 168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
> 0 MEM_INST_RETIRED.STLB_HIT_STORES:R
>
> 1.003105924 seconds time elapsed
>
> Signed-off-by: Weilin Wang <[email protected]>
> ---
> tools/perf/builtin-stat.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 14be132f7b6f..055139e44763 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1235,6 +1235,8 @@ static struct option stat_options[] = {

What's the base of your patchset? I think we moved this to cmd_stat().
Please make sure to rebase on the perf-tools-next.


> "disable adding events for the metric threshold calculation"),
> OPT_BOOLEAN(0, "topdown", &topdown_run,
> "measure top-down statistics"),
> + OPT_BOOLEAN(0, "enable-tpebs-recording", &tpebs_recording,
> + "enable recording for tpebs when retire_latency required"),

Please update the documentation (man page) for this option.

Also I'm afraid it'd fail to build on non-x86 because tpebes_recording
is defined in intel-tpebs.c.

Thanks,
Namhyung


> OPT_UINTEGER(0, "td-level", &stat_config.topdown_level,
> "Set the metrics level for the top-down statistics (0: max level)"),
> OPT_BOOLEAN(0, "smi-cost", &smi_cost,
> --
> 2.43.0
>

2024-06-11 17:57:21

by Wang, Weilin

[permalink] [raw]
Subject: RE: [RFC PATCH v12 6/8] perf stat: Add command line option for enabling tpebs recording



> -----Original Message-----
> From: Namhyung Kim <[email protected]>
> Sent: Tuesday, June 11, 2024 9:50 AM
> To: Wang, Weilin <[email protected]>
> Cc: Ian Rogers <[email protected]>; Arnaldo Carvalho de Melo
> <[email protected]>; Peter Zijlstra <[email protected]>; Ingo Molnar
> <[email protected]>; Alexander Shishkin
> <[email protected]>; Jiri Olsa <[email protected]>; Hunter,
> Adrian <[email protected]>; Kan Liang <[email protected]>;
> [email protected]; [email protected]; Taylor, Perry
> <[email protected]>; Alt, Samantha <[email protected]>; Biggers,
> Caleb <[email protected]>
> Subject: Re: [RFC PATCH v12 6/8] perf stat: Add command line option for
> enabling tpebs recording
>
> Hello,
>
> On Mon, Jun 10, 2024 at 10:24 PM <[email protected]> wrote:
> >
> > From: Weilin Wang <[email protected]>
> >
> > With this command line option, tpebs recording is turned off in perf stat on
> > default. It will only be turned on when this option is given in perf stat
> > command.
> >
> > Exampe with --enable-tpebs-recording:
> >
> > perf stat -M tma_dtlb_store -a --enable-tpebs-recording sleep 1
> >
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.000 MB - ]
> >
> > Performance counter stats for 'system wide':
> >
> > 181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 %
> tma_dtlb_store
> > 3,195,608 cpu_core/topdown-retiring/
> > 40,156,649 cpu_core/topdown-mem-bound/
> > 3,550,925 cpu_core/topdown-bad-spec/
> > 117,571,818 cpu_core/topdown-fe-bound/
> > 57,118,087 cpu_core/topdown-be-bound/
> > 69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
> > 4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
> > 30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
> > 30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
> > 168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
> > 0 MEM_INST_RETIRED.STLB_HIT_STORES:R
>
> Here I guess we can expect a non-zero value, right?

Hi Namhyung,

Do you mean when we have the option, we should expect non-zero value?
During the execution, it's possible that we don't capture any sample on this
event. In this case, we would have a zero value. In the future, I think we will
give it the default value instead of zero.

>
> >
> > 1.003105924 seconds time elapsed
> >
> > Exampe without --enable-tpebs-recording:
> >
> > perf stat -M tma_dtlb_store -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 %
> tma_dtlb_store
> > 3,195,608 cpu_core/topdown-retiring/
> > 40,156,649 cpu_core/topdown-mem-bound/
> > 3,550,925 cpu_core/topdown-bad-spec/
> > 117,571,818 cpu_core/topdown-fe-bound/
> > 57,118,087 cpu_core/topdown-be-bound/
> > 69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
> > 4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
> > 30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
> > 30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
> > 168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
> > 0 MEM_INST_RETIRED.STLB_HIT_STORES:R
> >
> > 1.003105924 seconds time elapsed
> >
> > Signed-off-by: Weilin Wang <[email protected]>
> > ---
> > tools/perf/builtin-stat.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 14be132f7b6f..055139e44763 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -1235,6 +1235,8 @@ static struct option stat_options[] = {
>
> What's the base of your patchset? I think we moved this to cmd_stat().
> Please make sure to rebase on the perf-tools-next.

This was rebased on top of Ian's change for the tool event and retire_latency parser
patches. I think that was on tmp.perf-tools.next. I will rebase on perf-tools-next.

>
>
> > "disable adding events for the metric threshold calculation"),
> > OPT_BOOLEAN(0, "topdown", &topdown_run,
> > "measure top-down statistics"),
> > + OPT_BOOLEAN(0, "enable-tpebs-recording", &tpebs_recording,
> > + "enable recording for tpebs when retire_latency required"),
>
> Please update the documentation (man page) for this option.
>
> Also I'm afraid it'd fail to build on non-x86 because tpebes_recording
> is defined in intel-tpebs.c.
>

Yes, you are right. I forgot about non-x86. I will update this and also the
Documentation.

Thanks,
Weilin

> Thanks,
> Namhyung
>
>
> > OPT_UINTEGER(0, "td-level", &stat_config.topdown_level,
> > "Set the metrics level for the top-down statistics (0: max level)"),
> > OPT_BOOLEAN(0, "smi-cost", &smi_cost,
> > --
> > 2.43.0
> >

2024-06-11 19:00:31

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v12 6/8] perf stat: Add command line option for enabling tpebs recording

On Tue, Jun 11, 2024 at 10:49 AM Wang, Weilin <[email protected]> wrote:
>
>
>
> > -----Original Message-----
> > From: Namhyung Kim <[email protected]>
> > Sent: Tuesday, June 11, 2024 9:50 AM
> > To: Wang, Weilin <[email protected]>
> > Cc: Ian Rogers <[email protected]>; Arnaldo Carvalho de Melo
> > <[email protected]>; Peter Zijlstra <[email protected]>; Ingo Molnar
> > <[email protected]>; Alexander Shishkin
> > <[email protected]>; Jiri Olsa <[email protected]>; Hunter,
> > Adrian <[email protected]>; Kan Liang <[email protected]>;
> > [email protected]; [email protected]; Taylor, Perry
> > <[email protected]>; Alt, Samantha <[email protected]>; Biggers,
> > Caleb <[email protected]>
> > Subject: Re: [RFC PATCH v12 6/8] perf stat: Add command line option for
> > enabling tpebs recording
> >
> > Hello,
> >
> > On Mon, Jun 10, 2024 at 10:24 PM <[email protected]> wrote:
> > >
> > > From: Weilin Wang <[email protected]>
> > >
> > > With this command line option, tpebs recording is turned off in perf stat on
> > > default. It will only be turned on when this option is given in perf stat
> > > command.
> > >
> > > Exampe with --enable-tpebs-recording:
> > >
> > > perf stat -M tma_dtlb_store -a --enable-tpebs-recording sleep 1
> > >
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 0.000 MB - ]
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > 181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 %
> > tma_dtlb_store
> > > 3,195,608 cpu_core/topdown-retiring/
> > > 40,156,649 cpu_core/topdown-mem-bound/
> > > 3,550,925 cpu_core/topdown-bad-spec/
> > > 117,571,818 cpu_core/topdown-fe-bound/
> > > 57,118,087 cpu_core/topdown-be-bound/
> > > 69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
> > > 4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
> > > 30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
> > > 30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
> > > 168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
> > > 0 MEM_INST_RETIRED.STLB_HIT_STORES:R
> >
> > Here I guess we can expect a non-zero value, right?
>
> Hi Namhyung,
>
> Do you mean when we have the option, we should expect non-zero value?
> During the execution, it's possible that we don't capture any sample on this
> event. In this case, we would have a zero value. In the future, I think we will
> give it the default value instead of zero.

I mean there's a chance you get non-zero, but it can be zero sometimes, right?
Then I think it's better to choose non-zero in this example otherwise it's not
clear what's the difference of using this option or not when you look at the
output in this commit message (they are both 0).

Thanks,
Namhyung

>
> >
> > >
> > > 1.003105924 seconds time elapsed
> > >
> > > Exampe without --enable-tpebs-recording:
> > >
> > > perf stat -M tma_dtlb_store -a sleep 1
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > 181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 %
> > tma_dtlb_store
> > > 3,195,608 cpu_core/topdown-retiring/
> > > 40,156,649 cpu_core/topdown-mem-bound/
> > > 3,550,925 cpu_core/topdown-bad-spec/
> > > 117,571,818 cpu_core/topdown-fe-bound/
> > > 57,118,087 cpu_core/topdown-be-bound/
> > > 69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
> > > 4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
> > > 30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
> > > 30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
> > > 168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
> > > 0 MEM_INST_RETIRED.STLB_HIT_STORES:R
> > >
> > > 1.003105924 seconds time elapsed
> > >
> > > Signed-off-by: Weilin Wang <[email protected]>
> > > ---
> > > tools/perf/builtin-stat.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > > index 14be132f7b6f..055139e44763 100644
> > > --- a/tools/perf/builtin-stat.c
> > > +++ b/tools/perf/builtin-stat.c
> > > @@ -1235,6 +1235,8 @@ static struct option stat_options[] = {
> >
> > What's the base of your patchset? I think we moved this to cmd_stat().
> > Please make sure to rebase on the perf-tools-next.
>
> This was rebased on top of Ian's change for the tool event and retire_latency parser
> patches. I think that was on tmp.perf-tools.next. I will rebase on perf-tools-next.
>
> >
> >
> > > "disable adding events for the metric threshold calculation"),
> > > OPT_BOOLEAN(0, "topdown", &topdown_run,
> > > "measure top-down statistics"),
> > > + OPT_BOOLEAN(0, "enable-tpebs-recording", &tpebs_recording,
> > > + "enable recording for tpebs when retire_latency required"),
> >
> > Please update the documentation (man page) for this option.
> >
> > Also I'm afraid it'd fail to build on non-x86 because tpebes_recording
> > is defined in intel-tpebs.c.
> >
>
> Yes, you are right. I forgot about non-x86. I will update this and also the
> Documentation.
>
> Thanks,
> Weilin
>
> > Thanks,
> > Namhyung
> >
> >
> > > OPT_UINTEGER(0, "td-level", &stat_config.topdown_level,
> > > "Set the metrics level for the top-down statistics (0: max level)"),
> > > OPT_BOOLEAN(0, "smi-cost", &smi_cost,
> > > --
> > > 2.43.0
> > >

2024-06-11 20:08:36

by Wang, Weilin

[permalink] [raw]
Subject: RE: [RFC PATCH v12 6/8] perf stat: Add command line option for enabling tpebs recording



> -----Original Message-----
> From: Namhyung Kim <[email protected]>
> Sent: Tuesday, June 11, 2024 11:49 AM
> To: Wang, Weilin <[email protected]>
> Cc: Ian Rogers <[email protected]>; Arnaldo Carvalho de Melo
> <[email protected]>; Peter Zijlstra <[email protected]>; Ingo Molnar
> <[email protected]>; Alexander Shishkin
> <[email protected]>; Jiri Olsa <[email protected]>; Hunter,
> Adrian <[email protected]>; Kan Liang <[email protected]>;
> [email protected]; [email protected]; Taylor, Perry
> <[email protected]>; Alt, Samantha <[email protected]>; Biggers,
> Caleb <[email protected]>
> Subject: Re: [RFC PATCH v12 6/8] perf stat: Add command line option for
> enabling tpebs recording
>
> On Tue, Jun 11, 2024 at 10:49 AM Wang, Weilin <[email protected]>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Namhyung Kim <[email protected]>
> > > Sent: Tuesday, June 11, 2024 9:50 AM
> > > To: Wang, Weilin <[email protected]>
> > > Cc: Ian Rogers <[email protected]>; Arnaldo Carvalho de Melo
> > > <[email protected]>; Peter Zijlstra <[email protected]>; Ingo Molnar
> > > <[email protected]>; Alexander Shishkin
> > > <[email protected]>; Jiri Olsa <[email protected]>; Hunter,
> > > Adrian <[email protected]>; Kan Liang <[email protected]>;
> > > [email protected]; [email protected]; Taylor,
> Perry
> > > <[email protected]>; Alt, Samantha <[email protected]>;
> Biggers,
> > > Caleb <[email protected]>
> > > Subject: Re: [RFC PATCH v12 6/8] perf stat: Add command line option for
> > > enabling tpebs recording
> > >
> > > Hello,
> > >
> > > On Mon, Jun 10, 2024 at 10:24 PM <[email protected]> wrote:
> > > >
> > > > From: Weilin Wang <[email protected]>
> > > >
> > > > With this command line option, tpebs recording is turned off in perf stat
> on
> > > > default. It will only be turned on when this option is given in perf stat
> > > > command.
> > > >
> > > > Exampe with --enable-tpebs-recording:
> > > >
> > > > perf stat -M tma_dtlb_store -a --enable-tpebs-recording sleep 1
> > > >
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 0.000 MB - ]
> > > >
> > > > Performance counter stats for 'system wide':
> > > >
> > > > 181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 %
> > > tma_dtlb_store
> > > > 3,195,608 cpu_core/topdown-retiring/
> > > > 40,156,649 cpu_core/topdown-mem-bound/
> > > > 3,550,925 cpu_core/topdown-bad-spec/
> > > > 117,571,818 cpu_core/topdown-fe-bound/
> > > > 57,118,087 cpu_core/topdown-be-bound/
> > > > 69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
> > > > 4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
> > > > 30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
> > > > 30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
> > > > 168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
> > > > 0 MEM_INST_RETIRED.STLB_HIT_STORES:R
> > >
> > > Here I guess we can expect a non-zero value, right?
> >
> > Hi Namhyung,
> >
> > Do you mean when we have the option, we should expect non-zero value?
> > During the execution, it's possible that we don't capture any sample on this
> > event. In this case, we would have a zero value. In the future, I think we will
> > give it the default value instead of zero.
>
> I mean there's a chance you get non-zero, but it can be zero sometimes, right?
> Then I think it's better to choose non-zero in this example otherwise it's not
> clear what's the difference of using this option or not when you look at the
> output in this commit message (they are both 0).

I see. I will create a non-zero example and update.

Thanks,
Weilin
>
> Thanks,
> Namhyung
>
> >
> > >
> > > >
> > > > 1.003105924 seconds time elapsed
> > > >
> > > > Exampe without --enable-tpebs-recording:
> > > >
> > > > perf stat -M tma_dtlb_store -a sleep 1
> > > >
> > > > Performance counter stats for 'system wide':
> > > >
> > > > 181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 %
> > > tma_dtlb_store
> > > > 3,195,608 cpu_core/topdown-retiring/
> > > > 40,156,649 cpu_core/topdown-mem-bound/
> > > > 3,550,925 cpu_core/topdown-bad-spec/
> > > > 117,571,818 cpu_core/topdown-fe-bound/
> > > > 57,118,087 cpu_core/topdown-be-bound/
> > > > 69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
> > > > 4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
> > > > 30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
> > > > 30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
> > > > 168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
> > > > 0 MEM_INST_RETIRED.STLB_HIT_STORES:R
> > > >
> > > > 1.003105924 seconds time elapsed
> > > >
> > > > Signed-off-by: Weilin Wang <[email protected]>
> > > > ---
> > > > tools/perf/builtin-stat.c | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > > > index 14be132f7b6f..055139e44763 100644
> > > > --- a/tools/perf/builtin-stat.c
> > > > +++ b/tools/perf/builtin-stat.c
> > > > @@ -1235,6 +1235,8 @@ static struct option stat_options[] = {
> > >
> > > What's the base of your patchset? I think we moved this to cmd_stat().
> > > Please make sure to rebase on the perf-tools-next.
> >
> > This was rebased on top of Ian's change for the tool event and retire_latency
> parser
> > patches. I think that was on tmp.perf-tools.next. I will rebase on perf-tools-
> next.
> >
> > >
> > >
> > > > "disable adding events for the metric threshold calculation"),
> > > > OPT_BOOLEAN(0, "topdown", &topdown_run,
> > > > "measure top-down statistics"),
> > > > + OPT_BOOLEAN(0, "enable-tpebs-recording", &tpebs_recording,
> > > > + "enable recording for tpebs when retire_latency required"),
> > >
> > > Please update the documentation (man page) for this option.
> > >
> > > Also I'm afraid it'd fail to build on non-x86 because tpebes_recording
> > > is defined in intel-tpebs.c.
> > >
> >
> > Yes, you are right. I forgot about non-x86. I will update this and also the
> > Documentation.
> >
> > Thanks,
> > Weilin
> >
> > > Thanks,
> > > Namhyung
> > >
> > >
> > > > OPT_UINTEGER(0, "td-level", &stat_config.topdown_level,
> > > > "Set the metrics level for the top-down statistics (0: max
> level)"),
> > > > OPT_BOOLEAN(0, "smi-cost", &smi_cost,
> > > > --
> > > > 2.43.0
> > > >