2022-08-31 18:15:38

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v2 0/7] Add core wide metric literal

It is possible to optimize metrics when all SMT threads (CPUs) on a
core are measuring events in system wide mode. For example, TMA
metrics [1] defines CORE_CLKS for Sandybrdige as:

if SMT is disabled:
CPU_CLK_UNHALTED.THREAD
if SMT is enabled and recording on all SMT threads (for all processes):
CPU_CLK_UNHALTED.THREAD_ANY / 2
if SMT is enabled and not recording on all SMT threads:
(CPU_CLK_UNHALTED.THREAD/2)*
(1+CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE/CPU_CLK_UNHALTED.REF_XCLK )

That is two more events are necessary when not gathering counts on all
SMT threads. To distinguish all SMT threads on a core vs system wide
(all CPUs) call the new property core wide.

As this literal requires the user requested CPUs and system wide to be
present, the parsing of metrics is delayed until after command line
option processing. As events are used to compute the evlist maps, and
metrics create events, the data for core wide must come from the target.

This patch series doesn't correct the Intel metrics to use #core_wide,
which will be done in follow up work. To see the two behaviors
currently you need an Intel CPU between Sandybridge and before
Icelake, then compare the events for tma_backend_bound_percent and
Backend_Bound_SMT where the former assumes recording on all SMT
threads and the latter assumes not recording on all SMT threads. The
future work will just have a single backend bound metric for both
cases determined using #core_wide.

[1] https://download.01.org/perfmon/TMA_Metrics.xlsx Note, #EBS_Mode
is false when recording on all SMT threads and all processes which is
#core_wide true in this change.

v2. Add error handling for ENOMEM to more strdup cases as suggested by
Arnaldo. Init the shadow stats if running "perf stat report",
issue caught by the stat shell test.

Ian Rogers (7):
perf metric: Return early if no CPU PMU table exists
perf expr: Move the scanner_ctx into the parse_ctx
perf smt: Compute SMT from topology
perf topology: Add core_wide
perf stat: Delay metric parsing
perf metrics: Wire up core_wide
perf test: Add basic core_wide expression test

tools/perf/builtin-stat.c | 66 +++++++++++++----
tools/perf/tests/expr.c | 37 +++++++---
tools/perf/util/cputopo.c | 61 +++++++++++++++
tools/perf/util/cputopo.h | 5 ++
tools/perf/util/expr.c | 29 +++++---
tools/perf/util/expr.h | 14 ++--
tools/perf/util/expr.l | 6 +-
tools/perf/util/metricgroup.c | 135 ++++++++++++++++++++++++----------
tools/perf/util/metricgroup.h | 4 +-
tools/perf/util/smt.c | 109 ++++++---------------------
tools/perf/util/smt.h | 12 ++-
tools/perf/util/stat-shadow.c | 13 +++-
tools/perf/util/stat.h | 2 +
13 files changed, 318 insertions(+), 175 deletions(-)

--
2.37.2.672.g94769d06f0-goog


2022-08-31 18:16:42

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v2 7/7] perf test: Add basic core_wide expression test

Add basic test for coverage, similar to #smt_on.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/expr.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
index db736ed49556..8bd719766814 100644
--- a/tools/perf/tests/expr.c
+++ b/tools/perf/tests/expr.c
@@ -158,6 +158,9 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
{
struct cpu_topology *topology = cpu_topology__new();
bool smton = smt_on(topology);
+ bool corewide = core_wide(/*system_wide=*/false,
+ /*user_requested_cpus=*/false,
+ topology);

cpu_topology__delete(topology);
expr__ctx_clear(ctx);
@@ -168,6 +171,16 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
TEST_ASSERT_VAL("find ids", hashmap__find(ctx->ids,
smton ? "EVENT1" : "EVENT2",
(void **)&val_ptr));
+
+ expr__ctx_clear(ctx);
+ TEST_ASSERT_VAL("find ids",
+ expr__find_ids("EVENT1 if #core_wide else EVENT2",
+ NULL, ctx) == 0);
+ TEST_ASSERT_VAL("find ids", hashmap__size(ctx->ids) == 1);
+ TEST_ASSERT_VAL("find ids", hashmap__find(ctx->ids,
+ corewide ? "EVENT1" : "EVENT2",
+ (void **)&val_ptr));
+
}
/* The expression is a constant 1.0 without needing to evaluate EVENT1. */
expr__ctx_clear(ctx);
--
2.37.2.672.g94769d06f0-goog