This is a completely new approach from V3 [1], although the metrics and
event descriptions are autogenerated, the topdown metrics have been
manually edited to use #no_stall_errata (now directly comparing on the
CPUID in v5). This removes the need to duplicate the whole set of JSONs
when only the topdown metrics are different between N2 and V2.
The CPU ID comparison function still needs to change so that the new
literal can compare on versions, but now no change is needed to mapfile
or the PMU event generation code because we still only support one
set of JSONs per CPU.
[1]: https://lore.kernel.org/lkml/[email protected]/
------
Changes since v5:
* Split patch 5 into one to add the new expression builtin and one to
update the Arm metric formulas
* Make _get_cpuid() still return an error if reading the file fails
* Improve the return value comment on strcmp_cpuid_str()
* Remove the "The events json file with the highest matching version
is used." comment because that was only true in v2
* Drop patch 1 because it was applied (I kept patch 4 even though it
was applied so that there is some context)
Changes since v4:
* Replace the #no_stall_errata literal with a more generic function
for comparing CPU IDs. This will hopefully keep configuration out
of the code and inside the JSONs
Changes since v3:
* Instead of duplicating all the metrics, add a new expression
literal that can be used to share the same metrics between N2 and V2
* Move tests to arch/arm64/tests
* Remove changes from jevents.py and mapfile.csv
Changes since v2:
* version -> variant in second commit message
* Add a bit more detail about version matching in the second commit
message
* Update the comments in pmu-events/arch/arm64/mapfile.csv to say that
variant and revision fields are now used
* Increase the CC list
Changes since v1:
* Split last change into two so it doesn't hit the mailing list size
limit
James Clark (6):
perf arm64: Allow version comparisons of CPU IDs
perf test: Add a test for the new Arm CPU ID comparison behavior
perf vendor events arm64: Update scale units and descriptions of
common topdown metrics
perf jevents: Add a new expression builtin strcmp_cpuid_str()
perf vendor events arm64: Update stall_slot workaround for N2 r0p3
perf vendor events arm64: Update N2 and V2 metrics and events using
Arm telemetry repo
tools/perf/arch/arm64/include/arch-tests.h | 3 +
tools/perf/arch/arm64/tests/Build | 1 +
tools/perf/arch/arm64/tests/arch-tests.c | 4 +
tools/perf/arch/arm64/tests/cpuid-match.c | 38 ++
tools/perf/arch/arm64/util/header.c | 67 +++-
tools/perf/arch/arm64/util/pmu.c | 18 +-
.../arch/arm64/arm/neoverse-n2-v2/branch.json | 8 -
.../arch/arm64/arm/neoverse-n2-v2/bus.json | 18 +-
.../arch/arm64/arm/neoverse-n2-v2/cache.json | 155 --------
.../arm64/arm/neoverse-n2-v2/exception.json | 45 ++-
.../arm/neoverse-n2-v2/fp_operation.json | 22 ++
.../arm64/arm/neoverse-n2-v2/general.json | 10 +
.../arm64/arm/neoverse-n2-v2/instruction.json | 143 -------
.../arm64/arm/neoverse-n2-v2/l1d_cache.json | 54 +++
.../arm64/arm/neoverse-n2-v2/l1i_cache.json | 14 +
.../arm64/arm/neoverse-n2-v2/l2_cache.json | 50 +++
.../arm64/arm/neoverse-n2-v2/l3_cache.json | 22 ++
.../arm64/arm/neoverse-n2-v2/ll_cache.json | 10 +
.../arch/arm64/arm/neoverse-n2-v2/memory.json | 39 +-
.../arm64/arm/neoverse-n2-v2/metrics.json | 365 ++++++++++--------
.../arm64/arm/neoverse-n2-v2/pipeline.json | 23 --
.../arm64/arm/neoverse-n2-v2/retired.json | 30 ++
.../arch/arm64/arm/neoverse-n2-v2/spe.json | 12 +-
.../arm/neoverse-n2-v2/spec_operation.json | 110 ++++++
.../arch/arm64/arm/neoverse-n2-v2/stall.json | 30 ++
.../arch/arm64/arm/neoverse-n2-v2/sve.json | 50 +++
.../arch/arm64/arm/neoverse-n2-v2/tlb.json | 66 ++++
.../arch/arm64/arm/neoverse-n2-v2/trace.json | 27 +-
tools/perf/pmu-events/arch/arm64/sbsa.json | 24 +-
tools/perf/pmu-events/metric.py | 17 +-
tools/perf/util/expr.c | 18 +
tools/perf/util/expr.h | 1 +
tools/perf/util/expr.l | 1 +
tools/perf/util/expr.y | 8 +-
tools/perf/util/pmu.c | 17 +
tools/perf/util/pmu.h | 1 +
36 files changed, 923 insertions(+), 598 deletions(-)
create mode 100644 tools/perf/arch/arm64/tests/cpuid-match.c
delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/branch.json
delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/fp_operation.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/general.json
delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/instruction.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/l1d_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/l1i_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/l2_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/l3_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/ll_cache.json
delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/pipeline.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/retired.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/spec_operation.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/stall.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/sve.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/tlb.json
--
2.34.1
Metrics will be published here [1] going forwards, but they have
slightly different scale units. To allow autogenerated metrics to be
added more easily, update the scale units to match.
The more detailed descriptions have also been taken and added to the
common file.
[1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/tree/main/data/pmu/cpu/
Acked-by: Ian Rogers <[email protected]>
Reviewed-by: John Garry <[email protected]>
Signed-off-by: James Clark <[email protected]>
---
tools/perf/pmu-events/arch/arm64/sbsa.json | 24 +++++++++++-----------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json
index f90b338261ac..4eed79a28f6e 100644
--- a/tools/perf/pmu-events/arch/arm64/sbsa.json
+++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
@@ -1,34 +1,34 @@
[
{
- "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
- "BriefDescription": "Frontend bound L1 topdown metric",
+ "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))",
+ "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",
"DefaultMetricgroupName": "TopdownL1",
"MetricGroup": "Default;TopdownL1",
"MetricName": "frontend_bound",
- "ScaleUnit": "100%"
+ "ScaleUnit": "1percent of slots"
},
{
- "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
- "BriefDescription": "Bad speculation L1 topdown metric",
+ "MetricExpr": "100 * ((1 - op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles)))",
+ "BriefDescription": "This metric is the percentage of total slots that executed operations and didn't retire due to a pipeline flush.\nThis indicates cycles that were utilized but inefficiently.",
"DefaultMetricgroupName": "TopdownL1",
"MetricGroup": "Default;TopdownL1",
"MetricName": "bad_speculation",
- "ScaleUnit": "100%"
+ "ScaleUnit": "1percent of slots"
},
{
- "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
- "BriefDescription": "Retiring L1 topdown metric",
+ "MetricExpr": "100 * ((op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles)))",
+ "BriefDescription": "This metric is the percentage of total slots that retired operations, which indicates cycles that were utilized efficiently.",
"DefaultMetricgroupName": "TopdownL1",
"MetricGroup": "Default;TopdownL1",
"MetricName": "retiring",
- "ScaleUnit": "100%"
+ "ScaleUnit": "1percent of slots"
},
{
- "MetricExpr": "stall_slot_backend / (#slots * cpu_cycles)",
- "BriefDescription": "Backend Bound L1 topdown metric",
+ "MetricExpr": "100 * (stall_slot_backend / (#slots * cpu_cycles))",
+ "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the backend of the processor.",
"DefaultMetricgroupName": "TopdownL1",
"MetricGroup": "Default;TopdownL1",
"MetricName": "backend_bound",
- "ScaleUnit": "100%"
+ "ScaleUnit": "1percent of slots"
}
]
--
2.34.1
Now that variant and revision fields are taken into account the behavior
is slightly more complicated so add a test to ensure that this behaves
as expected.
Reviewed-by: John Garry <[email protected]>
Signed-off-by: James Clark <[email protected]>
---
tools/perf/arch/arm64/include/arch-tests.h | 3 ++
tools/perf/arch/arm64/tests/Build | 1 +
tools/perf/arch/arm64/tests/arch-tests.c | 4 +++
tools/perf/arch/arm64/tests/cpuid-match.c | 38 ++++++++++++++++++++++
4 files changed, 46 insertions(+)
create mode 100644 tools/perf/arch/arm64/tests/cpuid-match.c
diff --git a/tools/perf/arch/arm64/include/arch-tests.h b/tools/perf/arch/arm64/include/arch-tests.h
index 452b3d904521..474d7cf5afbd 100644
--- a/tools/perf/arch/arm64/include/arch-tests.h
+++ b/tools/perf/arch/arm64/include/arch-tests.h
@@ -2,6 +2,9 @@
#ifndef ARCH_TESTS_H
#define ARCH_TESTS_H
+struct test_suite;
+
+int test__cpuid_match(struct test_suite *test, int subtest);
extern struct test_suite *arch_tests[];
#endif
diff --git a/tools/perf/arch/arm64/tests/Build b/tools/perf/arch/arm64/tests/Build
index a61c06bdb757..e337c09e7f56 100644
--- a/tools/perf/arch/arm64/tests/Build
+++ b/tools/perf/arch/arm64/tests/Build
@@ -2,3 +2,4 @@ perf-y += regs_load.o
perf-$(CONFIG_DWARF_UNWIND) += dwarf-unwind.o
perf-y += arch-tests.o
+perf-y += cpuid-match.o
diff --git a/tools/perf/arch/arm64/tests/arch-tests.c b/tools/perf/arch/arm64/tests/arch-tests.c
index ad16b4f8f63e..74932e72c727 100644
--- a/tools/perf/arch/arm64/tests/arch-tests.c
+++ b/tools/perf/arch/arm64/tests/arch-tests.c
@@ -3,9 +3,13 @@
#include "tests/tests.h"
#include "arch-tests.h"
+
+DEFINE_SUITE("arm64 CPUID matching", cpuid_match);
+
struct test_suite *arch_tests[] = {
#ifdef HAVE_DWARF_UNWIND_SUPPORT
&suite__dwarf_unwind,
#endif
+ &suite__cpuid_match,
NULL,
};
diff --git a/tools/perf/arch/arm64/tests/cpuid-match.c b/tools/perf/arch/arm64/tests/cpuid-match.c
new file mode 100644
index 000000000000..af0871b54ae7
--- /dev/null
+++ b/tools/perf/arch/arm64/tests/cpuid-match.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/compiler.h>
+
+#include "arch-tests.h"
+#include "tests/tests.h"
+#include "util/header.h"
+
+int test__cpuid_match(struct test_suite *test __maybe_unused,
+ int subtest __maybe_unused)
+{
+ /* midr with no leading zeros matches */
+ if (strcmp_cpuid_str("0x410fd0c0", "0x00000000410fd0c0"))
+ return -1;
+ /* Upper case matches */
+ if (strcmp_cpuid_str("0x410fd0c0", "0x00000000410FD0C0"))
+ return -1;
+ /* r0p0 = r0p0 matches */
+ if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000410fd480"))
+ return -1;
+ /* r0p1 > r0p0 matches */
+ if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000410fd481"))
+ return -1;
+ /* r1p0 > r0p0 matches*/
+ if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000411fd480"))
+ return -1;
+ /* r0p0 < r0p1 doesn't match */
+ if (!strcmp_cpuid_str("0x00000000410fd481", "0x00000000410fd480"))
+ return -1;
+ /* r0p0 < r1p0 doesn't match */
+ if (!strcmp_cpuid_str("0x00000000411fd480", "0x00000000410fd480"))
+ return -1;
+ /* Different CPU doesn't match */
+ if (!strcmp_cpuid_str("0x00000000410fd4c0", "0x00000000430f0af0"))
+ return -1;
+
+ return 0;
+}
+
--
2.34.1
On 16/08/2023 17:11, Arnaldo Carvalho de Melo wrote:
> Em Wed, Aug 16, 2023 at 12:47:44PM +0100, James Clark escreveu:
>> +++ b/tools/perf/arch/arm64/tests/cpuid-match.c
>> @@ -0,0 +1,38 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include <linux/compiler.h>
>> +
>> +#include "arch-tests.h"
>> +#include "tests/tests.h"
>> +#include "util/header.h"
>> +
>> +int test__cpuid_match(struct test_suite *test __maybe_unused,
>> + int subtest __maybe_unused)
>> +{
>> + /* midr with no leading zeros matches */
>> + if (strcmp_cpuid_str("0x410fd0c0", "0x00000000410fd0c0"))
>> + return -1;
>> + /* Upper case matches */
>> + if (strcmp_cpuid_str("0x410fd0c0", "0x00000000410FD0C0"))
>> + return -1;
>> + /* r0p0 = r0p0 matches */
>> + if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000410fd480"))
>> + return -1;
>> + /* r0p1 > r0p0 matches */
>> + if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000410fd481"))
>> + return -1;
>> + /* r1p0 > r0p0 matches*/
>> + if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000411fd480"))
>> + return -1;
>> + /* r0p0 < r0p1 doesn't match */
>> + if (!strcmp_cpuid_str("0x00000000410fd481", "0x00000000410fd480"))
>> + return -1;
>> + /* r0p0 < r1p0 doesn't match */
>> + if (!strcmp_cpuid_str("0x00000000411fd480", "0x00000000410fd480"))
>> + return -1;
>> + /* Different CPU doesn't match */
>> + if (!strcmp_cpuid_str("0x00000000410fd4c0", "0x00000000430f0af0"))
>> + return -1;
>> +
>> + return 0;
>> +}
>> +
>> --
>> 2.34.1
>>
> ⬢[acme@toolbox perf-tools-next]$ git am ./v6_20230816_james_clark_perf_vendor_events_arm64_update_n2_and_v2_metrics_and_events_using_arm_telem.mbx
> Applying: perf test: Add a test for the new Arm CPU ID comparison behavior
> .git/rebase-apply/patch:93: new blank line at EOF.
> +
> warning: 1 line adds whitespace errors.
> ⬢[acme@toolbox perf-tools-next]$
>
> I'm removing it
Interesting that checkpatch.pl doesn't see that. Thanks for the fix.
Em Wed, Aug 16, 2023 at 12:47:44PM +0100, James Clark escreveu:
> +++ b/tools/perf/arch/arm64/tests/cpuid-match.c
> @@ -0,0 +1,38 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/compiler.h>
> +
> +#include "arch-tests.h"
> +#include "tests/tests.h"
> +#include "util/header.h"
> +
> +int test__cpuid_match(struct test_suite *test __maybe_unused,
> + int subtest __maybe_unused)
> +{
> + /* midr with no leading zeros matches */
> + if (strcmp_cpuid_str("0x410fd0c0", "0x00000000410fd0c0"))
> + return -1;
> + /* Upper case matches */
> + if (strcmp_cpuid_str("0x410fd0c0", "0x00000000410FD0C0"))
> + return -1;
> + /* r0p0 = r0p0 matches */
> + if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000410fd480"))
> + return -1;
> + /* r0p1 > r0p0 matches */
> + if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000410fd481"))
> + return -1;
> + /* r1p0 > r0p0 matches*/
> + if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000411fd480"))
> + return -1;
> + /* r0p0 < r0p1 doesn't match */
> + if (!strcmp_cpuid_str("0x00000000410fd481", "0x00000000410fd480"))
> + return -1;
> + /* r0p0 < r1p0 doesn't match */
> + if (!strcmp_cpuid_str("0x00000000411fd480", "0x00000000410fd480"))
> + return -1;
> + /* Different CPU doesn't match */
> + if (!strcmp_cpuid_str("0x00000000410fd4c0", "0x00000000430f0af0"))
> + return -1;
> +
> + return 0;
> +}
> +
> --
> 2.34.1
>
⬢[acme@toolbox perf-tools-next]$ git am ./v6_20230816_james_clark_perf_vendor_events_arm64_update_n2_and_v2_metrics_and_events_using_arm_telem.mbx
Applying: perf test: Add a test for the new Arm CPU ID comparison behavior
.git/rebase-apply/patch:93: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
⬢[acme@toolbox perf-tools-next]$
I'm removing it
Currently variant and revision fields are masked out of the MIDR so
it's not possible to compare different versions of the same CPU.
In a later commit a workaround will be removed just for N2 r0p3, so
enable comparisons on version.
This has the side effect of changing the MIDR stored in the header of
the perf.data file to no longer have masked version fields. It also
affects the lookups in mapfile.csv, but as that currently only has
zeroed version fields, it has no actual effect. The mapfile.csv
documentation also states to zero the version fields, so unless this
isn't done it will continue to have no effect.
There is an existing weak default strcmp_cpuid_str() function, and an
x86 version. This adds another version for arm64.
Signed-off-by: James Clark <[email protected]>
---
tools/perf/arch/arm64/util/header.c | 67 ++++++++++++++++++++++-------
1 file changed, 52 insertions(+), 15 deletions(-)
diff --git a/tools/perf/arch/arm64/util/header.c b/tools/perf/arch/arm64/util/header.c
index 80b9f6287fe2..a2eef9ec5491 100644
--- a/tools/perf/arch/arm64/util/header.c
+++ b/tools/perf/arch/arm64/util/header.c
@@ -1,3 +1,6 @@
+#include <linux/kernel.h>
+#include <linux/bits.h>
+#include <linux/bitfield.h>
#include <stdio.h>
#include <stdlib.h>
#include <perf/cpumap.h>
@@ -10,15 +13,14 @@
#define MIDR "/regs/identification/midr_el1"
#define MIDR_SIZE 19
-#define MIDR_REVISION_MASK 0xf
-#define MIDR_VARIANT_SHIFT 20
-#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_REVISION_MASK GENMASK(3, 0)
+#define MIDR_VARIANT_MASK GENMASK(23, 20)
static int _get_cpuid(char *buf, size_t sz, struct perf_cpu_map *cpus)
{
const char *sysfs = sysfs__mountpoint();
- u64 midr = 0;
int cpu;
+ int ret = EINVAL;
if (!sysfs || sz < MIDR_SIZE)
return EINVAL;
@@ -44,22 +46,13 @@ static int _get_cpuid(char *buf, size_t sz, struct perf_cpu_map *cpus)
}
fclose(file);
- /* Ignore/clear Variant[23:20] and
- * Revision[3:0] of MIDR
- */
- midr = strtoul(buf, NULL, 16);
- midr &= (~(MIDR_VARIANT_MASK | MIDR_REVISION_MASK));
- scnprintf(buf, MIDR_SIZE, "0x%016lx", midr);
/* got midr break loop */
+ ret = 0;
break;
}
perf_cpu_map__put(cpus);
-
- if (!midr)
- return EINVAL;
-
- return 0;
+ return ret;
}
int get_cpuid(char *buf, size_t sz)
@@ -99,3 +92,47 @@ char *get_cpuid_str(struct perf_pmu *pmu)
return buf;
}
+
+/*
+ * Return 0 if idstr is a higher or equal to version of the same part as
+ * mapcpuid. Therefore, if mapcpuid has 0 for revision and variant then any
+ * version of idstr will match as long as it's the same CPU type.
+ *
+ * Return 1 if the CPU type is different or the version of idstr is lower.
+ */
+int strcmp_cpuid_str(const char *mapcpuid, const char *idstr)
+{
+ u64 map_id = strtoull(mapcpuid, NULL, 16);
+ char map_id_variant = FIELD_GET(MIDR_VARIANT_MASK, map_id);
+ char map_id_revision = FIELD_GET(MIDR_REVISION_MASK, map_id);
+ u64 id = strtoull(idstr, NULL, 16);
+ char id_variant = FIELD_GET(MIDR_VARIANT_MASK, id);
+ char id_revision = FIELD_GET(MIDR_REVISION_MASK, id);
+ u64 id_fields = ~(MIDR_VARIANT_MASK | MIDR_REVISION_MASK);
+
+ /* Compare without version first */
+ if ((map_id & id_fields) != (id & id_fields))
+ return 1;
+
+ /*
+ * ID matches, now compare version.
+ *
+ * Arm revisions (like r0p0) are compared here like two digit semver
+ * values eg. 1.3 < 2.0 < 2.1 < 2.2.
+ *
+ * r = high value = 'Variant' field in MIDR
+ * p = low value = 'Revision' field in MIDR
+ *
+ */
+ if (id_variant > map_id_variant)
+ return 0;
+
+ if (id_variant == map_id_variant && id_revision >= map_id_revision)
+ return 0;
+
+ /*
+ * variant is less than mapfile variant or variants are the same but
+ * the revision doesn't match. Return no match.
+ */
+ return 1;
+}
--
2.34.1