2022-09-28 10:34:56

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 00/15] perf mem/c2c: Add support for AMD

Perf mem and c2c tools are wrappers around perf record with mem load/
store events. IBS tagged load/store sample provides most of the
information needed for these tools. Enable support for these tools on
AMD Zen processors based on IBS Op pmu.

There are some limitations though: Only load/store micro-ops provide
mem/c2c information. Whereas, IBS does not have a way to choose a
particular type of micro-op to tag. This results in many non-LS
micro-ops being tagged which appear as N/A in the perf report. IBS,
being an uncore pmu from kernel point of view[1], does not support per
process monitoring. Thus, perf mem/c2c on AMD are currently supported
in per-cpu mode only.

Example:
$ sudo ./perf mem record -- -c 10000
^C[ perf record: Woken up 227 times to write data ]
[ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

$ sudo ./perf mem report -F mem,sample,snoop
Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
Memory access Samples Snoop
N/A 700620 N/A
L1 hit 126675 N/A
L2 hit 424 N/A
L3 hit 664 HitM
L3 hit 10 N/A
Local RAM hit 2 N/A
Remote RAM (1 hop) hit 8558 N/A
Remote Cache (1 hop) hit 3 N/A
Remote Cache (1 hop) hit 2 HitM
Remote Cache (2 hops) hit 10 HitM
Remote Cache (2 hops) hit 6 N/A
Uncached hit 4 N/A

Prepared on queue/perf/core (cce6a2d7e0e49).

v2: https://lore.kernel.org/all/[email protected]
v2->v3:
- Use sample_flags instead of __PERF_SAMPLE_*_EARLY varients
- Make PERF_SAMPLE_WEIGHT independent of PERF_SAMPLE_DATA_SRC
- Add a patch to reverse sync PERF_MEM_SNOOPX_PEER from tools
to kernel uapi header
- Add Acked-by: Jiri Olsa for tool side unchanged patches

Also, a recent patch[2] to test perf mem fails on AMD because of
aforementioned limitations.

[1]: https://lore.kernel.org/lkml/[email protected]
[2]: https://lore.kernel.org/lkml/20220924133408.1125903-1-leo.yan%40linaro.org


Ravi Bangoria (15):
perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
perf/x86/amd: Support PERF_SAMPLE_ADDR
perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
perf tool: Sync include/uapi/linux/perf_event.h header
perf tool: Sync arch/x86/include/asm/amd-ibs.h header
perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
perf mem/c2c: Add load store event mappings for AMD
perf mem/c2c: Avoid printing empty lines for unsupported events
perf mem: Use more generic term for LFB
perf script: Add missing fields in usage hint

arch/x86/events/amd/ibs.c | 345 ++++++++++++++++++++++-
arch/x86/include/asm/amd-ibs.h | 16 ++
include/uapi/linux/perf_event.h | 6 +-
kernel/events/core.c | 3 +-
tools/arch/x86/include/asm/amd-ibs.h | 16 ++
tools/include/uapi/linux/perf_event.h | 4 +-
tools/perf/Documentation/perf-c2c.txt | 14 +-
tools/perf/Documentation/perf-mem.txt | 3 +-
tools/perf/Documentation/perf-record.txt | 1 +
tools/perf/arch/x86/util/mem-events.c | 31 +-
tools/perf/builtin-c2c.c | 1 +
tools/perf/builtin-mem.c | 1 +
tools/perf/builtin-script.c | 7 +-
tools/perf/util/mem-events.c | 17 +-
14 files changed, 438 insertions(+), 27 deletions(-)

--
2.31.1


2022-09-28 10:35:16

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 08/15] perf tool: Sync include/uapi/linux/perf_event.h header

Two new fields for mem_lvl_num has been introduced: PERF_MEM_LVLNUM_IO
and PERF_MEM_LVLNUM_EXTN_MEM which are required to support perf mem/c2c
on AMD platform.

Signed-off-by: Ravi Bangoria <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 581ed4bdc062..9b65fc7d2377 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1295,7 +1295,9 @@ union perf_mem_data_src {
#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */
#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */
#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */
-/* 5-0xa available */
+/* 5-0x8 available */
+#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
+#define PERF_MEM_LVLNUM_IO 0x0a /* I/O */
#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB */
#define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */
--
2.31.1

2022-09-28 10:49:06

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 09/15] perf tool: Sync arch/x86/include/asm/amd-ibs.h header

Although new details added into this header is currently used by
kernel only, tools copy needs to be in sync with kernel file.

Signed-off-by: Ravi Bangoria <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
---
tools/arch/x86/include/asm/amd-ibs.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/tools/arch/x86/include/asm/amd-ibs.h b/tools/arch/x86/include/asm/amd-ibs.h
index 9a3312e12e2e..93807b437e4d 100644
--- a/tools/arch/x86/include/asm/amd-ibs.h
+++ b/tools/arch/x86/include/asm/amd-ibs.h
@@ -6,6 +6,22 @@

#include "msr-index.h"

+/* IBS_OP_DATA2 DataSrc */
+#define IBS_DATA_SRC_LOC_CACHE 2
+#define IBS_DATA_SRC_DRAM 3
+#define IBS_DATA_SRC_REM_CACHE 4
+#define IBS_DATA_SRC_IO 7
+
+/* IBS_OP_DATA2 DataSrc Extension */
+#define IBS_DATA_SRC_EXT_LOC_CACHE 1
+#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE 2
+#define IBS_DATA_SRC_EXT_DRAM 3
+#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE 5
+#define IBS_DATA_SRC_EXT_PMEM 6
+#define IBS_DATA_SRC_EXT_IO 7
+#define IBS_DATA_SRC_EXT_EXT_MEM 8
+#define IBS_DATA_SRC_EXT_PEER_AGENT_MEM 12
+
/*
* IBS Hardware MSRs
*/
--
2.31.1

2022-09-28 10:51:54

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 14/15] perf mem: Use more generic term for LFB

A hw component to track outstanding L1 Data Cache misses is called
LFB (Line Fill Buffer) on Intel and Arm. However similar component
exists on other arch with different names, for ex, it's called MAB
(Miss Address Buffer) on AMD. Use 'LFB/MAB' instead of just 'LFB'.

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/util/mem-events.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 4553b4389b17..a1838a641777 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -282,7 +282,7 @@ static const char * const mem_lvl[] = {
"HIT",
"MISS",
"L1",
- "LFB",
+ "LFB/MAB",
"L2",
"L3",
"Local RAM",
@@ -298,7 +298,7 @@ static const char * const mem_lvlnum[] = {
[PERF_MEM_LVLNUM_EXTN_MEM] = "Ext Mem",
[PERF_MEM_LVLNUM_IO] = "I/O",
[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
- [PERF_MEM_LVLNUM_LFB] = "LFB",
+ [PERF_MEM_LVLNUM_LFB] = "LFB/MAB",
[PERF_MEM_LVLNUM_RAM] = "RAM",
[PERF_MEM_LVLNUM_PMEM] = "PMEM",
[PERF_MEM_LVLNUM_NA] = "N/A",
--
2.31.1