2022-06-16 11:54:31

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 00/14] perf mem/c2c: Add support for AMD

Perf mem and c2c tools are wrappers around perf record with mem load/
store events. IBS tagged load/store sample provides most of the
information needed for these tools. Enable support for these tools on
AMD Zen processors based on IBS Op pmu.

There are some limitations though: Only load/store instructions provide
mem/c2c information. However, IBS does not provide a way to choose a
particular type of instruction to tag. This results in many non-LS
instructions being tagged which appear as N/A. IBS, being an uncore pmu
from kernel point of view[1], does not support per process monitoring.
Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.

Example:
$ sudo ./perf mem record -- -c 10000
^C[ perf record: Woken up 227 times to write data ]
[ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

$ sudo ./perf mem report -F mem,sample,snoop
Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
Memory access Samples Snoop
N/A 700620 N/A
L1 hit 126675 N/A
L2 hit 424 N/A
L3 hit 664 HitM
L3 hit 10 N/A
Local RAM hit 2 N/A
Remote RAM (1 hop) hit 8558 N/A
Remote Cache (1 hop) hit 3 N/A
Remote Cache (1 hop) hit 2 HitM
Remote Cache (2 hops) hit 10 HitM
Remote Cache (2 hops) hit 6 N/A
Uncached hit 4 N/A

Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement patches[2]

[1]: https://lore.kernel.org/lkml/[email protected]
[2]: https://lore.kernel.org/lkml/[email protected]

v1: https://lore.kernel.org/lkml/[email protected]
v1->v2:
- Instead of defining macros to extract IBS register bits, use existing
bitfield definitions. Zen4 has introduced additional set of bits in
IBS registers which this series also exploits and thus this series
now depends on IBS Zen4 enhancement patchset.
- Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new event,
perf tool starts with a set of attributes and goes on reverting some
attributes in a predefined order until it succeeds or run out or all
attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
which always fails because IBS does not support guest filtering. The
problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
support from kernel, using it from the perf tool need more changes.
I'll try to address this bug later.
- Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
that physical address is set by arch pmu driver and should not be
overwritten.


Ravi Bangoria (14):
perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
perf/x86/amd: Support PERF_SAMPLE_ADDR
perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
perf tool: Sync include/uapi/linux/perf_event.h header
perf tool: Sync arch/x86/include/asm/amd-ibs.h header
perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
perf mem/c2c: Add load store event mappings for AMD
perf mem/c2c: Avoid printing empty lines for unsupported events
perf mem: Use more generic term for LFB
perf script: Add missing fields in usage hint

arch/x86/events/amd/ibs.c | 372 ++++++++++++++++++++++-
arch/x86/include/asm/amd-ibs.h | 16 +
include/uapi/linux/perf_event.h | 5 +-
kernel/events/core.c | 4 +-
tools/arch/x86/include/asm/amd-ibs.h | 16 +
tools/include/uapi/linux/perf_event.h | 5 +-
tools/perf/Documentation/perf-c2c.txt | 14 +-
tools/perf/Documentation/perf-mem.txt | 3 +-
tools/perf/Documentation/perf-record.txt | 1 +
tools/perf/arch/x86/util/mem-events.c | 31 +-
tools/perf/builtin-c2c.c | 1 +
tools/perf/builtin-mem.c | 1 +
tools/perf/builtin-script.c | 7 +-
tools/perf/util/mem-events.c | 17 +-
14 files changed, 467 insertions(+), 26 deletions(-)

--
2.31.1


2022-06-16 11:54:53

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 01/14] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}

PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
accesses but it can not distinguish between local and remote IO.
Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.

Signed-off-by: Ravi Bangoria <[email protected]>
---
include/uapi/linux/perf_event.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d37629dbad72..1c3157c1be9d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1292,7 +1292,9 @@ union perf_mem_data_src {
#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */
#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */
#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */
-/* 5-0xa available */
+/* 5-0x8 available */
+#define PERF_MEM_LVLNUM_EXTN_MEM 0x9 /* Extension memory */
+#define PERF_MEM_LVLNUM_IO 0x0a /* I/O */
#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB */
#define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */
--
2.31.1

2022-06-16 11:54:53

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 05/14] perf/x86/amd: Support PERF_SAMPLE_ADDR

IBS_DC_LINADDR provides the linear data address for the tagged load/
store operation. Populate perf sample address using it.

Signed-off-by: Ravi Bangoria <[email protected]>
---
arch/x86/events/amd/ibs.c | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 830e527a29c3..9b3e265a9fed 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -977,12 +977,35 @@ static void perf_ibs_get_data_src(struct perf_event *event,
perf_ibs_get_mem_lock(&op_data3, data);
}

+static void perf_ibs_get_data_addr(struct perf_event *event,
+ struct perf_ibs_data *ibs_data,
+ struct perf_sample_data *data)
+{
+ union perf_mem_data_src *data_src = &data->data_src;
+ union ibs_op_data3 op_data3;
+
+ op_data3.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
+
+ if (!(event->attr.sample_type & PERF_SAMPLE_DATA_SRC))
+ perf_ibs_get_mem_op(&op_data3, data);
+
+ if ((data_src->mem_op != PERF_MEM_OP_LOAD &&
+ data_src->mem_op != PERF_MEM_OP_STORE) ||
+ !op_data3.dc_lin_addr_valid) {
+ data->addr = 0x0;
+ return;
+ }
+
+ data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
+}
+
static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
int check_rip)
{
if (sample_type & PERF_SAMPLE_RAW ||
(perf_ibs == &perf_ibs_op &&
- sample_type & PERF_SAMPLE_DATA_SRC))
+ (sample_type & PERF_SAMPLE_DATA_SRC ||
+ sample_type & PERF_SAMPLE_ADDR)))
return perf_ibs->offset_max;
else if (check_rip)
return 3;
@@ -1094,6 +1117,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
if (perf_ibs == &perf_ibs_op) {
if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC)
perf_ibs_get_data_src(event, &ibs_data, &data);
+ if (event->attr.sample_type & PERF_SAMPLE_ADDR)
+ perf_ibs_get_data_addr(event, &ibs_data, &data);
}

/*
--
2.31.1

2022-06-16 11:54:56

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 03/14] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC

struct perf_mem_data_src is used to pass arch specific memory access
details into generic form. These details gets consumed by tools like
perf mem and c2c. IBS tagged load/store sample provides most of the
information needed for these tools. Add a logic to convert IBS
specific raw data into perf_mem_data_src.

Signed-off-by: Ravi Bangoria <[email protected]>
---
arch/x86/events/amd/ibs.c | 302 +++++++++++++++++++++++++++++++++++++-
1 file changed, 296 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index c251bc44c088..de2632a2e44d 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -688,6 +688,294 @@ static struct perf_ibs perf_ibs_op = {
.get_count = get_ibs_op_count,
};

+static void perf_ibs_get_mem_op(union ibs_op_data3 *op_data3,
+ struct perf_sample_data *data)
+{
+ union perf_mem_data_src *data_src = &data->data_src;
+
+ data_src->mem_op = PERF_MEM_OP_NA;
+
+ if (op_data3->ld_op)
+ data_src->mem_op = PERF_MEM_OP_LOAD;
+ else if (op_data3->st_op)
+ data_src->mem_op = PERF_MEM_OP_STORE;
+}
+
+/*
+ * Processors having CPUID_Fn8000001B_EAX[11] aka IBS_CAPS_ZEN4 has
+ * more fine granular DataSrc encodings. Others have coarse.
+ */
+static u8 perf_ibs_data_src(union ibs_op_data2 *op_data2)
+{
+ if (ibs_caps & IBS_CAPS_ZEN4)
+ return (op_data2->data_src_hi << 3) | op_data2->data_src_lo;
+
+ return op_data2->data_src_lo;
+}
+
+static void perf_ibs_get_mem_lvl(struct perf_event *event,
+ union ibs_op_data2 *op_data2,
+ union ibs_op_data3 *op_data3,
+ struct perf_sample_data *data)
+{
+ union perf_mem_data_src *data_src = &data->data_src;
+ u8 ibs_data_src = perf_ibs_data_src(op_data2);
+
+ data_src->mem_lvl = 0;
+
+ /*
+ * DcMiss, L2Miss, DataSrc, DcMissLat etc. are all invalid for Uncached
+ * memory accesses. So, check DcUcMemAcc bit early.
+ */
+ if (op_data3->dc_uc_mem_acc && ibs_data_src != IBS_DATA_SRC_EXT_IO) {
+ data_src->mem_lvl = PERF_MEM_LVL_UNC | PERF_MEM_LVL_HIT;
+ return;
+ }
+
+ /* L1 Hit */
+ if (op_data3->dc_miss == 0) {
+ data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
+ return;
+ }
+
+ /* L2 Hit */
+ if (op_data3->l2_miss == 0) {
+ /* Erratum #1293 */
+ if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xF ||
+ !(op_data3->sw_pf || op_data3->dc_miss_no_mab_alloc)) {
+ data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+ return;
+ }
+ }
+
+ /* L3 Hit */
+ if (ibs_caps & IBS_CAPS_ZEN4) {
+ if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+ ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE) {
+ data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+ return;
+ }
+ } else {
+ if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+ ibs_data_src == IBS_DATA_SRC_LOC_CACHE) {
+ data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_REM_CCE1 |
+ PERF_MEM_LVL_HIT;
+ return;
+ }
+ }
+
+ /* A peer cache in a near CCX. */
+ if (ibs_caps & IBS_CAPS_ZEN4 && data_src->mem_op == PERF_MEM_OP_LOAD &&
+ ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE) {
+ data_src->mem_lvl = PERF_MEM_LVL_REM_CCE1 | PERF_MEM_LVL_HIT;
+ return;
+ }
+
+ /* A peer cache in a far CCX. */
+ if (ibs_caps & IBS_CAPS_ZEN4) {
+ if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+ ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE) {
+ data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
+ return;
+ }
+ } else {
+ if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+ ibs_data_src == IBS_DATA_SRC_REM_CACHE) {
+ data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
+ return;
+ }
+ }
+
+ /* DRAM */
+ if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+ ibs_data_src == IBS_DATA_SRC_EXT_DRAM) {
+ if (op_data2->rmt_node == 0)
+ data_src->mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
+ else
+ data_src->mem_lvl = PERF_MEM_LVL_REM_RAM1 | PERF_MEM_LVL_HIT;
+ return;
+ }
+
+ /* PMEM */
+ if (ibs_caps & IBS_CAPS_ZEN4 && data_src->mem_op == PERF_MEM_OP_LOAD &&
+ ibs_data_src == IBS_DATA_SRC_EXT_PMEM) {
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_PMEM;
+ if (op_data2->rmt_node) {
+ data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+ /* IBS doesn't provide Remote socket detail */
+ data_src->mem_hops = PERF_MEM_HOPS_1;
+ }
+ return;
+ }
+
+ /* Extension Memory */
+ if (ibs_caps & IBS_CAPS_ZEN4 && data_src->mem_op == PERF_MEM_OP_LOAD &&
+ ibs_data_src == IBS_DATA_SRC_EXT_EXT_MEM) {
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_EXTN_MEM;
+ if (op_data2->rmt_node) {
+ data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+ /* IBS doesn't provide Remote socket detail */
+ data_src->mem_hops = PERF_MEM_HOPS_1;
+ }
+ return;
+ }
+
+ /* IO */
+ if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+ ibs_data_src == IBS_DATA_SRC_EXT_IO) {
+ data_src->mem_lvl = PERF_MEM_LVL_IO;
+ data_src->mem_lvl_num = PERF_MEM_LVLNUM_IO;
+ if (op_data2->rmt_node) {
+ data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+ /* IBS doesn't provide Remote socket detail */
+ data_src->mem_hops = PERF_MEM_HOPS_1;
+ }
+ return;
+ }
+
+ /*
+ * MAB (Miss Address Buffer) Hit. MAB keeps track of outstanding
+ * DC misses. However such data may come from any level in mem
+ * hierarchy. IBS provides detail about both MAB as well as actual
+ * DataSrc simultaneously. Prioritize DataSrc over MAB, i.e. set
+ * MAB only when IBS fails to provide DataSrc.
+ */
+ if (op_data3->dc_miss_no_mab_alloc) {
+ data_src->mem_lvl = PERF_MEM_LVL_LFB | PERF_MEM_LVL_HIT;
+ return;
+ }
+
+ data_src->mem_lvl = PERF_MEM_LVL_NA;
+}
+
+static bool perf_ibs_cache_hit_st_valid(void)
+{
+ /* 0: Uninitialized, 1: Valid, -1: Invalid */
+ static int cache_hit_st_valid;
+
+ if (unlikely(!cache_hit_st_valid)) {
+ if (boot_cpu_data.x86 == 0x19 &&
+ (boot_cpu_data.x86_model <= 0xF ||
+ (boot_cpu_data.x86_model >= 0x20 &&
+ boot_cpu_data.x86_model <= 0x5F))) {
+ cache_hit_st_valid = -1;
+ } else {
+ cache_hit_st_valid = 1;
+ }
+ }
+
+ return cache_hit_st_valid == 1;
+}
+
+static void perf_ibs_get_mem_snoop(union ibs_op_data2 *op_data2,
+ struct perf_sample_data *data)
+{
+ union perf_mem_data_src *data_src = &data->data_src;
+ u8 ibs_data_src;
+
+ data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+
+ if (!perf_ibs_cache_hit_st_valid() ||
+ data_src->mem_op != PERF_MEM_OP_LOAD ||
+ data_src->mem_lvl & PERF_MEM_LVL_L1 ||
+ data_src->mem_lvl & PERF_MEM_LVL_L2 ||
+ op_data2->cache_hit_st)
+ return;
+
+ ibs_data_src = perf_ibs_data_src(op_data2);
+
+ if (ibs_caps & IBS_CAPS_ZEN4) {
+ if (ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE ||
+ ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE ||
+ ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE)
+ data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+ } else if (ibs_data_src == IBS_DATA_SRC_LOC_CACHE) {
+ data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+ }
+}
+
+static void perf_ibs_get_tlb_lvl(union ibs_op_data3 *op_data3,
+ struct perf_sample_data *data)
+{
+ union perf_mem_data_src *data_src = &data->data_src;
+
+ data_src->mem_dtlb = PERF_MEM_TLB_NA;
+
+ if (!op_data3->dc_lin_addr_valid)
+ return;
+
+ if (!op_data3->dc_l1tlb_miss) {
+ data_src->mem_dtlb = PERF_MEM_TLB_L1 | PERF_MEM_TLB_HIT;
+ return;
+ }
+
+ if (!op_data3->dc_l2tlb_miss) {
+ data_src->mem_dtlb = PERF_MEM_TLB_L2 | PERF_MEM_TLB_HIT;
+ return;
+ }
+
+ data_src->mem_dtlb = PERF_MEM_TLB_L2 | PERF_MEM_TLB_MISS;
+}
+
+static void perf_ibs_get_mem_lock(union ibs_op_data3 *op_data3,
+ struct perf_sample_data *data)
+{
+ union perf_mem_data_src *data_src = &data->data_src;
+
+ data_src->mem_lock = PERF_MEM_LOCK_NA;
+
+ if (op_data3->dc_locked_op)
+ data_src->mem_lock = PERF_MEM_LOCK_LOCKED;
+}
+
+#define ibs_op_msr_idx(msr) (msr - MSR_AMD64_IBSOPCTL)
+
+static void perf_ibs_get_data_src(struct perf_event *event,
+ struct perf_ibs_data *ibs_data,
+ struct perf_sample_data *data)
+{
+ union perf_mem_data_src *data_src = &data->data_src;
+ union ibs_op_data2 op_data2;
+ union ibs_op_data3 op_data3;
+
+ op_data3.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
+
+ perf_ibs_get_mem_op(&op_data3, data);
+ if (data_src->mem_op != PERF_MEM_OP_LOAD &&
+ data_src->mem_op != PERF_MEM_OP_STORE)
+ return;
+
+ op_data2.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA2)];
+
+ /* Erratum #1293 */
+ if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model <= 0xF &&
+ (op_data3.sw_pf || op_data3.dc_miss_no_mab_alloc)) {
+ /*
+ * OP_DATA2 has only two fields on Zen3: DataSrc and RmtNode.
+ * DataSrc=0 is No valid status and RmtNode is invalid when
+ * DataSrc=0.
+ */
+ op_data2.val = 0;
+ }
+
+ perf_ibs_get_mem_lvl(event, &op_data2, &op_data3, data);
+ perf_ibs_get_mem_snoop(&op_data2, data);
+ perf_ibs_get_tlb_lvl(&op_data3, data);
+ perf_ibs_get_mem_lock(&op_data3, data);
+}
+
+static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
+ int check_rip)
+{
+ if (sample_type & PERF_SAMPLE_RAW ||
+ (perf_ibs == &perf_ibs_op &&
+ sample_type & PERF_SAMPLE_DATA_SRC))
+ return perf_ibs->offset_max;
+ else if (check_rip)
+ return 3;
+ return 1;
+}
+
static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
{
struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
@@ -735,12 +1023,9 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
size = 1;
offset = 1;
check_rip = (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_RIPINVALIDCHK));
- if (event->attr.sample_type & PERF_SAMPLE_RAW)
- offset_max = perf_ibs->offset_max;
- else if (check_rip)
- offset_max = 3;
- else
- offset_max = 1;
+
+ offset_max = perf_ibs_get_offset_max(perf_ibs, event->attr.sample_type, check_rip);
+
do {
rdmsrl(msr + offset, *buf++);
size++;
@@ -793,6 +1078,11 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
data.raw = &raw;
}

+ if (perf_ibs == &perf_ibs_op) {
+ if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC)
+ perf_ibs_get_data_src(event, &ibs_data, &data);
+ }
+
/*
* rip recorded by IbsOpRip will not be consistent with rsp and rbp
* recorded as part of interrupt regs. Thus we need to use rip from
--
2.31.1

2022-06-16 11:55:14

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 08/14] perf tool: Sync arch/x86/include/asm/amd-ibs.h header

Although new details added into this header is currently used by
kernel only, tools copy needs to be in sync with kernel file.

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/arch/x86/include/asm/amd-ibs.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/tools/arch/x86/include/asm/amd-ibs.h b/tools/arch/x86/include/asm/amd-ibs.h
index 9a3312e12e2e..93807b437e4d 100644
--- a/tools/arch/x86/include/asm/amd-ibs.h
+++ b/tools/arch/x86/include/asm/amd-ibs.h
@@ -6,6 +6,22 @@

#include "msr-index.h"

+/* IBS_OP_DATA2 DataSrc */
+#define IBS_DATA_SRC_LOC_CACHE 2
+#define IBS_DATA_SRC_DRAM 3
+#define IBS_DATA_SRC_REM_CACHE 4
+#define IBS_DATA_SRC_IO 7
+
+/* IBS_OP_DATA2 DataSrc Extension */
+#define IBS_DATA_SRC_EXT_LOC_CACHE 1
+#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE 2
+#define IBS_DATA_SRC_EXT_DRAM 3
+#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE 5
+#define IBS_DATA_SRC_EXT_PMEM 6
+#define IBS_DATA_SRC_EXT_IO 7
+#define IBS_DATA_SRC_EXT_EXT_MEM 8
+#define IBS_DATA_SRC_EXT_PEER_AGENT_MEM 12
+
/*
* IBS Hardware MSRs
*/
--
2.31.1

2022-06-16 11:55:41

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 04/14] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}

IbsDcMissLat indicates the number of clock cycles from when a miss is
detected in the data cache to when the data was delivered to the core.
Similarly, IbsTagToRetCtr provides number of cycles from when the op
was tagged to when the op was retired. Consider these fields for
sample->weight. Note that sample->weight will be populated only when
PERF_SAMPLE_DATA_SRC is also set, although PERF_SAMPLE_WEIGHT_STRUCT
and PERF_SAMPLE_WEIGHT are independent of PERF_SAMPLE_DATA_SRC.

Signed-off-by: Ravi Bangoria <[email protected]>
---
arch/x86/events/amd/ibs.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index de2632a2e44d..830e527a29c3 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -714,6 +714,7 @@ static u8 perf_ibs_data_src(union ibs_op_data2 *op_data2)
}

static void perf_ibs_get_mem_lvl(struct perf_event *event,
+ union ibs_op_data *op_data,
union ibs_op_data2 *op_data2,
union ibs_op_data3 *op_data3,
struct perf_sample_data *data)
@@ -738,6 +739,16 @@ static void perf_ibs_get_mem_lvl(struct perf_event *event,
return;
}

+ /* Load latency (Data cache miss latency) */
+ if (data_src->mem_op == PERF_MEM_OP_LOAD) {
+ if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_STRUCT) {
+ data->weight.var1_dw = op_data3->dc_miss_lat;
+ data->weight.var2_w = op_data->tag_to_ret_ctr;
+ } else if (event->attr.sample_type & PERF_SAMPLE_WEIGHT) {
+ data->weight.full = op_data3->dc_miss_lat;
+ }
+ }
+
/* L2 Hit */
if (op_data3->l2_miss == 0) {
/* Erratum #1293 */
@@ -935,6 +946,7 @@ static void perf_ibs_get_data_src(struct perf_event *event,
struct perf_sample_data *data)
{
union perf_mem_data_src *data_src = &data->data_src;
+ union ibs_op_data op_data;
union ibs_op_data2 op_data2;
union ibs_op_data3 op_data3;

@@ -945,6 +957,7 @@ static void perf_ibs_get_data_src(struct perf_event *event,
data_src->mem_op != PERF_MEM_OP_STORE)
return;

+ op_data.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA)];
op_data2.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA2)];

/* Erratum #1293 */
@@ -958,7 +971,7 @@ static void perf_ibs_get_data_src(struct perf_event *event,
op_data2.val = 0;
}

- perf_ibs_get_mem_lvl(event, &op_data2, &op_data3, data);
+ perf_ibs_get_mem_lvl(event, &op_data, &op_data2, &op_data3, data);
perf_ibs_get_mem_snoop(&op_data2, data);
perf_ibs_get_tlb_lvl(&op_data3, data);
perf_ibs_get_mem_lock(&op_data3, data);
--
2.31.1

2022-06-16 11:57:23

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 10/14] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events

Currently perf sets PERF_SAMPLE_WEIGHT flag only for mem load events.
Set it for combined load-store event as well which will enable recording
of load latency by default on arch that does not support independent
mem load event.

Also document missing -W in perf-record man page.

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 1 +
tools/perf/builtin-c2c.c | 1 +
tools/perf/builtin-mem.c | 1 +
3 files changed, 3 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index cf8ad50f3de1..cf68eeb08316 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -397,6 +397,7 @@ is enabled for all the sampling events. The sampled branch type is the same for
The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
Note that this feature may not be available on all processors.

+-W::
--weight::
Enable weightened sampling. An additional weight is recorded per sample and can be
displayed with the weight and local_weight sort keys. This currently works for TSX
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 4898ee57d156..3bf3db6f889c 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -3034,6 +3034,7 @@ static int perf_c2c__record(int argc, const char **argv)
*/
if (e->tag) {
e->record = true;
+ rec_argv[i++] = "-W";
} else {
e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
e->record = true;
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 9e435fd23503..f7dd8216de72 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -122,6 +122,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
(mem->operation & MEM_OPERATION_LOAD) &&
(mem->operation & MEM_OPERATION_STORE)) {
e->record = true;
+ rec_argv[i++] = "-W";
} else {
if (mem->operation & MEM_OPERATION_LOAD) {
e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
--
2.31.1

2022-06-16 12:07:48

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 12/14] perf mem/c2c: Avoid printing empty lines for unsupported events

Perf mem and c2c can be used with 3 different events: load, store and
combined load-store. Some architectures might support only partial set
of events in which case, perf prints empty line for unsupported events.
Avoid that.

Ex, AMD Zen cpus supports only combined load-store event and does not
support individual load store events.

Before patch:
$ ./perf mem record -e list


mem-ldst : available

After patch:
$ ./perf mem record -e list
mem-ldst : available

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/util/mem-events.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 4a55cdd51bba..91db7a0e2da6 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -156,11 +156,12 @@ void perf_mem_events__list(void)
for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
struct perf_mem_event *e = perf_mem_events__ptr(j);

- fprintf(stderr, "%-13s%-*s%s\n",
- e->tag ?: "",
- verbose > 0 ? 25 : 0,
- verbose > 0 ? perf_mem_events__name(j, NULL) : "",
- e->supported ? ": available" : "");
+ fprintf(stderr, "%-*s%-*s%s",
+ e->tag ? 13 : 0,
+ e->tag ? : "",
+ e->tag && verbose > 0 ? 25 : 0,
+ e->tag && verbose > 0 ? perf_mem_events__name(j, NULL) : "",
+ e->supported ? ": available\n" : "");
}
}

--
2.31.1

2022-06-16 12:11:05

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 13/14] perf mem: Use more generic term for LFB

A hw component to track outstanding L1 Data Cache misses is called
LFB (Line Fill Buffer) on Intel and Arm. However similar component
exists on other arch with different names, for ex, it's called MAB
(Miss Address Buffer) on AMD. Replace LFB with generic name "Cache
Fill Buffer".

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/util/mem-events.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 91db7a0e2da6..eaa8efcf255b 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -282,7 +282,7 @@ static const char * const mem_lvl[] = {
"HIT",
"MISS",
"L1",
- "LFB",
+ "Cache Fill Buffer",
"L2",
"L3",
"Local RAM",
@@ -298,7 +298,7 @@ static const char * const mem_lvlnum[] = {
[PERF_MEM_LVLNUM_EXTN_MEM] = "Ext Mem",
[PERF_MEM_LVLNUM_IO] = "I/O",
[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
- [PERF_MEM_LVLNUM_LFB] = "LFB",
+ [PERF_MEM_LVLNUM_LFB] = "Cache Fill Buffer",
[PERF_MEM_LVLNUM_RAM] = "RAM",
[PERF_MEM_LVLNUM_PMEM] = "PMEM",
[PERF_MEM_LVLNUM_NA] = "N/A",
--
2.31.1

2022-06-16 12:13:00

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 11/14] perf mem/c2c: Add load store event mappings for AMD

Perf mem and c2c tools are wrappers around perf record with mem load/
store events. IBS tagged load/store sample provides most of the
information needed for these tools. Wire in ibs_op// event as mem-ldst
event for AMD.

There are some limitations though: Only load/store instructions provide
mem/c2c information. However, IBS does not provide a way to choose a
particular type of instruction to tag. This results in many non-LS
instructions being tagged which appear as N/A. IBS, being an uncore pmu
from kernel point of view[1], does not support per process monitoring.
Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.

Example:
$ sudo ./perf mem record -- -c 10000
^C[ perf record: Woken up 227 times to write data ]
[ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

$ sudo ./perf mem report -F mem,sample,snoop
Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
Memory access Samples Snoop
N/A 700620 N/A
L1 hit 126675 N/A
L2 hit 424 N/A
L3 hit 664 HitM
L3 hit 10 N/A
Local RAM hit 2 N/A
Remote RAM (1 hop) hit 8558 N/A
Remote Cache (1 hop) hit 3 N/A
Remote Cache (1 hop) hit 2 HitM
Remote Cache (2 hops) hit 10 HitM
Remote Cache (2 hops) hit 6 N/A
Uncached hit 4 N/A

[1]: https://lore.kernel.org/lkml/[email protected]

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/Documentation/perf-c2c.txt | 14 ++++++++----
tools/perf/Documentation/perf-mem.txt | 3 ++-
tools/perf/arch/x86/util/mem-events.c | 31 +++++++++++++++++++++++++--
3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index 6f69173731aa..32d173fb6541 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -19,9 +19,10 @@ C2C stands for Cache To Cache.
The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
you to track down the cacheline contentions.

-On x86, the tool is based on load latency and precise store facility events
+On Intel, the tool is based on load latency and precise store facility events
provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
-with thresholding feature.
+with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware
+limitations, perf c2c is not supported on Zen3 cpus).

These events provide:
- memory address of the access
@@ -49,7 +50,8 @@ RECORD OPTIONS

-l::
--ldlat::
- Configure mem-loads latency. (x86 only)
+ Configure mem-loads latency. Supported on Intel and Arm64 processors
+ only. Ignored on other archs.

-k::
--all-kernel::
@@ -133,11 +135,15 @@ Following perf record options are configured by default:
-W,-d,--phys-data,--sample-cpu

Unless specified otherwise with '-e' option, following events are monitored by
-default on x86:
+default on Intel:

cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P

+following on AMD:
+
+ ibs_op//
+
and following on PowerPC:

cpu/mem-loads/
diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 66177511c5c4..005c95580b1e 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -85,7 +85,8 @@ RECORD OPTIONS
Be more verbose (show counter open errors, etc)

--ldlat <n>::
- Specify desired latency for loads event. (x86 only)
+ Specify desired latency for loads event. Supported on Intel and Arm64
+ processors only. Ignored on other archs.

In addition, for report all perf report options are valid, and for record
all perf record options.
diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
index 5214370ca4e4..f683ac702247 100644
--- a/tools/perf/arch/x86/util/mem-events.c
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -1,7 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
#include "util/pmu.h"
+#include "util/env.h"
#include "map_symbol.h"
#include "mem-events.h"
+#include "linux/string.h"

static char mem_loads_name[100];
static bool mem_loads_name__init;
@@ -12,18 +14,43 @@ static char mem_stores_name[100];

#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }

-static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+static struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "%s/events/mem-loads"),
E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores"),
E(NULL, NULL, NULL),
};

+static struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
+ E(NULL, NULL, NULL),
+ E(NULL, NULL, NULL),
+ E("mem-ldst", "ibs_op//", "ibs_op"),
+};
+
+static int perf_mem_is_amd_cpu(void)
+{
+ struct perf_env env = { .total_mem = 0, };
+
+ perf_env__cpuid(&env);
+ if (env.cpuid && strstarts(env.cpuid, "AuthenticAMD"))
+ return 1;
+ return -1;
+}
+
struct perf_mem_event *perf_mem_events__ptr(int i)
{
+ /* 0: Uninitialized, 1: Yes, -1: No */
+ static int is_amd;
+
if (i >= PERF_MEM_EVENTS__MAX)
return NULL;

- return &perf_mem_events[i];
+ if (!is_amd)
+ is_amd = perf_mem_is_amd_cpu();
+
+ if (is_amd == 1)
+ return &perf_mem_events_amd[i];
+
+ return &perf_mem_events_intel[i];
}

bool is_mem_loads_aux_event(struct evsel *leader)
--
2.31.1

2022-06-16 12:17:37

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 09/14] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}

Add support for printing these new fields in perf mem report.

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/util/mem-events.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index c3c21a9c350b..4a55cdd51bba 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -294,6 +294,8 @@ static const char * const mem_lvl[] = {
};

static const char * const mem_lvlnum[] = {
+ [PERF_MEM_LVLNUM_EXTN_MEM] = "Ext Mem",
+ [PERF_MEM_LVLNUM_IO] = "I/O",
[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
[PERF_MEM_LVLNUM_LFB] = "LFB",
[PERF_MEM_LVLNUM_RAM] = "RAM",
--
2.31.1

2022-06-16 12:21:15

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 06/14] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR

IBS_DC_PHYSADDR provides the physical data address for the tagged load/
store operation. Populate perf sample physical address using it.
Currently, physical address is unconditionally overwritten by generic
perf driver. Introduce internal only __PERF_SAMPLE_PHYS_ADDR_EARLY type
to notify generic code that arch pmu has already set physical address.

Signed-off-by: Ravi Bangoria <[email protected]>
---
arch/x86/events/amd/ibs.c | 34 ++++++++++++++++++++++++++++++++-
include/uapi/linux/perf_event.h | 1 +
kernel/events/core.c | 4 +++-
3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 9b3e265a9fed..d224abddc3af 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -310,6 +310,13 @@ static int perf_ibs_init(struct perf_event *event)
if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;

+ /*
+ * Setting _EARLY flag makes sure generic perf driver does not
+ * overwrite physical address set by arch specific pmu driver.
+ */
+ if (event->attr.sample_type & PERF_SAMPLE_PHYS_ADDR)
+ event->attr.sample_type |= __PERF_SAMPLE_PHYS_ADDR_EARLY;
+
return 0;
}

@@ -999,13 +1006,36 @@ static void perf_ibs_get_data_addr(struct perf_event *event,
data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
}

+static void perf_ibs_get_phy_addr(struct perf_event *event,
+ struct perf_ibs_data *ibs_data,
+ struct perf_sample_data *data)
+{
+ union perf_mem_data_src *data_src = &data->data_src;
+ union ibs_op_data3 op_data3;
+
+ op_data3.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
+
+ if (!(event->attr.sample_type & PERF_SAMPLE_DATA_SRC))
+ perf_ibs_get_mem_op(&op_data3, data);
+
+ if ((data_src->mem_op != PERF_MEM_OP_LOAD &&
+ data_src->mem_op != PERF_MEM_OP_STORE) ||
+ !op_data3.dc_phy_addr_valid) {
+ data->phys_addr = 0x0;
+ return;
+ }
+
+ data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
+}
+
static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
int check_rip)
{
if (sample_type & PERF_SAMPLE_RAW ||
(perf_ibs == &perf_ibs_op &&
(sample_type & PERF_SAMPLE_DATA_SRC ||
- sample_type & PERF_SAMPLE_ADDR)))
+ sample_type & PERF_SAMPLE_ADDR ||
+ sample_type & PERF_SAMPLE_PHYS_ADDR)))
return perf_ibs->offset_max;
else if (check_rip)
return 3;
@@ -1119,6 +1149,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
perf_ibs_get_data_src(event, &ibs_data, &data);
if (event->attr.sample_type & PERF_SAMPLE_ADDR)
perf_ibs_get_data_addr(event, &ibs_data, &data);
+ if (event->attr.sample_type & PERF_SAMPLE_PHYS_ADDR)
+ perf_ibs_get_phy_addr(event, &ibs_data, &data);
}

/*
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 1c3157c1be9d..daf7c337e53e 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -165,6 +165,7 @@ enum perf_event_sample_format {

PERF_SAMPLE_MAX = 1U << 25, /* non-ABI */

+ __PERF_SAMPLE_PHYS_ADDR_EARLY = 1ULL << 62, /* non-ABI; internal use */
__PERF_SAMPLE_CALLCHAIN_EARLY = 1ULL << 63, /* non-ABI; internal use */
};

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 80782cddb1da..f1b486410d0b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7403,8 +7403,10 @@ void perf_prepare_sample(struct perf_event_header *header,
header->size += size;
}

- if (sample_type & PERF_SAMPLE_PHYS_ADDR)
+ if (sample_type & PERF_SAMPLE_PHYS_ADDR &&
+ !(sample_type & __PERF_SAMPLE_PHYS_ADDR_EARLY)) {
data->phys_addr = perf_virt_to_phys(data->addr);
+ }

#ifdef CONFIG_CGROUP_PERF
if (sample_type & PERF_SAMPLE_CGROUP) {
--
2.31.1

2022-06-16 12:25:33

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2 07/14] perf tool: Sync include/uapi/linux/perf_event.h header

Two new fields for mem_lvl_num has been introduced: PERF_MEM_LVLNUM_IO
and PERF_MEM_LVLNUM_EXTN_MEM. Also, __PERF_SAMPLE_PHYS_ADDR_EARLY is
introduce to be used internally by kernel. Kernel header already
contains these definitions. Sync them into tools header as well.

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index d37629dbad72..daf7c337e53e 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -165,6 +165,7 @@ enum perf_event_sample_format {

PERF_SAMPLE_MAX = 1U << 25, /* non-ABI */

+ __PERF_SAMPLE_PHYS_ADDR_EARLY = 1ULL << 62, /* non-ABI; internal use */
__PERF_SAMPLE_CALLCHAIN_EARLY = 1ULL << 63, /* non-ABI; internal use */
};

@@ -1292,7 +1293,9 @@ union perf_mem_data_src {
#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */
#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */
#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */
-/* 5-0xa available */
+/* 5-0x8 available */
+#define PERF_MEM_LVLNUM_EXTN_MEM 0x9 /* Extension memory */
+#define PERF_MEM_LVLNUM_IO 0x0a /* I/O */
#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB */
#define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */
--
2.31.1

2022-07-12 09:32:08

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD


On 16-Jun-22 5:06 PM, Ravi Bangoria wrote:
> Perf mem and c2c tools are wrappers around perf record with mem load/
> store events. IBS tagged load/store sample provides most of the
> information needed for these tools. Enable support for these tools on
> AMD Zen processors based on IBS Op pmu.

Gentle ping!

Thank,
Ravi

2022-07-12 11:50:45

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD

On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
> Perf mem and c2c tools are wrappers around perf record with mem load/
> store events. IBS tagged load/store sample provides most of the
> information needed for these tools. Enable support for these tools on
> AMD Zen processors based on IBS Op pmu.
>
> There are some limitations though: Only load/store instructions provide
> mem/c2c information. However, IBS does not provide a way to choose a
> particular type of instruction to tag. This results in many non-LS
> instructions being tagged which appear as N/A. IBS, being an uncore pmu
> from kernel point of view[1], does not support per process monitoring.
> Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.
>
> Example:
> $ sudo ./perf mem record -- -c 10000
> ^C[ perf record: Woken up 227 times to write data ]
> [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]
>
> $ sudo ./perf mem report -F mem,sample,snoop
> Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
> Memory access Samples Snoop
> N/A 700620 N/A
> L1 hit 126675 N/A
> L2 hit 424 N/A
> L3 hit 664 HitM
> L3 hit 10 N/A
> Local RAM hit 2 N/A
> Remote RAM (1 hop) hit 8558 N/A
> Remote Cache (1 hop) hit 3 N/A
> Remote Cache (1 hop) hit 2 HitM
> Remote Cache (2 hops) hit 10 HitM
> Remote Cache (2 hops) hit 6 N/A
> Uncached hit 4 N/A
>
> Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement patches[2]
>
> [1]: https://lore.kernel.org/lkml/[email protected]
> [2]: https://lore.kernel.org/lkml/[email protected]
>
> v1: https://lore.kernel.org/lkml/[email protected]
> v1->v2:
> - Instead of defining macros to extract IBS register bits, use existing
> bitfield definitions. Zen4 has introduced additional set of bits in
> IBS registers which this series also exploits and thus this series
> now depends on IBS Zen4 enhancement patchset.
> - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new event,
> perf tool starts with a set of attributes and goes on reverting some
> attributes in a predefined order until it succeeds or run out or all
> attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
> which always fails because IBS does not support guest filtering. The
> problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
> exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
> support from kernel, using it from the perf tool need more changes.
> I'll try to address this bug later.
> - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
> that physical address is set by arch pmu driver and should not be
> overwritten.
>
>
> Ravi Bangoria (14):
> perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
> perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
> perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
> perf/x86/amd: Support PERF_SAMPLE_ADDR
> perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
> perf tool: Sync include/uapi/linux/perf_event.h header
> perf tool: Sync arch/x86/include/asm/amd-ibs.h header
> perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
> perf mem/c2c: Add load store event mappings for AMD
> perf mem/c2c: Avoid printing empty lines for unsupported events
> perf mem: Use more generic term for LFB
> perf script: Add missing fields in usage hint

tools part looks good to me

Acked-by: Jiri Olsa <[email protected]>

thanks,
jirka

>
> arch/x86/events/amd/ibs.c | 372 ++++++++++++++++++++++-
> arch/x86/include/asm/amd-ibs.h | 16 +
> include/uapi/linux/perf_event.h | 5 +-
> kernel/events/core.c | 4 +-
> tools/arch/x86/include/asm/amd-ibs.h | 16 +
> tools/include/uapi/linux/perf_event.h | 5 +-
> tools/perf/Documentation/perf-c2c.txt | 14 +-
> tools/perf/Documentation/perf-mem.txt | 3 +-
> tools/perf/Documentation/perf-record.txt | 1 +
> tools/perf/arch/x86/util/mem-events.c | 31 +-
> tools/perf/builtin-c2c.c | 1 +
> tools/perf/builtin-mem.c | 1 +
> tools/perf/builtin-script.c | 7 +-
> tools/perf/util/mem-events.c | 17 +-
> 14 files changed, 467 insertions(+), 26 deletions(-)
>
> --
> 2.31.1
>

2022-07-18 16:10:18

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD

Em Tue, Jul 12, 2022 at 01:35:25PM +0200, Jiri Olsa escreveu:
> On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
> > Perf mem and c2c tools are wrappers around perf record with mem load/
> > store events. IBS tagged load/store sample provides most of the
> > information needed for these tools. Enable support for these tools on
> > AMD Zen processors based on IBS Op pmu.
> >
> > There are some limitations though: Only load/store instructions provide
> > mem/c2c information. However, IBS does not provide a way to choose a
> > particular type of instruction to tag. This results in many non-LS
> > instructions being tagged which appear as N/A. IBS, being an uncore pmu
> > from kernel point of view[1], does not support per process monitoring.
> > Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.
> >
> > Example:
> > $ sudo ./perf mem record -- -c 10000
> > ^C[ perf record: Woken up 227 times to write data ]
> > [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]
> >
> > $ sudo ./perf mem report -F mem,sample,snoop
> > Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
> > Memory access Samples Snoop
> > N/A 700620 N/A
> > L1 hit 126675 N/A
> > L2 hit 424 N/A
> > L3 hit 664 HitM
> > L3 hit 10 N/A
> > Local RAM hit 2 N/A
> > Remote RAM (1 hop) hit 8558 N/A
> > Remote Cache (1 hop) hit 3 N/A
> > Remote Cache (1 hop) hit 2 HitM
> > Remote Cache (2 hops) hit 10 HitM
> > Remote Cache (2 hops) hit 6 N/A
> > Uncached hit 4 N/A
> >
> > Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement patches[2]
> >
> > [1]: https://lore.kernel.org/lkml/[email protected]
> > [2]: https://lore.kernel.org/lkml/[email protected]
> >
> > v1: https://lore.kernel.org/lkml/[email protected]
> > v1->v2:
> > - Instead of defining macros to extract IBS register bits, use existing
> > bitfield definitions. Zen4 has introduced additional set of bits in
> > IBS registers which this series also exploits and thus this series
> > now depends on IBS Zen4 enhancement patchset.
> > - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new event,
> > perf tool starts with a set of attributes and goes on reverting some
> > attributes in a predefined order until it succeeds or run out or all
> > attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
> > which always fails because IBS does not support guest filtering. The
> > problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
> > exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
> > support from kernel, using it from the perf tool need more changes.
> > I'll try to address this bug later.
> > - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
> > that physical address is set by arch pmu driver and should not be
> > overwritten.
> >
> >
> > Ravi Bangoria (14):
> > perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> > perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
> > perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
> > perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
> > perf/x86/amd: Support PERF_SAMPLE_ADDR
> > perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
> > perf tool: Sync include/uapi/linux/perf_event.h header
> > perf tool: Sync arch/x86/include/asm/amd-ibs.h header
> > perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> > perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
> > perf mem/c2c: Add load store event mappings for AMD
> > perf mem/c2c: Avoid printing empty lines for unsupported events
> > perf mem: Use more generic term for LFB
> > perf script: Add missing fields in usage hint
>
> tools part looks good to me
>
> Acked-by: Jiri Olsa <[email protected]>

What about the kernel bits? PeterZ? Is this in some tip branch?

- Arnaldo

2022-07-22 03:17:29

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD

On 21-Jul-22 10:54 PM, Arnaldo Carvalho de Melo wrote:
> Ping.
>
> On Mon, Jul 18, 2022, 12:34 PM Arnaldo Carvalho de Melo <
> [email protected]> wrote:
>
>> Em Tue, Jul 12, 2022 at 01:35:25PM +0200, Jiri Olsa escreveu:
>>> On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
>>>> Perf mem and c2c tools are wrappers around perf record with mem load/
>>>> store events. IBS tagged load/store sample provides most of the
>>>> information needed for these tools. Enable support for these tools on
>>>> AMD Zen processors based on IBS Op pmu.
>>>>
>>>> There are some limitations though: Only load/store instructions provide
>>>> mem/c2c information. However, IBS does not provide a way to choose a
>>>> particular type of instruction to tag. This results in many non-LS
>>>> instructions being tagged which appear as N/A. IBS, being an uncore pmu
>>>> from kernel point of view[1], does not support per process monitoring.
>>>> Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.
>>>>
>>>> Example:
>>>> $ sudo ./perf mem record -- -c 10000
>>>> ^C[ perf record: Woken up 227 times to write data ]
>>>> [ perf record: Captured and wrote 58.760 MB perf.data (836978
>> samples) ]
>>>>
>>>> $ sudo ./perf mem report -F mem,sample,snoop
>>>> Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
>>>> Memory access Samples Snoop
>>>> N/A 700620 N/A
>>>> L1 hit 126675 N/A
>>>> L2 hit 424 N/A
>>>> L3 hit 664 HitM
>>>> L3 hit 10 N/A
>>>> Local RAM hit 2 N/A
>>>> Remote RAM (1 hop) hit 8558 N/A
>>>> Remote Cache (1 hop) hit 3 N/A
>>>> Remote Cache (1 hop) hit 2 HitM
>>>> Remote Cache (2 hops) hit 10 HitM
>>>> Remote Cache (2 hops) hit 6 N/A
>>>> Uncached hit 4 N/A
>>>>
>>>> Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement
>> patches[2]
>>>>
>>>> [1]:
>> https://lore.kernel.org/lkml/[email protected]
>>>> [2]:
>> https://lore.kernel.org/lkml/[email protected]
>>>>
>>>> v1:
>> https://lore.kernel.org/lkml/[email protected]
>>>> v1->v2:
>>>> - Instead of defining macros to extract IBS register bits, use
>> existing
>>>> bitfield definitions. Zen4 has introduced additional set of bits in
>>>> IBS registers which this series also exploits and thus this series
>>>> now depends on IBS Zen4 enhancement patchset.
>>>> - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new
>> event,
>>>> perf tool starts with a set of attributes and goes on reverting some
>>>> attributes in a predefined order until it succeeds or run out or all
>>>> attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
>>>> which always fails because IBS does not support guest filtering. The
>>>> problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
>>>> exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
>>>> support from kernel, using it from the perf tool need more changes.
>>>> I'll try to address this bug later.
>>>> - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
>>>> that physical address is set by arch pmu driver and should not be
>>>> overwritten.
>>>>
>>>>
>>>> Ravi Bangoria (14):
>>>> perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>>>> perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
>>>> perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
>>>> perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
>>>> perf/x86/amd: Support PERF_SAMPLE_ADDR
>>>> perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
>>>> perf tool: Sync include/uapi/linux/perf_event.h header
>>>> perf tool: Sync arch/x86/include/asm/amd-ibs.h header
>>>> perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>>>> perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
>>>> perf mem/c2c: Add load store event mappings for AMD
>>>> perf mem/c2c: Avoid printing empty lines for unsupported events
>>>> perf mem: Use more generic term for LFB
>>>> perf script: Add missing fields in usage hint
>>>
>>> tools part looks good to me
>>>
>>> Acked-by: Jiri Olsa <[email protected]>
>>
>> What about the kernel bits? PeterZ? Is this in some tip branch?

Peter, Would you able to pick this up for next merge window? Please
note that, one dependency patch needs to be applied first from "IBS
Zen4 enhancement" series:

[PATCH v6 6/8] perf/x86/ibs: Add new IBS register bits into header
https://lore.kernel.org/lkml/[email protected]

Please let me know if you face any issues.

Thanks,
Ravi

2022-08-10 13:35:51

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD

Em Fri, Jul 22, 2022 at 07:51:27AM +0530, Ravi Bangoria escreveu:
> On 21-Jul-22 10:54 PM, Arnaldo Carvalho de Melo wrote:
> > On Mon, Jul 18, 2022, 12:34 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> >> Em Tue, Jul 12, 2022 at 01:35:25PM +0200, Jiri Olsa escreveu:
> >>> On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
> >>>> Perf mem and c2c tools are wrappers around perf record with mem load/
> >>>> store events. IBS tagged load/store sample provides most of the
> >>>> information needed for these tools. Enable support for these tools on
> >>>> AMD Zen processors based on IBS Op pmu.

> >>>> Ravi Bangoria (14):
> >>>> perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> >>>> perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
> >>>> perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
> >>>> perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
> >>>> perf/x86/amd: Support PERF_SAMPLE_ADDR
> >>>> perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
> >>>> perf tool: Sync include/uapi/linux/perf_event.h header
> >>>> perf tool: Sync arch/x86/include/asm/amd-ibs.h header
> >>>> perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> >>>> perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
> >>>> perf mem/c2c: Add load store event mappings for AMD
> >>>> perf mem/c2c: Avoid printing empty lines for unsupported events
> >>>> perf mem: Use more generic term for LFB
> >>>> perf script: Add missing fields in usage hint

> >>> tools part looks good to me

> >>> Acked-by: Jiri Olsa <[email protected]>

> >> What about the kernel bits? PeterZ? Is this in some tip branch?

> Peter, Would you able to pick this up for next merge window? Please
> note that, one dependency patch needs to be applied first from "IBS
> Zen4 enhancement" series:

> [PATCH v6 6/8] perf/x86/ibs: Add new IBS register bits into header
> https://lore.kernel.org/lkml/[email protected]

It is there already:

⬢[acme@toolbox perf]$ git log --oneline torvalds/master | grep -m1 "Add new IBS register bits into header"
326ecc15c61c349c perf/x86/ibs: Add new IBS register bits into header
⬢[acme@toolbox perf]$

but not the other patches in this series:

⬢[acme@toolbox perf]$ git log --oneline torvalds/master | grep -m1 "amd: Support PERF_SAMPLE_PHY_ADDR"
⬢[acme@toolbox perf]$

- Arnaldo

2022-08-25 11:59:37

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD

On 22-Jul-22 7:51 AM, Ravi Bangoria wrote:
> On 21-Jul-22 10:54 PM, Arnaldo Carvalho de Melo wrote:
>> Ping.
>>
>> On Mon, Jul 18, 2022, 12:34 PM Arnaldo Carvalho de Melo <
>> [email protected]> wrote:
>>
>>> Em Tue, Jul 12, 2022 at 01:35:25PM +0200, Jiri Olsa escreveu:
>>>> On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
>>>>> Perf mem and c2c tools are wrappers around perf record with mem load/
>>>>> store events. IBS tagged load/store sample provides most of the
>>>>> information needed for these tools. Enable support for these tools on
>>>>> AMD Zen processors based on IBS Op pmu.
>>>>>
>>>>> There are some limitations though: Only load/store instructions provide
>>>>> mem/c2c information. However, IBS does not provide a way to choose a
>>>>> particular type of instruction to tag. This results in many non-LS
>>>>> instructions being tagged which appear as N/A. IBS, being an uncore pmu
>>>>> from kernel point of view[1], does not support per process monitoring.
>>>>> Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.
>>>>>
>>>>> Example:
>>>>> $ sudo ./perf mem record -- -c 10000
>>>>> ^C[ perf record: Woken up 227 times to write data ]
>>>>> [ perf record: Captured and wrote 58.760 MB perf.data (836978
>>> samples) ]
>>>>>
>>>>> $ sudo ./perf mem report -F mem,sample,snoop
>>>>> Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
>>>>> Memory access Samples Snoop
>>>>> N/A 700620 N/A
>>>>> L1 hit 126675 N/A
>>>>> L2 hit 424 N/A
>>>>> L3 hit 664 HitM
>>>>> L3 hit 10 N/A
>>>>> Local RAM hit 2 N/A
>>>>> Remote RAM (1 hop) hit 8558 N/A
>>>>> Remote Cache (1 hop) hit 3 N/A
>>>>> Remote Cache (1 hop) hit 2 HitM
>>>>> Remote Cache (2 hops) hit 10 HitM
>>>>> Remote Cache (2 hops) hit 6 N/A
>>>>> Uncached hit 4 N/A
>>>>>
>>>>> Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement
>>> patches[2]
>>>>>
>>>>> [1]:
>>> https://lore.kernel.org/lkml/[email protected]
>>>>> [2]:
>>> https://lore.kernel.org/lkml/[email protected]
>>>>>
>>>>> v1:
>>> https://lore.kernel.org/lkml/[email protected]
>>>>> v1->v2:
>>>>> - Instead of defining macros to extract IBS register bits, use
>>> existing
>>>>> bitfield definitions. Zen4 has introduced additional set of bits in
>>>>> IBS registers which this series also exploits and thus this series
>>>>> now depends on IBS Zen4 enhancement patchset.
>>>>> - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new
>>> event,
>>>>> perf tool starts with a set of attributes and goes on reverting some
>>>>> attributes in a predefined order until it succeeds or run out or all
>>>>> attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
>>>>> which always fails because IBS does not support guest filtering. The
>>>>> problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
>>>>> exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
>>>>> support from kernel, using it from the perf tool need more changes.
>>>>> I'll try to address this bug later.
>>>>> - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
>>>>> that physical address is set by arch pmu driver and should not be
>>>>> overwritten.
>>>>>
>>>>>
>>>>> Ravi Bangoria (14):
>>>>> perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>>>>> perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
>>>>> perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
>>>>> perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
>>>>> perf/x86/amd: Support PERF_SAMPLE_ADDR
>>>>> perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
>>>>> perf tool: Sync include/uapi/linux/perf_event.h header
>>>>> perf tool: Sync arch/x86/include/asm/amd-ibs.h header
>>>>> perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>>>>> perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
>>>>> perf mem/c2c: Add load store event mappings for AMD
>>>>> perf mem/c2c: Avoid printing empty lines for unsupported events
>>>>> perf mem: Use more generic term for LFB
>>>>> perf script: Add missing fields in usage hint
>>>>
>>>> tools part looks good to me
>>>>
>>>> Acked-by: Jiri Olsa <[email protected]>
>>>
>>> What about the kernel bits? PeterZ? Is this in some tip branch?
>
> Peter, Would you able to pick this up for next merge window? Please
> note that, one dependency patch needs to be applied first from "IBS
> Zen4 enhancement" series:
>
> [PATCH v6 6/8] perf/x86/ibs: Add new IBS register bits into header
> https://lore.kernel.org/lkml/[email protected]

Peter, can you please pull this series. (Dependency patch is already
picked up by Boris.)

Thanks,
Ravi