2023-04-07 11:28:45

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements

Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
limitations. Mainly:

1. mem_lvl_num doesn't allow setting multiple sources whereas old API
allows it. Setting multiple data sources is useful because IBS on
pre-zen4 uarch doesn't provide fine granular DataSrc details (there
is only one such DataSrc(2h) though).
2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
prints mem_lvl and mem_lvl_num both if both are set, which is ugly.

Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
setting new API fields, convert all individual field assignments to
compile time wrapper macros built using PERF_MEM_S(). Also convert
DataSrc conditional code to array lookups.

Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
non-intuitive. Make it sane.

v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
v2->v3:
- IBS: Don't club RmtNode with DataSrc=7 (IO)
- Make perf_mem__lvl_scnprintf() more sane
- Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
and interpreat it in tool.
- Add PERF_MEM_LVLNUM_NA to default data_src value
- Change some of the IBS bit description according to latest PPR

Namhyung Kim (1):
perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src

Ravi Bangoria (8):
perf/mem: Introduce PERF_MEM_LVLNUM_UNC
perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
perf headers: Sync uapi/linux/perf_event.h
perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
perf mem: Refactor perf_mem__lvl_scnprintf()
perf mem: Increase HISTC_MEM_LVL column size to 39 chars
perf script ibs: Change bit description according to latest PPR

arch/x86/events/amd/ibs.c | 156 +++++++++++---------------
include/linux/perf_event.h | 3 +-
include/uapi/linux/perf_event.h | 3 +-
tools/include/uapi/linux/perf_event.h | 3 +-
tools/perf/util/amd-sample-raw.c | 14 +--
tools/perf/util/event.h | 3 +-
tools/perf/util/hist.c | 2 +-
tools/perf/util/mem-events.c | 90 ++++++++-------
8 files changed, 132 insertions(+), 142 deletions(-)

--
2.34.1


2023-04-07 11:28:54

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 2/9] perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA

Add PERF_MEM_LVLNUM_NA wherever PERF_MEM_NA is used to set default values.

Signed-off-by: Ravi Bangoria <[email protected]>
---
include/linux/perf_event.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d5628a7b5eaa..064c5ad7ff11 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1183,7 +1183,8 @@ struct perf_sample_data {
PERF_MEM_S(LVL, NA) |\
PERF_MEM_S(SNOOP, NA) |\
PERF_MEM_S(LOCK, NA) |\
- PERF_MEM_S(TLB, NA))
+ PERF_MEM_S(TLB, NA) |\
+ PERF_MEM_S(LVLNUM, NA))

static inline void perf_sample_data_init(struct perf_sample_data *data,
u64 addr, u64 period)
--
2.34.1

2023-04-07 11:29:13

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 5/9] perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE

Add PERF_MEM_LVLNUM_NA wherever PERF_MEM_DATA_SRC_NONE is used to set
default values.

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/util/event.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 6663a676eadc..de20e01c9d72 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -89,7 +89,8 @@ enum {
PERF_MEM_S(LVL, NA) |\
PERF_MEM_S(SNOOP, NA) |\
PERF_MEM_S(LOCK, NA) |\
- PERF_MEM_S(TLB, NA))
+ PERF_MEM_S(TLB, NA) |\
+ PERF_MEM_S(LVLNUM, NA))

/* Attribute type for custom synthesized events */
#define PERF_TYPE_SYNTH (INT_MAX + 1U)
--
2.34.1

2023-04-07 11:29:22

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 7/9] perf mem: Refactor perf_mem__lvl_scnprintf()

Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() is
non-intuitive. For ex, it ignores mem_lvl when mem_hops is set but
considers it otherwise. It prints both mem_lvl_num and mem_lvl when
mem_hops is not set.

Refactor this function such that it behaves more intuitively: Use
new API mem_lvl_num|mem_remote|mem_hops if mem_lvl_num contains
value other than PERF_MEM_LVLNUM_NA. Otherwise, fallback to old API
mem_lvl. Since new API has no way to indicate MISS, use it from old
api, otherwise don't club old and new APIs while parsing as well as
printing.

Before:
$ sudo ./perf mem report -F sample,mem --stdio
# Samples Memory access
# ............ ........................
#
250097 N/A
188907 L1 hit
4116 L2 hit
3496 Remote Cache (1 hop) hit
3271 Remote Cache (2 hops) hit
873 L3 hit
598 Local RAM hit
438 Remote RAM (1 hop) hit
1 Uncached hit

After:
$ sudo ./perf mem report -F sample,mem --stdio
# Samples Memory access
# ............ .......................................
#
255517 N/A
189989 L1 hit
4541 L2 hit
3363 Remote core, same node Any cache hit
3336 Remote node, same socket Any cache hit
1275 L3 hit
743 RAM hit
545 Remote node, same socket RAM hit
4 Uncached hit

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/util/mem-events.c | 89 +++++++++++++++++++-----------------
1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 3a7c72be8326..ed1ee4b05356 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -344,66 +344,71 @@ static int perf_mem__op_scnprintf(char *out, size_t sz, struct mem_info *mem_inf

int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
{
- size_t i, l = 0;
- u64 m = PERF_MEM_LVL_NA;
- u64 hit, miss;
+ union perf_mem_data_src data_src;
int printed = 0;
-
- if (mem_info)
- m = mem_info->data_src.mem_lvl;
+ size_t l = 0;
+ size_t i;
+ int lvl;
+ char hit_miss[5] = {0};

sz -= 1; /* -1 for null termination */
out[0] = '\0';

- hit = m & PERF_MEM_LVL_HIT;
- miss = m & PERF_MEM_LVL_MISS;
+ if (!mem_info)
+ goto na;

- /* already taken care of */
- m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS);
+ data_src = mem_info->data_src;

- if (mem_info && mem_info->data_src.mem_remote) {
- strcat(out, "Remote ");
- l += 7;
- }
+ if (data_src.mem_lvl & PERF_MEM_LVL_HIT)
+ memcpy(hit_miss, "hit", 3);
+ else if (data_src.mem_lvl & PERF_MEM_LVL_MISS)
+ memcpy(hit_miss, "miss", 4);

- /*
- * Incase mem_hops field is set, we can skip printing data source via
- * PERF_MEM_LVL namespace.
- */
- if (mem_info && mem_info->data_src.mem_hops) {
- l += scnprintf(out + l, sz - l, "%s ", mem_hops[mem_info->data_src.mem_hops]);
- } else {
- for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
- if (!(m & 0x1))
- continue;
- if (printed++) {
- strcat(out, " or ");
- l += 4;
- }
- l += scnprintf(out + l, sz - l, mem_lvl[i]);
+ lvl = data_src.mem_lvl_num;
+ if (lvl && lvl != PERF_MEM_LVLNUM_NA) {
+ if (data_src.mem_remote) {
+ strcat(out, "Remote ");
+ l += 7;
}
+
+ if (data_src.mem_hops)
+ l += scnprintf(out + l, sz - l, "%s ", mem_hops[data_src.mem_hops]);
+
+ if (mem_lvlnum[lvl])
+ l += scnprintf(out + l, sz - l, mem_lvlnum[lvl]);
+ else
+ l += scnprintf(out + l, sz - l, "L%d", lvl);
+
+ l += scnprintf(out + l, sz - l, " %s", hit_miss);
+ return l;
}

- if (mem_info && mem_info->data_src.mem_lvl_num) {
- int lvl = mem_info->data_src.mem_lvl_num;
+ lvl = data_src.mem_lvl;
+ if (!lvl)
+ goto na;
+
+ lvl &= ~(PERF_MEM_LVL_NA | PERF_MEM_LVL_HIT | PERF_MEM_LVL_MISS);
+ if (!lvl)
+ goto na;
+
+ for (i = 0; lvl && i < ARRAY_SIZE(mem_lvl); i++, lvl >>= 1) {
+ if (!(lvl & 0x1))
+ continue;
if (printed++) {
strcat(out, " or ");
l += 4;
}
- if (mem_lvlnum[lvl])
- l += scnprintf(out + l, sz - l, mem_lvlnum[lvl]);
- else
- l += scnprintf(out + l, sz - l, "L%d", lvl);
+ l += scnprintf(out + l, sz - l, mem_lvl[i]);
}

- if (l == 0)
- l += scnprintf(out + l, sz - l, "N/A");
- if (hit)
- l += scnprintf(out + l, sz - l, " hit");
- if (miss)
- l += scnprintf(out + l, sz - l, " miss");
+ if (printed) {
+ l += scnprintf(out + l, sz - l, " %s", hit_miss);
+ return l;
+ }

- return l;
+na:
+ strcat(out, "N/A");
+ return 3;
}

static const char * const snoop_access[] = {
--
2.34.1

2023-04-07 11:29:49

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 9/9] perf script ibs: Change bit description according to latest PPR

Some of the IBS_OP_DATA2 bit descriptions were stale (taken from old
version of PPR). Change it according to latest PPR.

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/util/amd-sample-raw.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/amd-sample-raw.c b/tools/perf/util/amd-sample-raw.c
index b0e70ce9d87a..6a6ddba76c75 100644
--- a/tools/perf/util/amd-sample-raw.c
+++ b/tools/perf/util/amd-sample-raw.c
@@ -105,17 +105,17 @@ static void pr_ibs_op_data2_extended(union ibs_op_data2 reg)
static const char * const data_src_str[] = {
"",
" DataSrc 1=Local L3 or other L1/L2 in CCX",
- " DataSrc 2=A peer cache in a near CCX",
- " DataSrc 3=Data returned from DRAM",
+ " DataSrc 2=Another CCX cache in the same NUMA node",
+ " DataSrc 3=DRAM",
" DataSrc 4=(reserved)",
- " DataSrc 5=A peer cache in a far CCX",
- " DataSrc 6=DRAM address map with \"long latency\" bit set",
- " DataSrc 7=Data returned from MMIO/Config/PCI/APIC",
- " DataSrc 8=Extension Memory (S-Link, GenZ, etc)",
+ " DataSrc 5=Another CCX cache in a different NUMA node",
+ " DataSrc 6=Long-latency DIMM",
+ " DataSrc 7=MMIO/Config/PCI/APIC",
+ " DataSrc 8=Extension Memory",
" DataSrc 9=(reserved)",
" DataSrc 10=(reserved)",
" DataSrc 11=(reserved)",
- " DataSrc 12=Peer Agent Memory",
+ " DataSrc 12=Coherent Memory of a different processor type",
/* 13 to 31 are reserved. Avoid printing them. */
};
int data_src = (reg.data_src_hi << 3) | reg.data_src_lo;
--
2.34.1

2023-04-07 11:30:49

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 4/9] perf headers: Sync uapi/linux/perf_event.h

... to bring PERF_MEM_LVLNUM_UNC definition to userspace

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 37675437b768..39c6a250dd1b 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1339,7 +1339,8 @@ union perf_mem_data_src {
#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */
#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */
#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */
-/* 5-0x8 available */
+/* 5-0x7 available */
+#define PERF_MEM_LVLNUM_UNC 0x08 /* Uncached */
#define PERF_MEM_LVLNUM_CXL 0x09 /* CXL */
#define PERF_MEM_LVLNUM_IO 0x0a /* I/O */
#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
--
2.34.1

2023-04-07 11:31:02

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 6/9] perf mem: Add support for printing PERF_MEM_LVLNUM_UNC

Add support for printing PERF_MEM_LVLNUM_UNC in perf mem report.

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/util/mem-events.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index b3a91093069a..3a7c72be8326 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -295,6 +295,7 @@ static const char * const mem_lvl[] = {
};

static const char * const mem_lvlnum[] = {
+ [PERF_MEM_LVLNUM_UNC] = "Uncached",
[PERF_MEM_LVLNUM_CXL] = "CXL",
[PERF_MEM_LVLNUM_IO] = "I/O",
[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
--
2.34.1

2023-04-07 11:31:10

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v3 8/9] perf mem: Increase HISTC_MEM_LVL column size to 39 chars

39 is taken from the length of longest printable new API string:
"Remote socket, same board Any cache hit". Although, using old API
can result into even longer strings, let's not overkill by making
it dynamic length.

Signed-off-by: Ravi Bangoria <[email protected]>
---
tools/perf/util/hist.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b6e4b4edde43..46ce4527919a 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -207,7 +207,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
hists__new_col_len(hists, HISTC_MEM_TLB, 22);
hists__new_col_len(hists, HISTC_MEM_SNOOP, 12);
- hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3);
+ hists__new_col_len(hists, HISTC_MEM_LVL, 36 + 3);
hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
hists__new_col_len(hists, HISTC_MEM_BLOCKED, 10);
--
2.34.1

2023-04-07 21:52:18

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements

Hi Ravi,

On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <[email protected]> wrote:
>
> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
> limitations. Mainly:
>
> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
> allows it. Setting multiple data sources is useful because IBS on
> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
> is only one such DataSrc(2h) though).
> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>
> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
> setting new API fields, convert all individual field assignments to
> compile time wrapper macros built using PERF_MEM_S(). Also convert
> DataSrc conditional code to array lookups.
>
> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
> non-intuitive. Make it sane.

Looks good, but I think you need to split kernel and user patches.

>
> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
> v2->v3:
> - IBS: Don't club RmtNode with DataSrc=7 (IO)
> - Make perf_mem__lvl_scnprintf() more sane
> - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
> and interpreat it in tool.
> - Add PERF_MEM_LVLNUM_NA to default data_src value
> - Change some of the IBS bit description according to latest PPR
>
> Namhyung Kim (1):
> perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
>
> Ravi Bangoria (8):
> perf/mem: Introduce PERF_MEM_LVLNUM_UNC
> perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
> perf headers: Sync uapi/linux/perf_event.h
> perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
> perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
> perf mem: Refactor perf_mem__lvl_scnprintf()
> perf mem: Increase HISTC_MEM_LVL column size to 39 chars
> perf script ibs: Change bit description according to latest PPR

Acked-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung


>
> arch/x86/events/amd/ibs.c | 156 +++++++++++---------------
> include/linux/perf_event.h | 3 +-
> include/uapi/linux/perf_event.h | 3 +-
> tools/include/uapi/linux/perf_event.h | 3 +-
> tools/perf/util/amd-sample-raw.c | 14 +--
> tools/perf/util/event.h | 3 +-
> tools/perf/util/hist.c | 2 +-
> tools/perf/util/mem-events.c | 90 ++++++++-------
> 8 files changed, 132 insertions(+), 142 deletions(-)
>
> --
> 2.34.1
>

2023-04-10 02:26:34

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements

On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
> Hi Ravi,
>
> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <[email protected]> wrote:
>>
>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>> limitations. Mainly:
>>
>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>> allows it. Setting multiple data sources is useful because IBS on
>> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>> is only one such DataSrc(2h) though).
>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>
>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>> setting new API fields, convert all individual field assignments to
>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>> DataSrc conditional code to array lookups.
>>
>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>> non-intuitive. Make it sane.
>
> Looks good, but I think you need to split kernel and user patches.

Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
Arnaldo, Peter, please let me know if you wants to split the series and
resend.

>
>>
>> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
>> v2->v3:
>> - IBS: Don't club RmtNode with DataSrc=7 (IO)
>> - Make perf_mem__lvl_scnprintf() more sane
>> - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
>> and interpreat it in tool.
>> - Add PERF_MEM_LVLNUM_NA to default data_src value
>> - Change some of the IBS bit description according to latest PPR
>>
>> Namhyung Kim (1):
>> perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
>>
>> Ravi Bangoria (8):
>> perf/mem: Introduce PERF_MEM_LVLNUM_UNC
>> perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
>> perf headers: Sync uapi/linux/perf_event.h
>> perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
>> perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
>> perf mem: Refactor perf_mem__lvl_scnprintf()
>> perf mem: Increase HISTC_MEM_LVL column size to 39 chars
>> perf script ibs: Change bit description according to latest PPR
>
> Acked-by: Namhyung Kim <[email protected]>

Thanks!

2023-04-10 15:21:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements

Em Mon, Apr 10, 2023 at 07:53:57AM +0530, Ravi Bangoria escreveu:
> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
> > Hi Ravi,
> >
> > On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <[email protected]> wrote:
> >>
> >> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
> >> limitations. Mainly:
> >>
> >> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
> >> allows it. Setting multiple data sources is useful because IBS on
> >> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
> >> is only one such DataSrc(2h) though).
> >> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
> >> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
> >> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
> >>
> >> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
> >> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
> >> setting new API fields, convert all individual field assignments to
> >> compile time wrapper macros built using PERF_MEM_S(). Also convert
> >> DataSrc conditional code to array lookups.
> >>
> >> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
> >> non-intuitive. Make it sane.
> >
> > Looks good, but I think you need to split kernel and user patches.
>
> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
> Arnaldo, Peter, please let me know if you wants to split the series and
> resend.

I can always use b4's -P option :-) So no need to resubmit, I can pick
the tools bits,

- Arnaldo

> >
> >>
> >> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
> >> v2->v3:
> >> - IBS: Don't club RmtNode with DataSrc=7 (IO)
> >> - Make perf_mem__lvl_scnprintf() more sane
> >> - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
> >> and interpreat it in tool.
> >> - Add PERF_MEM_LVLNUM_NA to default data_src value
> >> - Change some of the IBS bit description according to latest PPR
> >>
> >> Namhyung Kim (1):
> >> perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
> >>
> >> Ravi Bangoria (8):
> >> perf/mem: Introduce PERF_MEM_LVLNUM_UNC
> >> perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
> >> perf headers: Sync uapi/linux/perf_event.h
> >> perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
> >> perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
> >> perf mem: Refactor perf_mem__lvl_scnprintf()
> >> perf mem: Increase HISTC_MEM_LVL column size to 39 chars
> >> perf script ibs: Change bit description according to latest PPR
> >
> > Acked-by: Namhyung Kim <[email protected]>
>
> Thanks!

--

- Arnaldo

2023-04-10 22:39:21

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements

Em Mon, Apr 10, 2023 at 12:15:48PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Apr 10, 2023 at 07:53:57AM +0530, Ravi Bangoria escreveu:
> > On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
> > > Hi Ravi,
> > >
> > > On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <[email protected]> wrote:
> > >>
> > >> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
> > >> limitations. Mainly:
> > >>
> > >> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
> > >> allows it. Setting multiple data sources is useful because IBS on
> > >> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
> > >> is only one such DataSrc(2h) though).
> > >> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
> > >> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
> > >> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
> > >>
> > >> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
> > >> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
> > >> setting new API fields, convert all individual field assignments to
> > >> compile time wrapper macros built using PERF_MEM_S(). Also convert
> > >> DataSrc conditional code to array lookups.
> > >>
> > >> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
> > >> non-intuitive. Make it sane.
> > >
> > > Looks good, but I think you need to split kernel and user patches.
> >
> > Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
> > Arnaldo, Peter, please let me know if you wants to split the series and
> > resend.
>
> I can always use b4's -P option :-) So no need to resubmit, I can pick
>> the tools bits,

Done

> - Arnaldo
>
> > >
> > >>
> > >> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
> > >> v2->v3:
> > >> - IBS: Don't club RmtNode with DataSrc=7 (IO)
> > >> - Make perf_mem__lvl_scnprintf() more sane
> > >> - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
> > >> and interpreat it in tool.
> > >> - Add PERF_MEM_LVLNUM_NA to default data_src value
> > >> - Change some of the IBS bit description according to latest PPR
> > >>
> > >> Namhyung Kim (1):
> > >> perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
> > >>
> > >> Ravi Bangoria (8):
> > >> perf/mem: Introduce PERF_MEM_LVLNUM_UNC
> > >> perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
> > >> perf headers: Sync uapi/linux/perf_event.h
> > >> perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
> > >> perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
> > >> perf mem: Refactor perf_mem__lvl_scnprintf()
> > >> perf mem: Increase HISTC_MEM_LVL column size to 39 chars
> > >> perf script ibs: Change bit description according to latest PPR
> > >
> > > Acked-by: Namhyung Kim <[email protected]>
> >
> > Thanks!
>
> --
>
> - Arnaldo

--

- Arnaldo

2023-05-16 03:33:04

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements

On 10-Apr-23 7:53 AM, Ravi Bangoria wrote:
> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
>> Hi Ravi,
>>
>> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <[email protected]> wrote:
>>>
>>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>>> limitations. Mainly:
>>>
>>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>>> allows it. Setting multiple data sources is useful because IBS on
>>> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>>> is only one such DataSrc(2h) though).
>>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>>> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>>> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>>
>>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>>> setting new API fields, convert all individual field assignments to
>>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>>> DataSrc conditional code to array lookups.
>>>
>>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>>> non-intuitive. Make it sane.
>>
>> Looks good, but I think you need to split kernel and user patches.
>
> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
> Arnaldo, Peter, please let me know if you wants to split the series and
> resend.

Hi Peter, tools/ patches are already upstream. Can you please pick up
kernel changes.

Thanks,
Ravi

2023-05-29 04:21:09

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements

On 16-May-23 8:15 AM, Ravi Bangoria wrote:
> On 10-Apr-23 7:53 AM, Ravi Bangoria wrote:
>> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
>>> Hi Ravi,
>>>
>>> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <[email protected]> wrote:
>>>>
>>>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>>>> limitations. Mainly:
>>>>
>>>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>>>> allows it. Setting multiple data sources is useful because IBS on
>>>> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>>>> is only one such DataSrc(2h) though).
>>>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>>>> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>>>> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>>>
>>>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>>>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>>>> setting new API fields, convert all individual field assignments to
>>>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>>>> DataSrc conditional code to array lookups.
>>>>
>>>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>>>> non-intuitive. Make it sane.
>>>
>>> Looks good, but I think you need to split kernel and user patches.
>>
>> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
>> Arnaldo, Peter, please let me know if you wants to split the series and
>> resend.
>
> Hi Peter, tools/ patches are already upstream. Can you please pick up
> kernel changes.

Gentle ping, Peter!

Thanks,
Ravi

2023-06-30 07:06:35

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements

On 29-May-23 9:35 AM, Ravi Bangoria wrote:
> On 16-May-23 8:15 AM, Ravi Bangoria wrote:
>> On 10-Apr-23 7:53 AM, Ravi Bangoria wrote:
>>> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
>>>> Hi Ravi,
>>>>
>>>> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <[email protected]> wrote:
>>>>>
>>>>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>>>>> limitations. Mainly:
>>>>>
>>>>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>>>>> allows it. Setting multiple data sources is useful because IBS on
>>>>> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>>>>> is only one such DataSrc(2h) though).
>>>>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>>>>> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>>>>> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>>>>
>>>>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>>>>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>>>>> setting new API fields, convert all individual field assignments to
>>>>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>>>>> DataSrc conditional code to array lookups.
>>>>>
>>>>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>>>>> non-intuitive. Make it sane.
>>>>
>>>> Looks good, but I think you need to split kernel and user patches.
>>>
>>> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
>>> Arnaldo, Peter, please let me know if you wants to split the series and
>>> resend.
>>
>> Hi Peter, tools/ patches are already upstream. Can you please pick up
>> kernel changes.
>
> Gentle ping, Peter!

Hello Peter, this is pending from long time. Can you please consider taking
it. Please let me know if you want me to resend the patches.

Thanks,
Ravi