When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores so we can detect situtions like cache line contention
and transfers on Arm platforms.
This changes enables the expected behavior of perf c2c on a system with SPE where
lines that are shared among multiple cores show up in perf c2c output.
These changes switch to use mem_lvl_num to encode the level information instead
of mem_lvl which is being deprecated, but I haven't found other users of
mem_lvl_num.
Changes in v3:
* Assume ther are only three levels of cache hierarchy
* Split the mem_lvl_num and HITM changes in c2c into two seperate patches
Ali Saidi (3):
perf arm-spe: Use SPE data source for neoverse cores
perf mem: Support mem_lvl_num in c2c command
perf mem: Support HITM for when mem_lvl_num is any
.../util/arm-spe-decoder/arm-spe-decoder.c | 1 +
.../util/arm-spe-decoder/arm-spe-decoder.h | 12 ++
tools/perf/util/arm-spe.c | 109 +++++++++++++++---
tools/perf/util/mem-events.c | 20 +++-
4 files changed, 124 insertions(+), 18 deletions(-)
--
2.32.0
Hi Ali, thank you for your patches
On 18/03/2022 19:59, Ali Saidi wrote:
> When synthesizing data from SPE, augment the type with source information
> for Arm Neoverse cores so we can detect situtions like cache line contention
> and transfers on Arm platforms.
>
> This changes enables the expected behavior of perf c2c on a system with SPE where
> lines that are shared among multiple cores show up in perf c2c output.
>
> These changes switch to use mem_lvl_num to encode the level information instead
> of mem_lvl which is being deprecated, but I haven't found other users of
> mem_lvl_num.
>
> Changes in v3:
> * Assume ther are only three levels of cache hierarchy
> * Split the mem_lvl_num and HITM changes in c2c into two seperate patches
>
> Ali Saidi (3):
> perf arm-spe: Use SPE data source for neoverse cores
> perf mem: Support mem_lvl_num in c2c command
> perf mem: Support HITM for when mem_lvl_num is any
>
> .../util/arm-spe-decoder/arm-spe-decoder.c | 1 +
> .../util/arm-spe-decoder/arm-spe-decoder.h | 12 ++
> tools/perf/util/arm-spe.c | 109 +++++++++++++++---
> tools/perf/util/mem-events.c | 20 +++-
> 4 files changed, 124 insertions(+), 18 deletions(-)
>
I tested on a Neoverse N1 system using the below commands and the output
looks either unchanged or improved compared to before. For example:
| $ perf mem record -e spe-ldst -a -- sleep 4
| $ perf mem report
|
| 1.39% 1 1263 L3 miss [k] 0xffffb9a34bda2088
| 0.58% 1 529 L1 miss [k] 0xffffb9a34bd3be7c
| 0.34% 1 310 N/A [k] 0xffffb9a34baf4d28
| 0.34% 1 309 N/A [k] 0xffffb9a34bb82844
... became:
| 1.39% 1 1263 RAM hit [k] 0xffffb9a34bda2088
| 0.58% 1 529 L2 hit [k] 0xffffb9a34bd3be7c
| 0.34% 1 310 L1 hit [k] 0xffffb9a34baf4d28
| 0.34% 1 309 L1 hit [k] 0xffffb9a34bb82844
Also some L3 misses are now labeled as "Any cache hit" with the Snoop
bit set. For example:
| 0.37% 1 332 L3 miss [.] 0x0000aaaadf70a700 N/A
... became:
| 0.37% 1 332 Any cache hit [.] 0x0000aaaadf70a700 HitM
Tested-by: German Gomez <[email protected]>
Reviewed-by: German Gomez <[email protected]>
Thanks,
German
(I didn't run on a non-Neoverse system but it doesn't look like any
behaviour is changed for those)