2019-11-04 09:21:09

by Zhangshaokun

[permalink] [raw]
Subject: [RFC] About perf-mem command support on arm64 platform

Hi all,

perf-mem is used to profile memory access which has been implemented on x86
platform. It needs mem-stores events and mem-loads/load-latency.
For mem-stores events, it is MEM_INST_RETIRED_ALL_STORES whose raw number
is r82d0, and mem-loads/load-latency is from PEBS if I follow its code.

Now, for some arm64 cores, like HiSilicon's tsv110 and ARM's Neoverse N1,
has supported the SPE(Statistical Profiling Extensions), so is it a
possibility that perf-mem is supported on arm64?
https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1

For arm64 PMU, it has 'st_retired' event that the event number is 0x0007
which is equal to mem-stores on x86, if we want support perf-mem, it seems
that 'st_retired' shall be replaced by 'mem-stores'
in arch/arm64/kernel/perf_event.c file. Of course, the cpu core should
support st_retired event. I'm not sure Will/Mark are happy on this.;-)

For mem-loads/load-latency, we can derive them from SPE sampled data which
supports by load_filter and min_latency in SPE driver. and we may do some
work on tools/perf/builtin-mem.c.

From the above conditions, it seems that we may have the opportunity to
support the perf-mem command on arm64.
I'm not very sure about it, so I send this RFC and any comments are welcome.

Thanks,
Shaokun



2019-11-04 14:28:02

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC] About perf-mem command support on arm64 platform

On Mon, Nov 04, 2019 at 05:18:00PM +0800, Shaokun Zhang wrote:
> perf-mem is used to profile memory access which has been implemented on x86
> platform. It needs mem-stores events and mem-loads/load-latency.
> For mem-stores events, it is MEM_INST_RETIRED_ALL_STORES whose raw number
> is r82d0, and mem-loads/load-latency is from PEBS if I follow its code.
>
> Now, for some arm64 cores, like HiSilicon's tsv110 and ARM's Neoverse N1,
> has supported the SPE(Statistical Profiling Extensions), so is it a
> possibility that perf-mem is supported on arm64?
> https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1

I don't understand the relationship you're trying to draw between mem-stores
and SPE. How does perf-mem work and what does it actually require from the
CPU?

One thing that may be worth noting is that SPE isn't generally able to
capture information about all instructions being executed by the CPU:
instead, it instructions (most likely micro-ops) are sampled based on
some user-specified period. The CPU advertises a minimum recommended
period which we expose under /sys and enforce when programming events.

> For arm64 PMU, it has 'st_retired' event that the event number is 0x0007
> which is equal to mem-stores on x86, if we want support perf-mem, it seems
> that 'st_retired' shall be replaced by 'mem-stores'
> in arch/arm64/kernel/perf_event.c file. Of course, the cpu core should
> support st_retired event. I'm not sure Will/Mark are happy on this.;-)
>
> For mem-loads/load-latency, we can derive them from SPE sampled data which
> supports by load_filter and min_latency in SPE driver. and we may do some
> work on tools/perf/builtin-mem.c.

I don't see how you could reconcile the sampling nature of SPE with a
CPU PMU counter, particularly as filtering in SPE happens /after/ sampling.

> From the above conditions, it seems that we may have the opportunity to
> support the perf-mem command on arm64.
> I'm not very sure about it, so I send this RFC and any comments are welcome.

I don't think there's enough information here to comment meaningfully more
than SPE != PEBS.

Will

2019-11-05 07:53:58

by Zhangshaokun

[permalink] [raw]
Subject: Re: [RFC] About perf-mem command support on arm64 platform

Hi Will,

Thanks your reply firstly.

On 2019/11/4 22:26, Will Deacon wrote:
> On Mon, Nov 04, 2019 at 05:18:00PM +0800, Shaokun Zhang wrote:
>> perf-mem is used to profile memory access which has been implemented on x86
>> platform. It needs mem-stores events and mem-loads/load-latency.
>> For mem-stores events, it is MEM_INST_RETIRED_ALL_STORES whose raw number
>> is r82d0, and mem-loads/load-latency is from PEBS if I follow its code.
>>
>> Now, for some arm64 cores, like HiSilicon's tsv110 and ARM's Neoverse N1,
>> has supported the SPE(Statistical Profiling Extensions), so is it a
>> possibility that perf-mem is supported on arm64?
>> https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1
>
> I don't understand the relationship you're trying to draw between mem-stores

There may be some misunderstanding if I don't describe it correctly. From
the implementation of perf-mem on x86, it needs:
a. mem-stores PMU events;
b. mem-loads/load-latency from PEBS;

If arm64 plans to support perf-mem, we need to support mem-stores and
mem-loads/load-latency, and we can derive the latter from SPE.

> and SPE. How does perf-mem work and what does it actually require from the
> CPU?

An excellent question, I don't check the perf-mem carefully. Just from my
understanding, it needs the mentioned events and PEBS sampled data that is
filtered by desired latency for loads event.

>
> One thing that may be worth noting is that SPE isn't generally able to
> capture information about all instructions being executed by the CPU:

Got it and I have used SPE on Huawei Kunpeng 920 SoC.

> instead, it instructions (most likely micro-ops) are sampled based on
> some user-specified period. The CPU advertises a minimum recommended

Ok, If I follow it right, perf record -c XXX to define the period for SPE.

> period which we expose under /sys and enforce when programming events.
>
>> For arm64 PMU, it has 'st_retired' event that the event number is 0x0007
>> which is equal to mem-stores on x86, if we want support perf-mem, it seems
>> that 'st_retired' shall be replaced by 'mem-stores'
>> in arch/arm64/kernel/perf_event.c file. Of course, the cpu core should
>> support st_retired event. I'm not sure Will/Mark are happy on this.;-)
>>
>> For mem-loads/load-latency, we can derive them from SPE sampled data which
>> supports by load_filter and min_latency in SPE driver. and we may do some
>> work on tools/perf/builtin-mem.c.
>
> I don't see how you could reconcile the sampling nature of SPE with a
> CPU PMU counter, particularly as filtering in SPE happens /after/ sampling.
>

Jiri, can you give some implementations of perf-mem on mem-stores and
PEBS please?

>> From the above conditions, it seems that we may have the opportunity to
>> support the perf-mem command on arm64.
>> I'm not very sure about it, so I send this RFC and any comments are welcome.
>
> I don't think there's enough information here to comment meaningfully more
> than SPE != PEBS.

We can get load-latency from SPE now and want to throw the thoughts whether
we should do perf-mem on arm64.

Thanks,
Shaokun

>
> Will
>
> .
>