Hi Stephane,
On Thu, Feb 03, 2022 at 11:39:40PM -0800, Stephane Eranian wrote:
> With the existing code, the following command:
>
> $ perf stat -e branches sleep 0
> Performance counter stats for 'sleep 0':
> <not supported> branches
>
> on N1 core (pmuv3).
This is definitely not ideal. :(
> This is due to the fact that the mapping for the generic event is wrong.
I don't think that's quite true; more detail below. This is certainly *messy*
though.
> It is using ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED which is not implemented
> on N1 (and most likely on any PMUv3 implementations). However, there is
> another supported event ARMV8_PMUV3_PERFCTR_BR_RETIRED measuring the same
> condition.
I have a couple of concerns here:
1) Both PC_WRITE_RETIRED and BR_RETIRED are OPTIONAL (though the Arm strongly
recommends that BR_RETIRED is implemented), so CPUs may exist which only
support one of the two, or both.
So as-is, this patch may break working support for CPUs which have
PC_WRITE_RETIRED but not BR_RETIRED.
IIUC we should be able to detect whether either are implemented by looking
at PMCEID, and we could take that into account when mapping the event.
2) IIUC (even with ARMv8.6) there is a potential semantic difference between
PC_WRITE_RETIRED and BR_RETIRED, in that e.g. PC_WRITE_RETIRED must include
exception returns while this is IMPLEMENTATION DEFINED for BR_RETIRED.
I guess this might not matter all that much given the precise definition of
"Software change of the PC" is IMPLEMENTATION DEFINED, but I don't think
it's true that the two events count "the same condition", and we should be
more explicit about that.
> This patch switches the mapping to ARMV8_PMUV3_PERFCTR_BR_RETIRED so that
> the perf stat command above works.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> arch/arm64/kernel/perf_event.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index cab678ed6618..ec2b98343a0b 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -45,7 +45,7 @@ static const unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
> [PERF_COUNT_HW_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_INST_RETIRED,
> [PERF_COUNT_HW_CACHE_REFERENCES] = ARMV8_PMUV3_PERFCTR_L1D_CACHE,
> [PERF_COUNT_HW_CACHE_MISSES] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
> - [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED,
> + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_BR_RETIRED,
As above, I don't think we can unconditionally make this change, and instead
should have the mapping function take PMCEID into account to map the event (or
bail out if we don't know a suitable event is implemented).
Thanks,
Mark.
> [PERF_COUNT_HW_BRANCH_MISSES] = ARMV8_PMUV3_PERFCTR_BR_MIS_PRED,
> [PERF_COUNT_HW_BUS_CYCLES] = ARMV8_PMUV3_PERFCTR_BUS_CYCLES,
> [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = ARMV8_PMUV3_PERFCTR_STALL_FRONTEND,
> --
> 2.35.0.263.gb82422642f-goog
>
On Fri, Feb 4, 2022 at 3:48 AM Mark Rutland <[email protected]> wrote:
>
> Hi Stephane,
>
> On Thu, Feb 03, 2022 at 11:39:40PM -0800, Stephane Eranian wrote:
> > With the existing code, the following command:
> >
> > $ perf stat -e branches sleep 0
> > Performance counter stats for 'sleep 0':
> > <not supported> branches
> >
> > on N1 core (pmuv3).
>
> This is definitely not ideal. :(
>
> > This is due to the fact that the mapping for the generic event is wrong.
>
> I don't think that's quite true; more detail below. This is certainly *messy*
> though.
>
> > It is using ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED which is not implemented
> > on N1 (and most likely on any PMUv3 implementations). However, there is
> > another supported event ARMV8_PMUV3_PERFCTR_BR_RETIRED measuring the same
> > condition.
>
> I have a couple of concerns here:
>
> 1) Both PC_WRITE_RETIRED and BR_RETIRED are OPTIONAL (though the Arm strongly
> recommends that BR_RETIRED is implemented), so CPUs may exist which only
> support one of the two, or both.
>
That was my fear here.
> So as-is, this patch may break working support for CPUs which have
> PC_WRITE_RETIRED but not BR_RETIRED.
>
> IIUC we should be able to detect whether either are implemented by looking
> at PMCEID, and we could take that into account when mapping the event.
>
> 2) IIUC (even with ARMv8.6) there is a potential semantic difference between
> PC_WRITE_RETIRED and BR_RETIRED, in that e.g. PC_WRITE_RETIRED must include
> exception returns while this is IMPLEMENTATION DEFINED for BR_RETIRED.
>
> I guess this might not matter all that much given the precise definition of
> "Software change of the PC" is IMPLEMENTATION DEFINED, but I don't think
> it's true that the two events count "the same condition", and we should be
> more explicit about that.
>
It also depends on the Linux definition for "branches" which I have never seen.
So all we can do is look at the x86 variants and check on the
exception return case.
I will do that.
> > This patch switches the mapping to ARMV8_PMUV3_PERFCTR_BR_RETIRED so that
> > the perf stat command above works.
> >
> > Signed-off-by: Stephane Eranian <[email protected]>
> > ---
> > arch/arm64/kernel/perf_event.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > index cab678ed6618..ec2b98343a0b 100644
> > --- a/arch/arm64/kernel/perf_event.c
> > +++ b/arch/arm64/kernel/perf_event.c
> > @@ -45,7 +45,7 @@ static const unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
> > [PERF_COUNT_HW_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_INST_RETIRED,
> > [PERF_COUNT_HW_CACHE_REFERENCES] = ARMV8_PMUV3_PERFCTR_L1D_CACHE,
> > [PERF_COUNT_HW_CACHE_MISSES] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
> > - [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED,
> > + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_BR_RETIRED,
>
> As above, I don't think we can unconditionally make this change, and instead
> should have the mapping function take PMCEID into account to map the event (or
> bail out if we don't know a suitable event is implemented).
>
The PMCEID does contains what's implemented and that it why the event
disappears on some implementation.
So instead we could use it to adjust the mapping. Let me try to do
that instead at least for this event given there is
a choice of mapping. And yes, if neither of these exist then the event
does not exist.
There is always the possibility to use an actual event code and that
bypasses the PMCEID check.
> Thanks,
> Mark.
>
> > [PERF_COUNT_HW_BRANCH_MISSES] = ARMV8_PMUV3_PERFCTR_BR_MIS_PRED,
> > [PERF_COUNT_HW_BUS_CYCLES] = ARMV8_PMUV3_PERFCTR_BUS_CYCLES,
> > [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = ARMV8_PMUV3_PERFCTR_STALL_FRONTEND,
> > --
> > 2.35.0.263.gb82422642f-goog
> >