2018-10-01 10:08:30

by Ganapatrao Kulkarni

[permalink] [raw]
Subject: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.

Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
since L1D_CACHE_REFILL counts both load and store misses.
Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
from armv8_pmuv3 cache mapping.

Signed-off-by: Ganapatrao Kulkarni <[email protected]>
---
arch/arm64/kernel/perf_event.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 33147aacdafd..6a67ad22d1eb 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -207,17 +207,9 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
PERF_CACHE_MAP_ALL_UNSUPPORTED,

- [C(L1D)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE,
- [C(L1D)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
- [C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE,
- [C(L1D)][C(OP_WRITE)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
-
[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1I_CACHE,
[C(L1I)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL,

- [C(DTLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL,
- [C(DTLB)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_TLB,
-
[C(ITLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL,
[C(ITLB)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1I_TLB,

--
2.18.0



2018-10-01 14:30:48

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.

Hi Ganapat,

On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote:
> Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
> armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
> since L1D_CACHE_REFILL counts both load and store misses.
> Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
> and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
> from armv8_pmuv3 cache mapping.
>
> Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> ---
> arch/arm64/kernel/perf_event.c | 8 --------
> 1 file changed, 8 deletions(-)

The "generic" events are really implemented on a best-effort basis, as
they rarely tend to map exactly to what the hardware supports. I think
they originally stemmed from the x86 CPU PMU, but that doesn't really
help us.

I had a discussion with Ingo back when we originally implemented perf
because I actually preferred not to implement the generic events at all.
However, he was strongly of the opinion that a best-effort approach was
sufficient to get casual users going with the tool, so that's what we went
with.

Will

2018-10-01 16:39:34

by Ganapatrao Kulkarni

[permalink] [raw]
Subject: Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.

Hi Will,

On Mon, Oct 1, 2018 at 7:58 PM Will Deacon <[email protected]> wrote:
>
> Hi Ganapat,
>
> On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote:
> > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
> > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
> > since L1D_CACHE_REFILL counts both load and store misses.
> > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
> > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
> > from armv8_pmuv3 cache mapping.
> >
> > Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> > ---
> > arch/arm64/kernel/perf_event.c | 8 --------
> > 1 file changed, 8 deletions(-)
>
> The "generic" events are really implemented on a best-effort basis, as
> they rarely tend to map exactly to what the hardware supports. I think
> they originally stemmed from the x86 CPU PMU, but that doesn't really
> help us.

This works fairly well for DT based boots, since almost all SoCs have
added remapping using custom dt object binding.
However we have concluded in the past to drop SoC specific from the
ACPI mapping and use json to add SoC/micro architecture specific
events support.
At present , When we boot with ACPI, it is misleading for these events.

In fact, this has been pointed internally from benchmark team and
reported it as hardware bug!
IMHO, it would be much simpler to delete these misleading events
mapping rather explaining to perf tool users.

We already have proper mapping for these events,
armv8_pmuv3_0/l1d_cache_refill/
armv8_pmuv3_0/l1d_cache/
[core imp def:]
l1d_cache_rd
l1d_cache_wr
l1d_cache_refill_rd
l1d_cache_refill_wr

>
> I had a discussion with Ingo back when we originally implemented perf
> because I actually preferred not to implement the generic events at all.
> However, he was strongly of the opinion that a best-effort approach was
> sufficient to get casual users going with the tool, so that's what we went
> with.

thanks for the background info, these generic mapping fairly works
except these events.

>
> Will

thanks,
Ganapat

2018-10-04 05:42:56

by Ganapatrao Kulkarni

[permalink] [raw]
Subject: Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.

Hi Will,

can you please pull this patch?

On Mon, Oct 1, 2018 at 10:09 PM Ganapatrao Kulkarni <[email protected]> wrote:
>
> Hi Will,
>
> On Mon, Oct 1, 2018 at 7:58 PM Will Deacon <[email protected]> wrote:
> >
> > Hi Ganapat,
> >
> > On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote:
> > > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
> > > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
> > > since L1D_CACHE_REFILL counts both load and store misses.
> > > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
> > > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
> > > from armv8_pmuv3 cache mapping.
> > >
> > > Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> > > ---
> > > arch/arm64/kernel/perf_event.c | 8 --------
> > > 1 file changed, 8 deletions(-)
> >
> > The "generic" events are really implemented on a best-effort basis, as
> > they rarely tend to map exactly to what the hardware supports. I think
> > they originally stemmed from the x86 CPU PMU, but that doesn't really
> > help us.
>
> This works fairly well for DT based boots, since almost all SoCs have
> added remapping using custom dt object binding.
> However we have concluded in the past to drop SoC specific from the
> ACPI mapping and use json to add SoC/micro architecture specific
> events support.
> At present , When we boot with ACPI, it is misleading for these events.
>
> In fact, this has been pointed internally from benchmark team and
> reported it as hardware bug!
> IMHO, it would be much simpler to delete these misleading events
> mapping rather explaining to perf tool users.
>
> We already have proper mapping for these events,
> armv8_pmuv3_0/l1d_cache_refill/
> armv8_pmuv3_0/l1d_cache/
> [core imp def:]
> l1d_cache_rd
> l1d_cache_wr
> l1d_cache_refill_rd
> l1d_cache_refill_wr
>
> >
> > I had a discussion with Ingo back when we originally implemented perf
> > because I actually preferred not to implement the generic events at all.
> > However, he was strongly of the opinion that a best-effort approach was
> > sufficient to get casual users going with the tool, so that's what we went
> > with.
>
> thanks for the background info, these generic mapping fairly works
> except these events.
>
> >
> > Will
>
> thanks,
> Ganapat

thanks,
Ganapat

2018-10-04 12:22:06

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.

Hi Ganapat,

On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> can you please pull this patch?

I still don't like the idea of just removing events like this, especially
when other architectures (including some x86 and Power CPUs afaict) playa
similar games for generic events, and these events do actually appear in
user code.

I also don't understand why you remove the TLB events. I think that logic
would imply we should remove all of the events, because we can't distinguish
prefetches from reads either. If we want to be consistent, then I think
we should just remove the OP_WRITE events for L1D and BPU -- would you be
ok with that instead?

Also, looking at the code, I think our PMCEID parsing is broken for 8.1
parts, where the upper 32 bits of the register are offset by 0x4000 in the
event numbering space.

Will

2018-10-04 19:41:49

by Ganapatrao Kulkarni

[permalink] [raw]
Subject: Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.

Hi Will,

On Thu, Oct 4, 2018 at 5:51 PM Will Deacon <[email protected]> wrote:
>
> Hi Ganapat,
>
> On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> > can you please pull this patch?
>
> I still don't like the idea of just removing events like this, especially
> when other architectures (including some x86 and Power CPUs afaict) playa
> similar games for generic events, and these events do actually appear in
> user code.
>
> I also don't understand why you remove the TLB events. I think that logic
> would imply we should remove all of the events, because we can't distinguish
> prefetches from reads either. If we want to be consistent, then I think
> we should just remove the OP_WRITE events for L1D and BPU -- would you be
> ok with that instead?

IIUC, dTLB-load-misses is mapped to
ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL(event 0x05) and dTLB-loads is
mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB(0x25). Which are as per spec,
counts TLB access/misses for both memory-read operation and
memory-write operation.

IMO, It won't help in keeping these events, knowingly that their
mapping is not accurate, only thing i can say to users is , dont use
events that are marked as "Hardware cache event"

>
> Also, looking at the code, I think our PMCEID parsing is broken for 8.1
> parts, where the upper 32 bits of the register are offset by 0x4000 in the
> event numbering space.

yes, i did not find any mapping in PMCEIDx registers for
implementation defined events, otherwise we would have remapped at
runtime.

>
> Will

thanks
Ganapat

2018-10-05 12:27:47

by John Garry

[permalink] [raw]
Subject: Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.

On 04/10/2018 13:22, Will Deacon wrote:
> Hi Ganapat,
>
> On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
>> can you please pull this patch?
>
> I still don't like the idea of just removing events like this, especially
> when other architectures (including some x86 and Power CPUs afaict) playa
> similar games for generic events, and these events do actually appear in
> user code.
>
> I also don't understand why you remove the TLB events. I think that logic
> would imply we should remove all of the events, because we can't distinguish
> prefetches from reads either. If we want to be consistent, then I think
> we should just remove the OP_WRITE events for L1D and BPU -- would you be
> ok with that instead?
>
> Also, looking at the code, I think our PMCEID parsing is broken for 8.1
> parts, where the upper 32 bits of the register are offset by 0x4000 in the
> event numbering space.
>

Here's something I noticed:
static ssize_t
armv8pmu_events_sysfs_show(struct device *dev,
struct device_attribute *attr, char *page)
{
struct perf_pmu_events_attr *pmu_attr;

pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);

return sprintf(page, "event=0x%03llx\n", pmu_attr->id);

Should this be min width now be 4, since event width is now 16 bits
(even though I don't know why we need to specify this width at all)?

Cheers,
John

> Will
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> .
>



2018-10-05 12:39:51

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.

On Fri, Oct 05, 2018 at 01:27:08PM +0100, John Garry wrote:
> On 04/10/2018 13:22, Will Deacon wrote:
> >Hi Ganapat,
> >
> >On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> >>can you please pull this patch?
> >
> >I still don't like the idea of just removing events like this, especially
> >when other architectures (including some x86 and Power CPUs afaict) playa
> >similar games for generic events, and these events do actually appear in
> >user code.
> >
> >I also don't understand why you remove the TLB events. I think that logic
> >would imply we should remove all of the events, because we can't distinguish
> >prefetches from reads either. If we want to be consistent, then I think
> >we should just remove the OP_WRITE events for L1D and BPU -- would you be
> >ok with that instead?
> >
> >Also, looking at the code, I think our PMCEID parsing is broken for 8.1
> >parts, where the upper 32 bits of the register are offset by 0x4000 in the
> >event numbering space.
> >
>
> Here's something I noticed:
> static ssize_t
> armv8pmu_events_sysfs_show(struct device *dev,
> struct device_attribute *attr, char *page)
> {
> struct perf_pmu_events_attr *pmu_attr;
>
> pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
>
> return sprintf(page, "event=0x%03llx\n", pmu_attr->id);
>
> Should this be min width now be 4, since event width is now 16 bits (even
> though I don't know why we need to specify this width at all)?

Yeah, that is pretty weird, but the 16-bit events won't be truncated, so
I think I'd rather just leave that string as-is since it's ABI. I agree
that we probably shouldn't have bothered with the width at all in hindsight.

Will

2018-10-05 13:35:07

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.

On Fri, Oct 05, 2018 at 01:09:33AM +0530, Ganapatrao Kulkarni wrote:
> On Thu, Oct 4, 2018 at 5:51 PM Will Deacon <[email protected]> wrote:
> > On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> > > can you please pull this patch?
> >
> > I still don't like the idea of just removing events like this, especially
> > when other architectures (including some x86 and Power CPUs afaict) playa
> > similar games for generic events, and these events do actually appear in
> > user code.
> >
> > I also don't understand why you remove the TLB events. I think that logic
> > would imply we should remove all of the events, because we can't distinguish
> > prefetches from reads either. If we want to be consistent, then I think
> > we should just remove the OP_WRITE events for L1D and BPU -- would you be
> > ok with that instead?
>
> IIUC, dTLB-load-misses is mapped to
> ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL(event 0x05) and dTLB-loads is
> mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB(0x25). Which are as per spec,
> counts TLB access/misses for both memory-read operation and
> memory-write operation.
>
> IMO, It won't help in keeping these events, knowingly that their
> mapping is not accurate, only thing i can say to users is , dont use
> events that are marked as "Hardware cache event"

Right, but my point is that all of the events are inaccurate by this line
of reasoning, because we don't support PREFETCH for any of them. So I'd
rather just drop the duplicate events from the WRITE entries, like other
CPUs and architectures do.

I'm about to pack my desk up because we're moving office, but I pushed out
some patches I hacked up on my perf/updates branch:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=perf/updates

I'll post them to the list next week if it's not the merge window.

Will