2010-11-01 14:41:23

by Robert Schöne

[permalink] [raw]
Subject: [PATCH] wrong PERF_COUNT_HW_CACHE_REFERENCES and PERF_COUNT_HW_CACHE_MISSES for AMD

The current arch/x86/kernel/cpu/perf_event_amd.c file lists
L1-Instruction-Cache Misses and Accesses as PERF_COUNT_HW_CACHE_MISSES
resp. PERF_COUNT_HW_CACHE_REFERENCES.

This fix uses L2C-Misses and Accesses instead. (Real LLC-events would be
better, but there are some restrictions for Northbridge Events on AMD).

The event codes are copied from the list of cache events from the same
file.


Signed-off-by: Robert Schoene <[email protected]>


--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -100,8 +100,8 @@ static const u64 amd_perfmon_event_map[] =
{
[PERF_COUNT_HW_CPU_CYCLES] = 0x0076,
[PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0,
- [PERF_COUNT_HW_CACHE_REFERENCES] = 0x0080,
- [PERF_COUNT_HW_CACHE_MISSES] = 0x0081,
+ [PERF_COUNT_HW_CACHE_REFERENCES] = 0x037D,
+ [PERF_COUNT_HW_CACHE_MISSES] = 0x037E,
[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2,
[PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3,
};


2010-11-02 01:55:51

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH] wrong PERF_COUNT_HW_CACHE_REFERENCES and PERF_COUNT_HW_CACHE_MISSES for AMD

Hi,



On Mon, Nov 1, 2010 at 3:11 PM, Robert Schöne
<[email protected]> wrote:
>
> The current arch/x86/kernel/cpu/perf_event_amd.c file lists
> L1-Instruction-Cache Misses and Accesses as PERF_COUNT_HW_CACHE_MISSES
> resp. PERF_COUNT_HW_CACHE_REFERENCES.
>
I always thought PERF_COUNT_HW_CACHE_* was about data cache misses.
But given that there is no clear definitions for those events, it
creates confusion.

If you change the meaning of HW_CACHE_MISSES, then seems to me, you need
to change the mapping in the perf tool, because now it includes both data+code.


> This fix uses L2C-Misses and Accesses instead. (Real LLC-events would be
> better, but there are some restrictions for Northbridge Events on AMD).
>
And those constraints are handled correctly by the kernel.

The constraint is such that you cannot have more than 4 instances of
Northbridge events active at the same time per core. If you do, then one
of them will starve (if issued from different cores).


> --- a/arch/x86/kernel/cpu/perf_event_amd.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> @@ -100,8 +100,8 @@ static const u64 amd_perfmon_event_map[] =
>  {
>   [PERF_COUNT_HW_CPU_CYCLES]           = 0x0076,
>   [PERF_COUNT_HW_INSTRUCTIONS]         = 0x00c0,
> -  [PERF_COUNT_HW_CACHE_REFERENCES]     = 0x0080,
> -  [PERF_COUNT_HW_CACHE_MISSES]         = 0x0081,
> +  [PERF_COUNT_HW_CACHE_REFERENCES]     = 0x037D,
> +  [PERF_COUNT_HW_CACHE_MISSES]         = 0x037E,
>   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]  = 0x00c2,
>   [PERF_COUNT_HW_BRANCH_MISSES]                = 0x00c3,
>  };
>

2010-11-02 11:08:23

by Robert Schöne

[permalink] [raw]
Subject: Re: [PATCH] wrong PERF_COUNT_HW_CACHE_REFERENCES and PERF_COUNT_HW_CACHE_MISSES for AMD

Hi,
Am Dienstag, den 02.11.2010, 02:55 +0100 schrieb Stephane Eranian:
> Hi,
>
>
>
> On Mon, Nov 1, 2010 at 3:11 PM, Robert Schöne
> <[email protected]> wrote:
> >
> > The current arch/x86/kernel/cpu/perf_event_amd.c file lists
> > L1-Instruction-Cache Misses and Accesses as PERF_COUNT_HW_CACHE_MISSES
> > resp. PERF_COUNT_HW_CACHE_REFERENCES.
> >
> I always thought PERF_COUNT_HW_CACHE_* was about data cache misses.
> But given that there is no clear definitions for those events, it
> creates confusion.
>
That's what I thought too before reading the AMD BKDG for Family 10.
It always seemed to me that the "hardware" event type was kind of a
mapping to the Intel "architectural events". And in their definition its
it reads as LLC.
>
> If you change the meaning of HW_CACHE_MISSES, then seems to me, you need
> to change the mapping in the perf tool, because now it includes both data+code.
>
So does the Intel implementation. It's just LLC misses with no
definition on what was accessed.
>
> > This fix uses L2C-Misses and Accesses instead. (Real LLC-events would be
> > better, but there are some restrictions for Northbridge Events on AMD).
> >
> And those constraints are handled correctly by the kernel.
>
> The constraint is such that you cannot have more than 4 instances of
> Northbridge events active at the same time per core. If you do, then one
> of them will starve (if issued from different cores).
>
Yes, we could use event 4E1 (L3 Cache Misses), but we would need
different event IDs for the different AMD Families. Not all of them have
an L3-Cache and even some implementations of Family 10h don't have L3
either.
As this event ID is a definition, we would have to introduce a
"placeholder" definition, which is - whenever a Cache Misses/Accesses
event is initiated - replaced by the "Last Level Cache" event ID for the
processor, which is currently in the system.
>
>
> > --- a/arch/x86/kernel/cpu/perf_event_amd.c
> > +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> > @@ -100,8 +100,8 @@ static const u64 amd_perfmon_event_map[] =
> > {
> > [PERF_COUNT_HW_CPU_CYCLES] = 0x0076,
> > [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0,
> > - [PERF_COUNT_HW_CACHE_REFERENCES] = 0x0080,
> > - [PERF_COUNT_HW_CACHE_MISSES] = 0x0081,
> > + [PERF_COUNT_HW_CACHE_REFERENCES] = 0x037D,
> > + [PERF_COUNT_HW_CACHE_MISSES] = 0x037E,
> > [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2,
> > [PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3,
> > };
> >

2010-11-22 11:08:58

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH] wrong PERF_COUNT_HW_CACHE_REFERENCES and PERF_COUNT_HW_CACHE_MISSES for AMD

Robert,

Has there been any progress on this issue?

On Tue, Nov 2, 2010 at 12:08 PM, Robert Schöne
<[email protected]> wrote:
>
> >
> Yes, we could use event 4E1 (L3 Cache Misses), but we would need
> different event IDs for the different AMD Families. Not all of them have
> an L3-Cache and even some implementations of Family 10h don't have L3
> either.

I think you could introduce several generic event mapping tables, like what is
done for the various Intel processors, i.e., have variations of the
amd_perfmon_event_map[] table. Then, the kernel would auto-detect the
host CPU and pick the correct table. Same thing would have to be done
for the LL generic cache events if some mappings use Northbridge events.

In general, however, I would recommend not using those generic cache
events to begin with. I think you understand why now. When dealing with
PMU events, you should read the documentation first. Micro-architectures
vary greatly even within the same processor family.