2009-06-09 13:32:38

by Yong Wang

[permalink] [raw]
Subject: [PATCH -tip] perf_counter/x86: Correct some event and umask values for Intel processors

Correct some event and UMASK values according to Intel SDM.

Signed-off-by: Yong Wang <[email protected]>

---
arch/x86/kernel/cpu/perf_counter.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 56001fe..40978aa 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -119,7 +119,7 @@ static const u64 nehalem_hw_cache_event_ids
},
[ C(L1I ) ] = {
[ C(OP_READ) ] = {
- [ C(RESULT_ACCESS) ] = 0x0480, /* L1I.READS */
+ [ C(RESULT_ACCESS) ] = 0x0380, /* L1I.READS */
[ C(RESULT_MISS) ] = 0x0280, /* L1I.MISSES */
},
[ C(OP_WRITE) ] = {
@@ -162,7 +162,7 @@ static const u64 nehalem_hw_cache_event_ids
[ C(ITLB) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x01c0, /* INST_RETIRED.ANY_P */
- [ C(RESULT_MISS) ] = 0x0185, /* ITLB_MISS_RETIRED */
+ [ C(RESULT_MISS) ] = 0x20c8, /* ITLB_MISS_RETIRED */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
@@ -291,7 +291,7 @@ static const u64 atom_hw_cache_event_ids
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_WRITE) ] = {
- [ C(RESULT_ACCESS) ] = 0x2241, /* L1D_CACHE.ST */
+ [ C(RESULT_ACCESS) ] = 0x2240, /* L1D_CACHE.ST */
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_PREFETCH) ] = {
@@ -301,8 +301,8 @@ static const u64 atom_hw_cache_event_ids
},
[ C(L1I ) ] = {
[ C(OP_READ) ] = {
- [ C(RESULT_ACCESS) ] = 0x0080, /* L1I.READS */
- [ C(RESULT_MISS) ] = 0x0081, /* L1I.MISSES */
+ [ C(RESULT_ACCESS) ] = 0x0380, /* L1I.READS */
+ [ C(RESULT_MISS) ] = 0x0280, /* L1I.MISSES */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
@@ -329,11 +329,11 @@ static const u64 atom_hw_cache_event_ids
},
[ C(DTLB) ] = {
[ C(OP_READ) ] = {
- [ C(RESULT_ACCESS) ] = 0x0f40, /* L1D_CACHE_LD.MESI (alias) */
+ [ C(RESULT_ACCESS) ] = 0x2140, /* L1D_CACHE_LD.MESI (alias) */
[ C(RESULT_MISS) ] = 0x0508, /* DTLB_MISSES.MISS_LD */
},
[ C(OP_WRITE) ] = {
- [ C(RESULT_ACCESS) ] = 0x0f41, /* L1D_CACHE_ST.MESI (alias) */
+ [ C(RESULT_ACCESS) ] = 0x2240, /* L1D_CACHE_ST.MESI (alias) */
[ C(RESULT_MISS) ] = 0x0608, /* DTLB_MISSES.MISS_ST */
},
[ C(OP_PREFETCH) ] = {


2009-06-09 14:16:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip] perf_counter/x86: Correct some event and umask values for Intel processors


* Yong Wang <[email protected]> wrote:

> Correct some event and UMASK values according to Intel SDM.

Very nice, thanks!

were you able to test the Atom ones by any chance?

Ingo

2009-06-09 14:53:47

by Yong Wang

[permalink] [raw]
Subject: [tip:perfcounters/core] perf_counter, x86: Correct some event and umask values for Intel processors

Commit-ID: fecc8ac8496fce96069724f54daba8e7078b0082
Gitweb: http://git.kernel.org/tip/fecc8ac8496fce96069724f54daba8e7078b0082
Author: Yong Wang <[email protected]>
AuthorDate: Tue, 9 Jun 2009 21:15:53 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 9 Jun 2009 16:50:07 +0200

perf_counter, x86: Correct some event and umask values for Intel processors

Correct some event and UMASK values according to Intel SDM,
in the Nehalem and Atom tables.

Signed-off-by: Yong Wang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/kernel/cpu/perf_counter.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 56001fe..40978aa 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -119,7 +119,7 @@ static const u64 nehalem_hw_cache_event_ids
},
[ C(L1I ) ] = {
[ C(OP_READ) ] = {
- [ C(RESULT_ACCESS) ] = 0x0480, /* L1I.READS */
+ [ C(RESULT_ACCESS) ] = 0x0380, /* L1I.READS */
[ C(RESULT_MISS) ] = 0x0280, /* L1I.MISSES */
},
[ C(OP_WRITE) ] = {
@@ -162,7 +162,7 @@ static const u64 nehalem_hw_cache_event_ids
[ C(ITLB) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x01c0, /* INST_RETIRED.ANY_P */
- [ C(RESULT_MISS) ] = 0x0185, /* ITLB_MISS_RETIRED */
+ [ C(RESULT_MISS) ] = 0x20c8, /* ITLB_MISS_RETIRED */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
@@ -291,7 +291,7 @@ static const u64 atom_hw_cache_event_ids
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_WRITE) ] = {
- [ C(RESULT_ACCESS) ] = 0x2241, /* L1D_CACHE.ST */
+ [ C(RESULT_ACCESS) ] = 0x2240, /* L1D_CACHE.ST */
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_PREFETCH) ] = {
@@ -301,8 +301,8 @@ static const u64 atom_hw_cache_event_ids
},
[ C(L1I ) ] = {
[ C(OP_READ) ] = {
- [ C(RESULT_ACCESS) ] = 0x0080, /* L1I.READS */
- [ C(RESULT_MISS) ] = 0x0081, /* L1I.MISSES */
+ [ C(RESULT_ACCESS) ] = 0x0380, /* L1I.READS */
+ [ C(RESULT_MISS) ] = 0x0280, /* L1I.MISSES */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
@@ -329,11 +329,11 @@ static const u64 atom_hw_cache_event_ids
},
[ C(DTLB) ] = {
[ C(OP_READ) ] = {
- [ C(RESULT_ACCESS) ] = 0x0f40, /* L1D_CACHE_LD.MESI (alias) */
+ [ C(RESULT_ACCESS) ] = 0x2140, /* L1D_CACHE_LD.MESI (alias) */
[ C(RESULT_MISS) ] = 0x0508, /* DTLB_MISSES.MISS_LD */
},
[ C(OP_WRITE) ] = {
- [ C(RESULT_ACCESS) ] = 0x0f41, /* L1D_CACHE_ST.MESI (alias) */
+ [ C(RESULT_ACCESS) ] = 0x2240, /* L1D_CACHE_ST.MESI (alias) */
[ C(RESULT_MISS) ] = 0x0608, /* DTLB_MISSES.MISS_ST */
},
[ C(OP_PREFETCH) ] = {

2009-06-10 05:53:14

by Yong Wang

[permalink] [raw]
Subject: Re: [PATCH -tip] perf_counter/x86: Correct some event and umask values for Intel processors

On Tue, Jun 09, 2009 at 04:16:21PM +0200, Ingo Molnar wrote:
>
> * Yong Wang <[email protected]> wrote:
>
> > Correct some event and UMASK values according to Intel SDM.
>
> Very nice, thanks!
>
> were you able to test the Atom ones by any chance?
>

You bet I was as I'm working on Moblin;-) However, some work while some
do not. I'll take a look at the problematic ones. With the previous
event and umask values, the pmc does not count at all for some events,
like l1d-write-ops.

Btw, one thing I don't quite understand is why you aliased
dtlb-write-ops to l1d-write-ops when setting event and umask values. Are
they the same event?

Thanks
-Yong

2009-06-10 10:43:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip] perf_counter/x86: Correct some event and umask values for Intel processors


* Yong Wang <[email protected]> wrote:

> On Tue, Jun 09, 2009 at 04:16:21PM +0200, Ingo Molnar wrote:
> >
> > * Yong Wang <[email protected]> wrote:
> >
> > > Correct some event and UMASK values according to Intel SDM.
> >
> > Very nice, thanks!
> >
> > were you able to test the Atom ones by any chance?
> >
>
> You bet I was as I'm working on Moblin ;-) [...]

Heh :-)

> [...] However, some work while some do not. I'll take a look at
> the problematic ones. With the previous event and umask values,
> the pmc does not count at all for some events, like l1d-write-ops.

Interesting. I had a good look at the Atom details in the docs but
couldnt find anything suspicious. There's various umask level
extensions (sometimes cflags level ones) like whether to measure the
core or the thread, but the defaults (zero) seem to have OK
semantics for most of the events.

Btw., when mapping out event tables there's one little trick i used
to 'scan' an event, using 'perf stat' and raw event numbers:

for ((i=0;i<256;i++)); do \
perf stat -e $(printf "r%02x%02x\n" $i 0xc0) true 2>&1 | \
grep -w raw | grep -vw 0; \
done

This scans all 256 umask values for the main event code of 0xc0, and
displays the umask values where the counter show some activity.

( if it's some rare event then you might want to run something else
that excercises that event, not /bin/true. )

> Btw, one thing I don't quite understand is why you aliased
> dtlb-write-ops to l1d-write-ops when setting event and umask
> values. Are they the same event?

No, they are indeed different events - that's a bug in the table,
good spotting. Mind sending a (tested) patch for it?

Thanks,

Ingo

2009-06-11 08:44:11

by Yong Wang

[permalink] [raw]
Subject: Re: [PATCH -tip] perf_counter/x86: Correct some event and umask values for Intel processors

> > Btw, one thing I don't quite understand is why you aliased
> > dtlb-write-ops to l1d-write-ops when setting event and umask
> > values. Are they the same event?
>
> No, they are indeed different events - that's a bug in the table,
> good spotting. Mind sending a (tested) patch for it?
>

I'm a little confused. By dtlb-write-ops, do you want to count the
number of times that DTLB is accessed due to store operations or the
number of times that DTLB entries are written to, i.e. updated?

Btw, do you know whether virtual cache is employed or not on
atom/core2/nehalem so that tlb won't be accessed when accessing l1
caches?

-Yong

2009-06-11 11:27:03

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip] perf_counter/x86: Correct some event and umask values for Intel processors


* Yong Wang <[email protected]> wrote:

> > > Btw, one thing I don't quite understand is why you aliased
> > > dtlb-write-ops to l1d-write-ops when setting event and umask
> > > values. Are they the same event?
> >
> > No, they are indeed different events - that's a bug in the table,
> > good spotting. Mind sending a (tested) patch for it?
> >
>
> I'm a little confused. By dtlb-write-ops, do you want to count the
> number of times that DTLB is accessed due to store operations or
> the number of times that DTLB entries are written to, i.e.
> updated?

ah - i think what makes most sense is the (micro-)instruction
direction: i.e. TLB entry accessed due to store operations.

Also, are TLB entries updated typically after they get established?
Things like the dirty or accessed bit in the PTE are written out to
caches immediately, so that bit probably does not linger in the PTE.

> Btw, do you know whether virtual cache is employed or not on
> atom/core2/nehalem so that tlb won't be accessed when accessing l1
> caches?

Hm, last i checked the L2 was all physically indexed. The short
experiment with (partial?) virtual indexing in P4 was a ...
spectacular failure IMO.

This doesnt mean the counters wont under-count. The TLB hotpath is
probably one of the most important critical paths in a CPU, so it's
fair for a CPU not to count those accesses in the PMU, to squeeze
out a few more gates. (I havent validated the TLB counters on
core2/nehalem to that level yet so i dont know for sure how this is
laid out in practice.)

Ingo

2009-06-12 08:32:17

by Yong Wang

[permalink] [raw]
Subject: Re: [PATCH -tip] perf_counter/x86: Correct some event and umask values for Intel processors

On Wed, Jun 10, 2009 at 12:42:42PM +0200, Ingo Molnar wrote:
>
> * Yong Wang <[email protected]> wrote:
>
> > On Tue, Jun 09, 2009 at 04:16:21PM +0200, Ingo Molnar wrote:
> > >
> > > * Yong Wang <[email protected]> wrote:
> > >
> > > > Correct some event and UMASK values according to Intel SDM.
> > >
> > > Very nice, thanks!
> > >
> > > were you able to test the Atom ones by any chance?
> > >
> >
> > You bet I was as I'm working on Moblin ;-) [...]
>
> Heh :-)
>
> > [...] However, some work while some do not. I'll take a look at
> > the problematic ones. With the previous event and umask values,
> > the pmc does not count at all for some events, like l1d-write-ops.
>
> Interesting. I had a good look at the Atom details in the docs but
> couldnt find anything suspicious. There's various umask level
> extensions (sometimes cflags level ones) like whether to measure the
> core or the thread, but the defaults (zero) seem to have OK
> semantics for most of the events.
>
> Btw., when mapping out event tables there's one little trick i used
> to 'scan' an event, using 'perf stat' and raw event numbers:
>
> for ((i=0;i<256;i++)); do \
> perf stat -e $(printf "r%02x%02x\n" $i 0xc0) true 2>&1 | \
> grep -w raw | grep -vw 0; \
> done
>
> This scans all 256 umask values for the main event code of 0xc0, and
> displays the umask values where the counter show some activity.
>
> ( if it's some rare event then you might want to run something else
> that excercises that event, not /bin/true. )
>

Just took a look at the problematics ones and found that the fixed
function PMCs do not work on current Atom processors. I tested on 3 Atom
netbooks and the results are the same. Just sent a quirk patch for that.

-Yong