2009-06-22 20:26:24

by Stephane Eranian

[permalink] [raw]
Subject: perf_counter Atom patch

Hi,


You recently submitted a patch for perf_counter to disable
use of fixed counters on Atom because you claim they do
not work.

--------------------------------------------------------------------------------------
author Yong Wang <[email protected]>
Fri, 12 Jun 2009 08:08:55 +0000 (16:08 +0800)
committer Ingo Molnar <[email protected]>
Fri, 12 Jun 2009 11:48:32 +0000 (13:48 +0200)
commit dff5da6d09daaab40a8741dce0ed3c2e94079de2
tree c1f4ce70e4a566a231ba00c775de4e96fb8acb39
parent faafec1e61e61d350248af2a7e5f047606adab6e
perf_counter/x86: Add a quirk for Atom processors

The fixed-function performance counters do not work on current Atom
processors. Use the general-purpose ones instead.
--------------------------------------------------------------------------------------


I would like to better understand what makes you think
this is the case.

Perfmon is working on Atom and there, fixed counters work perfectly:
$ head -6 /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 28
model name : Intel(R) Atom(TM) CPU 230 @ 1.60GHz
stepping : 2
...
$ pfmon -v --us-c -e
unhalted_core_cycles,unhalted_reference_cycles,instructions_retired
noploop 10
[FIXED_CTRL(pmc16)=0xaaa pmi0=1 en0=0x2 any0=0 pmi1=1 en1=0x2 any1=0
pmi2=1 en2=0x2 any2=0] INSTRUCTIONS_RETIRED UNHALTED_CORE_CYCLES
UNHALTED_REFERENCE_CYCLES
[FIXED_CTR0(pmd16)]
[FIXED_CTR1(pmd17)]
[FIXED_CTR2(pmd18)]
noploop for 10 seconds
15,902,604,169 UNHALTED_CORE_CYCLES
15,902,586,180 UNHALTED_REFERENCE_CYCLES
7,941,842,505 INSTRUCTIONS_RETIRED


I seem to recall that what may be bogus on Atom is what is returned by
CPUID(0xa)
for the fixed counters. But they are there and they work. Thus, I
believe, the quirk
should be at the location where CPUID(0xa) is invoked not where you've put it.


2009-06-23 03:39:53

by tip-bot for Yong Wang

[permalink] [raw]
Subject: RE: perf_counter Atom patch

> From: stephane eranian [mailto:[email protected]]
>
> I would like to better understand what makes you think
> this is the case.
>

Because I observed that the output of 'perf stat -e 0:0 -e 0:1 -e 0:6 <cmd>'
is always like below without the quirk.

Performance counter stats for '<cmd>':

0 cycles
0 instructions
0 bus-cycles

> Perfmon is working on Atom and there, fixed counters work perfectly:
> $ head -6 /proc/cpuinfo
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 28
> model name : Intel(R) Atom(TM) CPU 230 @ 1.60GHz
> stepping : 2
> ...

My cpuinfo is below and the only difference I can see is 270 vs 230.

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 28
model name : Intel(R) Atom(TM) CPU N270 @ 1.60GHz
stepping : 2

> $ pfmon -v --us-c -e
> unhalted_core_cycles,unhalted_reference_cycles,instructions_retired
> noploop 10
> [FIXED_CTRL(pmc16)=0xaaa pmi0=1 en0=0x2 any0=0 pmi1=1 en1=0x2 any1=0
> pmi2=1 en2=0x2 any2=0] INSTRUCTIONS_RETIRED UNHALTED_CORE_CYCLES
> UNHALTED_REFERENCE_CYCLES
> [FIXED_CTR0(pmd16)]
> [FIXED_CTR1(pmd17)]
> [FIXED_CTR2(pmd18)]
> noploop for 10 seconds
> 15,902,604,169 UNHALTED_CORE_CYCLES
> 15,902,586,180 UNHALTED_REFERENCE_CYCLES
> 7,941,842,505 INSTRUCTIONS_RETIRED
>

Could you pls try to revert my patch, run 'perf stat -e 0:0 -e 0:1 -e 0:6 <cmd>' and see
whether the counters count or not? I tried pfmon on my atom box but it always runs into
segfault. If the fixed counters work for you, I will ask Atom hw foks here in Intel why this
is the case and revise the code accordingly.

>
> I seem to recall that what may be bogus on Atom is what is returned by
> CPUID(0xa)
> for the fixed counters. But they are there and they work. Thus, I
> believe, the quirk
> should be at the location where CPUID(0xa) is invoked not
> where you've put it.
>

The return value of CPUID(0xa) is indeed bogus, too and there is another quirk for that in
intel_pmu_init() in arch/x86/kernel/cpu/perf_counter.c

x86_pmu.num_counters_fixed = max((int)edx.split.num_counters_fixed, 3);

Is this what you were talking about?-

2009-06-23 06:00:44

by tip-bot for Yong Wang

[permalink] [raw]
Subject: RE: perf_counter Atom patch

> From: Wang, Yong Y
>
> > From: stephane eranian [mailto:[email protected]]
> >
> > I would like to better understand what makes you think
> > this is the case.
> >
>
> Because I observed that the output of 'perf stat -e 0:0 -e
> 0:1 -e 0:6 <cmd>'
> is always like below without the quirk.
>
> Performance counter stats for '<cmd>':
>
> 0 cycles
> 0 instructions
> 0 bus-cycles
>
> > Perfmon is working on Atom and there, fixed counters work perfectly:
> > $ head -6 /proc/cpuinfo
> > processor : 0
> > vendor_id : GenuineIntel
> > cpu family : 6
> > model : 28
> > model name : Intel(R) Atom(TM) CPU 230 @ 1.60GHz
> > stepping : 2
> > ...
>
> My cpuinfo is below and the only difference I can see is 270 vs 230.
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 28
> model name : Intel(R) Atom(TM) CPU N270 @ 1.60GHz
> stepping : 2
>
> > $ pfmon -v --us-c -e
> > unhalted_core_cycles,unhalted_reference_cycles,instructions_retired
> > noploop 10
> > [FIXED_CTRL(pmc16)=0xaaa pmi0=1 en0=0x2 any0=0 pmi1=1 en1=0x2 any1=0
> > pmi2=1 en2=0x2 any2=0] INSTRUCTIONS_RETIRED UNHALTED_CORE_CYCLES
> > UNHALTED_REFERENCE_CYCLES
> > [FIXED_CTR0(pmd16)]
> > [FIXED_CTR1(pmd17)]
> > [FIXED_CTR2(pmd18)]
> > noploop for 10 seconds
> > 15,902,604,169 UNHALTED_CORE_CYCLES
> > 15,902,586,180 UNHALTED_REFERENCE_CYCLES
> > 7,941,842,505 INSTRUCTIONS_RETIRED
> >
>
> Could you pls try to revert my patch, run 'perf stat -e 0:0
> -e 0:1 -e 0:6 <cmd>' and see
> whether the counters count or not? I tried pfmon on my atom
> box but it always runs into
> segfault. If the fixed counters work for you, I will ask Atom
> hw foks here in Intel why this
> is the case and revise the code accordingly.
>

I just found an Atom 230 based nettop in our lab and looks like the fixed counters
do not work. Below is the output on a kernel without my patch.

atom@atom-desktop:~$ head -n 6 /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 28
model name : Intel(R) Atom(TM) CPU 230 @ 1.60GHz
stepping : 2
atom@atom-desktop:~$ ./perf stat -e 0:0 -e 0:1 -e 0:6 true

Performance counter stats for 'true':

0 cycles
0 instructions
0 bus-cycles

0.004089458 seconds time elapsed.

Are you aware of any microcode update for the Atom processor you are using?

2009-06-23 07:51:22

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf_counter Atom patch

Hi,

On Tue, Jun 23, 2009 at 5:38 AM, Wang, Yong Y<[email protected]> wrote:
>> From: stephane eranian [mailto:[email protected]]
>>
>> I would like to better understand what makes you think
>> this is the case.
>>
>
> Because I observed that the output of 'perf stat -e 0:0 -e 0:1 -e 0:6 <cmd>'
> is always like below without the quirk.
>
>  Performance counter stats for '<cmd>':
>
>              0  cycles
>              0  instructions
>              0  bus-cycles
>
>> Perfmon is working on Atom and there, fixed counters work perfectly:
>> $ head -6 /proc/cpuinfo
>> processor     : 0
>> vendor_id     : GenuineIntel
>> cpu family    : 6
>> model         : 28
>> model name    : Intel(R) Atom(TM) CPU  230   @ 1.60GHz
>> stepping      : 2
>> ...
>
> My cpuinfo is below and the only difference I can see is 270 vs 230.
>
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 28
> model name      : Intel(R) Atom(TM) CPU N270   @ 1.60GHz
> stepping        : 2
>
Unfortunately, I don't have a N270 to compare with your results.
We need to verify whether or not N270 implements the fixed counters.
Does it report architected perfmon v3 or v1?

> The return value of CPUID(0xa) is indeed bogus, too and there is another quirk for that in
> intel_pmu_init() in arch/x86/kernel/cpu/perf_counter.c
>
> x86_pmu.num_counters_fixed      = max((int)edx.split.num_counters_fixed, 3);
>
> Is this what you were talking about?

Not quite, because with the max() you'd have a problem on Intel Core
Duo/Solo processors
as they do implement the first generation of architected perfmon and
that one did not have
fixed counters. So you'd have to special case family=6 model=14.

2009-06-23 08:16:55

by Yong Wang

[permalink] [raw]
Subject: Re: perf_counter Atom patch

On Tue, Jun 23, 2009 at 09:45:03AM +0200, stephane eranian wrote:
>
> Unfortunately, I don't have a N270 to compare with your results.
> We need to verify whether or not N270 implements the fixed counters.
> Does it report architected perfmon v3 or v1?
>

All Atom processors report perfmon v3 as specified in SDM. N270 is no
exception.

> > The return value of CPUID(0xa) is indeed bogus, too and there is another quirk for that in
> > intel_pmu_init() in arch/x86/kernel/cpu/perf_counter.c
> >
> > x86_pmu.num_counters_fixed ?? ?? ??= max((int)edx.split.num_counters_fixed, 3);
> >
> > Is this what you were talking about?
>
> Not quite, because with the max() you'd have a problem on Intel Core
> Duo/Solo processors
> as they do implement the first generation of architected perfmon and
> that one did not have
> fixed counters. So you'd have to special case family=6 model=14.

That has been taken into account actually. Only perfmon v2 and above are
supported as you see in intel_pmu_init().

if (version < 2)
return -ENODEV;

2009-06-23 08:28:13

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf_counter Atom patch

Hi,

On Tue, Jun 23, 2009 at 9:59 AM, Yong Wang<[email protected]> wrote:
> On Tue, Jun 23, 2009 at 09:45:03AM +0200, stephane eranian wrote:
>>
>> Unfortunately, I don't have a N270 to compare with your results.
>> We need to verify whether or not N270 implements the fixed counters.
>> Does it report architected perfmon v3 or v1?
>>
>
> All Atom processors report perfmon v3 as specified in SDM. N270 is no
> exception.
>
V3 does not set a minimal number of fixed counters, could be zero. But
that seems
odd. Let me ask around.

>> > The return value of CPUID(0xa) is indeed bogus, too and there is another quirk for that in
>> > intel_pmu_init() in arch/x86/kernel/cpu/perf_counter.c
>> >
>> > x86_pmu.num_counters_fixed ?? ?? ??= max((int)edx.split.num_counters_fixed, 3);
>> >
>> > Is this what you were talking about?
>>
>> Not quite, because with the max() you'd have a problem on Intel Core
>> Duo/Solo processors
>> as they do implement the first generation of architected perfmon and
>> that one did not have
>> fixed counters. So you'd have to special case family=6 model=14.
>
> That has been taken into account actually. Only perfmon v2 and above are
> supported as you see in intel_pmu_init().
>
>        if (version < 2)
>                return -ENODEV;
>
I assume this is a current limitation of the implementation. If you
see version < 2
you could simply consider having 0 fixed counters and everything else would work
as expected. But there is a catch, unfortunately, in that there is erratum AE49
which says that there is only one enable bit to control the two generic counters
on Core Duo/Solo.

2009-06-23 08:40:53

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf_counter Atom patch

Yong,

On Tue, Jun 23, 2009 at 10:27 AM, stephane
eranian<[email protected]> wrote:
> Hi,
>
> On Tue, Jun 23, 2009 at 9:59 AM, Yong Wang<[email protected]> wrote:
>> On Tue, Jun 23, 2009 at 09:45:03AM +0200, stephane eranian wrote:
>>>
>>> Unfortunately, I don't have a N270 to compare with your results.
>>> We need to verify whether or not N270 implements the fixed counters.
>>> Does it report architected perfmon v3 or v1?
>>>
>>
>> All Atom processors report perfmon v3 as specified in SDM. N270 is no
>> exception.
>>
> V3 does not set a minimal number of fixed counters, could be zero. But
> that seems
> odd. Let me ask around.
>
Second thought on this:
x86_pmu.num_counters_fixed =
max((int)edx.split.num_counters_fixed, 3);

rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);


Forcing num_counter_fixed is not enough, you need to make sure they are actually
activated in GLOBAL_CTRL, i.e., make sure bits 32-34 are set in intel_ctrl.
Depending on which machine you're on, the power on value for GLOBAL_CTRL
changes. The correct value for it should be that ONLY generic counters are on
by default.

2009-06-23 08:51:11

by Yong Wang

[permalink] [raw]
Subject: Re: perf_counter Atom patch

On Tue, Jun 23, 2009 at 10:27:47AM +0200, stephane eranian wrote:
> Hi,
>
> On Tue, Jun 23, 2009 at 9:59 AM, Yong Wang<[email protected]> wrote:
> > On Tue, Jun 23, 2009 at 09:45:03AM +0200, stephane eranian wrote:
> >>
> >> Unfortunately, I don't have a N270 to compare with your results.
> >> We need to verify whether or not N270 implements the fixed counters.
> >> Does it report architected perfmon v3 or v1?
> >>
> >
> > All Atom processors report perfmon v3 as specified in SDM. N270 is no
> > exception.
> >
> V3 does not set a minimal number of fixed counters, could be zero. But
> that seems
> odd. Let me ask around.
>

The Atom spec update says CPUID.0AH.EDX should be 0x0503 in errata
section. I assume future offerings will more likely to add more new
counters in stead of remove existing ones. Correct me if I'm wrong;-)

> >> > The return value of CPUID(0xa) is indeed bogus, too and there is another quirk for that in
> >> > intel_pmu_init() in arch/x86/kernel/cpu/perf_counter.c
> >> >
> >> > x86_pmu.num_counters_fixed = max((int)edx.split.num_counters_fixed, 3);
> >> >
> >> > Is this what you were talking about?
> >>
> >> Not quite, because with the max() you'd have a problem on Intel Core
> >> Duo/Solo processors
> >> as they do implement the first generation of architected perfmon and
> >> that one did not have
> >> fixed counters. So you'd have to special case family=6 model=14.
> >
> > That has been taken into account actually. Only perfmon v2 and above are
> > supported as you see in intel_pmu_init().
> >
> > if (version < 2)
> > return -ENODEV;
> >
> I assume this is a current limitation of the implementation. If you
> see version < 2
> you could simply consider having 0 fixed counters and everything else would work
> as expected. But there is a catch, unfortunately, in that there is erratum AE49
> which says that there is only one enable bit to control the two generic counters
> on Core Duo/Solo.
>

Ideally we could consider having 0 fixed counters for Core Duo/Solo. But
like you said, the erratum you pointed out just complicates things much.

2009-06-23 09:10:09

by Yong Wang

[permalink] [raw]
Subject: Re: perf_counter Atom patch

On Tue, Jun 23, 2009 at 10:40:45AM +0200, stephane eranian wrote:
> Yong,
>
> On Tue, Jun 23, 2009 at 10:27 AM, stephane
> eranian<[email protected]> wrote:
> > Hi,
> >
> > On Tue, Jun 23, 2009 at 9:59 AM, Yong Wang<[email protected]> wrote:
> >> On Tue, Jun 23, 2009 at 09:45:03AM +0200, stephane eranian wrote:
> >>>
> >>> Unfortunately, I don't have a N270 to compare with your results.
> >>> We need to verify whether or not N270 implements the fixed counters.
> >>> Does it report architected perfmon v3 or v1?
> >>>
> >>
> >> All Atom processors report perfmon v3 as specified in SDM. N270 is no
> >> exception.
> >>
> > V3 does not set a minimal number of fixed counters, could be zero. But
> > that seems
> > odd. Let me ask around.
> >
> Second thought on this:
> x86_pmu.num_counters_fixed =
> max((int)edx.split.num_counters_fixed, 3);
>
> rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);
>
>
> Forcing num_counter_fixed is not enough, you need to make sure they are actually
> activated in GLOBAL_CTRL, i.e., make sure bits 32-34 are set in intel_ctrl.
> Depending on which machine you're on, the power on value for GLOBAL_CTRL
> changes. The correct value for it should be that ONLY generic counters are on
> by default.
>

Oh, this might be why fixed counter do not work on my Atom box. I will
look into it. BTW, why should ONLY generic counters be on by default?

2009-06-23 09:10:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf_counter Atom patch

On Tue, 2009-06-23 at 16:34 +0800, Yong Wang wrote:
> > you could simply consider having 0 fixed counters and everything else would work
> > as expected. But there is a catch, unfortunately, in that there is erratum AE49
> > which says that there is only one enable bit to control the two generic counters
> > on Core Duo/Solo.

Ah, that's similar to P6 like machines. The P6 docs say that to disable
a counter you should simply write all zeros (except the EN bit for ctr0)
to the control register (IIRC).

I suppose we could do something similar on these errata cores, make
x86_pmu_disable_counter() write ARCH_PERFMON_EVENTSEL0_ENABLE instead.

Would that work?

2009-06-23 09:20:10

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf_counter Atom patch

On Tue, Jun 23, 2009 at 10:53 AM, Yong Wang<[email protected]> wrote:
> On Tue, Jun 23, 2009 at 10:40:45AM +0200, stephane eranian wrote:
>> Yong,
>>
>> On Tue, Jun 23, 2009 at 10:27 AM, stephane
>> eranian<[email protected]> wrote:
>> > Hi,
>> >
>> > On Tue, Jun 23, 2009 at 9:59 AM, Yong Wang<[email protected]> wrote:
>> >> On Tue, Jun 23, 2009 at 09:45:03AM +0200, stephane eranian wrote:
>> >>>
>> >>> Unfortunately, I don't have a N270 to compare with your results.
>> >>> We need to verify whether or not N270 implements the fixed counters.
>> >>> Does it report architected perfmon v3 or v1?
>> >>>
>> >>
>> >> All Atom processors report perfmon v3 as specified in SDM. N270 is no
>> >> exception.
>> >>
>> > V3 does not set a minimal number of fixed counters, could be zero. But
>> > that seems
>> > odd. Let me ask around.
>> >
>> Second thought on this:
>>        x86_pmu.num_counters_fixed      =
>> max((int)edx.split.num_counters_fixed, 3);
>>
>>         rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);
>>
>>
>> Forcing num_counter_fixed is not enough, you need to make sure they are actually
>> activated in GLOBAL_CTRL, i.e., make sure bits 32-34 are set in intel_ctrl.
>> Depending on which machine you're on, the power on value for GLOBAL_CTRL
>> changes. The correct value for it should be that ONLY generic counters are on
>> by default.
>>
>
> Oh, this might be why fixed counter do not work on my Atom box. I will
> look into it. BTW, why should ONLY generic counters be on by default?
>
Glad you asked.
I think, it's for backward compatibility with processors such as Core Duo with
architected perfmon v1 which did not have the GLOBAL_* controls.

2009-06-23 09:41:04

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf_counter Atom patch

On Tue, Jun 23, 2009 at 11:10 AM, Peter Zijlstra<[email protected]> wrote:
> On Tue, 2009-06-23 at 16:34 +0800, Yong Wang wrote:
>> > you could simply consider having 0 fixed counters and everything else would work
>> > as expected. But there is a catch, unfortunately, in that there is erratum AE49
>> > which says that there is only one enable bit to control the two generic counters
>> > on Core Duo/Solo.
>
> Ah, that's similar to P6 like machines. The P6 docs say that to disable
> a counter you should simply write all zeros (except the EN bit for ctr0)
> to the control register (IIRC).
>
> I suppose we could do something similar on these errata cores, make
> x86_pmu_disable_counter() write ARCH_PERFMON_EVENTSEL0_ENABLE instead.
>
> Would that work?
>
I suspect that to make this work correctly on P6 and Core Duo, you
will have to enforce
only one event/group to maintain the independence you expose at the
user level. An
Alternative would be to ensure that:
- group leader in always in counter0
- sibling events are created with disabled=0
- ioctl(ENABLE/DISABLE) on siblings always fail

Of course, this does not work, if the group leader event requires
counter1. But I have to check
if such restriction exists on Core Duo.

2009-06-23 09:47:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: perf_counter Atom patch


* Yong Wang <[email protected]> wrote:

> On Tue, Jun 23, 2009 at 10:40:45AM +0200, stephane eranian wrote:
> > Yong,
> >
> > On Tue, Jun 23, 2009 at 10:27 AM, stephane
> > eranian<[email protected]> wrote:
> > > Hi,
> > >
> > > On Tue, Jun 23, 2009 at 9:59 AM, Yong Wang<[email protected]> wrote:
> > >> On Tue, Jun 23, 2009 at 09:45:03AM +0200, stephane eranian wrote:
> > >>>
> > >>> Unfortunately, I don't have a N270 to compare with your results.
> > >>> We need to verify whether or not N270 implements the fixed counters.
> > >>> Does it report architected perfmon v3 or v1?
> > >>>
> > >>
> > >> All Atom processors report perfmon v3 as specified in SDM. N270 is no
> > >> exception.
> > >>
> > > V3 does not set a minimal number of fixed counters, could be zero. But
> > > that seems
> > > odd. Let me ask around.
> > >
> > Second thought on this:
> > x86_pmu.num_counters_fixed =
> > max((int)edx.split.num_counters_fixed, 3);
> >
> > rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);
> >
> >
> > Forcing num_counter_fixed is not enough, you need to make sure
> > they are actually activated in GLOBAL_CTRL, i.e., make sure bits
> > 32-34 are set in intel_ctrl. Depending on which machine you're
> > on, the power on value for GLOBAL_CTRL changes. The correct
> > value for it should be that ONLY generic counters are on by
> > default.
> >
>
> Oh, this might be why fixed counter do not work on my Atom box. I
> will look into it. [...]

Thanks - having a different bootup default for the global ctrl
indeed sounds like a good and plausible explanation - please send a
patch for that if you've tested it, removing that quirk and adding
the global-enable ctrl logic.

Ingo

2009-06-24 02:36:35

by Yong Wang

[permalink] [raw]
Subject: Re: perf_counter Atom patch

On Tue, Jun 23, 2009 at 11:47:17AM +0200, Ingo Molnar wrote:
>
> * Yong Wang <[email protected]> wrote:
>
> > On Tue, Jun 23, 2009 at 10:40:45AM +0200, stephane eranian wrote:
> > > Yong,
> > >
> > > On Tue, Jun 23, 2009 at 10:27 AM, stephane
> > > eranian<[email protected]> wrote:
> > > > Hi,
> > > >
> > > > On Tue, Jun 23, 2009 at 9:59 AM, Yong Wang<[email protected]> wrote:
> > > >> On Tue, Jun 23, 2009 at 09:45:03AM +0200, stephane eranian wrote:
> > > >>>
> > > >>> Unfortunately, I don't have a N270 to compare with your results.
> > > >>> We need to verify whether or not N270 implements the fixed counters.
> > > >>> Does it report architected perfmon v3 or v1?
> > > >>>
> > > >>
> > > >> All Atom processors report perfmon v3 as specified in SDM. N270 is no
> > > >> exception.
> > > >>
> > > > V3 does not set a minimal number of fixed counters, could be zero. But
> > > > that seems
> > > > odd. Let me ask around.
> > > >
> > > Second thought on this:
> > > x86_pmu.num_counters_fixed =
> > > max((int)edx.split.num_counters_fixed, 3);
> > >
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);
> > >
> > >
> > > Forcing num_counter_fixed is not enough, you need to make sure
> > > they are actually activated in GLOBAL_CTRL, i.e., make sure bits
> > > 32-34 are set in intel_ctrl. Depending on which machine you're
> > > on, the power on value for GLOBAL_CTRL changes. The correct
> > > value for it should be that ONLY generic counters are on by
> > > default.
> > >
> >
> > Oh, this might be why fixed counter do not work on my Atom box. I
> > will look into it. [...]
>
> Thanks - having a different bootup default for the global ctrl
> indeed sounds like a good and plausible explanation - please send a
> patch for that if you've tested it, removing that quirk and adding
> the global-enable ctrl logic.
>

The root cause of fixed counters not working on Atom is indeed related
to global counter control MSR. The power-on value on Atom is 0x3 which
means only general purpose counters are enabled by default. The power-on
value on Core2 is 0xffffffffffffffff which I believe is also the case
for Nehalem. That's why Core2 and Nehalem do not have the problem.

2009-06-24 08:52:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: perf_counter Atom patch


* Yong Wang <[email protected]> wrote:

> On Tue, Jun 23, 2009 at 11:47:17AM +0200, Ingo Molnar wrote:
> >
> > * Yong Wang <[email protected]> wrote:
> >
> > > On Tue, Jun 23, 2009 at 10:40:45AM +0200, stephane eranian wrote:
> > > > Yong,
> > > >
> > > > On Tue, Jun 23, 2009 at 10:27 AM, stephane
> > > > eranian<[email protected]> wrote:
> > > > > Hi,
> > > > >
> > > > > On Tue, Jun 23, 2009 at 9:59 AM, Yong Wang<[email protected]> wrote:
> > > > >> On Tue, Jun 23, 2009 at 09:45:03AM +0200, stephane eranian wrote:
> > > > >>>
> > > > >>> Unfortunately, I don't have a N270 to compare with your results.
> > > > >>> We need to verify whether or not N270 implements the fixed counters.
> > > > >>> Does it report architected perfmon v3 or v1?
> > > > >>>
> > > > >>
> > > > >> All Atom processors report perfmon v3 as specified in SDM. N270 is no
> > > > >> exception.
> > > > >>
> > > > > V3 does not set a minimal number of fixed counters, could be zero. But
> > > > > that seems
> > > > > odd. Let me ask around.
> > > > >
> > > > Second thought on this:
> > > > x86_pmu.num_counters_fixed =
> > > > max((int)edx.split.num_counters_fixed, 3);
> > > >
> > > > rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);
> > > >
> > > >
> > > > Forcing num_counter_fixed is not enough, you need to make sure
> > > > they are actually activated in GLOBAL_CTRL, i.e., make sure bits
> > > > 32-34 are set in intel_ctrl. Depending on which machine you're
> > > > on, the power on value for GLOBAL_CTRL changes. The correct
> > > > value for it should be that ONLY generic counters are on by
> > > > default.
> > > >
> > >
> > > Oh, this might be why fixed counter do not work on my Atom box. I
> > > will look into it. [...]
> >
> > Thanks - having a different bootup default for the global ctrl
> > indeed sounds like a good and plausible explanation - please send a
> > patch for that if you've tested it, removing that quirk and adding
> > the global-enable ctrl logic.
> >
>
> The root cause of fixed counters not working on Atom is indeed
> related to global counter control MSR. The power-on value on Atom
> is 0x3 which means only general purpose counters are enabled by
> default. The power-on value on Core2 is 0xffffffffffffffff which I
> believe is also the case for Nehalem. That's why Core2 and Nehalem
> do not have the problem.

I suspect it can also be firmware/BIOS and microcode version
dependent - it's better to not rely on bootup state like that
indeed.

thanks guys,

Ingo

2009-06-25 10:39:31

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf_counter Atom patch

On Wed, Jun 24, 2009 at 10:52 AM, Ingo Molnar<[email protected]> wrote:
>
> * Yong Wang <[email protected]> wrote:
>
>> On Tue, Jun 23, 2009 at 11:47:17AM +0200, Ingo Molnar wrote:
>> >
>> > * Yong Wang <[email protected]> wrote:
>> >
>> > > On Tue, Jun 23, 2009 at 10:40:45AM +0200, stephane eranian wrote:
>> > > > Yong,
>> > > >
>> > > > On Tue, Jun 23, 2009 at 10:27 AM, stephane
>> > > > eranian<[email protected]> wrote:
>> > > > > Hi,
>> > > > >
>> > > > > On Tue, Jun 23, 2009 at 9:59 AM, Yong Wang<[email protected]> wrote:
>> > > > >> On Tue, Jun 23, 2009 at 09:45:03AM +0200, stephane eranian wrote:
>> > > > >>>
>> > > > >>> Unfortunately, I don't have a N270 to compare with your results.
>> > > > >>> We need to verify whether or not N270 implements the fixed counters.
>> > > > >>> Does it report architected perfmon v3 or v1?
>> > > > >>>
>> > > > >>
>> > > > >> All Atom processors report perfmon v3 as specified in SDM. N270 is no
>> > > > >> exception.
>> > > > >>
>> > > > > V3 does not set a minimal number of fixed counters, could be zero. But
>> > > > > that seems
>> > > > > odd. Let me ask around.
>> > > > >
>> > > > Second thought on this:
>> > > >        x86_pmu.num_counters_fixed      =
>> > > > max((int)edx.split.num_counters_fixed, 3);
>> > > >
>> > > >         rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);
>> > > >
>> > > >
>> > > > Forcing num_counter_fixed is not enough, you need to make sure
>> > > > they are actually activated in GLOBAL_CTRL, i.e., make sure bits
>> > > > 32-34 are set in intel_ctrl. Depending on which machine you're
>> > > > on, the power on value for GLOBAL_CTRL changes. The correct
>> > > > value for it should be that ONLY generic counters are on by
>> > > > default.
>> > > >
>> > >
>> > > Oh, this might be why fixed counter do not work on my Atom box. I
>> > > will look into it. [...]
>> >
>> > Thanks - having a different bootup default for the global ctrl
>> > indeed sounds like a good and plausible explanation - please send a
>> > patch for that if you've tested it, removing that quirk and adding
>> > the global-enable ctrl logic.
>> >
>>
>> The root cause of fixed counters not working on Atom is indeed
>> related to global counter control MSR. The power-on value on Atom
>> is 0x3 which means only general purpose counters are enabled by
>> default. The power-on value on Core2 is 0xffffffffffffffff which I
>> believe is also the case for Nehalem. That's why Core2 and Nehalem
>> do not have the problem.
>
> I suspect it can also be firmware/BIOS and microcode version
> dependent - it's better to not rely on bootup state like that
> indeed.

I would recommend you ignore boot up values and setup GLOBAL_CTRL*
the way you need. So I agree.