2010-07-13 08:13:12

by Yanmin Zhang

[permalink] [raw]
Subject: perf failed with kernel 2.6.35-rc

Peter,

perf doesn't work on my Nehalem EX machine.
1) The 1st start of 'perf top' is ok;
2) Kill the 1st perf and restart it. It doesn't work. No data is showed.

I located below commit:
commit 1ac62cfff252fb668405ef3398a1fa7f4a0d6d15
Author: Peter Zijlstra <[email protected]>
Date: Fri Mar 26 14:08:44 2010 +0100

perf, x86: Add Nehelem PMU programming errata workaround

workaround From: Peter Zijlstra <[email protected]>
Date: Fri Mar 26 13:59:41 CET 2010

Implement the workaround for Intel Errata AAK100 and AAP53.

Also, remove the Core-i7 name for Nehalem events since there are
also Westmere based i7 chips.


If I comment out the workaround in function intel_pmu_nhm_enable_all,
perf could work.

A quick glance shows:
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
should be:
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);


I triggered sysrq to dump PMU registers and found the last bit of
global status register is 1. I added a status reset operation like below patch:

--- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 09:38:11.000000000 +0800
+++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 14:41:42.000000000 +0800
@@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);

- wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
+ wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
+ /*
+ * Reset the last 3 bits of global status register in case
+ * previous enabling causes overflows.
+ */
+ wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);

for (i = 0; i < 3; i++) {
struct perf_event *event = cpuc->events[i];



However, it still doesn't work. Current right way is to comment out
the workaround.

Yanmin


2010-07-13 08:50:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: perf failed with kernel 2.6.35-rc


* Zhang, Yanmin <[email protected]> wrote:

> Peter,
>
> perf doesn't work on my Nehalem EX machine.
> 1) The 1st start of 'perf top' is ok;
> 2) Kill the 1st perf and restart it. It doesn't work. No data is showed.
>
> I located below commit:
> commit 1ac62cfff252fb668405ef3398a1fa7f4a0d6d15
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Mar 26 14:08:44 2010 +0100
>
> perf, x86: Add Nehelem PMU programming errata workaround
>
> workaround From: Peter Zijlstra <[email protected]>
> Date: Fri Mar 26 13:59:41 CET 2010
>
> Implement the workaround for Intel Errata AAK100 and AAP53.
>
> Also, remove the Core-i7 name for Nehalem events since there are
> also Westmere based i7 chips.
>
>
> If I comment out the workaround in function intel_pmu_nhm_enable_all,
> perf could work.
>
> A quick glance shows:
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> should be:
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);

> I triggered sysrq to dump PMU registers and found the last bit of
> global status register is 1. I added a status reset operation like below patch:
>
> --- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 09:38:11.000000000 +0800
> +++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 14:41:42.000000000 +0800
> @@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
> wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
> wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
>
> - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
> + /*
> + * Reset the last 3 bits of global status register in case
> + * previous enabling causes overflows.
> + */
> + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);
>
> for (i = 0; i < 3; i++) {
> struct perf_event *event = cpuc->events[i];
>
>
> However, it still doesn't work. Current right way is to comment out
> the workaround.

Well, how about doing it like this:

wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
/*
* Reset the last 3 bits of global status register in case
* previous enabling causes overflows.
*/
wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);

for (i = 0; i < 3; i++) {
struct perf_event *event = cpuc->events[i];
...
}

wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);

I.e. global-mask, overflow-clear, explicit-enable, then global-enable?

Thanks,

Ingo

2010-07-13 15:16:21

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf failed with kernel 2.6.35-rc

On Tue, Jul 13, 2010 at 10:14 AM, Zhang, Yanmin
<[email protected]> wrote:
> Peter,
>
> perf doesn't work on my Nehalem EX machine.
> 1) The 1st start of 'perf top' is ok;
> 2) Kill the 1st perf and restart it. It doesn't work. No data is showed.
>
> I located below commit:
> commit 1ac62cfff252fb668405ef3398a1fa7f4a0d6d15
> Author: Peter Zijlstra <[email protected]>
> Date:   Fri Mar 26 14:08:44 2010 +0100
>
>    perf, x86: Add Nehelem PMU programming errata workaround
>
>    workaround From: Peter Zijlstra <[email protected]>
>    Date: Fri Mar 26 13:59:41 CET 2010
>
>    Implement the workaround for Intel Errata AAK100 and AAP53.
>
>    Also, remove the Core-i7 name for Nehalem events since there are
>    also Westmere based i7 chips.
>
>
> If I comment out the workaround in function intel_pmu_nhm_enable_all,
> perf could work.
>
> A quick glance shows:
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> should be:
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
>
>
> I triggered sysrq to dump PMU registers and found the last bit of
> global status register is 1. I added a status reset operation like below patch:
>
What do you call the last bit? bit0 or bit63?

> --- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c     2010-07-14 09:38:11.000000000 +0800
> +++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c        2010-07-14 14:41:42.000000000 +0800
> @@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
>                wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
>                wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
>
> -               wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> +               wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
>                wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
> +               /*
> +                * Reset the last 3 bits of global status register in case
> +                * previous enabling causes overflows.
> +                */

The workaround cannot cause on overflow because the associated counters
won't count anything given their umask value is 0 (which does not correspond
to anything for event 0xB1, event 0xB5 is undocumented). This is for the events
described in table A.2. If NHM-EX has a different definition for 0xB1, 0xB5,
then that's another story.


> +               wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);
>
>                for (i = 0; i < 3; i++) {
>                        struct perf_event *event = cpuc->events[i];
>
>
>
> However, it still doesn't work. Current right way is to comment out
> the workaround.
>
> Yanmin
>
>
>

2010-07-14 00:12:40

by Yanmin Zhang

[permalink] [raw]
Subject: Re: perf failed with kernel 2.6.35-rc

On Tue, 2010-07-13 at 17:16 +0200, Stephane Eranian wrote:
> On Tue, Jul 13, 2010 at 10:14 AM, Zhang, Yanmin
> <[email protected]> wrote:
> > Peter,
> >
> > perf doesn't work on my Nehalem EX machine.
> > 1) The 1st start of 'perf top' is ok;
> > 2) Kill the 1st perf and restart it. It doesn't work. No data is showed.
> >
> > I located below commit:
> > commit 1ac62cfff252fb668405ef3398a1fa7f4a0d6d15
> > Author: Peter Zijlstra <[email protected]>
> > Date: Fri Mar 26 14:08:44 2010 +0100
> >
> > perf, x86: Add Nehelem PMU programming errata workaround
> >
> > workaround From: Peter Zijlstra <[email protected]>
> > Date: Fri Mar 26 13:59:41 CET 2010
> >
> > Implement the workaround for Intel Errata AAK100 and AAP53.
> >
> > Also, remove the Core-i7 name for Nehalem events since there are
> > also Westmere based i7 chips.
> >
> >
> > If I comment out the workaround in function intel_pmu_nhm_enable_all,
> > perf could work.
> >
> > A quick glance shows:
> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> > should be:
> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
> >
> >
> > I triggered sysrq to dump PMU registers and found the last bit of
> > global status register is 1. I added a status reset operation like below patch:
> >
> What do you call the last bit? bit0 or bit63?
Sorry for confusing you. It's bit0, mapping to PERFMON_EVENTSEL0.

>
> > --- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 09:38:11.000000000 +0800
> > +++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 14:41:42.000000000 +0800
> > @@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
> > wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
> > wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
> >
> > - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> > + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
> > + /*
> > + * Reset the last 3 bits of global status register in case
> > + * previous enabling causes overflows.
> > + */
>
> The workaround cannot cause on overflow because the associated counters
> won't count anything given their umask value is 0 (which does not correspond
> to anything for event 0xB1, event 0xB5 is undocumented). This is for the events
> described in table A.2. If NHM-EX has a different definition for 0xB1, 0xB5,
> then that's another story.
I found the status bit is set by triggering sysrq to dump PMU registers.

If I start perf by gdb, sometimes, perf could work. I found one processor's 1st status
register is equal to 0 while other processors' are 1. If just starting perf, all 1st
status registers are equal to 1.

>
>
> > + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);
> >
> > for (i = 0; i < 3; i++) {
> > struct perf_event *event = cpuc->events[i];
> >
> >
> >
> > However, it still doesn't work. Current right way is to comment out
> > the workaround.

2010-07-14 00:36:25

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf failed with kernel 2.6.35-rc

What about running simpler commands like perf stat?


On Wed, Jul 14, 2010 at 2:13 AM, Zhang, Yanmin
<[email protected]> wrote:
> On Tue, 2010-07-13 at 17:16 +0200, Stephane Eranian wrote:
>> On Tue, Jul 13, 2010 at 10:14 AM, Zhang, Yanmin
>> <[email protected]> wrote:
>> > Peter,
>> >
>> > perf doesn't work on my Nehalem EX machine.
>> > 1) The 1st start of 'perf top' is ok;
>> > 2) Kill the 1st perf and restart it. It doesn't work. No data is showed.
>> >
>> > I located below commit:
>> > commit 1ac62cfff252fb668405ef3398a1fa7f4a0d6d15
>> > Author: Peter Zijlstra <[email protected]>
>> > Date:   Fri Mar 26 14:08:44 2010 +0100
>> >
>> >    perf, x86: Add Nehelem PMU programming errata workaround
>> >
>> >    workaround From: Peter Zijlstra <[email protected]>
>> >    Date: Fri Mar 26 13:59:41 CET 2010
>> >
>> >    Implement the workaround for Intel Errata AAK100 and AAP53.
>> >
>> >    Also, remove the Core-i7 name for Nehalem events since there are
>> >    also Westmere based i7 chips.
>> >
>> >
>> > If I comment out the workaround in function intel_pmu_nhm_enable_all,
>> > perf could work.
>> >
>> > A quick glance shows:
>> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
>> > should be:
>> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
>> >
>> >
>> > I triggered sysrq to dump PMU registers and found the last bit of
>> > global status register is 1. I added a status reset operation like below patch:
>> >
>> What do you call the last bit? bit0 or bit63?
> Sorry for confusing you. It's bit0, mapping to PERFMON_EVENTSEL0.
>
>>
>> > --- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c     2010-07-14 09:38:11.000000000 +0800
>> > +++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c        2010-07-14 14:41:42.000000000 +0800
>> > @@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
>> >                wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
>> >                wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
>> >
>> > -               wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
>> > +               wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
>> >                wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
>> > +               /*
>> > +                * Reset the last 3 bits of global status register in case
>> > +                * previous enabling causes overflows.
>> > +                */
>>
>> The workaround cannot cause on overflow because the associated counters
>> won't count anything given their umask value is 0 (which does not correspond
>> to anything for event 0xB1, event 0xB5 is undocumented). This is for the events
>> described in table A.2. If NHM-EX has a different definition for 0xB1, 0xB5,
>> then that's another story.
> I found the status bit is set by triggering sysrq to dump PMU registers.
>
> If I start perf by gdb, sometimes, perf could work. I found one processor's 1st status
> register is equal to 0 while other processors' are 1. If just starting perf, all 1st
> status registers are equal to 1.
>
>>
>>
>> > +               wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);
>> >
>> >                for (i = 0; i < 3; i++) {
>> >                        struct perf_event *event = cpuc->events[i];
>> >
>> >
>> >
>> > However, it still doesn't work. Current right way is to comment out
>> > the workaround.
>
>
>

2010-07-14 00:48:50

by Yanmin Zhang

[permalink] [raw]
Subject: Re: perf failed with kernel 2.6.35-rc

On Tue, 2010-07-13 at 10:50 +0200, Ingo Molnar wrote:
> * Zhang, Yanmin <[email protected]> wrote:
>
> > Peter,
> >
> > perf doesn't work on my Nehalem EX machine.
> > 1) The 1st start of 'perf top' is ok;
> > 2) Kill the 1st perf and restart it. It doesn't work. No data is showed.
> >
> > I located below commit:
> > commit 1ac62cfff252fb668405ef3398a1fa7f4a0d6d15
> > Author: Peter Zijlstra <[email protected]>
> > Date: Fri Mar 26 14:08:44 2010 +0100
> >
> > perf, x86: Add Nehelem PMU programming errata workaround
> >
> > workaround From: Peter Zijlstra <[email protected]>
> > Date: Fri Mar 26 13:59:41 CET 2010
> >
> > Implement the workaround for Intel Errata AAK100 and AAP53.
> >
> > Also, remove the Core-i7 name for Nehalem events since there are
> > also Westmere based i7 chips.
> >
> >
> > If I comment out the workaround in function intel_pmu_nhm_enable_all,
> > perf could work.
> >
> > A quick glance shows:
> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> > should be:
> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
>
> > I triggered sysrq to dump PMU registers and found the last bit of
> > global status register is 1. I added a status reset operation like below patch:
> >
> > --- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 09:38:11.000000000 +0800
> > +++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 14:41:42.000000000 +0800
> > @@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
> > wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
> > wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
> >
> > - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> > + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
> > + /*
> > + * Reset the last 3 bits of global status register in case
> > + * previous enabling causes overflows.
> > + */
> > + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);
> >
> > for (i = 0; i < 3; i++) {
> > struct perf_event *event = cpuc->events[i];
> >
> >
> > However, it still doesn't work. Current right way is to comment out
> > the workaround.
>
> Well, how about doing it like this:
>
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
> /*
> * Reset the last 3 bits of global status register in case
> * previous enabling causes overflows.
> */
> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);
>
> for (i = 0; i < 3; i++) {
> struct perf_event *event = cpuc->events[i];
> ...
> }
>
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
>

> I.e. global-mask, overflow-clear, explicit-enable, then global-enable?
Ingo,

wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7) is global enable. So it's
global-enable, overflow-clear, explicit-enable, then global-disable?

Yanmin

2010-07-14 02:21:56

by Yanmin Zhang

[permalink] [raw]
Subject: Re: perf failed with kernel 2.6.35-rc

On Wed, 2010-07-14 at 02:36 +0200, Stephane Eranian wrote:
> What about running simpler commands like perf stat?
'perf stat ls' seems ok.
I compare the PMU register dumping info before and after 'perf stat ls'.
Some processors' bit0 of status registers become 0 from 1.

I can't guarantee all 'perf stat' is ok because it seems some processors
wouldn't collect perf statistics correctly while others seem ok.

'gdb perf' could work because one processor seems ok while others couldn't.
So 'gdb perf' actually miss some data.

>
>
> On Wed, Jul 14, 2010 at 2:13 AM, Zhang, Yanmin
> <[email protected]> wrote:
> > On Tue, 2010-07-13 at 17:16 +0200, Stephane Eranian wrote:
> >> On Tue, Jul 13, 2010 at 10:14 AM, Zhang, Yanmin
> >> <[email protected]> wrote:
> >> > Peter,
> >> >
> >> > perf doesn't work on my Nehalem EX machine.
> >> > 1) The 1st start of 'perf top' is ok;
> >> > 2) Kill the 1st perf and restart it. It doesn't work. No data is showed.
> >> >
> >> > I located below commit:
> >> > commit 1ac62cfff252fb668405ef3398a1fa7f4a0d6d15
> >> > Author: Peter Zijlstra <[email protected]>
> >> > Date: Fri Mar 26 14:08:44 2010 +0100
> >> >
> >> > perf, x86: Add Nehelem PMU programming errata workaround
> >> >
> >> > workaround From: Peter Zijlstra <[email protected]>
> >> > Date: Fri Mar 26 13:59:41 CET 2010
> >> >
> >> > Implement the workaround for Intel Errata AAK100 and AAP53.
> >> >
> >> > Also, remove the Core-i7 name for Nehalem events since there are
> >> > also Westmere based i7 chips.
> >> >
> >> >
> >> > If I comment out the workaround in function intel_pmu_nhm_enable_all,
> >> > perf could work.
> >> >
> >> > A quick glance shows:
> >> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> >> > should be:
> >> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
> >> >
> >> >
> >> > I triggered sysrq to dump PMU registers and found the last bit of
> >> > global status register is 1. I added a status reset operation like below patch:
> >> >
> >> What do you call the last bit? bit0 or bit63?
> > Sorry for confusing you. It's bit0, mapping to PERFMON_EVENTSEL0.
> >
> >>
> >> > --- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 09:38:11.000000000 +0800
> >> > +++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 14:41:42.000000000 +0800
> >> > @@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
> >> > wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
> >> > wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
> >> >
> >> > - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> >> > + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
> >> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
> >> > + /*
> >> > + * Reset the last 3 bits of global status register in case
> >> > + * previous enabling causes overflows.
> >> > + */
> >>
> >> The workaround cannot cause on overflow because the associated counters
> >> won't count anything given their umask value is 0 (which does not correspond
> >> to anything for event 0xB1, event 0xB5 is undocumented). This is for the events
> >> described in table A.2. If NHM-EX has a different definition for 0xB1, 0xB5,
> >> then that's another story.
> > I found the status bit is set by triggering sysrq to dump PMU registers.
> >
> > If I start perf by gdb, sometimes, perf could work. I found one processor's 1st status
> > register is equal to 0 while other processors' are 1. If just starting perf, all 1st
> > status registers are equal to 1.
> >
> >>
> >>
> >> > + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);
> >> >
> >> > for (i = 0; i < 3; i++) {
> >> > struct perf_event *event = cpuc->events[i];
> >> >
> >> >
> >> >
> >> > However, it still doesn't work. Current right way is to comment out
> >> > the workaround.

2010-07-14 08:14:03

by Yanmin Zhang

[permalink] [raw]
Subject: Re: perf failed with kernel 2.6.35-rc

On Wed, 2010-07-14 at 08:49 +0800, Zhang, Yanmin wrote:
> On Tue, 2010-07-13 at 10:50 +0200, Ingo Molnar wrote:
> > * Zhang, Yanmin <[email protected]> wrote:
> >
> > > Peter,
> > >
> > > perf doesn't work on my Nehalem EX machine.
> > > 1) The 1st start of 'perf top' is ok;
> > > 2) Kill the 1st perf and restart it. It doesn't work. No data is showed.
> > >
> > > I located below commit:
> > > commit 1ac62cfff252fb668405ef3398a1fa7f4a0d6d15
> > > Author: Peter Zijlstra <[email protected]>
> > > Date: Fri Mar 26 14:08:44 2010 +0100
> > >
> > > perf, x86: Add Nehelem PMU programming errata workaround
> > >
> > > workaround From: Peter Zijlstra <[email protected]>
> > > Date: Fri Mar 26 13:59:41 CET 2010
> > >
> > > Implement the workaround for Intel Errata AAK100 and AAP53.
> > >
> > > Also, remove the Core-i7 name for Nehalem events since there are
> > > also Westmere based i7 chips.
> > >
> > >
> > > If I comment out the workaround in function intel_pmu_nhm_enable_all,
> > > perf could work.
> > >
> > > A quick glance shows:
> > > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> > > should be:
> > > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
> >
> > > I triggered sysrq to dump PMU registers and found the last bit of
> > > global status register is 1. I added a status reset operation like below patch:
> > >
> > > --- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 09:38:11.000000000 +0800
> > > +++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 14:41:42.000000000 +0800
> > > @@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
> > > wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
> > > wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
> > >
> > > - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> > > + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
> > > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
> > > + /*
> > > + * Reset the last 3 bits of global status register in case
> > > + * previous enabling causes overflows.
> > > + */
> > > + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);
> > >
> > > for (i = 0; i < 3; i++) {
> > > struct perf_event *event = cpuc->events[i];
> > >
> > >
> > > However, it still doesn't work. Current right way is to comment out
> > > the workaround.
> >
> > Well, how about doing it like this:
> >
> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
> > /*
> > * Reset the last 3 bits of global status register in case
> > * previous enabling causes overflows.
> > */
> > wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);
> >
> > for (i = 0; i < 3; i++) {
> > struct perf_event *event = cpuc->events[i];
> > ...
> > }
> >
> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
> >
>
> > I.e. global-mask, overflow-clear, explicit-enable, then global-enable?
> Ingo,
>
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7) is global enable. So it's
> global-enable, overflow-clear, explicit-enable, then global-disable?
It doesn't work.

I copied the PMU dumping of logical processor 0.

Besides status register's bit0 is equal to 0, PMC0 count is small. So it takes
a long time to overflow.


CPU#0: ctrl: 000000070000000f
CPU#0: status: 0000000000000001
CPU#0: overflow: 0000000000000000
CPU#0: fixed: 0000000000000000
CPU#0: pebs: 0000000000000000
CPU#0: active: 0000000000000001
CPU#0: gen-PMC0 ctrl: 000000000053003c
CPU#0: gen-PMC0 count: 00000000549bffd9
CPU#0: gen-PMC0 left: 0000000000000002
CPU#0: gen-PMC1 ctrl: 00000000004300b1
CPU#0: gen-PMC1 count: 0000000000000000
CPU#0: gen-PMC1 left: 0000000000000000
CPU#0: gen-PMC2 ctrl: 00000000004300b5
CPU#0: gen-PMC2 count: 0000000000000000
CPU#0: gen-PMC2 left: 0000000000000000
CPU#0: gen-PMC3 ctrl: 0000000000000000
CPU#0: gen-PMC3 count: 0000000000000000
CPU#0: gen-PMC3 left: 0000000000000000
CPU#0: fixed-PMC0 count: 0000000000000000
CPU#0: fixed-PMC1 count: 0000000000000000
CPU#0: fixed-PMC2 count: 0000000000000000
SysRq : Show Regs


I instrumented function intel_pmu_nhm_enable_all to check
PMC0 count register before the workaround and after disabling
the 3 bits of MSR_CORE_PERF_GLOBAL_CTRL. It's changed unexpectedly.
below is a debug output about processor 0.

PMU register counter is changed. before[281474976710654] after[1]

So I think the event 0x4300D2 overflows. We need do a save and restore.
Below patch fixes it.

Yanmin

---

--- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c 2005-01-01 13:19:50.800000253 +0800
+++ linux-2.6.35-rc5_perf/arch/x86/kernel/cpu/perf_event_intel.c 2005-01-01 16:01:35.324000300 +0800
@@ -499,21 +499,34 @@ static void intel_pmu_nhm_enable_all(int
{
if (added) {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct perf_event *event;
int i;

+ for (i = 0; i < 3; i++) {
+ event = cpuc->events[i];
+ if (!event)
+ continue;
+ x86_perf_event_update(event);
+ }
wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 0, 0x4300D2);
wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);

- wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
+ wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
+ /*
+ * Reset the last 3 bits of global status register in case
+ * previous enabling causes overflows.
+ */
+ wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);

for (i = 0; i < 3; i++) {
- struct perf_event *event = cpuc->events[i];
+ event = cpuc->events[i];

if (!event)
continue;

+ x86_perf_event_set_period(event);
__x86_pmu_enable_event(&event->hw,
ARCH_PERFMON_EVENTSEL_ENABLE);
}

2010-08-03 12:20:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf failed with kernel 2.6.35-rc

On Tue, 2010-07-13 at 16:14 +0800, Zhang, Yanmin wrote:

> If I comment out the workaround in function intel_pmu_nhm_enable_all,
> perf could work.
>
> A quick glance shows:
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> should be:
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);

Correct, I've got a patch for that laying around,.. posted it several
times in the struct pmu rework series.

> I triggered sysrq to dump PMU registers and found the last bit of
> global status register is 1. I added a status reset operation like below patch:
>
> --- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 09:38:11.000000000 +0800
> +++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c 2010-07-14 14:41:42.000000000 +0800
> @@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
> wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
> wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
>
> - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
> + /*
> + * Reset the last 3 bits of global status register in case
> + * previous enabling causes overflows.
> + */
> + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);

Like Stephane already pointed out (and the comment above this function
mentions), these counters should be non-counting, and thus should not
cause an overflow (and this function gets called with GLOBAL_CTRL
cleared, so it shouldn't be counting to begin with).

The Nehalem-EX errata BA72 seems to agree (assuming Xeon-7500 is indeed
the EX, Intel really should do something about this naming madness)
http://www.intel.com/Assets/en_US/PDF/specupdate/323344.pdf

[ Also see the trace outout below, PERFCTR[12] stay 0 ]

> for (i = 0; i < 3; i++) {
> struct perf_event *event = cpuc->events[i];
>
>
>
> However, it still doesn't work. Current right way is to comment out
> the workaround.

So I frobbed the below patchlet to debug things, as it appears I can
indeed reproduce on my westmere.

---
arch/x86/kernel/cpu/perf_event.c | 24 +++++++++++++++++++++++-
arch/x86/kernel/cpu/perf_event_intel.c | 2 +-
2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index f2da20f..2ca41f7 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -31,7 +31,7 @@
#include <asm/nmi.h>
#include <asm/compat.h>

-#if 0
+#if 1
#undef wrmsrl
#define wrmsrl(msr, val) \
do { \
@@ -42,6 +42,16 @@ do { \
} while (0)
#endif

+#if 1
+#undef rdmsrl
+#define rdmsrl(msr, val) \
+do { \
+ (val) = native_read_msr(msr); \
+ trace_printk("rdmsrl(%lx, %lx)\n", (unsigned long)(msr),\
+ (unsigned long)(val)); \
+} while (0)
+#endif
+
/*
* best effort, GUP based copy_from_user() that assumes IRQ or NMI context
*/
@@ -583,6 +593,14 @@ static void x86_pmu_disable_all(void)
}
}

+static void noinline check_status(void)
+{
+ u64 status;
+ rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, status);
+ if (WARN_ON(status))
+ perf_event_print_debug();
+}
+
void hw_perf_disable(void)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -598,6 +616,8 @@ void hw_perf_disable(void)
barrier();

x86_pmu.disable_all();
+
+ check_status();
}

static void x86_pmu_enable_all(int added)
@@ -816,6 +836,8 @@ void hw_perf_enable(void)
if (cpuc->enabled)
return;

+ check_status();
+
if (cpuc->n_added) {
int n_running = cpuc->n_events - cpuc->n_added;
/*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 214ac86..56ce7dc 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -505,7 +505,7 @@ static void intel_pmu_nhm_enable_all(int added)
wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);

- wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
+ wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);

for (i = 0; i < 3; i++) {
---

It results in the below trace (edited out some superfluous stack traces:

<...>-1871 [000] 96.759853: intel_pmu_disable_all: wrmsrl(38f, 0)
<...>-1871 [000] 96.759853: <stack trace>
=> hw_perf_disable
=> perf_disable
=> perf_ctx_adjust_freq
=> perf_event_task_tick
=> scheduler_tick
=> update_process_times
=> tick_sched_timer
=> __run_hrtimer
<...>-1871 [000] 96.759872: check_status: rdmsrl(38e, 1)
<...>-1871 [000] 96.759872: <stack trace>
=> hw_perf_disable
=> perf_disable
=> perf_ctx_adjust_freq
=> perf_event_task_tick
=> scheduler_tick
=> update_process_times
=> tick_sched_timer
=> __run_hrtimer
<...>-1871 [000] 96.760180: perf_event_print_debug: rdmsrl(38f, 0)
<...>-1871 [000] 96.760186: perf_event_print_debug: rdmsrl(38e, 1)
<...>-1871 [000] 96.760191: perf_event_print_debug: rdmsrl(390, 0)
<...>-1871 [000] 96.760196: perf_event_print_debug: rdmsrl(38d, 0)
<...>-1871 [000] 96.760201: perf_event_print_debug: rdmsrl(3f1, 0)
<...>-1871 [000] 96.760248: perf_event_print_debug: rdmsrl(186, 53003c)
<...>-1871 [000] 96.760258: perf_event_print_debug: rdmsrl(c1, d1e91)
<...>-1871 [000] 96.760285: perf_event_print_debug: rdmsrl(187, 4300b1)
<...>-1871 [000] 96.760290: perf_event_print_debug: rdmsrl(c2, 0)
<...>-1871 [000] 96.760320: perf_event_print_debug: rdmsrl(188, 4300b5)
<...>-1871 [000] 96.760340: perf_event_print_debug: rdmsrl(c3, 0)
<...>-1871 [000] 96.760359: perf_event_print_debug: rdmsrl(189, 0)
<...>-1871 [000] 96.760379: perf_event_print_debug: rdmsrl(c4, 0)
<...>-1871 [000] 96.760398: perf_event_print_debug: rdmsrl(309, 0)
<...>-1871 [000] 96.760417: perf_event_print_debug: rdmsrl(30a, 0)
<...>-1871 [000] 96.760432: perf_event_print_debug: rdmsrl(30b, 0)
<...>-1871 [000] 96.760432: <stack trace>
=> check_status
=> hw_perf_disable
=> perf_disable
=> perf_ctx_adjust_freq
=> perf_event_task_tick
=> scheduler_tick
=> update_process_times
=> tick_sched_timer

Which seems to indicate we somehow disable the PMU (clear GLOBAL_CTRL)
with a pending overflow (as indicated by GLOBAL_STATUS and PERFCTR0).

So there's two questions:
1) how can we end up in the above state
2) why does the fixup change things

most confusing..

2010-08-04 15:49:04

by Stephane Eranian

[permalink] [raw]
Subject: Re: perf failed with kernel 2.6.35-rc

On Tue, Aug 3, 2010 at 2:20 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2010-07-13 at 16:14 +0800, Zhang, Yanmin wrote:
>
>> If I comment out the workaround in function intel_pmu_nhm_enable_all,
>> perf could work.
>>
>> A quick glance shows:
>> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
>> should be:
>> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
>
> Correct, I've got a patch for that laying around,.. posted it several
> times in the struct pmu rework series.
>
>> I triggered sysrq to dump PMU registers and found the last bit of
>> global status register is 1. I added a status reset operation like below patch:
>>
>> --- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c   2010-07-14 09:38:11.000000000 +0800
>> +++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c      2010-07-14 14:41:42.000000000 +0800
>> @@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
>>               wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
>>               wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
>>
>> -             wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
>> +             wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
>>               wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
>> +             /*
>> +              * Reset the last 3 bits of global status register in case
>> +              * previous enabling causes overflows.
>> +              */
>> +             wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);
>
> Like Stephane already pointed out (and the comment above this function
> mentions), these counters should be non-counting, and thus should not
> cause an overflow (and this function gets called with GLOBAL_CTRL
> cleared, so it shouldn't be counting to begin with).
>
> The Nehalem-EX errata BA72 seems to agree (assuming Xeon-7500 is indeed
> the EX, Intel really should do something about this naming madness)
> http://www.intel.com/Assets/en_US/PDF/specupdate/323344.pdf
>
> [ Also see the trace outout below, PERFCTR[12] stay 0 ]
>
>>               for (i = 0; i < 3; i++) {
>>                       struct perf_event *event = cpuc->events[i];
>>
>>
>>
>> However, it still doesn't work. Current right way is to comment out
>> the workaround.
>
> So I frobbed the below patchlet to debug things, as it appears I can
> indeed reproduce on my westmere.
>
> ---
>  arch/x86/kernel/cpu/perf_event.c       |   24 +++++++++++++++++++++++-
>  arch/x86/kernel/cpu/perf_event_intel.c |    2 +-
>  2 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index f2da20f..2ca41f7 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -31,7 +31,7 @@
>  #include <asm/nmi.h>
>  #include <asm/compat.h>
>
> -#if 0
> +#if 1
>  #undef wrmsrl
>  #define wrmsrl(msr, val)                                       \
>  do {                                                           \
> @@ -42,6 +42,16 @@ do {                                                         \
>  } while (0)
>  #endif
>
> +#if 1
> +#undef rdmsrl
> +#define rdmsrl(msr, val)                                       \
> +do {                                                           \
> +       (val) = native_read_msr(msr);                           \
> +       trace_printk("rdmsrl(%lx, %lx)\n", (unsigned long)(msr),\
> +                       (unsigned long)(val));                  \
> +} while (0)
> +#endif
> +
>  /*
>  * best effort, GUP based copy_from_user() that assumes IRQ or NMI context
>  */
> @@ -583,6 +593,14 @@ static void x86_pmu_disable_all(void)
>        }
>  }
>
> +static void noinline check_status(void)
> +{
> +       u64 status;
> +       rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, status);
> +       if (WARN_ON(status))
> +               perf_event_print_debug();
> +}
> +
>  void hw_perf_disable(void)
>  {
>        struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> @@ -598,6 +616,8 @@ void hw_perf_disable(void)
>        barrier();
>
>        x86_pmu.disable_all();
> +
> +       check_status();
>  }
>
>  static void x86_pmu_enable_all(int added)
> @@ -816,6 +836,8 @@ void hw_perf_enable(void)
>        if (cpuc->enabled)
>                return;
>
> +       check_status();
> +
>        if (cpuc->n_added) {
>                int n_running = cpuc->n_events - cpuc->n_added;
>                /*
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index 214ac86..56ce7dc 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -505,7 +505,7 @@ static void intel_pmu_nhm_enable_all(int added)
>                wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
>                wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
>
> -               wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> +               wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
>                wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
>
>                for (i = 0; i < 3; i++) {
> ---
>
> It results in the below trace (edited out some superfluous stack traces:
>
>           <...>-1871  [000]    96.759853: intel_pmu_disable_all: wrmsrl(38f, 0)
>           <...>-1871  [000]    96.759853: <stack trace>
>  => hw_perf_disable
>  => perf_disable
>  => perf_ctx_adjust_freq
>  => perf_event_task_tick
>  => scheduler_tick
>  => update_process_times
>  => tick_sched_timer
>  => __run_hrtimer
>           <...>-1871  [000]    96.759872: check_status: rdmsrl(38e, 1)
>           <...>-1871  [000]    96.759872: <stack trace>
>  => hw_perf_disable
>  => perf_disable
>  => perf_ctx_adjust_freq
>  => perf_event_task_tick
>  => scheduler_tick
>  => update_process_times
>  => tick_sched_timer
>  => __run_hrtimer
>           <...>-1871  [000]    96.760180: perf_event_print_debug: rdmsrl(38f, 0)
>           <...>-1871  [000]    96.760186: perf_event_print_debug: rdmsrl(38e, 1)
>           <...>-1871  [000]    96.760191: perf_event_print_debug: rdmsrl(390, 0)
>           <...>-1871  [000]    96.760196: perf_event_print_debug: rdmsrl(38d, 0)
>           <...>-1871  [000]    96.760201: perf_event_print_debug: rdmsrl(3f1, 0)
>           <...>-1871  [000]    96.760248: perf_event_print_debug: rdmsrl(186, 53003c)
>           <...>-1871  [000]    96.760258: perf_event_print_debug: rdmsrl(c1, d1e91)
>           <...>-1871  [000]    96.760285: perf_event_print_debug: rdmsrl(187, 4300b1)
>           <...>-1871  [000]    96.760290: perf_event_print_debug: rdmsrl(c2, 0)
>           <...>-1871  [000]    96.760320: perf_event_print_debug: rdmsrl(188, 4300b5)
>           <...>-1871  [000]    96.760340: perf_event_print_debug: rdmsrl(c3, 0)
>           <...>-1871  [000]    96.760359: perf_event_print_debug: rdmsrl(189, 0)
>           <...>-1871  [000]    96.760379: perf_event_print_debug: rdmsrl(c4, 0)
>           <...>-1871  [000]    96.760398: perf_event_print_debug: rdmsrl(309, 0)
>           <...>-1871  [000]    96.760417: perf_event_print_debug: rdmsrl(30a, 0)
>           <...>-1871  [000]    96.760432: perf_event_print_debug: rdmsrl(30b, 0)
>           <...>-1871  [000]    96.760432: <stack trace>
>  => check_status
>  => hw_perf_disable
>  => perf_disable
>  => perf_ctx_adjust_freq
>  => perf_event_task_tick
>  => scheduler_tick
>  => update_process_times
>  => tick_sched_timer
>
> Which seems to indicate we somehow disable the PMU (clear GLOBAL_CTRL)
> with a pending overflow (as indicated by GLOBAL_STATUS and PERFCTR0).
>
At the time you stop the PMU, you could have an in-flight overflow, i.e., it may
take some time to set the overflow status bit. But this seems far
fetched because
it would be very hard to reproduce. Do you get the same problem is you restrict
the event to counting at the user level, for instance?

Another good source of infos is the Intel VTUNE/PTU driver, called SEP. The
source is provided when you download PTU. I checked the code and they do
have a fix for the same errata. It seems to issue many more wrmsrl().

2010-08-06 05:36:10

by Yanmin Zhang

[permalink] [raw]
Subject: Re: perf failed with kernel 2.6.35-rc

On Thu, 2010-08-05 at 12:32 +0200, Peter Zijlstra wrote:
> On Thu, 2010-08-05 at 14:40 +0800, Zhang, Yanmin wrote:
>
> > Here is the new patch.
I did more tracing. It seems only PMC0 overflows unexpectedly with the Errata.

> >
> > ---
> >
> > --- linux-2.6.35/arch/x86/kernel/cpu/perf_event_intel.c 2010-08-02 06:11:14.000000000 +0800
> > +++ linux-2.6.35_perferrata/arch/x86/kernel/cpu/perf_event_intel.c 2010-08-06 13:43:34.000000000 +0800
> > @@ -499,21 +499,40 @@ static void intel_pmu_nhm_enable_all(int
> > {
> > if (added) {
> > struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> > + struct perf_event *event;
> > + int num_counters;
> > int i;
> >
> > - wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 0, 0x4300D2);
> > - wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
> > - wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
> > + /* We always operate 4 pairs of PERF Counters */
> > + num_counters = 4;
>
> Either hardcode 4 (its a model specific errata after all), or use
> x86_pmu.num_counters.
Ok.

>
> Also, please update the comment describing the new work-around.
I will add detailed steps.

>
> > + for (i = 0; i < num_counters; i++) {
> > + event = cpuc->events[i];
> > + if (!event)
> > + continue;
> > + x86_perf_event_update(event);
> > + }
> >
> > - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> > + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 0, 0x4300B5);
> > + wrmsrl(MSR_ARCH_PERFMON_PERFCTR0 + 0, 0x0);
> > + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300D2);
> > + wrmsrl(MSR_ARCH_PERFMON_PERFCTR0 + 1, 0x0);
> > + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B1);
> > + wrmsrl(MSR_ARCH_PERFMON_PERFCTR0 + 2, 0x0);
> > + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 3, 0x4300B1);
> > + wrmsrl(MSR_ARCH_PERFMON_PERFCTR0 + 3, 0x0);
> > +
>
> Maybe write that as:
>
> static const unsigned long magic[4] = { 0x4300B5, 0x4300D2, 0x4300B1, 0x4300B1 };
>
> for (i = 0; i < 4; i++) {
> wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, magic[i]);
> wrmsrl(MSR_ARCH_PERFMON_PERFCTR0 + i, 0);
> }
>
> > + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0xf);
> > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
> >
> > - for (i = 0; i < 3; i++) {
> > - struct perf_event *event = cpuc->events[i];
> > + for (i = 0; i < num_counters; i++) {
> > + event = cpuc->events[i];
> >
> > - if (!event)
> > + if (!event) {
> > + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, 0x0);
>
> Indeed, if they are counting we need to at least stop them, and clearing
> them is indeed the simplest way.
>
> > continue;
> > + }
> >
> > + x86_perf_event_set_period(event);
> > __x86_pmu_enable_event(&event->hw,
> > ARCH_PERFMON_EVENTSEL_ENABLE);
> > }
>
> Please send the updated patch for inclusion, Thanks!
Below is the patch against 2.6.35.

Fix Errata AAK100/AAP53/BD53.

Signed-off-by: Zhang Yanmin <[email protected]>

---

--- linux-2.6.35/arch/x86/kernel/cpu/perf_event_intel.c 2010-08-02 06:11:14.000000000 +0800
+++ linux-2.6.35_perferrata/arch/x86/kernel/cpu/perf_event_intel.c 2010-08-07 10:06:03.000000000 +0800
@@ -492,32 +492,73 @@ static void intel_pmu_enable_all(int add
* Intel Errata BD53 (model 44)
*
* These chips need to be 'reset' when adding counters by programming
- * the magic three (non counting) events 0x4300D2, 0x4300B1 and 0x4300B5
+ * the magic three (non counting) events 0x4300B5, 0x4300D2, and 0x4300B1
* either in sequence on the same PMC or on different PMCs.
*/
-static void intel_pmu_nhm_enable_all(int added)
+static void intel_pmu_nhm_workaround(void)
{
- if (added) {
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
- int i;
-
- wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 0, 0x4300D2);
- wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
- wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ static const unsigned long nhm_magic[4] = {
+ 0x4300B5,
+ 0x4300D2,
+ 0x4300B1,
+ 0x4300B1
+ };
+ struct perf_event *event;
+ int i;
+
+ /*
+ * The Errata requires below steps:
+ * 1) Clear MSR_IA32_PEBS_ENABLE and MSR_CORE_PERF_GLOBAL_CTRL;
+ * 2) Configure 4 PERFEVTSELx with the magic number and clear
+ * the corresponding PMCx;
+ * 3) set bit0~bit3 of MSR_CORE_PERF_GLOBAL_CTRL;
+ * 4) Clear MSR_CORE_PERF_GLOBAL_CTRL;
+ * 5) Clear 4 pairs of ERFEVTSELx and PMCx;
+ */

- wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
- wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
+ /*
+ * The real steps we choose are a little different from above.
+ * A) To reduce MSR operations, we don't run step 1) as they
+ * are already cleared before this function is called;
+ * B) Call x86_perf_event_update to save PMCx before configuring
+ * PERFEVTSELx with magic number;
+ * C) With step 5), we do clear only when the PERFEVTSELx is
+ * not used currently.
+ * D) Call x86_perf_event_set_period to restore PMCx;
+ */

- for (i = 0; i < 3; i++) {
- struct perf_event *event = cpuc->events[i];
+ /* We always operate 4 pairs of PERF Counters */
+ for (i = 0; i < 4; i++) {
+ event = cpuc->events[i];
+ if (event)
+ x86_perf_event_update(event);
+ }

- if (!event)
- continue;
+ for (i = 0; i < 4; i++) {
+ wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, nhm_magic[i]);
+ wrmsrl(MSR_ARCH_PERFMON_PERFCTR0 + i, 0x0);
+ }

+ wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0xf);
+ wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
+
+ for (i = 0; i < 4; i++) {
+ event = cpuc->events[i];
+
+ if (event) {
+ x86_perf_event_set_period(event);
__x86_pmu_enable_event(&event->hw,
- ARCH_PERFMON_EVENTSEL_ENABLE);
- }
+ ARCH_PERFMON_EVENTSEL_ENABLE);
+ } else
+ wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, 0x0);
}
+}
+
+static void intel_pmu_nhm_enable_all(int added)
+{
+ if (added)
+ intel_pmu_nhm_workaround();
intel_pmu_enable_all(added);
}


2010-08-18 10:28:20

by Yanmin Zhang

[permalink] [raw]
Subject: [tip:perf/urgent] perf, x86: Fix Intel-nhm PMU programming errata workaround

Commit-ID: 351af0725e5222e35741011d1ea62215c1ed06db
Gitweb: http://git.kernel.org/tip/351af0725e5222e35741011d1ea62215c1ed06db
Author: Zhang, Yanmin <[email protected]>
AuthorDate: Fri, 6 Aug 2010 13:39:08 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 18 Aug 2010 11:17:39 +0200

perf, x86: Fix Intel-nhm PMU programming errata workaround

Fix the Errata AAK100/AAP53/BD53 workaround, the officialy documented
workaround we implemented in:

11164cd: perf, x86: Add Nehelem PMU programming errata workaround

doesn't actually work fully and causes a stuck PMU state
under load and non-functioning perf profiling.

A functional workaround was found by trial & error.

Affects all Nehalem-class Intel PMUs.

Signed-off-by: Zhang Yanmin <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: <[email protected]> # .35.x
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel.c | 81 +++++++++++++++++++++++++-------
1 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 214ac86..d8d86d0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -491,33 +491,78 @@ static void intel_pmu_enable_all(int added)
* Intel Errata AAP53 (model 30)
* Intel Errata BD53 (model 44)
*
- * These chips need to be 'reset' when adding counters by programming
- * the magic three (non counting) events 0x4300D2, 0x4300B1 and 0x4300B5
- * either in sequence on the same PMC or on different PMCs.
+ * The official story:
+ * These chips need to be 'reset' when adding counters by programming the
+ * magic three (non-counting) events 0x4300B5, 0x4300D2, and 0x4300B1 either
+ * in sequence on the same PMC or on different PMCs.
+ *
+ * In practise it appears some of these events do in fact count, and
+ * we need to programm all 4 events.
*/
-static void intel_pmu_nhm_enable_all(int added)
+static void intel_pmu_nhm_workaround(void)
{
- if (added) {
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
- int i;
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ static const unsigned long nhm_magic[4] = {
+ 0x4300B5,
+ 0x4300D2,
+ 0x4300B1,
+ 0x4300B1
+ };
+ struct perf_event *event;
+ int i;
+
+ /*
+ * The Errata requires below steps:
+ * 1) Clear MSR_IA32_PEBS_ENABLE and MSR_CORE_PERF_GLOBAL_CTRL;
+ * 2) Configure 4 PERFEVTSELx with the magic events and clear
+ * the corresponding PMCx;
+ * 3) set bit0~bit3 of MSR_CORE_PERF_GLOBAL_CTRL;
+ * 4) Clear MSR_CORE_PERF_GLOBAL_CTRL;
+ * 5) Clear 4 pairs of ERFEVTSELx and PMCx;
+ */
+
+ /*
+ * The real steps we choose are a little different from above.
+ * A) To reduce MSR operations, we don't run step 1) as they
+ * are already cleared before this function is called;
+ * B) Call x86_perf_event_update to save PMCx before configuring
+ * PERFEVTSELx with magic number;
+ * C) With step 5), we do clear only when the PERFEVTSELx is
+ * not used currently.
+ * D) Call x86_perf_event_set_period to restore PMCx;
+ */

- wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 0, 0x4300D2);
- wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
- wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
+ /* We always operate 4 pairs of PERF Counters */
+ for (i = 0; i < 4; i++) {
+ event = cpuc->events[i];
+ if (event)
+ x86_perf_event_update(event);
+ }

- wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
- wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
+ for (i = 0; i < 4; i++) {
+ wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, nhm_magic[i]);
+ wrmsrl(MSR_ARCH_PERFMON_PERFCTR0 + i, 0x0);
+ }

- for (i = 0; i < 3; i++) {
- struct perf_event *event = cpuc->events[i];
+ wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0xf);
+ wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);

- if (!event)
- continue;
+ for (i = 0; i < 4; i++) {
+ event = cpuc->events[i];

+ if (event) {
+ x86_perf_event_set_period(event);
__x86_pmu_enable_event(&event->hw,
- ARCH_PERFMON_EVENTSEL_ENABLE);
- }
+ ARCH_PERFMON_EVENTSEL_ENABLE);
+ } else
+ wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, 0x0);
}
+}
+
+static void intel_pmu_nhm_enable_all(int added)
+{
+ if (added)
+ intel_pmu_nhm_workaround();
intel_pmu_enable_all(added);
}