2011-04-06 22:31:01

by Shaun Ruffell

[permalink] [raw]
Subject: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

Hello Don,

With 2.6.39-rc2 I was seeing the following NMIs when building the kernel:

[ 191.647131] Uhhuh. NMI received for unknown reason 21 on CPU 3.
[ 191.650068] Do you have a strange power saving mode enabled?
[ 191.650068] Dazed and confused, but trying to continue
[ 676.020001] Uhhuh. NMI received for unknown reason 21 on CPU 1.
[ 676.020001] Do you have a strange power saving mode enabled?
[ 676.020001] Dazed and confused, but trying to continue
[ 892.520335] Starting new kernel

I'm running on a Dell PowerEdge 2600 with the following processor:

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 3.06GHz
stepping : 7
...

I was able to bisect it down to commit 242214f9c1eeaae40, but I'm not
certain where to go from here. Is this something that is already known
or is there more information I should try to collect?

Here is the commit for reference:

commit 242214f9c1eeaae40eca11e3b4d37bfce960a7cd
Author: Don Zickus <[email protected]>
Date: Thu Mar 24 23:36:25 2011 +0300

perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows

The read of a proper MSR register was missed and instead of
counter the configration register was tested (it has
ARCH_P4_UNFLAGGED_BIT always cleared) leading to unknown NMI
hitting the system. As result the user may obtain "Dazed and
confused, but trying to continue" message. Fix it by reading a
proper MSR register.

When an NMI happens on a P4, the perf nmi handler checks the
configuration register to see if the overflow bit is set or not
before taking appropriate action. Unfortunately, various P4
machines had a broken overflow bit, so a backup mechanism was
implemented. This mechanism checked to see if the counter
rolled over or not.

A previous commit that implemented this backup mechanism was
broken. Instead of reading the counter register, it used the
configuration register to determine if the counter rolled over
or not. Reading that bit would give incorrect results.

This would lead to 'Dazed and confused' messages for the end
user when using the perf tool (or if the nmi watchdog is
running).

The fix is to read the counter register before determining if
the counter rolled over or not.

Signed-off-by: Don Zickus <[email protected]>
Signed-off-by: Cyrill Gorcunov <[email protected]>
Cc: Lin Ming <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 3769ac8..d3d7b59 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -777,6 +777,7 @@ static inline int p4_pmu_clear_cccr_ovf(struct hw_perf_event *hwc)
* the counter has reached zero value and continued counting before
* real NMI signal was received:
*/
+ rdmsrl(hwc->event_base, v);
if (!(v & ARCH_P4_UNFLAGGED_BIT))
return 1;


2011-04-07 00:17:10

by Don Zickus

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On Wed, Apr 06, 2011 at 05:30:36PM -0500, Shaun Ruffell wrote:
> Hello Don,
>
> I was able to bisect it down to commit 242214f9c1eeaae40, but I'm not
> certain where to go from here. Is this something that is already known
> or is there more information I should try to collect?

Nope, this is an ongoing issue. What happened was the perf P4 nmi handler
was swallowing all the NMIs. My patch fixed that and exposed a double NMI
problem. We have been chasing it for a couple of months. I think Cyril
was finally able to duplicate it (as he wrote the P4 code). I have
confidence that he will find a fix for it soon. :-)

Thanks for the report though!

Cheers,
Don

2011-04-07 03:18:52

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On Thursday, April 7, 2011, Don Zickus <[email protected]> wrote:
> On Wed, Apr 06, 2011 at 05:30:36PM -0500, Shaun Ruffell wrote:
>> Hello Don,
>>
>> I was able to bisect it down to commit 242214f9c1eeaae40, but I'm not
>> certain where to go from here. ?Is this something that is already known
>> or is there more information I should try to collect?
>
> Nope, this is an ongoing issue. ?What happened was the perf P4 nmi handler
> was swallowing all the NMIs. ?My patch fixed that and exposed a double NMI
> problem. ?We have been chasing it for a couple of months. ?I think Cyril
> was finally able to duplicate it (as he wrote the P4 code). ?I have
> confidence that he will find a fix for it soon. :-)
>
> Thanks for the report though!
>
> Cheers,
> Don
>

Hi, yeah, i got it too and i hope to fix this issue soon. Will ping as
only get working fix.

2011-04-07 14:39:05

by Shaun Ruffell

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On Thu, Apr 07, 2011 at 07:18:50AM +0400, Cyrill Gorcunov wrote:
> On Thursday, April 7, 2011, Don Zickus <[email protected]> wrote:
>> On Wed, Apr 06, 2011 at 05:30:36PM -0500, Shaun Ruffell wrote:
>>>
>>> I was able to bisect it down to commit 242214f9c1eeaae40, but I'm not
>>> certain where to go from here. ?Is this something that is already known
>>> or is there more information I should try to collect?
>>
>> Nope, this is an ongoing issue. ?What happened was the perf P4 nmi handler
>> was swallowing all the NMIs. ?My patch fixed that and exposed a double NMI
>> problem. ?We have been chasing it for a couple of months. ?I think Cyril
>> was finally able to duplicate it (as he wrote the P4 code). ?I have
>> confidence that he will find a fix for it soon. :-)
>>
>> Thanks for the report though!
>
> Hi, yeah, i got it too and i hope to fix this issue soon. Will ping as
> only get working fix.

Don, Cyrill,

Thanks for the explanation and my apologies for not relating the
previous discussions about this to what I was seeing. This issue would
be a blocker for any 2.6.39 final right?

Cyrill, I would be more than happy to test any patches. It's relatively
quick for me to reproduce.

Thanks,
Shaun

2011-04-07 14:43:09

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On 04/07/2011 06:38 PM, Shaun Ruffell wrote:
> On Thu, Apr 07, 2011 at 07:18:50AM +0400, Cyrill Gorcunov wrote:
>> On Thursday, April 7, 2011, Don Zickus <[email protected]> wrote:
>>> On Wed, Apr 06, 2011 at 05:30:36PM -0500, Shaun Ruffell wrote:
>>>>
>>>> I was able to bisect it down to commit 242214f9c1eeaae40, but I'm not
>>>> certain where to go from here. Is this something that is already known
>>>> or is there more information I should try to collect?
>>>
>>> Nope, this is an ongoing issue. What happened was the perf P4 nmi handler
>>> was swallowing all the NMIs. My patch fixed that and exposed a double NMI
>>> problem. We have been chasing it for a couple of months. I think Cyril
>>> was finally able to duplicate it (as he wrote the P4 code). I have
>>> confidence that he will find a fix for it soon. :-)
>>>
>>> Thanks for the report though!
>>
>> Hi, yeah, i got it too and i hope to fix this issue soon. Will ping as
>> only get working fix.
>
> Don, Cyrill,
>
> Thanks for the explanation and my apologies for not relating the
> previous discussions about this to what I was seeing. This issue would
> be a blocker for any 2.6.39 final right?

Well, could be, I didn't find the real reason for doubled nmi.
Still investigating. As a workaround you could simply disable
nmi-watchdog for a while in command line if it bothers you.

>
> Cyrill, I would be more than happy to test any patches. It's relatively
> quick for me to reproduce.

OK, I'll prepare some patches for you to test. Still think on where it
could fails :( Stay tuned.

>
> Thanks,
> Shaun

--
Cyrill

2011-04-13 19:33:59

by Maciej Rutecki

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

I created a Bugzilla entry at
https://bugzilla.kernel.org/show_bug.cgi?id=33252
for your bug report, please add your address to the CC list in there, thanks!

On czwartek, 7 kwietnia 2011 o 00:30:36 Shaun Ruffell wrote:
> Hello Don,
>
> With 2.6.39-rc2 I was seeing the following NMIs when building the kernel:
>
> [ 191.647131] Uhhuh. NMI received for unknown reason 21 on CPU 3.
> [ 191.650068] Do you have a strange power saving mode enabled?
> [ 191.650068] Dazed and confused, but trying to continue
> [ 676.020001] Uhhuh. NMI received for unknown reason 21 on CPU 1.
> [ 676.020001] Do you have a strange power saving mode enabled?
> [ 676.020001] Dazed and confused, but trying to continue
> [ 892.520335] Starting new kernel
>
> I'm running on a Dell PowerEdge 2600 with the following processor:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 15
> model : 2
> model name : Intel(R) Xeon(TM) CPU 3.06GHz
> stepping : 7
> ...
>
> I was able to bisect it down to commit 242214f9c1eeaae40, but I'm not
> certain where to go from here. Is this something that is already known
> or is there more information I should try to collect?
>
> Here is the commit for reference:
>
> commit 242214f9c1eeaae40eca11e3b4d37bfce960a7cd
> Author: Don Zickus <[email protected]>
> Date: Thu Mar 24 23:36:25 2011 +0300
>
> perf, x86: P4 PMU - Read proper MSR register to catch unflagged
> overflows
>
> The read of a proper MSR register was missed and instead of
> counter the configration register was tested (it has
> ARCH_P4_UNFLAGGED_BIT always cleared) leading to unknown NMI
> hitting the system. As result the user may obtain "Dazed and
> confused, but trying to continue" message. Fix it by reading a
> proper MSR register.
>
> When an NMI happens on a P4, the perf nmi handler checks the
> configuration register to see if the overflow bit is set or not
> before taking appropriate action. Unfortunately, various P4
> machines had a broken overflow bit, so a backup mechanism was
> implemented. This mechanism checked to see if the counter
> rolled over or not.
>
> A previous commit that implemented this backup mechanism was
> broken. Instead of reading the counter register, it used the
> configuration register to determine if the counter rolled over
> or not. Reading that bit would give incorrect results.
>
> This would lead to 'Dazed and confused' messages for the end
> user when using the perf tool (or if the nmi watchdog is
> running).
>
> The fix is to read the counter register before determining if
> the counter rolled over or not.
>
> Signed-off-by: Don Zickus <[email protected]>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> Cc: Lin Ming <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> diff --git a/arch/x86/kernel/cpu/perf_event_p4.c
> b/arch/x86/kernel/cpu/perf_event_p4.c index 3769ac8..d3d7b59 100644
> --- a/arch/x86/kernel/cpu/perf_event_p4.c
> +++ b/arch/x86/kernel/cpu/perf_event_p4.c
> @@ -777,6 +777,7 @@ static inline int p4_pmu_clear_cccr_ovf(struct
> hw_perf_event *hwc) * the counter has reached zero value and continued
> counting before * real NMI signal was received:
> */
> + rdmsrl(hwc->event_base, v);
> if (!(v & ARCH_P4_UNFLAGGED_BIT))
> return 1;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Maciej Rutecki
http://www.maciek.unixy.pl

2011-04-13 20:01:38

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On 04/13/2011 11:33 PM, Maciej Rutecki wrote:
> I created a Bugzilla entry at
> https://bugzilla.kernel.org/show_bug.cgi?id=33252
> for your bug report, please add your address to the CC list in there, thanks!
>

Here is a patch flying around which I tuned a bit, when all reporters confirm it
works for them we could close the bug. Thanks.

Cyrill
--
From: Don Zickus <[email protected]>
Subject: [PATCH] perf, x86: fix unknown NMIs on a Pentium4 box v2

When using perf on a Pentium4 box, lots of unknown NMIs would be generated.
This is the result of a P4 quirk that is subtle. The P4 generates an NMI
when the counter overflow and unlike other arches where the NMI is a one time
event, the P4 continues to assert its NMI until clear by the OS.

As a side effect to this quirk, the NMI on the apic is masked off to prevent
a stream of NMIs until the overflow flag is cleared. During the perf
re-design, this subtle-ness was overlooked and the apic was unmasked _before_
the overflow flag was cleared. As a result, this generated an extra NMI on
the P4 mchines.

The fix is trivial, wait until the NMI is properly handled before un-masking
the apic.

Sadly, in the old nmi watchdog there was a note that explained this exact
behaviour.

v2: Unmask LVT entry iif IRQ being handled by perf subsystem and add a comment.

Signed-off-by: Don Zickus <[email protected]>
Signed-off-by: Cyrill Gorcunov <[email protected]>
---

Don, Shaun, Ming, I've tested it on my non-HT machine, so if you have a chance
to test it on HT machine -- this would be a great thing!

Don, note the version v2 changes, thanks. I've tuned the former a bit
but left your From field untouched, are you OK with that?

arch/x86/kernel/cpu/perf_event.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
@@ -1370,12 +1370,19 @@ perf_event_nmi_handler(struct notifier_b
return NOTIFY_DONE;
}

- apic_write(APIC_LVTPC, APIC_DM_NMI);

handled = x86_pmu.handle_irq(args->regs);
if (!handled)
return NOTIFY_DONE;

+ /*
+ * Unmasking should be done after IRQ handled, otherwise
+ * there is a race between clearing of counter overflow
+ * flag and LTV entry unmasking (which might lead to double
+ * NMIs generation).
+ */
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+
this_nmi = percpu_read(irq_stat.__nmi_count);
if ((handled > 1) ||
/* the next nmi could be a back-to-back nmi */

2011-04-13 20:35:20

by Shaun Ruffell

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On Thu, Apr 14, 2011 at 12:01:35AM +0400, Cyrill Gorcunov wrote:
> On 04/13/2011 11:33 PM, Maciej Rutecki wrote:
>> I created a Bugzilla entry at
>> https://bugzilla.kernel.org/show_bug.cgi?id=33252
>> for your bug report, please add your address to the CC list in there, thanks!
>
> Here is a patch flying around which I tuned a bit, when all reporters confirm it
> works for them we could close the bug. Thanks.
>
> Cyrill
> --
> From: Don Zickus <[email protected]>
> Subject: [PATCH] perf, x86: fix unknown NMIs on a Pentium4 box v2
>
> When using perf on a Pentium4 box, lots of unknown NMIs would be generated.
> This is the result of a P4 quirk that is subtle. The P4 generates an NMI
> when the counter overflow and unlike other arches where the NMI is a one time
> event, the P4 continues to assert its NMI until clear by the OS.
>
> As a side effect to this quirk, the NMI on the apic is masked off to prevent
> a stream of NMIs until the overflow flag is cleared. During the perf
> re-design, this subtle-ness was overlooked and the apic was unmasked _before_
> the overflow flag was cleared. As a result, this generated an extra NMI on
> the P4 mchines.
>
> The fix is trivial, wait until the NMI is properly handled before un-masking
> the apic.
>
> Sadly, in the old nmi watchdog there was a note that explained this exact
> behaviour.
>
> v2: Unmask LVT entry iif IRQ being handled by perf subsystem and add a comment.
>
> Signed-off-by: Don Zickus <[email protected]>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> ---
>
> Don, Shaun, Ming, I've tested it on my non-HT machine, so if you have a chance
> to test it on HT machine -- this would be a great thing!
>
> Don, note the version v2 changes, thanks. I've tuned the former a bit
> but left your From field untouched, are you OK with that?
>
> arch/x86/kernel/cpu/perf_event.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
> =====================================================================
> --- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
> +++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
> @@ -1370,12 +1370,19 @@ perf_event_nmi_handler(struct notifier_b
> return NOTIFY_DONE;
> }
>
> - apic_write(APIC_LVTPC, APIC_DM_NMI);
>
> handled = x86_pmu.handle_irq(args->regs);
> if (!handled)
> return NOTIFY_DONE;
>
> + /*
> + * Unmasking should be done after IRQ handled, otherwise
> + * there is a race between clearing of counter overflow
> + * flag and LTV entry unmasking (which might lead to double
> + * NMIs generation).
> + */
> + apic_write(APIC_LVTPC, APIC_DM_NMI);
> +
> this_nmi = percpu_read(irq_stat.__nmi_count);
> if ((handled > 1) ||
> /* the next nmi could be a back-to-back nmi */

I had the first version of the patch running the test builds all night without
any NMIs. I installed this one and ran it through the case where I would
reliably get early NMIs and it still no NMIs.

So for v2:
Tested-by: Shaun Ruffell <[email protected]>

Thanks!

2011-04-13 20:53:06

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On 04/14/2011 12:35 AM, Shaun Ruffell wrote:
...
>
> I had the first version of the patch running the test builds all night without
> any NMIs. I installed this one and ran it through the case where I would
> reliably get early NMIs and it still no NMIs.
>
> So for v2:
> Tested-by: Shaun Ruffell <[email protected]>
>
> Thanks!

Thanks a huge Shaun. The thing is (if only I don't miss something) at moment there is
no much difference in which patch to pick up. But as only kgdb dives in or any other
subsystem (which say would use same manner of nmi delivery) we might be unmasking
lvt entry even if nothing were handled at all, so I bias to a second version.

--
Cyrill

2011-04-13 21:22:20

by Don Zickus

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On Thu, Apr 14, 2011 at 12:43:47AM +0400, Cyrill Gorcunov wrote:
> On 04/14/2011 12:35 AM, Shaun Ruffell wrote:
> ...
> >
> > I had the first version of the patch running the test builds all night without
> > any NMIs. I installed this one and ran it through the case where I would
> > reliably get early NMIs and it still no NMIs.
> >
> > So for v2:
> > Tested-by: Shaun Ruffell <[email protected]>
> >
> > Thanks!
>
> Thanks a huge Shaun. The thing is (if only I don't miss something) at moment there is
> no much difference in which patch to pick up. But as only kgdb dives in or any other
> subsystem (which say would use same manner of nmi delivery) we might be unmasking
> lvt entry even if nothing were handled at all, so I bias to a second version.

I agree with the second version. Initially I wanted to enable it in the
case of the !handled path. But your reasoning makes sense to me, don't
enable it in the !handled case because you might accidentally do something
bad.

Cheers,
Don

2011-04-13 21:25:55

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On 04/14/2011 01:22 AM, Don Zickus wrote:
> On Thu, Apr 14, 2011 at 12:43:47AM +0400, Cyrill Gorcunov wrote:
>> On 04/14/2011 12:35 AM, Shaun Ruffell wrote:
>> ...
>>>
>>> I had the first version of the patch running the test builds all night without
>>> any NMIs. I installed this one and ran it through the case where I would
>>> reliably get early NMIs and it still no NMIs.
>>>
>>> So for v2:
>>> Tested-by: Shaun Ruffell <[email protected]>
>>>
>>> Thanks!
>>
>> Thanks a huge Shaun. The thing is (if only I don't miss something) at moment there is
>> no much difference in which patch to pick up. But as only kgdb dives in or any other
>> subsystem (which say would use same manner of nmi delivery) we might be unmasking
>> lvt entry even if nothing were handled at all, so I bias to a second version.
>
> I agree with the second version. Initially I wanted to enable it in the
> case of the !handled path. But your reasoning makes sense to me, don't
> enable it in the !handled case because you might accidentally do something
> bad.
>
> Cheers,
> Don

OK, thanks for review! Shaun please continue testing it, if all will be fine until tomorrow
we will ask Ingo to pick it up then. Sounds OK for everyone?

(CC'ed a couple of people which were involved into this code snippet)
--
Cyrill

2011-04-13 21:53:39

by Shaun Ruffell

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On Thu, Apr 14, 2011 at 01:25:51AM +0400, Cyrill Gorcunov wrote:
...
> OK, thanks for review! Shaun please continue testing it, if all will be fine until tomorrow
> we will ask Ingo to pick it up then. Sounds OK for everyone?

Sounds good. I'll let it run tonight with v2 tonight.

2011-04-14 06:47:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs


* Cyrill Gorcunov <[email protected]> wrote:

> - apic_write(APIC_LVTPC, APIC_DM_NMI);
>
> handled = x86_pmu.handle_irq(args->regs);
> if (!handled)
> return NOTIFY_DONE;
>
> + /*
> + * Unmasking should be done after IRQ handled, otherwise
> + * there is a race between clearing of counter overflow
> + * flag and LTV entry unmasking (which might lead to double
> + * NMIs generation).
> + */
> + apic_write(APIC_LVTPC, APIC_DM_NMI);

Here we could leak a masked IRQ through the !handled path. If we got a LVTPC
irq we better handle it and unmask the LVTPC unconditionally - regardless of
whether we consider it 'handled' or not from the kernel POV ...

Thanks,

Ingo

2011-04-14 07:51:12

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On Thu, Apr 14, 2011 at 10:47 AM, Ingo Molnar <[email protected]> wrote:
>
> * Cyrill Gorcunov <[email protected]> wrote:
>
>> - ? ? apic_write(APIC_LVTPC, APIC_DM_NMI);
>>
>> ? ? ? handled = x86_pmu.handle_irq(args->regs);
>> ? ? ? if (!handled)
>> ? ? ? ? ? ? ? return NOTIFY_DONE;
>>
>> + ? ? /*
>> + ? ? ?* Unmasking should be done after IRQ handled, otherwise
>> + ? ? ?* there is a race between clearing of counter overflow
>> + ? ? ?* flag and LTV entry unmasking (which might lead to double
>> + ? ? ?* NMIs generation).
>> + ? ? ?*/
>> + ? ? apic_write(APIC_LVTPC, APIC_DM_NMI);
>
> Here we could leak a masked IRQ through the !handled path. If we got a LVTPC
> irq we better handle it and unmask the LVTPC unconditionally - regardless of
> whether we consider it 'handled' or not from the kernel POV ...
>
> Thanks,
>
> ? ? ? ?Ingo

If there is no counters overflowed I believe we should not poke LVTPC until
we sure NMI comes from it (and counter overflow is the only sign that
NMI came from LVTPC as far as I may say, and I see also a possibility for race
if counter signal reaches LVTPC and it is being processed inside apic chip
{which might take some time too before real NMI signal appears in cpu} and as
result hard to tell what we get in output -- double nmi again or
something else).
I might be screwed of course :)

2011-04-14 08:05:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs


* Cyrill Gorcunov <[email protected]> wrote:

> On Thu, Apr 14, 2011 at 10:47 AM, Ingo Molnar <[email protected]> wrote:
> >
> > * Cyrill Gorcunov <[email protected]> wrote:
> >
> >> - ? ? apic_write(APIC_LVTPC, APIC_DM_NMI);
> >>
> >> ? ? ? handled = x86_pmu.handle_irq(args->regs);
> >> ? ? ? if (!handled)
> >> ? ? ? ? ? ? ? return NOTIFY_DONE;
> >>
> >> + ? ? /*
> >> + ? ? ?* Unmasking should be done after IRQ handled, otherwise
> >> + ? ? ?* there is a race between clearing of counter overflow
> >> + ? ? ?* flag and LTV entry unmasking (which might lead to double
> >> + ? ? ?* NMIs generation).
> >> + ? ? ?*/
> >> + ? ? apic_write(APIC_LVTPC, APIC_DM_NMI);
> >
> > Here we could leak a masked IRQ through the !handled path. If we got a LVTPC
> > irq we better handle it and unmask the LVTPC unconditionally - regardless of
> > whether we consider it 'handled' or not from the kernel POV ...
> >
> > Thanks,
> >
> > ? ? ? ?Ingo
>
> If there is no counters overflowed I believe we should not poke LVTPC until
> we sure NMI comes from it (and counter overflow is the only sign that NMI
> came from LVTPC as far as I may say, and I see also a possibility for race if
> counter signal reaches LVTPC and it is being processed inside apic chip
> {which might take some time too before real NMI signal appears in cpu} and as
> result hard to tell what we get in output -- double nmi again or something
> else).

Well, we unmasked unconditionally before. If we unmask conditionally now, we
risk not unmasking. We risk a completely stuck PMU (there wont ever come *any*
NMI from it if we ever forget to unmask) versus spurious NMIs.

Maybe we can do it - but it will need a lot of testing on a lot of CPU types to
make sure there's no other CPU quirks in this area ...

So unless the conditional unmasking fixes a real bug (in kgdb or elsewhere)
lets unmask unconditionally now to fix the P4 regression in .39 - and queue up
a *separate* patch that moves it even further down and makes it conditional -
but queue that up for .40.

Thanks,

Ingo

2011-04-14 09:27:35

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On Thu, Apr 14, 2011 at 12:05 PM, Ingo Molnar <[email protected]> wrote:
...
>> If there is no counters overflowed I believe we should not poke LVTPC until
>> we sure NMI comes from it (and counter overflow is the only sign that NMI
>> came from LVTPC as far as I may say, and I see also a possibility for race if
>> counter signal reaches LVTPC and it is being processed inside apic chip
>> {which might take some time too before real NMI signal appears in cpu} and as
>> result hard to tell what we get in output -- double nmi again or something
>> else).
>
> Well, we unmasked unconditionally before. If we unmask conditionally now, we
> risk not unmasking. We risk a completely stuck PMU (there wont ever come *any*
> NMI from it if we ever forget to unmask) versus spurious NMIs.
>
> Maybe we can do it - but it will need a lot of testing on a lot of CPU types to
> make sure there's no other CPU quirks in this area ...
>
> So unless the conditional unmasking fixes a real bug (in kgdb or elsewhere)
> lets unmask unconditionally now to fix the P4 regression in .39 - and queue up
> a *separate* patch that moves it even further down and makes it conditional -
> but queue that up for .40.
>
> Thanks,
>
> ? ? ? ?Ingo
>

OK. Ingo I'll send a patch from Don with all tested-by (including me) and my ack
as only get back home. (I don't mind if Don beat me on this ;)

2011-04-14 14:30:42

by Shaun Ruffell

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On Wed, Apr 13, 2011 at 04:53:22PM -0500, Shaun Ruffell wrote:
> On Thu, Apr 14, 2011 at 01:25:51AM +0400, Cyrill Gorcunov wrote:
> ...
>> OK, thanks for review! Shaun please continue testing it, if all will be fine until tomorrow
>> we will ask Ingo to pick it up then. Sounds OK for everyone?
>
> Sounds good. I'll let it run tonight with v2 tonight.

I know Ingo asked to merge v1 now and something else in a later version.
However, just to follow up on this thread, v2 ran all night without producing
any NMIs on the PowerEdge 2600.

Cheers,
Shaun

2011-04-14 14:33:45

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [regression 2.6.39-rc2][bisected] "perf, x86: P4 PMU - Read proper MSR register to catch" and NMIs

On 04/14/2011 06:30 PM, Shaun Ruffell wrote:
> On Wed, Apr 13, 2011 at 04:53:22PM -0500, Shaun Ruffell wrote:
>> On Thu, Apr 14, 2011 at 01:25:51AM +0400, Cyrill Gorcunov wrote:
>> ...
>>> OK, thanks for review! Shaun please continue testing it, if all will be fine until tomorrow
>>> we will ask Ingo to pick it up then. Sounds OK for everyone?
>>
>> Sounds good. I'll let it run tonight with v2 tonight.
>
> I know Ingo asked to merge v1 now and something else in a later version.
> However, just to follow up on this thread, v2 ran all night without producing
> any NMIs on the PowerEdge 2600.
>
> Cheers,
> Shaun

Thanks Shaun, I'll send v1 shortly, just to be on safe side.
Then we will update it for .40 as Ingo proposed. Thanks again
for testing!

--
Cyrill