2014-02-15 00:44:15

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] perf, nmi: fix unknown NMI warning

From: Markus Metzger <[email protected]>

[From Markus, just sending for him because he had problems with his mail]

When using BTS on Core i7-4*, I get the below kernel warning.

$ perf record -c 1 -e branches:u ls
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317893] Uhhuh. NMI received for unknown reason 31 on CPU 2.

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317920] Do you have a strange power saving mode enabled?

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317945] Dazed and confused, but trying to continue

Make intel_pmu_handle_irq() take the full exit path when returning early.

Signed-off-by: Markus Metzger <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 0fa4f24..698ae77 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1361,10 +1361,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
intel_pmu_disable_all();
handled = intel_pmu_drain_bts_buffer();
status = intel_pmu_get_status();
- if (!status) {
- intel_pmu_enable_all(0);
- return handled;
- }
+ if (!status)
+ goto done;

loops = 0;
again:
--
1.8.5.3


2014-02-15 09:58:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf, nmi: fix unknown NMI warning

On Fri, Feb 14, 2014 at 04:44:08PM -0800, Andi Kleen wrote:
>
> Make intel_pmu_handle_irq() take the full exit path when returning early.

This reminds me of the late-ack stuff;

The way I understand interrupts to work is that when you raise the
interrupt it gets latched, when you ACK you drop the latch. Then when it
gets re-raised while its still in progress, it gets latched again and
the irq-enable at the end of the running handler will get it to trigger
again.

So by late-ACK-ing the PMI we can miss PMIs that happen between enabling
the PMU and ACKing the PMI.

We should either re-check the overflow mask after the ACK or do the ACK
while the PMU is disabled.

2014-02-16 18:38:53

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] perf, nmi: fix unknown NMI warning

> This reminds me of the late-ack stuff;
>
> The way I understand interrupts to work is that when you raise the
> interrupt it gets latched, when you ACK you drop the latch. Then when it
> gets re-raised while its still in progress, it gets latched again and
> the irq-enable at the end of the running handler will get it to trigger
> again.
>
> So by late-ACK-ing the PMI we can miss PMIs that happen between enabling
> the PMU and ACKing the PMI.

My understanding is that all these things are different latches/states, like
semaphores in a queue. pending-state, not-acked-state, interrupts disabled
state. There's also some delay in propagating between the states, which
was the reason we needed the late-ack in the first place.

Your argument relies on (1) and (2) being the same physical latch,
right?

The late-ack method was originally blessed by the hardware architects.

Also I don't think it would matter in any case because:

>
> We should either re-check the overflow mask after the ACK or do the ACK
> while the PMU is disabled.

For PMU that would be just a back-to-back PMI. We filter those
out anyways.

And if we're in a state that PMIs get re-raised quickly, we should either
regulate the period down or start throttling.

Plus also if we even get it wrong occassionally sampling is just
statistics and the law of the large numbers wins in the end.

BTW low period sampling is really a perf oddity, most other
profilers don't do that for hw sampling, as it just causes a lot of overhead
and doesn't give better sampling results.

Currently the adaptive period algorithm unfortunately has a tendency
to go very low occasionally before recovery, but I always considered that
a bug.

I suspect the only case where low period makes sense is s/w tracepoints.

-Andi

--
[email protected] -- Speaking for myself only.

2014-02-16 19:23:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf, nmi: fix unknown NMI warning

On Sun, Feb 16, 2014 at 07:38:50PM +0100, Andi Kleen wrote:
> > This reminds me of the late-ack stuff;
> >
> > The way I understand interrupts to work is that when you raise the
> > interrupt it gets latched, when you ACK you drop the latch. Then when it
> > gets re-raised while its still in progress, it gets latched again and
> > the irq-enable at the end of the running handler will get it to trigger
> > again.
> >
> > So by late-ACK-ing the PMI we can miss PMIs that happen between enabling
> > the PMU and ACKing the PMI.
>
> My understanding is that all these things are different latches/states, like
> semaphores in a queue. pending-state, not-acked-state, interrupts disabled
> state. There's also some delay in propagating between the states, which
> was the reason we needed the late-ack in the first place.
>
> Your argument relies on (1) and (2) being the same physical latch,
> right?

Indeed so; if they're separate states then things are fine. Are any of
these details documented someplace?

> The late-ack method was originally blessed by the hardware architects.
>
> Also I don't think it would matter in any case because:
>
> >
> > We should either re-check the overflow mask after the ACK or do the ACK
> > while the PMU is disabled.
>
> For PMU that would be just a back-to-back PMI. We filter those
> out anyways.

In this case the latter NMI will actually have an overflow state to
process so it's not a spurious NMI.

> And if we're in a state that PMIs get re-raised quickly, we should either
> regulate the period down or start throttling.

It could be a different counter; where both run at 'normal' periods but
just near miss each other by accident.

And sure; its all stats and over all it shouldn't matter that much, but
we should still try and do our best regardless.

2014-02-16 19:43:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] perf, nmi: fix unknown NMI warning


The best APIC documentation are the old data sheets for the external
APIC chips. I don't know if they cover things in such detail.

> In this case the latter NMI will actually have an overflow state to
> process so it's not a spurious NMI.

But we cannot distinguish it right? The spurious detector would
trigger in any case.

>
> > And if we're in a state that PMIs get re-raised quickly, we should either
> > regulate the period down or start throttling.
>
> It could be a different counter; where both run at 'normal' periods but
> just near miss each other by accident.

That's true.

It would be only a problem if they somehow become synchronized that
this happens very commonly. The usual defense against things like
that is to add a little randomization (I remember Stephane had
a patch for that some time ago). Also I believe it helps to have
the periods be prime numbers. But right now don't have any evidence
it's a real problem. I presume there's enough noise on a typical
setup that any such states disappear again quickly enough.

-Andi

Subject: [tip:perf/urgent] perf, nmi: Fix unknown NMI warning

Commit-ID: a3ef2229c94ff70998724cb64b9cb4c77db9e950
Gitweb: http://git.kernel.org/tip/a3ef2229c94ff70998724cb64b9cb4c77db9e950
Author: Markus Metzger <[email protected]>
AuthorDate: Fri, 14 Feb 2014 16:44:08 -0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Fri, 21 Feb 2014 22:09:01 +0100

perf, nmi: Fix unknown NMI warning

When using BTS on Core i7-4*, I get the below kernel warning.

$ perf record -c 1 -e branches:u ls
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317893] Uhhuh. NMI received for unknown reason 31 on CPU 2.

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317920] Do you have a strange power saving mode enabled?

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317945] Dazed and confused, but trying to continue

Make intel_pmu_handle_irq() take the full exit path when returning early.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Markus Metzger <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 0fa4f24..698ae77 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1361,10 +1361,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
intel_pmu_disable_all();
handled = intel_pmu_drain_bts_buffer();
status = intel_pmu_get_status();
- if (!status) {
- intel_pmu_enable_all(0);
- return handled;
- }
+ if (!status)
+ goto done;

loops = 0;
again: