2015-04-01 07:46:06

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Fri, 27 Mar 2015, Borislav Petkov wrote:

> From: Jiri Kosina <[email protected]>
>
> Since GHES sources are global, we theoretically need only a single CPU
> reading them per NMI instead of a thundering herd of CPUs waiting on a
> spinlock in NMI context for no reason at all.

I originally wasn't 100% sure whether GHES sources are global (i.e. if it
really doesn't matter which CPU is reading the registers), but looking at
the code more it actually seems that this is really the right thing to do.

Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page
with the registers, performs apei_read() (which is ghes-source specific,
but not CPU-specific) and unmaps the page again.

There is nothing that would make this CPU-specific. Adding Huang Ying (the
original author of the code) to confirm this. Huang?

> Do that.

I think this should indeed be pushed forward. It fixes horrible spinlock
contention on systems which are under NMI storm (such as when perf is
active) unrelated to GHES.

Thanks,

--
Jiri Kosina
SUSE Labs


2015-04-01 13:51:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Wed, Apr 01, 2015 at 09:45:53AM +0200, Jiri Kosina wrote:
> On Fri, 27 Mar 2015, Borislav Petkov wrote:
>
> > From: Jiri Kosina <[email protected]>
> >
> > Since GHES sources are global, we theoretically need only a single CPU
> > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > spinlock in NMI context for no reason at all.
>
> I originally wasn't 100% sure whether GHES sources are global (i.e. if it
> really doesn't matter which CPU is reading the registers), but looking at
> the code more it actually seems that this is really the right thing to do.
>
> Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page
> with the registers, performs apei_read() (which is ghes-source specific,
> but not CPU-specific) and unmaps the page again.
>
> There is nothing that would make this CPU-specific. Adding Huang Ying (the
> original author of the code) to confirm this. Huang?
>
> > Do that.
>
> I think this should indeed be pushed forward. It fixes horrible spinlock
> contention on systems which are under NMI storm (such as when perf is
> active) unrelated to GHES.

Right, so I tested injecting an error without your patch and same
behavior. So it all points at global sources AFAICT. It would be cool,
though, if someone who knows the fw confirms unambiguously.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-23 08:40:06

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Wed, 1 Apr 2015, Borislav Petkov wrote:

> > > From: Jiri Kosina <[email protected]>
> > >
> > > Since GHES sources are global, we theoretically need only a single CPU
> > > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > > spinlock in NMI context for no reason at all.
> >
> > I originally wasn't 100% sure whether GHES sources are global (i.e. if it
> > really doesn't matter which CPU is reading the registers), but looking at
> > the code more it actually seems that this is really the right thing to do.
> >
> > Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page
> > with the registers, performs apei_read() (which is ghes-source specific,
> > but not CPU-specific) and unmaps the page again.
> >
> > There is nothing that would make this CPU-specific. Adding Huang Ying (the
> > original author of the code) to confirm this. Huang?
> >
> > > Do that.
> >
> > I think this should indeed be pushed forward. It fixes horrible spinlock
> > contention on systems which are under NMI storm (such as when perf is
> > active) unrelated to GHES.
>
> Right, so I tested injecting an error without your patch and same
> behavior. So it all points at global sources AFAICT. It would be cool,
> though, if someone who knows the fw confirms unambiguously.

Three weeks have passed, therefore I find this an appropriate time for a
friendly ping :)

Rafael? Naoya? Huang?

This fixes a contention spinlock problem in NMI observed on a real HW, so
it would be really nice to have it fixed.

Thanks,

--
Jiri Kosina
SUSE Labs

2015-04-23 08:59:54

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Thu, Apr 23, 2015 at 10:39:58AM +0200, Jiri Kosina wrote:
> Three weeks have passed, therefore I find this an appropriate time for a
> friendly ping :)
>
> Rafael? Naoya? Huang?
>
> This fixes a contention spinlock problem in NMI observed on a real HW, so
> it would be really nice to have it fixed.

I think we should apply this.

Here's why: nothing in the ghes_notify_nmi() handler does CPU-specific
accesses. It iterates over the list of ghes sources which do NMI
notification but those sources are the *same* regardless of which core
does the access as their addresses are per-source, i.e. in that

struct acpi_generic_address error_status_address;

thing.

And it is a safe bet to say that all that error information is
serialized in the firmware for the error source to consume.

So I'm going to route this through the RAS tree unless Rafael wants to
take it.

Ok?

Tony, objections?

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-23 18:00:23

by Tony Luck

[permalink] [raw]
Subject: RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

> I think we should apply this.
>
> Here's why: nothing in the ghes_notify_nmi() handler does CPU-specific
> accesses

This looks to be true.

> Tony, objections?

No objections.

-Tony
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-04-27 20:24:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Thu, Apr 23, 2015 at 06:00:15PM +0000, Luck, Tony wrote:
> > I think we should apply this.
> >
> > Here's why: nothing in the ghes_notify_nmi() handler does CPU-specific
> > accesses
>
> This looks to be true.
>
> > Tony, objections?
>
> No objections.

Thanks, queued for 4.2, pending one last test I'm doing on a machine
which says

[ 24.332560] INFO: NMI handler (ghes_notify_nmi) took too long to run: 3.265 msecs
[ 24.332567] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.946 msecs
[ 24.332568] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.948 msecs

during boot.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-28 14:30:19

by Don Zickus

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Wed, Apr 01, 2015 at 09:45:53AM +0200, Jiri Kosina wrote:
> On Fri, 27 Mar 2015, Borislav Petkov wrote:
>
> > From: Jiri Kosina <[email protected]>
> >
> > Since GHES sources are global, we theoretically need only a single CPU
> > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > spinlock in NMI context for no reason at all.
>
> I originally wasn't 100% sure whether GHES sources are global (i.e. if it
> really doesn't matter which CPU is reading the registers), but looking at
> the code more it actually seems that this is really the right thing to do.
>
> Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page
> with the registers, performs apei_read() (which is ghes-source specific,
> but not CPU-specific) and unmaps the page again.
>
> There is nothing that would make this CPU-specific. Adding Huang Ying (the
> original author of the code) to confirm this. Huang?

Hi,

I believe the answer to this question is no, they are not global but
instead external. All external NMIs are routed to one cpu, normally cpu0.
This spinlock was made global to handle the 'someday' case of hotplugging
the bsp cpu (cpu0).

The other external NMIs (IO_CHECK and SERR) suffer from the same spinlock
problem. I tried using an irq_workqueue to work around quirks there and
PeterZ wasn't very fond of it (though he couldn't think of a better way to
solve it).

This patch seems interesting but you might still run into the thundering
herd problem with the global spinlock in
arch/x86/kernel/nmi.c::default_do_nmi(). That functions grabs a global
spinlock before processing the external NMI list (which GHES is a part of).


So I am assuming this patch solves the 'thundering herd' problem by
minimizing all the useless writes the spinlock would do for each cpu that
noticed it had no work to do?

In that case, I am in favor of this solution and would like to apply a
similar solution to arch/x86/kernel/nmi.c, to see if that helps there too.

Cheers,
Don


>
> > Do that.
>
> I think this should indeed be pushed forward. It fixes horrible spinlock
> contention on systems which are under NMI storm (such as when perf is
> active) unrelated to GHES.
>
> Thanks,
>
> --
> Jiri Kosina
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-04-28 14:42:45

by Don Zickus

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Tue, Apr 28, 2015 at 10:30:09AM -0400, Don Zickus wrote:
> On Wed, Apr 01, 2015 at 09:45:53AM +0200, Jiri Kosina wrote:
> > On Fri, 27 Mar 2015, Borislav Petkov wrote:
> >
> > > From: Jiri Kosina <[email protected]>
> > >
> > > Since GHES sources are global, we theoretically need only a single CPU
> > > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > > spinlock in NMI context for no reason at all.
> >
> > I originally wasn't 100% sure whether GHES sources are global (i.e. if it
> > really doesn't matter which CPU is reading the registers), but looking at
> > the code more it actually seems that this is really the right thing to do.
> >
> > Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page
> > with the registers, performs apei_read() (which is ghes-source specific,
> > but not CPU-specific) and unmaps the page again.
> >
> > There is nothing that would make this CPU-specific. Adding Huang Ying (the
> > original author of the code) to confirm this. Huang?
>
> Hi,
>
> I believe the answer to this question is no, they are not global but
> instead external. All external NMIs are routed to one cpu, normally cpu0.
> This spinlock was made global to handle the 'someday' case of hotplugging
> the bsp cpu (cpu0).
>
> The other external NMIs (IO_CHECK and SERR) suffer from the same spinlock
> problem. I tried using an irq_workqueue to work around quirks there and
> PeterZ wasn't very fond of it (though he couldn't think of a better way to
> solve it).
>
> This patch seems interesting but you might still run into the thundering
> herd problem with the global spinlock in
> arch/x86/kernel/nmi.c::default_do_nmi(). That functions grabs a global
> spinlock before processing the external NMI list (which GHES is a part of).

Grr, I mispoke. I sent a patchset a year ago to split out internal and
external NMIs to simplify the problem. So I wrote the above paragraph
thinking the GHES NMI handler was wrapped with the external NMI spinlock,
when in fact it isn't. However, with perf running with lots of events, it
is possible to start 'swallowing' NMIs which requires passing through the
spinlock I just mentioned. This might cause random delays in your
measurements and is still worth modifying.

Cheers,
Don

>
>
> So I am assuming this patch solves the 'thundering herd' problem by
> minimizing all the useless writes the spinlock would do for each cpu that
> noticed it had no work to do?
>
> In that case, I am in favor of this solution and would like to apply a
> similar solution to arch/x86/kernel/nmi.c, to see if that helps there too.
>
> Cheers,
> Don
>
>
> >
> > > Do that.
> >
> > I think this should indeed be pushed forward. It fixes horrible spinlock
> > contention on systems which are under NMI storm (such as when perf is
> > active) unrelated to GHES.
> >
> > Thanks,
> >
> > --
> > Jiri Kosina
> > SUSE Labs
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/

2015-04-28 14:56:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Tue, Apr 28, 2015 at 10:30:09AM -0400, Don Zickus wrote:
> On Wed, Apr 01, 2015 at 09:45:53AM +0200, Jiri Kosina wrote:
> > On Fri, 27 Mar 2015, Borislav Petkov wrote:
> >
> > > From: Jiri Kosina <[email protected]>
> > >
> > > Since GHES sources are global, we theoretically need only a single CPU
> > > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > > spinlock in NMI context for no reason at all.
> >
> > I originally wasn't 100% sure whether GHES sources are global (i.e. if it
> > really doesn't matter which CPU is reading the registers), but looking at
> > the code more it actually seems that this is really the right thing to do.
> >
> > Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page
> > with the registers, performs apei_read() (which is ghes-source specific,
> > but not CPU-specific) and unmaps the page again.
> >
> > There is nothing that would make this CPU-specific. Adding Huang Ying (the
> > original author of the code) to confirm this. Huang?
>
> Hi,
>
> I believe the answer to this question is no, they are not global but
> instead external. All external NMIs are routed to one cpu, normally cpu0.

Actually, the things we're talking about here are not global NMIs but
global error sources. I.e., the GHES sources which use an NMI to report
errors with and which we noodle over in ghes_notify_nmi() twice:

list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
...

> This spinlock was made global to handle the 'someday' case of hotplugging
> the bsp cpu (cpu0).
>
> The other external NMIs (IO_CHECK and SERR) suffer from the same spinlock
> problem. I tried using an irq_workqueue to work around quirks there and
> PeterZ wasn't very fond of it (though he couldn't think of a better way to
> solve it).
>
> This patch seems interesting but you might still run into the thundering
> herd problem with the global spinlock in
> arch/x86/kernel/nmi.c::default_do_nmi(). That functions grabs a global
> spinlock before processing the external NMI list (which GHES is a part of).

nmi_reason_lock? And our handler is registered with
register_nmi_handler() and so we iterate over it in nmi_handle(). It'd
be interesting to know what NMI reason we get for those GHES NMIs so
that we know whether nmi_handle() is called under the lock or not...

> So I am assuming this patch solves the 'thundering herd' problem by
> minimizing all the useless writes the spinlock would do for each cpu that
> noticed it had no work to do?

Not that spinlock. It used to take another one in ghes_notify_nmi().
Removing it solved the issue. There were even boxes which would do:

[ 24.332560] INFO: NMI handler (ghes_notify_nmi) took too long to run: 3.265 msecs
[ 24.332567] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.946 msecs
[ 24.332568] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.948 msecs

just like that during boot.

It was basically a lot of purely useless lock grabbing for no reason
whatsoever.

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-28 15:36:06

by Don Zickus

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Tue, Apr 28, 2015 at 04:55:48PM +0200, Borislav Petkov wrote:
> On Tue, Apr 28, 2015 at 10:30:09AM -0400, Don Zickus wrote:
> > On Wed, Apr 01, 2015 at 09:45:53AM +0200, Jiri Kosina wrote:
> > > On Fri, 27 Mar 2015, Borislav Petkov wrote:
> > >
> > > > From: Jiri Kosina <[email protected]>
> > > >
> > > > Since GHES sources are global, we theoretically need only a single CPU
> > > > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > > > spinlock in NMI context for no reason at all.
> > >
> > > I originally wasn't 100% sure whether GHES sources are global (i.e. if it
> > > really doesn't matter which CPU is reading the registers), but looking at
> > > the code more it actually seems that this is really the right thing to do.
> > >
> > > Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page
> > > with the registers, performs apei_read() (which is ghes-source specific,
> > > but not CPU-specific) and unmaps the page again.
> > >
> > > There is nothing that would make this CPU-specific. Adding Huang Ying (the
> > > original author of the code) to confirm this. Huang?
> >
> > Hi,
> >
> > I believe the answer to this question is no, they are not global but
> > instead external. All external NMIs are routed to one cpu, normally cpu0.
>
> Actually, the things we're talking about here are not global NMIs but
> global error sources. I.e., the GHES sources which use an NMI to report
> errors with and which we noodle over in ghes_notify_nmi() twice:
>
> list_for_each_entry_rcu(ghes, &ghes_nmi, list) {

Sure, most systems use NMI to report them I believe (at least the ones I
have played with). Regardless, I have hit performance problems on this lock
while doing perf testing.

I tried to work around it by splitting the NMI list into an
nmi_register(INTERNAL,...) and nmi_register(EXTERNAL,...) to allow perf to
avoid hitting this lock. But the fallout of that change included missed
NMIs and various solutions from that resulted in complexities that blocked
inclusion last year.

Your solution seems much simpler. :-)

> ...
>
> > This spinlock was made global to handle the 'someday' case of hotplugging
> > the bsp cpu (cpu0).
> >
> > The other external NMIs (IO_CHECK and SERR) suffer from the same spinlock
> > problem. I tried using an irq_workqueue to work around quirks there and
> > PeterZ wasn't very fond of it (though he couldn't think of a better way to
> > solve it).
> >
> > This patch seems interesting but you might still run into the thundering
> > herd problem with the global spinlock in
> > arch/x86/kernel/nmi.c::default_do_nmi(). That functions grabs a global
> > spinlock before processing the external NMI list (which GHES is a part of).
>
> nmi_reason_lock? And our handler is registered with
> register_nmi_handler() and so we iterate over it in nmi_handle(). It'd
> be interesting to know what NMI reason we get for those GHES NMIs so
> that we know whether nmi_handle() is called under the lock or not...

I followed up in another email stating I mis-spoke. I forgot this still
uses the NMI_LOCAL shared NMI. So every perf NMI, will also call the GHES
handler to make sure NMIs did not piggy back each other. So I don't believe
the NMI reason lock is called a majority of the time (except when the NMI is
swallowed, but that is under heavy perf load...).


>
> > So I am assuming this patch solves the 'thundering herd' problem by
> > minimizing all the useless writes the spinlock would do for each cpu that
> > noticed it had no work to do?
>
> Not that spinlock. It used to take another one in ghes_notify_nmi().
> Removing it solved the issue. There were even boxes which would do:
>
> [ 24.332560] INFO: NMI handler (ghes_notify_nmi) took too long to run: 3.265 msecs
> [ 24.332567] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.946 msecs
> [ 24.332568] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.948 msecs
>
> just like that during boot.
>
> It was basically a lot of purely useless lock grabbing for no reason
> whatsoever.

Yes, I meant the 'raw_spin_lock(&ghes_nmi_lock);' one. I poorly typed my
email and confused you into thinking I was referring to the wrong one.

Well, it isn't exactly no reason, but more along the lines of 'we do not
know who gets the external NMI, so everyone tries to talk with the GHES
firmware'. But I think that is just me squabbling.

We both agree the mechanics of the spinlock are overkill here and cause much
cache contention. Simplifying it to just 'reads' and return removes most of
the problem.

Cheers,
Don

2015-04-28 16:22:52

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Tue, Apr 28, 2015 at 11:35:21AM -0400, Don Zickus wrote:
> Your solution seems much simpler. :-)

... and I love simpler :-)

> I followed up in another email stating I mis-spoke. I forgot this still
> uses the NMI_LOCAL shared NMI. So every perf NMI, will also call the GHES
> handler to make sure NMIs did not piggy back each other. So I don't believe

And this is something we should really fix - perf and RAS should
not have anything to do with each other. But I don't know the NMI
code to even have an idea how. I don't even know whether we can
differentiate NMIs, hell, I can't imagine the hardware giving us a
different NMI reason through get_nmi_reason(). Maybe that byte returned
from NMI_REASON_PORT is too small and hangs on too much legacy crap to
even be usable. Questions over questions...

> the NMI reason lock is called a majority of the time (except when the NMI is
> swallowed, but that is under heavy perf load...).

..

> We both agree the mechanics of the spinlock are overkill here and cause much
> cache contention. Simplifying it to just 'reads' and return removes most of
> the problem.

Right.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-28 18:45:05

by Don Zickus

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Tue, Apr 28, 2015 at 06:22:29PM +0200, Borislav Petkov wrote:
> On Tue, Apr 28, 2015 at 11:35:21AM -0400, Don Zickus wrote:
> > Your solution seems much simpler. :-)
>
> ... and I love simpler :-)
>
> > I followed up in another email stating I mis-spoke. I forgot this still
> > uses the NMI_LOCAL shared NMI. So every perf NMI, will also call the GHES
> > handler to make sure NMIs did not piggy back each other. So I don't believe
>
> And this is something we should really fix - perf and RAS should
> not have anything to do with each other. But I don't know the NMI
> code to even have an idea how. I don't even know whether we can
> differentiate NMIs, hell, I can't imagine the hardware giving us a
> different NMI reason through get_nmi_reason(). Maybe that byte returned
> from NMI_REASON_PORT is too small and hangs on too much legacy crap to
> even be usable. Questions over questions...

:-) Well, let me first clear up some of your questions.

RAS doesn't go through the legacy ports (ie get_nmi_reason()). Instead it
triggers the external NMI through a different bit (ioapic I think).

The nmi code has no idea what io_remap'ed address apei is using to map its
error handling register that GHES uses. Unlike the legacy port which is
always port 0x61.

So, with NMI being basically a shared interrupt, with no ability to discern
who sent the interrupt (and even worse no ability to know how _many_ were sent as
the NMI is edge triggered instead of level triggered). As a result we rely
on the NMI handlers to talk to their address space/registers to determine if
they were they source of the interrupt.

Now I can agree that perf and RAS have nothing to do with each other, but
they both use NMI to interrupt. Perf is fortunate enough to be internal to
each cpu and therefore needs no global lock unlike GHES (hence part of the
problem).

The only way to determine who sent the NMI is to have each handler read its
register, which is time consuming for GHES.

Of course, we could go back to playing tricks knowing that external NMIs
like GHES and IO_CHECK/SERR are only routed to one cpu (cpu0 mainly) and
optimize things that way, but that inhibits the bsp cpu hotplugging folks.



I also played tricks like last year's patchset that split out the
nmi_handlers into LOCAL and EXTERNAL queues. Perf would be part of the
LOCAL queue while GHES was part of the EXTERNAL queue. The thought was to
never touch the EXTERNAL queue if perf claimed an NMI. This lead to all
sorts of missed external NMIs, so it didn't work out.


Anyway, any ideas or thoughts for improvement are always welcomed. :-)


Cheers,
Don

>
> > the NMI reason lock is called a majority of the time (except when the NMI is
> > swallowed, but that is under heavy perf load...).
>
> ..
>
> > We both agree the mechanics of the spinlock are overkill here and cause much
> > cache contention. Simplifying it to just 'reads' and return removes most of
> > the problem.
>
> Right.
>
> --
> Regards/Gruss,
> Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --

2015-05-04 15:41:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

On Tue, Apr 28, 2015 at 02:44:28PM -0400, Don Zickus wrote:
> RAS doesn't go through the legacy ports (ie get_nmi_reason()). Instead it
> triggers the external NMI through a different bit (ioapic I think).

Well, I see it getting registered with __register_nmi_handler() which
adds it to the NMI_LOCAL type, i.e., ghes_notify_nmi() gets called by

default_do_nmi
|-> nmi_handle(NMI_LOCAL, regs, b2b);

AFAICT.

Which explains also the issue we were seeing as that handler is called
on each NMI, even when the machine is running a perf workload.

> The nmi code has no idea what io_remap'ed address apei is using to map its
> error handling register that GHES uses. Unlike the legacy port which is
> always port 0x61.
>
> So, with NMI being basically a shared interrupt, with no ability to discern
> who sent the interrupt (and even worse no ability to know how _many_ were sent as
> the NMI is edge triggered instead of level triggered). As a result we rely
> on the NMI handlers to talk to their address space/registers to determine if
> they were they source of the interrupt.

I was afraid it would be something like that. We probably should poke hw
people to extend that NMI fun so that we can know who caused it.

<snip stuff I agree with>

> Anyway, any ideas or thoughts for improvement are always welcomed. :-)

Yeah, I'm afraid without hw support, that won't be doable. We need the
hw to tell us who caused the NMI. Otherwise we'll be round-robining
(:-)) through handlers like nuts.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--