2019-07-16 14:01:10

by Neil Horman

[permalink] [raw]
Subject: [PATCH] x86: Add irq spillover warning

On Intel hardware, cpus are limited in the number of irqs they can
have affined to them (currently 240), based on section 10.5.2 of:
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.pdf

If a cpu has more than this number of interrupts affined to it, they
will spill over to other cpus, which potentially may be outside of their
affinity mask. Given that this might cause unexpected behavior on
performance sensitive systems, warn the user should this condition occur
so that corrective action can be taken

Signed-off-by: Neil Horman <[email protected]>
Reported-by: [email protected]
CC: Thomas Gleixner <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: [email protected]
---
arch/x86/kernel/irq.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 9b68b5b00ac9..ac7ed32de3d5 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -244,6 +244,14 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)

desc = __this_cpu_read(vector_irq[vector]);

+ /*
+ * Intel processors are limited in the number of irqs they can address. If we affine
+ * too many irqs to a given cpu, they can silently spill to another cpu outside of
+ * their affinity mask. Warn the user when this occurs
+ */
+ if (unlikely(!cpumask_test_cpu(smp_processor_id(), &desc->irq_common_data.affinity)))
+ pr_emerg_ratelimited("%s: %d.%d handled outside of affinity mask\n");
+
if (!handle_irq(desc, regs)) {
ack_APIC_irq();

--
2.21.0


2019-07-16 15:58:20

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86: Add irq spillover warning

Neil,

On Tue, 16 Jul 2019, Neil Horman wrote:

> On Intel hardware, cpus are limited in the number of irqs they can
> have affined to them (currently 240), based on section 10.5.2 of:
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.pdf

That reference is really not useful to explain the problem and the number
of vectors is neither. Please explain the conceptual issue.

> If a cpu has more than this number of interrupts affined to it, they
> will spill over to other cpus, which potentially may be outside of their
> affinity mask.

Spill over?

The kernel decides to pick a vector on a CPU outside of the affinity when
it runs out of vectors on the CPUs in the affinity mask.

Please explain issues technically correct.

> Given that this might cause unexpected behavior on
> performance sensitive systems, warn the user should this condition occur
> so that corrective action can be taken

> @@ -244,6 +244,14 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)

Why on earth warn in the interrupt delivery hotpath? Just because it's the
place which really needs extra instructions and extra cache lines on
performance sensitive systems, right?

The fact that the kernel ran out of vectors for the CPUs in the affinity
mask is already known when the vector is allocated in activate_reserved().

So there is an obvious place to put such a warning and it's certainly not
do_IRQ().

Thanks,

tglx

2019-07-16 16:09:06

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH] x86: Add irq spillover warning

On Tue, Jul 16, 2019 at 05:57:31PM +0200, Thomas Gleixner wrote:
> Neil,
>
> On Tue, 16 Jul 2019, Neil Horman wrote:
>
> > On Intel hardware, cpus are limited in the number of irqs they can
> > have affined to them (currently 240), based on section 10.5.2 of:
> > https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.pdf
>
> That reference is really not useful to explain the problem and the number
> of vectors is neither. Please explain the conceptual issue.
>
You seem to have already done that below. Not really sure what more you are
asking for here.

> > If a cpu has more than this number of interrupts affined to it, they
> > will spill over to other cpus, which potentially may be outside of their
> > affinity mask.
>
> Spill over?
>
> The kernel decides to pick a vector on a CPU outside of the affinity when
> it runs out of vectors on the CPUs in the affinity mask.
>
Yes.

> Please explain issues technically correct.
>
I don't know what you mean by this. I explained it above, and you clearly
understood it.

> > Given that this might cause unexpected behavior on
> > performance sensitive systems, warn the user should this condition occur
> > so that corrective action can be taken
>
> > @@ -244,6 +244,14 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
>
> Why on earth warn in the interrupt delivery hotpath? Just because it's the
> place which really needs extra instructions and extra cache lines on
> performance sensitive systems, right?
>
Because theres already a check of the same variety in do_IRQ, but if the
information is available outside the hotpath, I was unaware, and am happy to
update this patch to refelct that.

> The fact that the kernel ran out of vectors for the CPUs in the affinity
> mask is already known when the vector is allocated in activate_reserved().
>
> So there is an obvious place to put such a warning and it's certainly not
> do_IRQ().
>
Sure

Thanks
Neil

> Thanks,
>
> tglx
>

2019-07-16 19:07:34

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86: Add irq spillover warning

Neil,

On Tue, 16 Jul 2019, Neil Horman wrote:
> On Tue, Jul 16, 2019 at 05:57:31PM +0200, Thomas Gleixner wrote:
> > On Tue, 16 Jul 2019, Neil Horman wrote:
> > > If a cpu has more than this number of interrupts affined to it, they
> > > will spill over to other cpus, which potentially may be outside of their
> > > affinity mask.
> >
> > Spill over?
> >
> > The kernel decides to pick a vector on a CPU outside of the affinity when
> > it runs out of vectors on the CPUs in the affinity mask.
> >
> Yes.
>
> > Please explain issues technically correct.
> >
> I don't know what you mean by this. I explained it above, and you clearly
> understood it.

It took me a while to grok it. Simply because I first thought it's some
hardware issue. And of course after confusion settled I knew what it is,
but just because I know that code like the back of my hand.

> > > Given that this might cause unexpected behavior on
> > > performance sensitive systems, warn the user should this condition occur
> > > so that corrective action can be taken
> >
> > > @@ -244,6 +244,14 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
> >
> > Why on earth warn in the interrupt delivery hotpath? Just because it's the
> > place which really needs extra instructions and extra cache lines on
> > performance sensitive systems, right?
> >
> Because theres already a check of the same variety in do_IRQ, but if the
> information is available outside the hotpath, I was unaware, and am happy to
> update this patch to refelct that.

Which check are you referring to?

Thanks,

tglx

2019-07-16 20:40:41

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH] x86: Add irq spillover warning

On Tue, Jul 16, 2019 at 09:05:30PM +0200, Thomas Gleixner wrote:
> Neil,
>
> On Tue, 16 Jul 2019, Neil Horman wrote:
> > On Tue, Jul 16, 2019 at 05:57:31PM +0200, Thomas Gleixner wrote:
> > > On Tue, 16 Jul 2019, Neil Horman wrote:
> > > > If a cpu has more than this number of interrupts affined to it, they
> > > > will spill over to other cpus, which potentially may be outside of their
> > > > affinity mask.
> > >
> > > Spill over?
> > >
> > > The kernel decides to pick a vector on a CPU outside of the affinity when
> > > it runs out of vectors on the CPUs in the affinity mask.
> > >
> > Yes.
> >
> > > Please explain issues technically correct.
> > >
> > I don't know what you mean by this. I explained it above, and you clearly
> > understood it.
>
> It took me a while to grok it. Simply because I first thought it's some
> hardware issue. And of course after confusion settled I knew what it is,
> but just because I know that code like the back of my hand.
>
> > > > Given that this might cause unexpected behavior on
> > > > performance sensitive systems, warn the user should this condition occur
> > > > so that corrective action can be taken
> > >
> > > > @@ -244,6 +244,14 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
> > >
> > > Why on earth warn in the interrupt delivery hotpath? Just because it's the
> > > place which really needs extra instructions and extra cache lines on
> > > performance sensitive systems, right?
> > >
> > Because theres already a check of the same variety in do_IRQ, but if the
> > information is available outside the hotpath, I was unaware, and am happy to
> > update this patch to refelct that.
>
> Which check are you referring to?
>
This one:
if (desc != VECTOR_RETRIGGERED) {
pr_emerg_ratelimited("%s: %d.%d No irq handler for vector\n",
__func__, smp_processor_id(),
vector);
I figured it was already checking one condition, another wouldn't hurt too much,
but no worries, I'm redoing this in activate_reserved now.

Best
Neil

> Thanks,
>
> tglx
>

2019-07-16 21:07:34

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86: Add irq spillover warning

Neil,

On Tue, 16 Jul 2019, Neil Horman wrote:
> On Tue, Jul 16, 2019 at 09:05:30PM +0200, Thomas Gleixner wrote:
> > > Because theres already a check of the same variety in do_IRQ, but if the
> > > information is available outside the hotpath, I was unaware, and am happy to
> > > update this patch to refelct that.
> >
> > Which check are you referring to?
> >
> This one:
> if (desc != VECTOR_RETRIGGERED) {
> pr_emerg_ratelimited("%s: %d.%d No irq handler for vector\n",
> __func__, smp_processor_id(),
> vector);
> I figured it was already checking one condition, another wouldn't hurt too much,
> but no worries, I'm redoing this in activate_reserved now.

That's in the slow path, when handle_irq() failed which is the unlikely case.

Thanks,

tglx