2008-06-24 10:47:32

by David Vrabel

[permalink] [raw]
Subject: PCI: MSI interrupts masked using prohibited method

PCI MSI interrupts are masked and unmasked using a method (by writing
the MSI Enable capability bit) that is prohibited by the PCI specification.

See PCI Local Bus Specification Revision 3.0, section 6.8.1.3. Message
Control for MSI on page 236.

"MSI Enable: If 1 and the MSI-X Enable bit in the MSI-X Message
Control register (see Section 6.8.2.3) is 0, the
function is permitted to use MSI to request service
and is prohibited from using its INTx# pin (if
implemented; see Section 6.2.4 Interrupt pin register).
System configuration software sets this bit to enable
MSI. A device driver is prohibited from writing this bit
to mask a function?s service request."

This behaviour can cause missed interrupts with some devices if the
interrupt is asserted by the hardware while MSI is disabled.

I believe the interrupt should be masked/unmasked on the interrupt
controller (the APIC on x86, for example). I'm going to test this now
and see if it works.

David
--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/


2008-06-25 21:21:04

by Jesse Barnes

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Tuesday, June 24, 2008 3:46 am David Vrabel wrote:
> PCI MSI interrupts are masked and unmasked using a method (by writing
> the MSI Enable capability bit) that is prohibited by the PCI specification.

Yeah, it's probably quite a bit slower too (I assume you're talking about
io_apic_64's msi_mask_irq). Seems like masking this at the ioapic level
would make more sense anyway...

> This behaviour can cause missed interrupts with some devices if the
> interrupt is asserted by the hardware while MSI is disabled.
>
> I believe the interrupt should be masked/unmasked on the interrupt
> controller (the APIC on x86, for example). I'm going to test this now
> and see if it works.

Great, thanks.

Jesse

2008-06-27 12:22:05

by David Vrabel

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

Jesse Barnes wrote:
> On Tuesday, June 24, 2008 3:46 am David Vrabel wrote:
>> PCI MSI interrupts are masked and unmasked using a method (by writing
>> the MSI Enable capability bit) that is prohibited by the PCI specification.
>
> Yeah, it's probably quite a bit slower too (I assume you're talking about
> io_apic_64's msi_mask_irq). Seems like masking this at the ioapic level
> would make more sense anyway...
>
>> This behaviour can cause missed interrupts with some devices if the
>> interrupt is asserted by the hardware while MSI is disabled.
>>
>> I believe the interrupt should be masked/unmasked on the interrupt
>> controller (the APIC on x86, for example). I'm going to test this now
>> and see if it works.

After further research it seems that MSI interrupts aren't routed via
the IO-APIC, so this cannot be done.

I think the only solution is to not perform any sort of masking and rely
on the device driver being able to handle this.

David
--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/


Attachments:
pci-dont-mask-msi-with-msi-enable-bit.patch (1.88 kB)

2008-06-27 17:08:24

by Jesse Barnes

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Friday, June 27, 2008 5:17 am David Vrabel wrote:
> Jesse Barnes wrote:
> > On Tuesday, June 24, 2008 3:46 am David Vrabel wrote:
> >> PCI MSI interrupts are masked and unmasked using a method (by writing
> >> the MSI Enable capability bit) that is prohibited by the PCI
> >> specification.
> >
> > Yeah, it's probably quite a bit slower too (I assume you're talking about
> > io_apic_64's msi_mask_irq). Seems like masking this at the ioapic level
> > would make more sense anyway...
> >
> >> This behaviour can cause missed interrupts with some devices if the
> >> interrupt is asserted by the hardware while MSI is disabled.
> >>
> >> I believe the interrupt should be masked/unmasked on the interrupt
> >> controller (the APIC on x86, for example). I'm going to test this now
> >> and see if it works.
>
> After further research it seems that MSI interrupts aren't routed via
> the IO-APIC, so this cannot be done.
>
> I think the only solution is to not perform any sort of masking and rely
> on the device driver being able to handle this.

On x86, they're targetted at the LAPIC block (see section 8 of the IA SDM);
maybe we could modify the message address or data such that it won't generate
an interrupt instead? I think this latest approach is correct in the sense
that both the system and drivers have to take care that
1) we don't miss interrupts, and
2) we don't generate spurious unhandled interrupts (as might happen if we
disable MSI and the device generates a legacy IRQ on a different vector).

But it looks like the real problem is in the system interrupt code that
handles MSIs. We should only be disabling MSIs using the capability bit at
device enable or disable time, not during the normal course of interrupt
handling, since if we do we may miss device interrupts or have them routed to
the wrong (legacy) vector.

Cc'ing Ingo & Thomas since they know the core interrupt code pretty well.

Jesse

2008-07-16 19:43:19

by Jesse Barnes

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Friday, June 27, 2008 10:07 am Jesse Barnes wrote:
> On Friday, June 27, 2008 5:17 am David Vrabel wrote:
> > Jesse Barnes wrote:
> > > On Tuesday, June 24, 2008 3:46 am David Vrabel wrote:
> > >> PCI MSI interrupts are masked and unmasked using a method (by writing
> > >> the MSI Enable capability bit) that is prohibited by the PCI
> > >> specification.
> > >
> > > Yeah, it's probably quite a bit slower too (I assume you're talking
> > > about io_apic_64's msi_mask_irq). Seems like masking this at the
> > > ioapic level would make more sense anyway...
> > >
> > >> This behaviour can cause missed interrupts with some devices if the
> > >> interrupt is asserted by the hardware while MSI is disabled.
> > >>
> > >> I believe the interrupt should be masked/unmasked on the interrupt
> > >> controller (the APIC on x86, for example). I'm going to test this
> > >> now and see if it works.
> >
> > After further research it seems that MSI interrupts aren't routed via
> > the IO-APIC, so this cannot be done.
> >
> > I think the only solution is to not perform any sort of masking and rely
> > on the device driver being able to handle this.
>
> On x86, they're targetted at the LAPIC block (see section 8 of the IA SDM);
> maybe we could modify the message address or data such that it won't
> generate an interrupt instead? I think this latest approach is correct in
> the sense that both the system and drivers have to take care that
> 1) we don't miss interrupts, and
> 2) we don't generate spurious unhandled interrupts (as might happen if we
> disable MSI and the device generates a legacy IRQ on a different vector).
>
> But it looks like the real problem is in the system interrupt code that
> handles MSIs. We should only be disabling MSIs using the capability bit at
> device enable or disable time, not during the normal course of interrupt
> handling, since if we do we may miss device interrupts or have them routed
> to the wrong (legacy) vector.
>
> Cc'ing Ingo & Thomas since they know the core interrupt code pretty well.

Ingo or Matthew, any ideas about this? The fundamental issue is that if we go
poke at a device's MSI cap bits during interrupt handling, the device may
start using regular IRQs instead, potentially on a different vector. It
would be good if we could come up with a better way of masking MSIs during
handling...

Thanks,
Jesse

2008-07-16 19:59:06

by Matthew Wilcox

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Wed, Jul 16, 2008 at 12:43:03PM -0700, Jesse Barnes wrote:
> On Friday, June 27, 2008 10:07 am Jesse Barnes wrote:
> > On Friday, June 27, 2008 5:17 am David Vrabel wrote:
> > > Jesse Barnes wrote:
> > > > On Tuesday, June 24, 2008 3:46 am David Vrabel wrote:
> > > >> PCI MSI interrupts are masked and unmasked using a method (by writing
> > > >> the MSI Enable capability bit) that is prohibited by the PCI
> > > >> specification.
> > > >
> > > > Yeah, it's probably quite a bit slower too (I assume you're talking
> > > > about io_apic_64's msi_mask_irq). Seems like masking this at the
> > > > ioapic level would make more sense anyway...
> > > >
> > > >> This behaviour can cause missed interrupts with some devices if the
> > > >> interrupt is asserted by the hardware while MSI is disabled.
> > > >>
> > > >> I believe the interrupt should be masked/unmasked on the interrupt
> > > >> controller (the APIC on x86, for example). I'm going to test this
> > > >> now and see if it works.
> > >
> > > After further research it seems that MSI interrupts aren't routed via
> > > the IO-APIC, so this cannot be done.
> > >
> > > I think the only solution is to not perform any sort of masking and rely
> > > on the device driver being able to handle this.
> >
> > On x86, they're targetted at the LAPIC block (see section 8 of the IA SDM);
> > maybe we could modify the message address or data such that it won't
> > generate an interrupt instead? I think this latest approach is correct in
> > the sense that both the system and drivers have to take care that
> > 1) we don't miss interrupts, and
> > 2) we don't generate spurious unhandled interrupts (as might happen if we
> > disable MSI and the device generates a legacy IRQ on a different vector).
> >
> > But it looks like the real problem is in the system interrupt code that
> > handles MSIs. We should only be disabling MSIs using the capability bit at
> > device enable or disable time, not during the normal course of interrupt
> > handling, since if we do we may miss device interrupts or have them routed
> > to the wrong (legacy) vector.
> >
> > Cc'ing Ingo & Thomas since they know the core interrupt code pretty well.
>
> Ingo or Matthew, any ideas about this? The fundamental issue is that if we go
> poke at a device's MSI cap bits during interrupt handling, the device may
> start using regular IRQs instead, potentially on a different vector. It
> would be good if we could come up with a better way of masking MSIs during
> handling...

OK, I didn't see the initial report, but I've looked at the PCI MSI
masking a little bit due to the multiple MSI work.

The first thing is that some MSI implementations permit masking on the
PCI device. That's handled through the

if (entry->msi_attrib.maskbit) {

part of msi_set_mask_bits() in drivers/pci/msi.c.

David's clearly talking about devices which don't support mask bits.
For these devices, we call
msi_set_enable(entry->dev, !flag);

which turns off the PCI_MSI_FLAGS_ENABLE bit. I'm not sure I see the
bit in the PCI spec that says devices are then allowed to ignore the
Interrupt Disable bit in the device control register, but I concede such
devices may be out there.

We have some infrastructure for resending IRQs, so we could soft-mask an
MSI and resend it when the irq is unmasked, if it has triggered.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-16 20:35:38

by David Miller

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

From: Matthew Wilcox <[email protected]>
Date: Wed, 16 Jul 2008 13:58:53 -0600

> David's clearly talking about devices which don't support mask bits.
> For these devices, we call
> msi_set_enable(entry->dev, !flag);
>
> which turns off the PCI_MSI_FLAGS_ENABLE bit. I'm not sure I see the
> bit in the PCI spec that says devices are then allowed to ignore the
> Interrupt Disable bit in the device control register, but I concede such
> devices may be out there.

Keep in mind also the msi intx disable bug we already have
workarounds for. It shows that there are tons of questionable
behavior in this area.

2008-07-17 12:16:36

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

Matthew Wilcox <[email protected]> writes:

> For these devices, we call
> msi_set_enable(entry->dev, !flag);
>
> which turns off the PCI_MSI_FLAGS_ENABLE bit. I'm not sure I see the
> bit in the PCI spec that says devices are then allowed to ignore the
> Interrupt Disable bit in the device control register, but I concede such
> devices may be out there.

Remember that MSI predates INTX disable bit.
--
Krzysztof Halasa

2008-07-17 12:43:47

by Matthew Wilcox

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Thu, Jul 17, 2008 at 02:16:03PM +0200, Krzysztof Halasa wrote:
> Remember that MSI predates INTX disable bit.

That's true. MSI was introduced in PCI 2.2 and INTX disable was
introduced in PCI 2.3; a 3-year window.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-17 13:15:37

by David Vrabel

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

Matthew Wilcox wrote:
> On Wed, Jul 16, 2008 at 12:43:03PM -0700, Jesse Barnes wrote:
>> On Friday, June 27, 2008 10:07 am Jesse Barnes wrote:
>>> On Friday, June 27, 2008 5:17 am David Vrabel wrote:
>>>> Jesse Barnes wrote:
>>>>> On Tuesday, June 24, 2008 3:46 am David Vrabel wrote:
>>>>>> PCI MSI interrupts are masked and unmasked using a method (by writing
>>>>>> the MSI Enable capability bit) that is prohibited by the PCI
>>>>>> specification.
>>>>> Yeah, it's probably quite a bit slower too (I assume you're talking
>>>>> about io_apic_64's msi_mask_irq). Seems like masking this at the
>>>>> ioapic level would make more sense anyway...
>>>>>
>>>>>> This behaviour can cause missed interrupts with some devices if the
>>>>>> interrupt is asserted by the hardware while MSI is disabled.
>>>>>>
>>>>>> I believe the interrupt should be masked/unmasked on the interrupt
>>>>>> controller (the APIC on x86, for example). I'm going to test this
>>>>>> now and see if it works.
>>>> After further research it seems that MSI interrupts aren't routed via
>>>> the IO-APIC, so this cannot be done.
>>>>
>>>> I think the only solution is to not perform any sort of masking and rely
>>>> on the device driver being able to handle this.
>>> On x86, they're targetted at the LAPIC block (see section 8 of the IA SDM);
>>> maybe we could modify the message address or data such that it won't
>>> generate an interrupt instead? I think this latest approach is correct in
>>> the sense that both the system and drivers have to take care that
>>> 1) we don't miss interrupts, and
>>> 2) we don't generate spurious unhandled interrupts (as might happen if we
>>> disable MSI and the device generates a legacy IRQ on a different vector).
>>>
>>> But it looks like the real problem is in the system interrupt code that
>>> handles MSIs. We should only be disabling MSIs using the capability bit at
>>> device enable or disable time, not during the normal course of interrupt
>>> handling, since if we do we may miss device interrupts or have them routed
>>> to the wrong (legacy) vector.
>>>
>>> Cc'ing Ingo & Thomas since they know the core interrupt code pretty well.
>> Ingo or Matthew, any ideas about this? The fundamental issue is that if we go
>> poke at a device's MSI cap bits during interrupt handling, the device may
>> start using regular IRQs instead, potentially on a different vector. It
>> would be good if we could come up with a better way of masking MSIs during
>> handling...
>
> OK, I didn't see the initial report, but I've looked at the PCI MSI
> masking a little bit due to the multiple MSI work.
>
> The first thing is that some MSI implementations permit masking on the
> PCI device. That's handled through the
>
> if (entry->msi_attrib.maskbit) {
>
> part of msi_set_mask_bits() in drivers/pci/msi.c.
>
> David's clearly talking about devices which don't support mask bits.
> For these devices, we call
> msi_set_enable(entry->dev, !flag);
>
> which turns off the PCI_MSI_FLAGS_ENABLE bit. I'm not sure I see the
> bit in the PCI spec that says devices are then allowed to ignore the
> Interrupt Disable bit in the device control register, but I concede such
> devices may be out there.

>From the PCI spec:

"This bit disables the device/function from asserting INTx#. A value of
0 enables the assertion of its INTx# signal. A value of 1 disables the
assertion of its INTx# signal. This bit?s state after RST# is 0. Refer
to Section 6.8.1.3 for control of MSI."

So the interrupt disable bit is only for the line interrupts, and not MSI.

> We have some infrastructure for resending IRQs, so we could soft-mask an
> MSI and resend it when the irq is unmasked, if it has triggered.

Is this really necessary? Any PCI device with MSI that doesn't have the
hardware MSI mask bits must have some sort of device specific
handshaking for managing when MSIs can be generated.

Regardless, doesn't __do_IRQ() handle this already anyway?

David
--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/

2008-07-17 15:39:27

by Matthew Wilcox

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Thu, Jul 17, 2008 at 02:14:40PM +0100, David Vrabel wrote:
> Matthew Wilcox wrote:
> > David's clearly talking about devices which don't support mask bits.
> > For these devices, we call
> > msi_set_enable(entry->dev, !flag);
> >
> > which turns off the PCI_MSI_FLAGS_ENABLE bit. I'm not sure I see the
> > bit in the PCI spec that says devices are then allowed to ignore the
> > Interrupt Disable bit in the device control register, but I concede such
> > devices may be out there.
>
> From the PCI spec:
>
> "This bit disables the device/function from asserting INTx#. A value of
> 0 enables the assertion of its INTx# signal. A value of 1 disables the
> assertion of its INTx# signal. This bit?s state after RST# is 0. Refer
> to Section 6.8.1.3 for control of MSI."
>
> So the interrupt disable bit is only for the line interrupts, and not MSI.

Yes it is, but we don't touch that bit at this time.

When we enable MSIs, we set the INTx disable bit (or at least do a write
to it ... as Krzysztof Halasa pointed out, not all devices implement
this bit). When we mask MSIs, we clear the MSI enable bit. So a device
conforming to PCI 2.3 has both INTx and MSI disabled. Unfortunately, a
device conforming to PCI 2.2 has MSI disabled and INTx implicitly
re-enabled. Oops.

> > We have some infrastructure for resending IRQs, so we could soft-mask an
> > MSI and resend it when the irq is unmasked, if it has triggered.
>
> Is this really necessary? Any PCI device with MSI that doesn't have the
> hardware MSI mask bits must have some sort of device specific
> handshaking for managing when MSIs can be generated.

Maybe so, but we don't control that. Here's the flow that leads to
the problem you've observed (note: only x86-64 analysed here, other
architectures may vary):

The mask_msi_irq() function is the heart of what's going on. This is
set to be the ->mask() function pointer in the MSI irq_chip struct.

The device generates an MSI interrupt.
Some magic happens in assembly, and the processor ends up in do_IRQ()
which calls generic_handle_irq(). Because it's an MSI IRQ,
handle_edge_irq() is called instead of __do_IRQ().

There are two opportunities for calling the ->mask() here, one is:

if (unlikely((desc->status & (IRQ_INPROGRESS | IRQ_DISABLED)) ||
!desc->action)) {
desc->status |= (IRQ_PENDING | IRQ_MASKED);
mask_ack_irq(desc, irq);

The other is:

if (unlikely(!action)) {
desc->chip->mask(irq);

This is all in generic code which knows nothing about any device-specific
method of masking interrupts from the chip.

> Regardless, doesn't __do_IRQ() handle this already anyway?

This is a good thought, let's follow it through. What if we simply make
->mask a no-op for devices which don't support mask bits?

MSIs are 'edge triggered' interrupts. They're sent once and then
forgotten by the hardware (as opposed to level triggered which will
continue to be triggered until deasserted).

The first time through (assuming ->action is non-NULL ...), we won't
try to mask the irq. The second time through, IRQ_INPROGRESS will be set,
so we try to mask_ack_irq().

How about we simply don't ack the irq at this point? That should prevent
it being triggered again, right?

Working on a patch to do this now ...

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-17 15:58:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Thu, 17 Jul 2008, Matthew Wilcox wrote:
>
> The device generates an MSI interrupt.
> Some magic happens in assembly, and the processor ends up in do_IRQ()
> which calls generic_handle_irq(). Because it's an MSI IRQ,
> handle_edge_irq() is called instead of __do_IRQ().

__do_IRQ() is the old all-in-one handler which is not called on
platforms which have GENERIC_HARDIRQS set. You can safely ignore what
__do_IRQ() does.

> There are two opportunities for calling the ->mask() here, one is:
>
> if (unlikely((desc->status & (IRQ_INPROGRESS | IRQ_DISABLED)) ||
> !desc->action)) {
> desc->status |= (IRQ_PENDING | IRQ_MASKED);
> mask_ack_irq(desc, irq);
>
> The other is:
>
> if (unlikely(!action)) {
> desc->chip->mask(irq);
>
> This is all in generic code which knows nothing about any device-specific
> method of masking interrupts from the chip.

Right and it is not supposed to know anything about the hardware
details at all. The per irq setting can provide NOOP functions for all
the mask/mask_ack/unmask things when thats the right way for the
particular irq line.

> > Regardless, doesn't __do_IRQ() handle this already anyway?

That's irrelevant. All the interrupts are handled via
irq_desc[irq].handle_irq() when GENERIC_HARDIRQS is set.

> This is a good thought, let's follow it through. What if we simply make
> ->mask a no-op for devices which don't support mask bits?

Yep. You can also use fasteoi_handler, which just calls ->eoi() after
the handler.

> MSIs are 'edge triggered' interrupts. They're sent once and then
> forgotten by the hardware (as opposed to level triggered which will
> continue to be triggered until deasserted).
>
> The first time through (assuming ->action is non-NULL ...), we won't
> try to mask the irq. The second time through, IRQ_INPROGRESS will be set,
> so we try to mask_ack_irq().
>
> How about we simply don't ack the irq at this point? That should prevent
> it being triggered again, right?
>
> Working on a patch to do this now ...

You want to use the fasteoi_handler.

Thanks,
tglx

2008-07-17 16:11:47

by Matthew Wilcox

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Thu, Jul 17, 2008 at 05:58:26PM +0200, Thomas Gleixner wrote:
> > This is a good thought, let's follow it through. What if we simply make
> > ->mask a no-op for devices which don't support mask bits?
>
> Yep. You can also use fasteoi_handler, which just calls ->eoi() after
> the handler.

I think that exposes us to a race.

CPU takes the first interrupt, calls handle_fasteoi_irq(). That
calls handle_IRQ_event() which calls the device's interrupt handler.
Interrupt handler reads status register to determine what to do next.
Device generates second interrupt and changes status register. Second
interrupt is never delivered because the ->eoi hasn't been called yet.

I plan to keep using the edge handler which solves this race by
calling mask_ack(). For MSIs without mask bits, it will do nothing.
Then when it calls unmask (before handling the second interrupt), we
call the chip->ack() for that IRQ. We'll never miss a pending interrupt.
The machine has booted ... let's see if it'll work under stress.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-17 16:56:57

by Matthew Wilcox

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method


OK, here's a patch which does the trick for me. I put a printk_ratelimit
into the new mask_ack_msi_irq() function, then hammered my AHCI controller
with 16 threads doing directio reads to the same track of a disc. It came
up pretty reliably, so I would say it's at least minmally tested. David,
does this solve your problem?

diff --git a/arch/x86/kernel/io_apic_64.c b/arch/x86/kernel/io_apic_64.c
index 6510cde..693de6c 100644
--- a/arch/x86/kernel/io_apic_64.c
+++ b/arch/x86/kernel/io_apic_64.c
@@ -2051,6 +2051,7 @@ static struct irq_chip msi_chip = {
.name = "PCI-MSI",
.unmask = unmask_msi_irq,
.mask = mask_msi_irq,
+ .mask_ack = mask_ack_msi_irq,
.ack = ack_apic_edge,
#ifdef CONFIG_SMP
.set_affinity = set_msi_irq_affinity,
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 8c61304..81c9bc6 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -123,7 +123,7 @@ static void msix_flush_writes(unsigned int irq)
}
}

-static void msi_set_mask_bits(unsigned int irq, u32 mask, u32 flag)
+static int msi_set_mask_bits(unsigned int irq, u32 mask, u32 flag)
{
struct msi_desc *entry;

@@ -141,7 +141,7 @@ static void msi_set_mask_bits(unsigned int irq, u32 mask, u32 flag)
mask_bits |= flag & mask;
pci_write_config_dword(entry->dev, pos, mask_bits);
} else {
- msi_set_enable(entry->dev, !flag);
+ return 0;
}
break;
case PCI_CAP_ID_MSIX:
@@ -157,6 +157,7 @@ static void msi_set_mask_bits(unsigned int irq, u32 mask, u32 flag)
break;
}
entry->msi_attrib.masked = !!flag;
+ return 1;
}

void read_msi_msg(unsigned int irq, struct msi_msg *msg)
@@ -245,10 +246,27 @@ void mask_msi_irq(unsigned int irq)
msix_flush_writes(irq);
}

+/*
+ * If we can't mask the MSI, decline to ack it. This has the same
+ * effect, only masking in the interrupt controller instead of the
+ * device. In order to unmask it, we have to ack the interrupt.
+ */
+void mask_ack_msi_irq(unsigned int irq)
+{
+ struct irq_desc *desc = irq_desc + irq;
+ if (msi_set_mask_bits(irq, 1, 1))
+ return;
+ desc->chip->ack(irq);
+}
+
void unmask_msi_irq(unsigned int irq)
{
- msi_set_mask_bits(irq, 1, 0);
- msix_flush_writes(irq);
+ struct irq_desc *desc = irq_desc + irq;
+ if (!msi_set_mask_bits(irq, 1, 0)) {
+ msix_flush_writes(irq);
+ return;
+ }
+ desc->chip->ack(irq);
}

static int msi_free_irqs(struct pci_dev* dev);
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 8f29392..316598b 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -11,6 +11,7 @@ struct msi_msg {

/* Helper functions */
extern void mask_msi_irq(unsigned int irq);
+extern void mask_ack_msi_irq(unsigned int irq);
extern void unmask_msi_irq(unsigned int irq);
extern void read_msi_msg(unsigned int irq, struct msi_msg *msg);
extern void write_msi_msg(unsigned int irq, struct msi_msg *msg);

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-17 17:04:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Thu, 17 Jul 2008, Matthew Wilcox wrote:

> On Thu, Jul 17, 2008 at 05:58:26PM +0200, Thomas Gleixner wrote:
> > > This is a good thought, let's follow it through. What if we simply make
> > > ->mask a no-op for devices which don't support mask bits?
> >
> > Yep. You can also use fasteoi_handler, which just calls ->eoi() after
> > the handler.
>
> I think that exposes us to a race.
>
> CPU takes the first interrupt, calls handle_fasteoi_irq(). That
> calls handle_IRQ_event() which calls the device's interrupt handler.
> Interrupt handler reads status register to determine what to do next.
> Device generates second interrupt and changes status register. Second
> interrupt is never delivered because the ->eoi hasn't been called yet.

Yeah, I know. The question is how the hardware works; there is fasteoi
capable hardware around (not on x86) which works with edge type
interrupts.

> I plan to keep using the edge handler which solves this race by
> calling mask_ack(). For MSIs without mask bits, it will do nothing.

Ah, there are ones w/o a mask bit. That detail slipped through.

Thanks,

tglx

2008-07-17 19:49:19

by Matthew Wilcox

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Thu, Jul 17, 2008 at 06:14:34PM +0100, David Vrabel wrote:
> Obviously io_apic_32.c will also need a similar change, and are there
> other architectures that will also need fixing similarly?

Yes, several.

> > } else {
> > - msi_set_enable(entry->dev, !flag);
> > + return 0;
[...]
> > entry->msi_attrib.masked = !!flag;
> > + return 1;
> > }
[...]
> > +/*
> > + * If we can't mask the MSI, decline to ack it. This has the same
> > + * effect, only masking in the interrupt controller instead of the
> > + * device. In order to unmask it, we have to ack the interrupt.
> > + */
> > +void mask_ack_msi_irq(unsigned int irq)
> > +{
> > + struct irq_desc *desc = irq_desc + irq;
> > + if (msi_set_mask_bits(irq, 1, 1))
> > + return;
> > + desc->chip->ack(irq);
> > +}
>
> This code doesn't match the comment. Since msi_set_mask_bits() returns
> true if it masked.
>
> I think you want
>
> if (!msi_set_mask_bits(irq, 1, 1))
> return;
> desc->chip->ack(irq);

Quite right. Somehow I managed to test and then send out an earlier
version of this patch.

Unfortunately, testing the right patch results in my machine locking up.
The Intel System Programming Guide, volume 3A points out that x86
interrupts really do have priorities associated with them. And since
EOI simply clears the highest priority bit, delaying calling ->ack()
just isn't possible.

I've also found some other distressing facts about LAPICs in the guide:

If more than one interrupt is generated with the same vector number,
the local APIC can set the bit for the vector both in the IRR and
the ISR. This means that for the Pentium 4 and Intel Xeon processors,
the IRR and ISR can queue two interrupts for each interrupt vector:
one in the IRR and one in the ISR. Any additional interrupts issued for
the same interrupt vector are collapsed into the single bit in the IRR.

For the P6 family and Pentium processors, the IRR and ISR registers can
queue no more than two interrupts per priority level, and will reject
other interrupts that are received within the same priority level.

So I think I'll disable multiple MSI for processors predating the P4.

I think David's original patch (just declining to mask the interrupt)
is the best approach to take. Perhaps architectures with saner
interrupt hardware would like to try the approach I've mentioned here.

I don't like the comment in http://lkml.org/lkml/2008/6/27/199 as it's
not prohibited ... just a bad idea. How about this patch?

---

David Vrabel points out that using the MSI Enable bit to disable
interrupts is a bad idea when the PCI device doesn't implement the INTx
disable bit. It will cause spurious interrupts to be generated on the
INTx pin.

Masking the interrupt is simply a performance optimisation; we can
happily let the device continue to interrupt us. In order to support
future optimisations, report success / failure from msi_set_mask_bits().

Signed-off-by: Matthew Wilcox <[email protected]>

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 8c61304..ba0fd05 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -123,7 +123,7 @@ static void msix_flush_writes(unsigned int irq)
}
}

-static void msi_set_mask_bits(unsigned int irq, u32 mask, u32 flag)
+static int msi_set_mask_bits(unsigned int irq, u32 mask, u32 flag)
{
struct msi_desc *entry;

@@ -141,7 +141,7 @@ static void msi_set_mask_bits(unsigned int irq, u32 mask, u32 flag)
mask_bits |= flag & mask;
pci_write_config_dword(entry->dev, pos, mask_bits);
} else {
- msi_set_enable(entry->dev, !flag);
+ return 0;
}
break;
case PCI_CAP_ID_MSIX:
@@ -157,6 +157,7 @@ static void msi_set_mask_bits(unsigned int irq, u32 mask, u32 flag)
break;
}
entry->msi_attrib.masked = !!flag;
+ return 1;
}

void read_msi_msg(unsigned int irq, struct msi_msg *msg)

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-18 10:33:51

by David Vrabel

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

Matthew Wilcox wrote:
>
> I think David's original patch (just declining to mask the interrupt)
> is the best approach to take. Perhaps architectures with saner
> interrupt hardware would like to try the approach I've mentioned here.
>
> I don't like the comment in http://lkml.org/lkml/2008/6/27/199 as it's
> not prohibited ... just a bad idea. How about this patch?

The PCI specification is quite clear that it's prohibited. The problem
also is more severe than simply having spurious interrupts -- with some
devices if a line interrupt is generated (regardless of whether it ends
up on the bus) then no more interrupts are generated.

I also think that the change requires a comment in the code. It odd to
have a mask function that doesn't really mask so a comment is necessary
to explain why this is.

Please apply this instead.

David
--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/


Attachments:
pci-dont-mask-msi-with-msi-enable-bit.patch (2.22 kB)

2008-07-22 13:56:47

by Michal Schmidt

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Fri, 18 Jul 2008 11:33:10 +0100
David Vrabel <[email protected]> wrote:

> Matthew Wilcox wrote:
> >
> > I think David's original patch (just declining to mask the
> > interrupt) is the best approach to take. Perhaps architectures
> > with saner interrupt hardware would like to try the approach I've
> > mentioned here.
> >
> > I don't like the comment in http://lkml.org/lkml/2008/6/27/199 as
> > it's not prohibited ... just a bad idea. How about this patch?
>
> The PCI specification is quite clear that it's prohibited. The
> problem also is more severe than simply having spurious interrupts --
> with some devices if a line interrupt is generated (regardless of
> whether it ends up on the bus) then no more interrupts are generated.
>
> I also think that the change requires a comment in the code. It odd
> to have a mask function that doesn't really mask so a comment is
> necessary to explain why this is.
>
> Please apply this instead.
>
> David

This breaks the setting of SMP affinity for MSI interrupts :-(
With the patch, writes to /proc/irq/<n>/smp_affinity are ignored for an
MSI interrupt.

Michal

2008-07-22 17:52:43

by Jesse Barnes

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Tuesday, July 22, 2008 6:56 am Michal Schmidt wrote:
> On Fri, 18 Jul 2008 11:33:10 +0100
>
> David Vrabel <[email protected]> wrote:
> > Matthew Wilcox wrote:
> > > I think David's original patch (just declining to mask the
> > > interrupt) is the best approach to take. Perhaps architectures
> > > with saner interrupt hardware would like to try the approach I've
> > > mentioned here.
> > >
> > > I don't like the comment in http://lkml.org/lkml/2008/6/27/199 as
> > > it's not prohibited ... just a bad idea. How about this patch?
> >
> > The PCI specification is quite clear that it's prohibited. The
> > problem also is more severe than simply having spurious interrupts --
> > with some devices if a line interrupt is generated (regardless of
> > whether it ends up on the bus) then no more interrupts are generated.
> >
> > I also think that the change requires a comment in the code. It odd
> > to have a mask function that doesn't really mask so a comment is
> > necessary to explain why this is.
> >
> > Please apply this instead.
> >
> > David
>
> This breaks the setting of SMP affinity for MSI interrupts :-(
> With the patch, writes to /proc/irq/<n>/smp_affinity are ignored for an
> MSI interrupt.

It should only break it for devices that don't provide a mask bit. But given
that we can't really mask generically on those devices, maybe that's ok given
that it fixes the other problems mentioned in this thread...

Jesse

2008-07-23 13:02:50

by Michal Schmidt

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Tue, 22 Jul 2008 10:52:26 -0700
Jesse Barnes <[email protected]> wrote:

> On Tuesday, July 22, 2008 6:56 am Michal Schmidt wrote:
> > This breaks the setting of SMP affinity for MSI interrupts :-(
> > With the patch, writes to /proc/irq/<n>/smp_affinity are ignored
> > for an MSI interrupt.
>
> It should only break it for devices that don't provide a mask bit.

Yes, smp_affinity works for devices with a mask bit.

> But given that we can't really mask generically on those devices,
> maybe that's ok given that it fixes the other problems mentioned in
> this thread...

Does it mean there is no way IRQ migration could work reliably for
devices without a mask bit?

Michal

2008-07-25 13:29:36

by Michal Schmidt

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Tue, 22 Jul 2008 10:52:26 -0700
Jesse Barnes <[email protected]> wrote:

> On Tuesday, July 22, 2008 6:56 am Michal Schmidt wrote:
> > This breaks the setting of SMP affinity for MSI interrupts :-(
> > With the patch, writes to /proc/irq/<n>/smp_affinity are ignored
> > for an MSI interrupt.
>
> It should only break it for devices that don't provide a mask bit.
> But given that we can't really mask generically on those devices,
> maybe that's ok given that it fixes the other problems mentioned in
> this thread...

I looked a bit more into why exactly it breaks. The device for which
MSI IRQ affinity breaks is a "Broadcom Corporation NetXtreme II BCM5708
Gigabit Ethernet" (14e4:164c rev 12) in HP DL360 G5.

The interesting thing is that I can see Destination ID bits of MSI
Message Address change correctly in lspci output. But the interrupt is
still delivered load-balanced to all CPUs even though the Destination
ID identifies the single CPU I asked for. It seems the device only
takes the new Message Address setting into account when the MSI Enable
bit in the Message Control register is changed from 0 to 1. I tested
this by setting the MSI enable bit to 0 and then immediately back to 1
at the end of io_apic_64.c:set_msi_irq_affinity().

Is this a permitted behaviour for the device? I couldn't find anything
in the PCI specification that would mentioned it.

Later I tested it with a different device which does not have maskbits
either (an Intel Ethernet controller). For this device MSI IRQ
migration works without problems and no hackery with the MSI enable bit
was necessary.

Michal

2008-07-25 13:43:20

by Matthew Wilcox

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Fri, Jul 25, 2008 at 03:29:18PM +0200, Michal Schmidt wrote:
> The interesting thing is that I can see Destination ID bits of MSI
> Message Address change correctly in lspci output. But the interrupt is
> still delivered load-balanced to all CPUs even though the Destination
> ID identifies the single CPU I asked for. It seems the device only
> takes the new Message Address setting into account when the MSI Enable
> bit in the Message Control register is changed from 0 to 1. I tested
> this by setting the MSI enable bit to 0 and then immediately back to 1
> at the end of io_apic_64.c:set_msi_irq_affinity().
>
> Is this a permitted behaviour for the device? I couldn't find anything
> in the PCI specification that would mentioned it.

I don't think that's necessary. However, the thought occurs that we
ought to disable MSI, then write the address, then re-enable MSI. It
doesn't cause a problem at the moment because we don't change the
top 32 bits of the address (at least on any of my systems ..) but
theoretically if we were to use a 64-bit address, we would experience
MSIs being sent to an address that was a mixture of the top 32 bits of
the old address and the bottom 32 bits of the new address.

We definitely can already get tearing when we've written the lower
address register but not the data register yet (also true for MSIX, by
the way). So we ought to fix this properly.

We have the problem that we might still get interrupts on the old
pin-based interrupt line (ie David's original problem). I have a
feeling somebody needs to register a handler for the pin-based interrupt
to handle this. One possibility would be for the MSI code to register a
handler that calls the driver's MSI handler. I don't think that's a
good idea though -- the driver's MSI handler is able to make different
assumptions from the pin handler. Do we want to make drivers register
an interrupt handler for the original interrupt number before they try
to set up MSI? It's certainly not what the PCI spec people had in mind,
but they seem to have overlooked this problem.

Yuck.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-25 13:53:49

by Michal Schmidt

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Fri, 25 Jul 2008 07:42:52 -0600
Matthew Wilcox <[email protected]> wrote:

> On Fri, Jul 25, 2008 at 03:29:18PM +0200, Michal Schmidt wrote:
> > The interesting thing is that I can see Destination ID bits of MSI
> > Message Address change correctly in lspci output. But the interrupt
> > is still delivered load-balanced to all CPUs even though the
> > Destination ID identifies the single CPU I asked for. It seems the
> > device only takes the new Message Address setting into account when
> > the MSI Enable bit in the Message Control register is changed from
> > 0 to 1. I tested this by setting the MSI enable bit to 0 and then
> > immediately back to 1 at the end of
> > io_apic_64.c:set_msi_irq_affinity().
> >
> > Is this a permitted behaviour for the device? I couldn't find
> > anything in the PCI specification that would mentioned it.
>
> I don't think that's necessary. However, the thought occurs that we
> ought to disable MSI, then write the address, then re-enable MSI. It
> doesn't cause a problem at the moment because we don't change the
> top 32 bits of the address (at least on any of my systems ..) but
> theoretically if we were to use a 64-bit address, we would experience
> MSIs being sent to an address that was a mixture of the top 32 bits of
> the old address and the bottom 32 bits of the new address.
>
> We definitely can already get tearing when we've written the lower
> address register but not the data register yet (also true for MSIX, by
> the way). So we ought to fix this properly.
>
> We have the problem that we might still get interrupts on the old
> pin-based interrupt line (ie David's original problem). I have a
> feeling somebody needs to register a handler for the pin-based
> interrupt to handle this. One possibility would be for the MSI code
> to register a handler that calls the driver's MSI handler. I don't
> think that's a good idea though -- the driver's MSI handler is able
> to make different assumptions from the pin handler. Do we want to
> make drivers register an interrupt handler for the original interrupt
> number before they try to set up MSI? It's certainly not what the
> PCI spec people had in mind, but they seem to have overlooked this
> problem.
>
> Yuck.

Maybe we should just not use MSI for devices without maskbits.
What would you think of this patch?:

This adds the parameter pci=msimaskbits to enable MSI only for devices with
maskbits. It is useful when setting of IRQ SMP affinity is needed, because
smp_affinity does not work reliably for MSI interrupts of devices without
maskbits.

And maybe this behaviour should be the default?

Michal

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 15af618..2771356 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -23,6 +23,7 @@
#include "pci.h"
#include "msi.h"

+/* 0 = disabled, 1 = enabled, 2 = enabled only for devices with mask bits */
static int pci_msi_enable = 1;

/* Arch hooks */
@@ -356,8 +357,13 @@ static int msi_capability_init(struct pci_dev *dev)

msi_set_enable(dev, 0); /* Ensure msi is disabled as I set it up */

- pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
+ pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
pci_read_config_word(dev, msi_control_reg(pos), &control);
+ if (pci_msi_enable == 2 && !is_mask_bit_support(control)) {
+ dev_info(&dev->dev,
+ "does not support maskbits, will not use MSI\n");
+ return -ENODEV;
+ }
/* MSI Entry Initialization */
entry = alloc_msi_entry();
if (!entry)
@@ -749,6 +755,11 @@ void pci_no_msi(void)
pci_msi_enable = 0;
}

+void pci_msi_require_mask_bits(void)
+{
+ pci_msi_enable = 2;
+}
+
void pci_msi_init_pci_dev(struct pci_dev *dev)
{
INIT_LIST_HEAD(&dev->msi_list);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d00f0e0..0e6019d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1868,6 +1868,8 @@ static int __devinit pci_setup(char *str)
if (*str && (str = pcibios_setup(str)) && *str) {
if (!strcmp(str, "nomsi")) {
pci_no_msi();
+ } else if (!strcmp(str, "msimaskbits")) {
+ pci_msi_require_mask_bits();
} else if (!strcmp(str, "noaer")) {
pci_no_aer();
} else if (!strcmp(str, "nodomains")) {
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d807cd7..2af4750 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -85,9 +85,11 @@ extern unsigned int pci_pm_d3_delay;

#ifdef CONFIG_PCI_MSI
void pci_no_msi(void);
+void pci_msi_require_mask_bits(void);
extern void pci_msi_init_pci_dev(struct pci_dev *dev);
#else
static inline void pci_no_msi(void) { }
+static inline void pci_msi_require_mask_bits(void) { }
static inline void pci_msi_init_pci_dev(struct pci_dev *dev) { }
#endif

2008-07-25 15:52:18

by Matthew Wilcox

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Fri, Jul 25, 2008 at 03:53:29PM +0200, Michal Schmidt wrote:
> Maybe we should just not use MSI for devices without maskbits.

That seems excessive. I wouldn't object to forbidding CPU affinity
changes for devices:

- Without maskbits
- With the intx quirk

I'd want to look into it in a bit more detail .... perhaps we could
allow irq affinity if we don't have to change the address, only the
data.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-25 16:39:19

by David Vrabel

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

Michal Schmidt wrote:
> On Fri, 25 Jul 2008 07:42:52 -0600
> Matthew Wilcox <[email protected]> wrote:
>
>> On Fri, Jul 25, 2008 at 03:29:18PM +0200, Michal Schmidt wrote:
>>> The interesting thing is that I can see Destination ID bits of MSI
>>> Message Address change correctly in lspci output. But the interrupt
>>> is still delivered load-balanced to all CPUs even though the
>>> Destination ID identifies the single CPU I asked for. It seems the
>>> device only takes the new Message Address setting into account when
>>> the MSI Enable bit in the Message Control register is changed from
>>> 0 to 1. I tested this by setting the MSI enable bit to 0 and then
>>> immediately back to 1 at the end of
>>> io_apic_64.c:set_msi_irq_affinity().
>>>
>>> Is this a permitted behaviour for the device? I couldn't find
>>> anything in the PCI specification that would mentioned it.

The spec says that system software should enable MSI before setting
message address and data (PCI 3.0 section 6.8.3.1 MSI configuration).
The kernel doesn't do this.

>> I don't think that's necessary. However, the thought occurs that we
>> ought to disable MSI, then write the address, then re-enable MSI. It
>> doesn't cause a problem at the moment because we don't change the
>> top 32 bits of the address (at least on any of my systems ..) but
>> theoretically if we were to use a 64-bit address, we would experience
>> MSIs being sent to an address that was a mixture of the top 32 bits of
>> the old address and the bottom 32 bits of the new address.
>>
>> We definitely can already get tearing when we've written the lower
>> address register but not the data register yet (also true for MSIX, by
>> the way). So we ought to fix this properly.

I really don't think we should be enabling/disabling MSI while
interrupts might be being generated. There are cases where interrupts
will be lost. Consider PCIe where we might end up with a situation
where MSI is disabled and then enabled sufficiently quickly that no
periodic line interrupt message is sent by the device.

The message address and data should only be modified while the vector is
masked (to avoid the aforementioned 'tearing'). This means that setting
IRQ affinity cannot be done on devices without per-vector mask bits. I
don't think this is a problem.

In vague psuedo-code, set_affinity() should be something like this:

int did_mask = msi_mask_vector();
if (!did_mask) {
return -ENOTSUPP;
}
/* fiddle with address and mask now */
msi_unmask_vector();

David
--
David Vrabel, Software Engineer, Drivers group Tel: +44 (0)1223 692562
CSR plc, Churchill House, Cambridge Business Park, Cowley Road, CB4 0WZ

2008-07-25 16:57:21

by Matthew Wilcox

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Fri, Jul 25, 2008 at 05:37:49PM +0100, David Vrabel wrote:
> The spec says that system software should enable MSI before setting
> message address and data (PCI 3.0 section 6.8.3.1 MSI configuration).
> The kernel doesn't do this.

I think you meant "disable"? I can't find anything in 6.8.3.1 of 3.0
that refers to this.

> I really don't think we should be enabling/disabling MSI while
> interrupts might be being generated. There are cases where interrupts
> will be lost. Consider PCIe where we might end up with a situation
> where MSI is disabled and then enabled sufficiently quickly that no
> periodic line interrupt message is sent by the device.

I don't think there's a difference here between PCIe and conventional
PCI. A device raising a line based interrupt is perfectly equivalent to
a device sending an INTx message.

> The message address and data should only be modified while the vector is
> masked (to avoid the aforementioned 'tearing'). This means that setting
> IRQ affinity cannot be done on devices without per-vector mask bits. I
> don't think this is a problem.

I agree. I think it's fine to have this limitation.

> In vague psuedo-code, set_affinity() should be something like this:
>
> int did_mask = msi_mask_vector();
> if (!did_mask) {
> return -ENOTSUPP;
> }
> /* fiddle with address and mask now */
> msi_unmask_vector();

Yes, something like that.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-25 19:13:01

by Jesse Barnes

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Friday, July 25, 2008 9:56 am Matthew Wilcox wrote:
> On Fri, Jul 25, 2008 at 05:37:49PM +0100, David Vrabel wrote:
> > The spec says that system software should enable MSI before setting
> > message address and data (PCI 3.0 section 6.8.3.1 MSI configuration).
> > The kernel doesn't do this.
>
> I think you meant "disable"? I can't find anything in 6.8.3.1 of 3.0
> that refers to this.
>
> > I really don't think we should be enabling/disabling MSI while
> > interrupts might be being generated. There are cases where interrupts
> > will be lost. Consider PCIe where we might end up with a situation
> > where MSI is disabled and then enabled sufficiently quickly that no
> > periodic line interrupt message is sent by the device.
>
> I don't think there's a difference here between PCIe and conventional
> PCI. A device raising a line based interrupt is perfectly equivalent to
> a device sending an INTx message.
>
> > The message address and data should only be modified while the vector is
> > masked (to avoid the aforementioned 'tearing'). This means that setting
> > IRQ affinity cannot be done on devices without per-vector mask bits. I
> > don't think this is a problem.
>
> I agree. I think it's fine to have this limitation.
>
> > In vague psuedo-code, set_affinity() should be something like this:
> >
> > int did_mask = msi_mask_vector();
> > if (!did_mask) {
> > return -ENOTSUPP;
> > }
> > /* fiddle with address and mask now */
> > msi_unmask_vector();
>
> Yes, something like that.

Yeah, that reflects what we actually support...

David, can you resend your "don't mask MSIs using the MSI enable bit" patch
against the latest bits so I can apply it? Did you also want to hack up the
above for the affinity code?

Thanks,
Jesse

2008-07-28 09:54:37

by Michal Schmidt

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Fri, 25 Jul 2008 09:51:46 -0600
Matthew Wilcox <[email protected]> wrote:

> On Fri, Jul 25, 2008 at 03:53:29PM +0200, Michal Schmidt wrote:
> > Maybe we should just not use MSI for devices without maskbits.
>
> That seems excessive. I wouldn't object to forbidding CPU affinity
> changes for devices:
>
> - Without maskbits
> - With the intx quirk
>
> I'd want to look into it in a bit more detail .... perhaps we could
> allow irq affinity if we don't have to change the address, only the
> data.

At least on x86, the destination processor is encoded in the message
address (see Intel Architecture Software Developer's Manual Vol. 3A,
9.11.1), so it won't help.

Michal

2008-07-28 09:59:30

by Michal Schmidt

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

On Fri, 25 Jul 2008 10:56:55 -0600
Matthew Wilcox <[email protected]> wrote:

> On Fri, Jul 25, 2008 at 05:37:49PM +0100, David Vrabel wrote:
> > The spec says that system software should enable MSI before setting
> > message address and data (PCI 3.0 section 6.8.3.1 MSI
> > configuration). The kernel doesn't do this.
>
> I think you meant "disable"? I can't find anything in 6.8.3.1 of 3.0
> that refers to this.
>
> > I really don't think we should be enabling/disabling MSI while
> > interrupts might be being generated. There are cases where
> > interrupts will be lost. Consider PCIe where we might end up with
> > a situation where MSI is disabled and then enabled sufficiently
> > quickly that no periodic line interrupt message is sent by the
> > device.
>
> I don't think there's a difference here between PCIe and conventional
> PCI. A device raising a line based interrupt is perfectly equivalent
> to a device sending an INTx message.
>
> > The message address and data should only be modified while the
> > vector is masked (to avoid the aforementioned 'tearing'). This
> > means that setting IRQ affinity cannot be done on devices without
> > per-vector mask bits. I don't think this is a problem.
>
> I agree. I think it's fine to have this limitation.

I can imagine this being a problem e.g. for people wanting to isolate
selected CPUs from interrupts for realtime tasks.

> > In vague psuedo-code, set_affinity() should be something like this:
> >
> > int did_mask = msi_mask_vector();
> > if (!did_mask) {
> > return -ENOTSUPP;
> > }
> > /* fiddle with address and mask now */
> > msi_unmask_vector();
>
> Yes, something like that.

2008-07-28 22:04:28

by Jesse Barnes

[permalink] [raw]
Subject: Re: PCI: MSI interrupts masked using prohibited method

> > I agree. I think it's fine to have this limitation.
>
> I can imagine this being a problem e.g. for people wanting to isolate
> selected CPUs from interrupts for realtime tasks.

And they'll still be able to do this; they'll just need to use hardware that
supports proper masking or make sure their interrupts are targetted at the
right CPU from the start.

Jesse