Tejun / Jeff,
I am working on MSI support for sata_mv, and am trying to puzzle out
exactly what the kernel expects for this. Looking at other drivers,
both libata and otherwise, yields a variety of conflicting implementations.
For starters, the MSI HOW-TO suggests that drivers must be careful
to invoke pci_disable_msi() on module unload, but I don't see that
happening anywhere in libata.
Unless, Tejun, the devm_* routines automatically do this.. do they?
Next, there's no mention of a need for invoking pci_intx() in the HOW-TO,
yet some device drivers call it, and others do not.
Eg. from ahci.c, we have this:
if ((hpriv->flags & AHCI_HFLAG_NO_MSI) || pci_enable_msi(pdev))
pci_intx(pdev, 1);
Which agrees with the existing code in sata_mv:
if (msi && pci_enable_msi(pdev))
pci_intx(pdev, 1);
Which seems to call pci_intx() only when MSI is *not* used. Fine.
But then in sata_vsc.c, we do sort of the opposite:
if (pci_enable_msi(pdev) == 0)
pci_intx(pdev, 0);
Either that one is wrong, or pci_intx() is unnecessary in all cases.
Again, the MSI HOW-TO doesn't even mention this routine.
Looking through the network drivers, it seems that some of them
do the pci_intx(pdev,1) call for the cases where pci_enable_msi() fails,
similar to ahci.c and sata_mv.c.
But not all of them do that.
Perhaps somebody from the PCI side of things might enlighten us all.
Thanks
Mark Lord wrote:
> Tejun / Jeff,
>
> I am working on MSI support for sata_mv, and am trying to puzzle out
> exactly what the kernel expects for this. Looking at other drivers,
> both libata and otherwise, yields a variety of conflicting implementations.
>
> For starters, the MSI HOW-TO suggests that drivers must be careful
> to invoke pci_disable_msi() on module unload, but I don't see that
> happening anywhere in libata.
>
> Unless, Tejun, the devm_* routines automatically do this.. do they?
>
> Next, there's no mention of a need for invoking pci_intx() in the HOW-TO,
> yet some device drivers call it, and others do not.
>
> Eg. from ahci.c, we have this:
>
> if ((hpriv->flags & AHCI_HFLAG_NO_MSI) || pci_enable_msi(pdev))
> pci_intx(pdev, 1);
>
> Which agrees with the existing code in sata_mv:
>
> if (msi && pci_enable_msi(pdev))
> pci_intx(pdev, 1);
>
> Which seems to call pci_intx() only when MSI is *not* used. Fine.
> But then in sata_vsc.c, we do sort of the opposite:
>
> if (pci_enable_msi(pdev) == 0)
> pci_intx(pdev, 0);
>
> Either that one is wrong, or pci_intx() is unnecessary in all cases.
> Again, the MSI HOW-TO doesn't even mention this routine.
>
> Looking through the network drivers, it seems that some of them
> do the pci_intx(pdev,1) call for the cases where pci_enable_msi() fails,
> similar to ahci.c and sata_mv.c.
>
> But not all of them do that.
..
Looking through drivers/pci/*, it appears that the call to pci_intx()
should be totally redundant. pci_enable_msi() does pci_intx(pdev,0)
on success only, and doesn't touch it otherwise.
Similarly, pci_disable_msi() does pci_intx(pdev,1).
So, where does libata cause pci_disable_msi() to be invoked?
Or is that just missing at the moment?
Thanks.
On Tue, Jan 20, 2009 at 8:03 AM, Mark Lord <[email protected]> wrote:
> Tejun / Jeff,
>
> I am working on MSI support for sata_mv, and am trying to puzzle out
> exactly what the kernel expects for this. Looking at other drivers,
> both libata and otherwise, yields a variety of conflicting implementations.
>
> For starters, the MSI HOW-TO suggests that drivers must be careful
> to invoke pci_disable_msi() on module unload, but I don't see that
> happening anywhere in libata.
I don't think that's necessary if free_irq() or disable_irq() are called.
However, I'm not seeing those get called either.
>
> Unless, Tejun, the devm_* routines automatically do this.. do they?
>
> Next, there's no mention of a need for invoking pci_intx() in the HOW-TO,
> yet some device drivers call it, and others do not.
In general, I don't think drivers need to call pci_intx().
I'm really not sure why it's exported even.
>
> Eg. from ahci.c, we have this:
>
> if ((hpriv->flags & AHCI_HFLAG_NO_MSI) || pci_enable_msi(pdev))
> pci_intx(pdev, 1);
>
> Which agrees with the existing code in sata_mv:
>
> if (msi && hi jiqi(pdev))
> pci_intx(pdev, 1);
Combined with the fact that we can't find where MSI is disabled,
this bit of code in pci_enable_msi() might cause MSI reload to fail:
/* Check whether driver already requested for MSI-X irqs */
if (dev->msix_enabled) {
dev_info(&dev->dev, "can't enable MSI "
"(MSI-X already enabled)\n");
return -EINVAL;
}
> Which seems to call pci_intx() only when MSI is *not* used. Fine.
> But then in sata_vsc.c, we do sort of the opposite:
>
> if (pci_enable_msi(pdev) == 0)
> pci_intx(pdev, 0);
>
> Either that one is wrong, or pci_intx() is unnecessary in all cases.
Neither. Code in sata_vsc.c is redundant and can be removed.
pci_intx() is necessary in some broken cases where INTX
has to be _enabled_ for MSI to work:
http://lkml.indiana.edu/hypermail/linux/kernel/0710.3/0772.html
> Again, the MSI HOW-TO doesn't even mention this routine.
>
> Looking through the network drivers, it seems that some of them
> do the pci_intx(pdev,1) call for the cases where pci_enable_msi() fails,
> similar to ahci.c and sata_mv.c.
>
> But not all of them do that.
Jeff Garzik seems to like that construct:
>
> Perhaps somebody from the PCI side of things might enlighten us all.
http://lists.linuxcoding.com/kernel/2005-q3/msg11296.html
Shows when pci_intx() was added. linux-pci was NOT cc'd on that email.
And no one asked for Documentation/ update. C'est la vie.
BTW, gregkh is very good about posting patches when sending a pull request
to linus. But that just means it was likely buried in a pile of 50+ patches.
grant
Grant Grundler wrote:
> On Tue, Jan 20, 2009 at 8:03 AM, Mark Lord <[email protected]> wrote:
..
>> For starters, the MSI HOW-TO suggests that drivers must be careful
>> to invoke pci_disable_msi() on module unload, but I don't see that
>> happening anywhere in libata.
>
> I don't think that's necessary if free_irq() or disable_irq() are called.
> However, I'm not seeing those get called either.
..
The linux/Documentation/PCI/MSI-HOWTO.txt information is rather explicit
about it, and the pci core code does seem to expect it.
> However, I'm not seeing those get called either.
..
The devres code takes care of the free_irq() normally,
but does not do the pci_disable_msi(), because it doesn't
know enough about the device to do it right now.
..
>> Eg. from ahci.c, we have this:
>>
>> if ((hpriv->flags & AHCI_HFLAG_NO_MSI) || pci_enable_msi(pdev))
>> pci_intx(pdev, 1);
>>
>> Which agrees with the existing code in sata_mv:
>>
>> if (msi && hi jiqi(pdev))
>> pci_intx(pdev, 1);
..
>> if (pci_enable_msi(pdev) == 0)
>> pci_intx(pdev, 0);
>>
>> Either that one is wrong, or pci_intx() is unnecessary in all cases.
..
Those calls to pci_intx() are definitely redundant,
as the pci core already does that for us on the relevant paths.
>> Perhaps somebody from the PCI side of things might enlighten us all.
>
> http://lists.linuxcoding.com/kernel/2005-q3/msg11296.html
>
> Shows when pci_intx() was added. linux-pci was NOT cc'd on that email.
> And no one asked for Documentation/ update. C'est la vie.
..
I still think the documentation matches the code in the pci core, though.
Patch for sata_mv coming shortly.
Cheers
On Tue, Jan 20, 2009 at 10:16 AM, Mark Lord <[email protected]> wrote:
...
>> I don't think that's necessary if free_irq() or disable_irq() are called.
>> However, I'm not seeing those get called either.
...
> The linux/Documentation/PCI/MSI-HOWTO.txt information is rather explicit
> about it, and the pci core code does seem to expect it.
It looks like pci_disable_msi() requires free_irq() to be called first in order
to guarantee the vectors are no longer in use. But I didn't see any code to
enforce that. Which code was "expecting free_irq()" to be called?
I've learned that anything in Documentation/ is explicit - and sometimes stale
and/or just missing (like you are rediscovering). Two years ago I took a whack
at Documentation/PCI/pci.txt and it's probably time to revisit that again and
add references to pci_intx().
> Those calls to pci_intx() are definitely redundant,
> as the pci core already does that for us on the relevant paths.
True. Want to submit a patch to remove the redundant code?
>
>>> Perhaps somebody from the PCI side of things might enlighten us all.
>>
>> http://lists.linuxcoding.com/kernel/2005-q3/msg11296.html
>>
>> Shows when pci_intx() was added. linux-pci was NOT cc'd on that email.
>> And no one asked for Documentation/ update. C'est la vie.
>
> ..
>
> I still think the documentation matches the code in the pci core, though.
Yeah, I think you are right.
grant
>
> Patch for sata_mv coming shortly.
>
> Cheers
>
Grant Grundler wrote:
> On Tue, Jan 20, 2009 at 10:16 AM, Mark Lord <[email protected]> wrote:
> ...
>>> I don't think that's necessary if free_irq() or disable_irq() are called.
>>> However, I'm not seeing those get called either.
> ...
>> The linux/Documentation/PCI/MSI-HOWTO.txt information is rather explicit
>> about it, and the pci core code does seem to expect it.
>
> It looks like pci_disable_msi() requires free_irq() to be called first in order
> to guarantee the vectors are no longer in use. But I didn't see any code to
> enforce that. Which code was "expecting free_irq()" to be called?
..
The devres subsystem automatically frees resources on device removal,
including calling free_irq() as necessary for any interrupt handler
that was registered with devm_request_irq(), as done in libata.
See linux/drivers/base/devres.c for the actual call to free_irq().
But it doesn't handle pci_disable_msi() in there.
Thus this code from latest sata_mv:
+static void mv_pci_remove_one(struct pci_dev *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct ata_host *host = dev_get_drvdata(dev);
+ struct mv_host_priv *hpriv = host->private_data;
+
+ if (hpriv->hp_flags & MV_HP_FLAG_MSI) {
+ devm_free_irq(host->dev, pdev->irq, host);
+ pci_disable_msi(pdev);
+ }
+ ata_pci_remove_one(pdev);
+}
I believe that other MSI users might want something similar,
or perhaps Tejun could extend devres to include a pair
of suitable functions, devm_enable_msi() and devm_disable_msi().
Then it would be just automatic for drivers, without any fuss.
Cheers
On Tue, 20 Jan 2009, Mark Lord wrote:
> Next, there's no mention of a need for invoking pci_intx() in the HOW-TO,
> yet some device drivers call it, and others do not.
Some devices don't work if you call it (don't send any interrupts); others
don't work if you don't call it (send both types of interrupts); most
don't care either way. In any case, the core code should deal with it
based on quirks. I think there isn't any manufacturer who has had both
kinds of quirk, and the clarified spec says that the INTX bit shouldn't
matter when MSI is enabled, so manufacturers are slowly converging to
that.
Initially, the core didn't do anything, and some drivers implemented the
need-intx-off quirk; eventually this went into the core as the default
behavior. Then some drivers implemented the need-intx-on quirk. Then some
common drivers for different hardware turned out to need it both ways, and
the core got a real quirk for it. But not all of the drivers have dropped
the local setting stuff.
The right way to go is to do nothing in the driver and use quirks if you
need it.
-Daniel
*This .sig left intentionally blank*
Daniel Barkalow wrote:
..
> Initially, the core didn't do anything, and some drivers implemented the
> need-intx-off quirk; eventually this went into the core as the default
> behavior. Then some drivers implemented the need-intx-on quirk. Then some
> common drivers for different hardware turned out to need it both ways, and
> the core got a real quirk for it. But not all of the drivers have dropped
> the local setting stuff.
>
> The right way to go is to do nothing in the driver and use quirks if you
> need it.
..
Perfect, thanks.
Next.. who knows something about debugging MSI across PCI bridges ?
I've got a 64-bit box here, PCIe near the core, but with full PCI-X
slots on the far side of two bridges.
The kernel happily allows my driver to setup MSI, but the interrupts
never arrive. So something somewhere in between is either
(1) not set up or quirked quite right, or
(2) one of the bridges won't pass MSI and we don't detect that.
I'll poke more at it later and post some info, if somebody out there
knows enough about this kind of thing to provide some basic hints.
Cheers!
On Tue, Jan 20, 2009 at 7:39 PM, Mark Lord <[email protected]> wrote:
...
> Next.. who knows something about debugging MSI across PCI bridges ?
> I've got a 64-bit box here, PCIe near the core, but with full PCI-X
> slots on the far side of two bridges.
>
> The kernel happily allows my driver to setup MSI, but the interrupts
> never arrive. So something somewhere in between is either
>
> (1) not set up or quirked quite right, or
> (2) one of the bridges won't pass MSI and we don't detect that.
>
> I'll poke more at it later and post some info, if somebody out there
> knows enough about this kind of thing to provide some basic hints.
Basic Hints:
1) post lspci -v output to verify device (and bridges) is programmed correctly.
2) look for chipset quirks that disable global msi
3) Make sure MMIO ranges for 0xfee00000 are routed to local APIC
ie each bridge needs to route that address somehow (negative decode
is common for upstream).
4) manually trigger the MSI by doing a MMIO write to the correct
0xfee00000 address with the assigned vector in order to see if your
interrupt handler gets called.
After that, it's about collecting PCI-X or PCIe traces to verify the
device is generating the transactions correctly and the bridges are
forwarding them.
hth,
grant
On Tue, 2009-01-20 at 20:02 -0800, Grant Grundler wrote:
> On Tue, Jan 20, 2009 at 7:39 PM, Mark Lord <[email protected]> wrote:
> ...
> > Next.. who knows something about debugging MSI across PCI bridges ?
> > I've got a 64-bit box here, PCIe near the core, but with full PCI-X
> > slots on the far side of two bridges.
> >
> > The kernel happily allows my driver to setup MSI, but the interrupts
> > never arrive. So something somewhere in between is either
> >
> > (1) not set up or quirked quite right, or
> > (2) one of the bridges won't pass MSI and we don't detect that.
> >
> > I'll poke more at it later and post some info, if somebody out there
> > knows enough about this kind of thing to provide some basic hints.
>
>
> Basic Hints:
> 1) post lspci -v output to verify device (and bridges) is programmed correctly.
> 2) look for chipset quirks that disable global msi
The kernel shouldn't let you enable MSI if that's the case, ie.
pci_enable_msi() should fail.
It might still be worth looking at the quirks though, in case there's
one for a previous revision of your bridge or something.
> 3) Make sure MMIO ranges for 0xfee00000 are routed to local APIC
> ie each bridge needs to route that address somehow (negative decode
> is common for upstream).
> 4) manually trigger the MSI by doing a MMIO write to the correct
> 0xfee00000 address with the assigned vector in order to see if your
> interrupt handler gets called.
And can you plug something directly into the PCIe bus? If so does MSI
work on that?
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
Hello, Mark.
Sorry about slacking off. Having way too much fun mucking around
assembly code lately.
Mark Lord wrote:
> I believe that other MSI users might want something similar,
> or perhaps Tejun could extend devres to include a pair
> of suitable functions, devm_enable_msi() and devm_disable_msi().
> Then it would be just automatic for drivers, without any fuss.
devres does handle MSI. Please take a look at pcim_release() in
drivers/pci/pci.c. msi/msix enabledness is tracked by
pdev->msi[x]_enabled and if either one was enabled on a managed PCI
device, it would be disabled on driver detach, so there's no reason to
worry about it.
Also, there is no reason for low level driver to dingle with intx. In
fact, it shouldn't as PCI quirk is supposed to handle all the pretty
quirkiness. The intx code was there before the quirk code was in
place and no one went after them. They should go and probably won't
cause any problem in the process.
Thanks.
--
tejun
Michael Ellerman wrote:
> On Tue, 2009-01-20 at 20:02 -0800, Grant Grundler wrote:
..
>> 1) post lspci -v output to verify device (and bridges) is programmed correctly.
>> 2) look for chipset quirks that disable global msi
>
> The kernel shouldn't let you enable MSI if that's the case, ie.
> pci_enable_msi() should fail.
..
Exactly. So it shouldn't be that, then.
> It might still be worth looking at the quirks though, in case there's
> one for a previous revision of your bridge or something.
>
>> 3) Make sure MMIO ranges for 0xfee00000 are routed to local APIC
>> ie each bridge needs to route that address somehow (negative decode
>> is common for upstream).
>> 4) manually trigger the MSI by doing a MMIO write to the correct
>> 0xfee00000 address with the assigned vector in order to see if your
>> interrupt handler gets called.
>
> And can you plug something directly into the PCIe bus? If so does MSI
> work on that?
..
Yup. PCIe cards have no problem with MSI in that box.
More later..
Mark Lord wrote:
> Michael Ellerman wrote:
>> On Tue, 2009-01-20 at 20:02 -0800, Grant Grundler wrote:
> ..
>>> 1) post lspci -v output to verify device (and bridges) is programmed
>>> correctly.
>>> 2) look for chipset quirks that disable global msi
>>
>> The kernel shouldn't let you enable MSI if that's the case, ie.
>> pci_enable_msi() should fail.
> ..
>
> Exactly. So it shouldn't be that, then.
>
>> It might still be worth looking at the quirks though, in case there's
>> one for a previous revision of your bridge or something.
>>
>>> 3) Make sure MMIO ranges for 0xfee00000 are routed to local APIC
>>> ie each bridge needs to route that address somehow (negative decode
>>> is common for upstream).
>>> 4) manually trigger the MSI by doing a MMIO write to the correct
>>> 0xfee00000 address with the assigned vector in order to see if your
>>> interrupt handler gets called.
>>
>> And can you plug something directly into the PCIe bus? If so does MSI
>> work on that?
> ..
>
> Yup. PCIe cards have no problem with MSI in that box.
>
> More later..
What kind of PCI-X bridge does that machine have? I know some AMD
HyperTransport to PCI-X bridges have broken MSI, but those should be
blacklisted already in the kernel..
Robert Hancock wrote:
> Mark Lord wrote:
>> Michael Ellerman wrote:
>>> On Tue, 2009-01-20 at 20:02 -0800, Grant Grundler wrote:
>> ..
..
>>> The kernel shouldn't let you enable MSI if that's the case, ie.
>>> pci_enable_msi() should fail.
>> ..
>> Exactly. So it shouldn't be that, then.
>>
>>> It might still be worth looking at the quirks though, in case there's
>>> one for a previous revision of your bridge or something.
>>>
>>>> 3) Make sure MMIO ranges for 0xfee00000 are routed to local APIC
>>>> ie each bridge needs to route that address somehow (negative decode
>>>> is common for upstream).
>>>> 4) manually trigger the MSI by doing a MMIO write to the correct
>>>> 0xfee00000 address with the assigned vector in order to see if your
>>>> interrupt handler gets called.
>>>
>>> And can you plug something directly into the PCIe bus? If so does MSI
>>> work on that?
>> ..
>>
>> Yup. PCIe cards have no problem with MSI in that box.
>>
>> More later..
>
> What kind of PCI-X bridge does that machine have? I know some AMD
> HyperTransport to PCI-X bridges have broken MSI, but those should be
> blacklisted already in the kernel..
..
Okay, I'm back looking at this now. In the interim, the box has undergone
a complete re-install, and I think the original boot logs are gone for good.
Now (with 2.6.28.1), the boot code identifies a quirk, and pci_enable_msi()
does indeed fail where it once succeeded. So all is well, but I don't know
why it wasn't working correctly earlier.
Oh well.
Cheers