I'm looking at implementing MSI/MSI-X support in a PCI device driver
I'm working on. However, I've run into an issue with the MSI API that
I would like some clarification on.
When I call pci_enable_msi, since my device is MSI-X capable, the
kernel calls msix_capability_init, which works out the memory region
where vectors should be written and then calls request_region. (In
fact it calls
request_mem_region(phys_addr,
dev_msi_cap * PCI_MSIX_ENTRY_SIZE,
"MSI-X iomap Failure"))
which leads to a bizarre entry in /proc/iomem with the name "MSI-X
iomap Failure")
The problem is that if I follow the standard route in my driver and
call pci_request_regions() during init (since I want to claim my whole
device), the request_mem_region in msix_capability_init will fail.
Now, for my device, the MSI-X table happens to fall in the middle of a
BAR, and I need to access stuff on both sides of it in that BAR. To
make things even worse for me, my device has two more BARs I want to claim.
So it seems I am forced to turn my nice clean pci_request_regions()
call into two calls to request_mem_region() (to get the beginning and
end of the BAR with the MSI-X table in it) and two more calls to
pci_request_region() (to get the other two BARs).
This isn't the end of the world but it feels suboptimal to me. Anyone
have an idea for a better way to do this? (I'm happy to write a patch
to the kernel if someone suggests how to change the MSI API)
Thanks,
Roland
Roland Dreier wrote:
> I'm looking at implementing MSI/MSI-X support in a PCI device driver
> I'm working on. However, I've run into an issue with the MSI API that
> I would like some clarification on.
>
> When I call pci_enable_msi, since my device is MSI-X capable, the
> kernel calls msix_capability_init, which works out the memory region
> where vectors should be written and then calls request_region. (In
> fact it calls
Not answering your question directly, but...
You are breaking new ground by adding MSI support to a driver. I thank
you for this -- alot -- but you should realize there will probably be a
little bit of PCI core work necessary in order to get things the way you
want them.
Feel free to propose changes to the PCI core to accomodate your MSI driver.
Jeff
Jeff> You are breaking new ground by adding MSI support to a
Jeff> driver. I thank you for this -- alot -- but you should
Jeff> realize there will probably be a little bit of PCI core work
Jeff> necessary in order to get things the way you want them.
Yes, I noticed only a couple of hotplug drivers seem to be using MSI,
and no one is using multiple vector support. My motivation is
twofold. First, because my device can generate interrupts for
different reasons, and I'm noticing that doing the MMIO read to the
cause register is taking a lot of time, so I'm wondering whether
avoiding this by having separate MSI-X messages will be a win.
Second, for the fun of trying it :)
Jeff> Feel free to propose changes to the PCI core to accomodate
Jeff> your MSI driver.
First patch coming up now :)
- Roland
Roland Dreier wrote:
> Jeff> Feel free to propose changes to the PCI core to accomodate
> Jeff> your MSI driver.
>
> First patch coming up now :)
Cool. If you are feeling generous or motivated, update
Documentation/pci.txt too ;-)
Sharing your experiences will save time for others.
Jeff
As a followup to my previous post about the request_mem_region in
msi.c, I noticed that the region is only released in
msi_remove_pci_irq_vectors(). Based on the fact that this function is
declared in linux/pci.h (and stubbed out if CONFIG_PCI_USE_VECTOR is
not defined), I'm guessing that the intent is for a device driver to
unconditionally call this when exiting.
However, a module can't call msi_remove_pci_irq_vectors unless the
symbol is exported... so if this is the way to do things, please apply
this patch.
On the other hand, MSI-HOWTO.txt seems to imply that the 0th MSI
vector should be cleaned up just by calling free_irq... so should
pci_disable_msi be calling msi_remove_pci_irq_vectors?
- Roland
Index: linux-2.6.7/drivers/pci/msi.c
===================================================================
--- linux-2.6.7.orig/drivers/pci/msi.c 2004-06-15 22:20:03.000000000 -0700
+++ linux-2.6.7/drivers/pci/msi.c 2004-06-21 20:51:33.000000000 -0700
@@ -1011,3 +1011,4 @@
EXPORT_SYMBOL(pci_enable_msi);
EXPORT_SYMBOL(msi_alloc_vectors);
EXPORT_SYMBOL(msi_free_vectors);
+EXPORT_SYMBOL(msi_remove_pci_irq_vectors);
Jeff> Cool. If you are feeling generous or motivated, update
Jeff> Documentation/pci.txt too ;-)
Will do, once I know what I'm doing :)
- R.
msix_capability_init() puts the offset of the MSI-X capability into
pos, then uses pos as a loop index to clear the MSI-X vector table,
and then tries to use pos as the offset again, which results in
writing the MSI-X enable bit off into space.
This patch fixes that by adding a new loop index variable and using
that to clear the vector table.
- Roland
Signed-off-by: Roland Dreier <[email protected]>
Index: linux-2.6.7/drivers/pci/msi.c
===================================================================
--- linux-2.6.7.orig/drivers/pci/msi.c 2004-06-21 20:51:33.000000000 -0700
+++ linux-2.6.7/drivers/pci/msi.c 2004-06-21 21:30:05.000000000 -0700
@@ -569,7 +569,7 @@
struct msi_desc *entry;
struct msg_address address;
struct msg_data data;
- int vector = 0, pos, dev_msi_cap;
+ int vector = 0, pos, dev_msi_cap, i;
u32 phys_addr, table_offset;
u32 control;
u8 bir;
@@ -629,12 +629,12 @@
writel(address.hi_address, base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
writel(*(u32*)&data, base + PCI_MSIX_ENTRY_DATA_OFFSET);
/* Initialize all entries from 1 up to 0 */
- for (pos = 1; pos < dev_msi_cap; pos++) {
- writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+ for (i = 1; i < dev_msi_cap; i++) {
+ writel(0, base + i * PCI_MSIX_ENTRY_SIZE +
PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
- writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+ writel(0, base + i * PCI_MSIX_ENTRY_SIZE +
PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
- writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+ writel(0, base + i * PCI_MSIX_ENTRY_SIZE +
PCI_MSIX_ENTRY_DATA_OFFSET);
}
attach_msi_entry(entry, vector);
On Mon, Jun 21, 2004 at 09:03:22PM -0700, Roland Dreier wrote:
> On the other hand, MSI-HOWTO.txt seems to imply that the 0th MSI
> vector should be cleaned up just by calling free_irq... so should
> pci_disable_msi be calling msi_remove_pci_irq_vectors?
I think so.
On Monday, June 21, 2004 Roland Dreier wrote:
>As a followup to my previous post about the request_mem_region in
>msi.c, I noticed that the region is only released in
>msi_remove_pci_irq_vectors(). Based on the fact that this function is
>declared in linux/pci.h (and stubbed out if CONFIG_PCI_USE_VECTOR is
>not defined), I'm guessing that the intent is for a device driver to
>unconditionally call this when exiting.
The intent of msi_remove_pci_irq_vectors() is to support hot-removed
operation. This function is not for a device driver to call and
should not be exported. I acknowledged the problem of the MSI-X region
being only released in msi_remove_pci_irq_vectors(). I'm in a progress
of updating the existing MSI-X code.
Thanks,
Long
On Monday, June 21, 2004 Roland Dreier wrote:
> The problem is that if I follow the standard route in my driver and
> call pci_request_regions() during init (since I want to claim my whole
> device), the request_mem_region in msix_capability_init will fail.
> Now, for my device, the MSI-X table happens to fall in the middle of a
> BAR, and I need to access stuff on both sides of it in that BAR. To
> make things even worse for me, my device has two more BARs I want to
claim.
>
> So it seems I am forced to turn my nice clean pci_request_regions()
> call into two calls to request_mem_region() (to get the beginning and
> end of the BAR with the MSI-X table in it) and two more calls to
> pci_request_region() (to get the other two BARs).
>
The PCI 3.0 specification has implementation notes that MMIO address
space for a device's MSI-X structure should be isolated so that the
software system can set different page for controlling accesses to
the MSI-X structure. The implementation of MSI patch requires the PCI
subsystem, not a device driver, to maintain full control of the MSI-X
table/MSI-X PBA and MMIO address space of the MSI-X table/MSI-X PBA.
A device driver is prohibited from requesting the MMIO address space
of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystem will fail
enabling MSI-X on its hardware device when it calls the function
pci_enable_msi().
Thanks,
Long
On Monday, June 21, 2004 Roland Dreier wrote:
>msix_capability_init() puts the offset of the MSI-X capability into
>pos, then uses pos as a loop index to clear the MSI-X vector table,
>and then tries to use pos as the offset again, which results in
>writing the MSI-X enable bit off into space.
>
>This patch fixes that by adding a new loop index variable and using
>that to clear the vector table.
Thanks for detecting this bug and providing a fix with your patch below.
Thanks,
Long
Index: linux-2.6.7/drivers/pci/msi.c
===================================================================
--- linux-2.6.7.orig/drivers/pci/msi.c 2004-06-21 20:51:33.000000000
-0700
+++ linux-2.6.7/drivers/pci/msi.c 2004-06-21 21:30:05.000000000
-0700
@@ -569,7 +569,7 @@
struct msi_desc *entry;
struct msg_address address;
struct msg_data data;
- int vector = 0, pos, dev_msi_cap;
+ int vector = 0, pos, dev_msi_cap, i;
u32 phys_addr, table_offset;
u32 control;
u8 bir;
@@ -629,12 +629,12 @@
writel(address.hi_address, base +
PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
writel(*(u32*)&data, base + PCI_MSIX_ENTRY_DATA_OFFSET);
/* Initialize all entries from 1 up to 0 */
- for (pos = 1; pos < dev_msi_cap; pos++) {
- writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+ for (i = 1; i < dev_msi_cap; i++) {
+ writel(0, base + i * PCI_MSIX_ENTRY_SIZE +
PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
- writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+ writel(0, base + i * PCI_MSIX_ENTRY_SIZE +
PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
- writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+ writel(0, base + i * PCI_MSIX_ENTRY_SIZE +
PCI_MSIX_ENTRY_DATA_OFFSET);
}
attach_msi_entry(entry, vector);
Tom> What do you think of "Failure to request the MMIO address
Tom> space of the MSI-X PBA"? Or what name do you suggest?
Unless I'm misunderstanding the code, nothing is failing, and it's not
the PBA being mapped. You're ioremapping the the MSI-X vector table
so that the msi core can write vector values, etc. So I would suggest
using a name of "MSI-X vector table" in the call to
request_mem_region. (Remember that this name will be displayed in
/proc/iomem, and I don't think it makes sense to display "MSI-X iomap
Failure" to the user when MSI-X is set up and working).
- Roland
On Tuesday, June 22 Roland Dreier wrote:
> So I would suggest using a name of "MSI-X vector table" in the
> call to request_mem_region.
Good!
Thanks,
Long
Tom> The intent of msi_remove_pci_irq_vectors() is to support
Tom> hot-removed operation. This function is not for a device
Tom> driver to call and should not be exported. I acknowledged the
Tom> problem of the MSI-X region being only released in
Tom> msi_remove_pci_irq_vectors(). I'm in a progress of updating
Tom> the existing MSI-X code.
Do you have any plans for when this should be fixed? Right now, with
the standard kernel, if I unload and then reload my driver module,
setting up MSI-X fails the second time through because the core has
not cleaned up the memory region from the first time.
- Roland
Tom> The PCI 3.0 specification has implementation notes that MMIO
Tom> address space for a device's MSI-X structure should be
Tom> isolated so that the software system can set different page
Tom> for controlling accesses to the MSI-X structure. The
Tom> implementation of MSI patch requires the PCI subsystem, not a
Tom> device driver, to maintain full control of the MSI-X
Tom> table/MSI-X PBA and MMIO address space of the MSI-X
Tom> table/MSI-X PBA. A device driver is prohibited from
Tom> requesting the MMIO address space of the MSI-X table/MSI-X
Tom> PBA. Otherwise, the PCI subsystem will fail enabling MSI-X on
Tom> its hardware device when it calls the function
Tom> pci_enable_msi().
Thanks. I guess for the time being I will have to split up my request
region calls.
Do you think the msi subsystem should use a different name for the
MSI-X memory region ("MSI-X iomap Failure" seems very strange to me).
- Roland
On Tuesday, June 22, 2004 Roland Dreier wrote:
>Do you think the msi subsystem should use a different name for the
>MSI-X memory region ("MSI-X iomap Failure" seems very strange to me).
What do you think of "Failure to request the MMIO address space of the
MSI-X PBA"?
Or what name do you suggest?
Thanks,
Long
On Tue, Jun 22, 2004 at 11:08:04AM -0700, Nguyen, Tom L wrote:
> On Tuesday, June 22, 2004 Roland Dreier wrote:
>
> >Do you think the msi subsystem should use a different name for the
> >MSI-X memory region ("MSI-X iomap Failure" seems very strange to me).
>
> What do you think of "Failure to request the MMIO address space of the
> MSI-X PBA"?
> Or what name do you suggest?
Why do you think you want "failure" in that string? It's a name that
can be used to easily identify the MMIO region, and your sentence is a)
too long and b) doesn't make sense (at least to me, maybe I'm missing
something important).
Why not just "MSI-X PBA"?
Christoph> Why do you think you want "failure" in that string?
Christoph> It's a name that can be used to easily identify the
Christoph> MMIO region, and your sentence is a) too long and b)
Christoph> doesn't make sense (at least to me, maybe I'm missing
Christoph> something important).
Agreed (and I think Long agrees now too).
Christoph> Why not just "MSI-X PBA"?
Just to be clear ... it's not actually the PBA, so I don't think
that's a good idea. I believe my suggestion of "MSI-X vector table,"
which Long accepted, is the right name.
- Roland
On Mon, Jun 21, 2004 at 09:35:24PM -0700, Roland Dreier wrote:
> msix_capability_init() puts the offset of the MSI-X capability into
> pos, then uses pos as a loop index to clear the MSI-X vector table,
> and then tries to use pos as the offset again, which results in
> writing the MSI-X enable bit off into space.
>
> This patch fixes that by adding a new loop index variable and using
> that to clear the vector table.
>
> - Roland
>
> Signed-off-by: Roland Dreier <[email protected]>
Applied, thanks.
greg k-h
Greg> Applied, thanks.
Great... I would suggest looking at applying the big patch that Long
posted today, too, as it has lots of improvements.
- Roland
On Tue, Jun 22, 2004 at 04:57:53PM -0700, Roland Dreier wrote:
> Greg> Applied, thanks.
>
> Great... I would suggest looking at applying the big patch that Long
> posted today, too, as it has lots of improvements.
Yes, I saw it, but it's just out for review, not ready to apply just
yet. :)
thanks,
greg k-h