Hi,
I encountered a problem that some PCI devices don't work on my system
which have huge number of PCI devices.
It is mandatory for all PCI device drivers to enable the device by
calling pci_enable_device() which enables all regions probed from the
device's BARs. If pci_enable_device() failes to enable any regions
probed from BARs, it returns as error. On the large servers, I/O port
resource could not be assigned to all PCI devices because it is
limited (64KB on Intel Architecture[1]) and it would be fragmented
(I/O base register of PCI-to-PCI bridge will usually be aligned to a
4KB boundary[2]). In this case, the devices which have no I/O port
resource assigned don't work because pci_enable_device() for those
devices failes. This is what happened on my machine.
---
[1]: Some machines support 64KB I/O port space per PCI segment.
[2]: Some P2P bridges support optional 1KB aligned I/O base.
Here, there are many PCI devices that provide both I/O port and MMIO
interface, and some of those devices can be handled without using I/O
port interface. The reason why such devices provide I/O port interface
is for compatibility to legacy OSs. So this kind of devices should
work even if enough I/O port resources are not assigned. The "PCI
Local Bus Specification Revision 3.0" also mentions about this topic
(Please see p.44, "IMPLEMENTATION NOTE"). On the current linux,
unfortunately, this kind of devices don't work if I/O port resources
are not assigned, because pci_enable_device() for those devices fails.
To solve this problem, this series of patches introduces a new
interface pci_set_bar_mask() and pci_set_bar_mask_by_resource() for
PCI device drivers to tell the kernel what regions they really want to
use. Once the driver call pci_set_bar_mask*(), following
pci_enable_device() and pci_request_regions() call handles only the
specific regions. If the driver doesn't use pci_set_bar_mask*(),
pci_enable_device() and pci_request_regions() handle all regions as
they currently are. By using pci_set_bar_mask*(), we can make PCI
drivers legacy I/O port free with very small change.
I'm attaching the following four patches:
[patch 1/4] Inntroduce pci_set_bar_mask*()
[patch 2/4] Update Documantion/pci.txt
[patch 3/4] Make Intel e1000 driver legacy I/O port free
[patch 4/4] Make Emulex lpfc driver legacy I/O port free
I would very much appreciate giving me any comments and suggestions.
Thanks,
Kenji Kaneshige
This patch introduces a new interface pci_select_resource() for PCI
device drivers to tell kernel what resources they want to use. This
interface enables some PCI device drivers to handle the devices even
if no I/O resources are allocated to the devices.
Signed-off-by: Kenji Kaneshige <[email protected]>
drivers/pci/pci.c | 14 +++++++++-----
drivers/pci/probe.c | 1 +
include/linux/pci.h | 15 +++++++++++++++
3 files changed, 25 insertions(+), 5 deletions(-)
Index: linux-2.6.16-rc3/drivers/pci/pci.c
===================================================================
--- linux-2.6.16-rc3.orig/drivers/pci/pci.c 2006-02-14 12:25:10.000000000 +0900
+++ linux-2.6.16-rc3/drivers/pci/pci.c 2006-02-14 12:27:59.000000000 +0900
@@ -497,7 +497,7 @@
{
int err;
- if ((err = pci_enable_device_bars(dev, (1 << PCI_NUM_RESOURCES) - 1)))
+ if ((err = pci_enable_device_bars(dev, dev->bar_mask)))
return err;
pci_fixup_device(pci_fixup_enable, dev);
dev->is_enabled = 1;
@@ -535,6 +535,7 @@
pcibios_disable_device(dev);
dev->is_enabled = 0;
+ pci_set_bar_mask(dev, (1 << PCI_NUM_RESOURCES) - 1);
}
/**
@@ -681,7 +682,8 @@
int i;
for (i = 0; i < 6; i++)
- pci_release_region(pdev, i);
+ if (pdev->bar_mask & (1 << i))
+ pci_release_region(pdev, i);
}
/**
@@ -702,13 +704,15 @@
int i;
for (i = 0; i < 6; i++)
- if(pci_request_region(pdev, i, res_name))
- goto err_out;
+ if (pdev->bar_mask & (1 << i))
+ if (pci_request_region(pdev, i, res_name))
+ goto err_out;
return 0;
err_out:
while(--i >= 0)
- pci_release_region(pdev, i);
+ if (pdev->bar_mask & (1 << i))
+ pci_release_region(pdev, i);
return -EBUSY;
}
Index: linux-2.6.16-rc3/drivers/pci/probe.c
===================================================================
--- linux-2.6.16-rc3.orig/drivers/pci/probe.c 2006-02-14 12:25:10.000000000 +0900
+++ linux-2.6.16-rc3/drivers/pci/probe.c 2006-02-14 12:27:59.000000000 +0900
@@ -803,6 +803,7 @@
dev->vendor = l & 0xffff;
dev->device = (l >> 16) & 0xffff;
dev->cfg_size = pci_cfg_space_size(dev);
+ pci_set_bar_mask(dev, (1 << PCI_NUM_RESOURCES) - 1);
/* Assume 32-bit PCI; let 64-bit PCI cards (which are far rarer)
set this higher, assuming the system even supports it. */
Index: linux-2.6.16-rc3/include/linux/pci.h
===================================================================
--- linux-2.6.16-rc3.orig/include/linux/pci.h 2006-02-14 12:25:13.000000000 +0900
+++ linux-2.6.16-rc3/include/linux/pci.h 2006-02-14 12:27:59.000000000 +0900
@@ -143,6 +143,7 @@
*/
unsigned int irq;
struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
+ int bar_mask; /* bitmask of BAR's to be enabled */
/* These fields are used by common fixups */
unsigned int transparent:1; /* Transparent PCI bridge */
@@ -695,6 +696,20 @@
}
#endif /* HAVE_ARCH_PCI_RESOURCE_TO_USER */
+static inline void pci_set_bar_mask(struct pci_dev *dev, int mask)
+{
+ dev->bar_mask = mask;
+}
+
+static inline void pci_set_bar_mask_by_resource(struct pci_dev *dev,
+ unsigned long mask)
+{
+ int i, bar_mask = 0;
+ for (i = 0; i < PCI_NUM_RESOURCES; i++)
+ if (pci_resource_flags(dev, i) & mask)
+ bar_mask |= (1 << i);
+ pci_set_bar_mask(dev, bar_mask);
+}
/*
* The world is not perfect and supplies us with broken PCI devices.
This patch adds the description about pci_select_resource() into
Documenation/pci.txt.
Signed-off-by: Kenji Kaneshige <[email protected]>
Documentation/pci.txt | 25 +++++++++++++++++++++++++
1 files changed, 25 insertions(+)
Index: linux-2.6.16-rc3/Documentation/pci.txt
===================================================================
--- linux-2.6.16-rc3.orig/Documentation/pci.txt 2006-01-03 12:21:10.000000000 +0900
+++ linux-2.6.16-rc3/Documentation/pci.txt 2006-02-14 12:28:03.000000000 +0900
@@ -169,6 +169,31 @@
needed and wakes up the device if it was in suspended state. Please note
that this function can fail.
+ If you want to enable only the specific type of regions of the device,
+you can tell it to the kernel by calling pci_set_bar_mask() or
+pci_set_bar_mask_by_resource() before calling pci_enable_device(). Once
+you tell it to the kernel, the following pci_enable_device() and
+pci_request_regions() call will handles only the regions you specified.
+The kernel will enables all regions of the device if you don't use
+pci_set_bar_mask*(). The pci_set_bar_mask*() would be needed to make some
+drivers legacy I/O port free. On the large servers, I/O port resource could
+not be assigned to all PCI devices because it is limited (64KB on Intel
+Architecture[1]) and it would be fragmented (I/O base register of
+PCI-to-PCI bridge will usually be aligned to a 4KB boundary[2]). In this
+case, pci_enable_device() for those devices will fail if you try to enable
+all the regions. However, it is a problem for some PCI devices that provide
+both I/O port and MMIO interface because some of them can be handled
+without using I/O port interface. The reason why such devices provide I/O
+port interface is for compatibility to legacy OSs. So this kind of devices
+should work even if enough I/O port resources are not assigned. The "PCI
+Local Bus Specification Revision 3.0" also mentions about this topic
+(Please see p.44, "IMPLEMENTATION NOTE"). You can solve this problem by
+using pci_set_bar_mask*(). Please note that the information specified
+through pci_set_bar_mask*() will be cleared at pci_disable_device() time.
+---
+[1] Some machines support 64KB I/O port space per PCI segment.
+[2] Some P2P bridges support optional 1KB aligned I/O base.
+
If you want to use the device in bus mastering mode, call pci_set_master()
which enables the bus master bit in PCI_COMMAND register and also fixes
the latency timer value if it's set to something bogus by the BIOS.
This patch makes e1000 driver pci legacy free.
Signed-off-by: Kenji Kaneshige <[email protected]>
drivers/net/e1000/e1000_main.c | 34 ++++++++++++++++++++++++++++------
1 files changed, 28 insertions(+), 6 deletions(-)
Index: linux-2.6.16-rc3/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.16-rc3.orig/drivers/net/e1000/e1000_main.c 2006-02-14 12:25:08.000000000 +0900
+++ linux-2.6.16-rc3/drivers/net/e1000/e1000_main.c 2006-02-14 12:28:06.000000000 +0900
@@ -642,6 +642,18 @@
}
}
+static inline int need_io_port(struct pci_dev *pdev)
+{
+ switch (pdev->device) {
+ case 0x1008: case 0x1009: case 0x100c: case 0x100d: case 0x100e:
+ case 0x1015: case 0x1016: case 0x1017: case 0x101e: case 0x100f:
+ case 0x1011: case 0x1010: case 0x1012: case 0x101d: case 0x1013:
+ case 0x1018: case 0x1078: case 0x1076: case 0x107c: case 0x1077:
+ return 1;
+ }
+ return 0;
+}
+
/**
* e1000_probe - Device Initialization Routine
* @pdev: PCI device information struct
@@ -666,6 +678,11 @@
int i, err, pci_using_dac;
uint16_t eeprom_data;
uint16_t eeprom_apme_mask = E1000_EEPROM_APME;
+ int need_io = need_io_port(pdev);
+
+ if (!need_io)
+ pci_set_bar_mask_by_resource(pdev, IORESOURCE_MEM);
+
if ((err = pci_enable_device(pdev)))
return err;
@@ -709,12 +726,15 @@
goto err_ioremap;
}
- for (i = BAR_1; i <= BAR_5; i++) {
- if (pci_resource_len(pdev, i) == 0)
- continue;
- if (pci_resource_flags(pdev, i) & IORESOURCE_IO) {
- adapter->hw.io_base = pci_resource_start(pdev, i);
- break;
+ if (need_io) {
+ for (i = BAR_1; i <= BAR_5; i++) {
+ if (pci_resource_len(pdev, i) == 0)
+ continue;
+ if (pci_resource_flags(pdev, i) & IORESOURCE_IO) {
+ adapter->hw.io_base =
+ pci_resource_start(pdev, i);
+ break;
+ }
}
}
@@ -4683,6 +4703,8 @@
if (retval)
DPRINTK(PROBE, ERR, "Error in setting power state\n");
e1000_pci_restore_state(adapter);
+ if (!need_io_port(pdev))
+ pci_set_bar_mask_by_resource(pdev, IORESOURCE_MEM);
ret_val = pci_enable_device(pdev);
pci_set_master(pdev);
This patch makes lpfc driver pci legacy free.
Signed-off-by: Kenji Kaneshige <[email protected]>
drivers/scsi/lpfc/lpfc_init.c | 1 +
1 files changed, 1 insertion(+)
Index: linux-2.6.16-rc3/drivers/scsi/lpfc/lpfc_init.c
===================================================================
--- linux-2.6.16-rc3.orig/drivers/scsi/lpfc/lpfc_init.c 2006-02-14 12:25:10.000000000 +0900
+++ linux-2.6.16-rc3/drivers/scsi/lpfc/lpfc_init.c 2006-02-14 12:28:08.000000000 +0900
@@ -1368,6 +1368,7 @@
int i;
uint16_t iotag;
+ pci_set_bar_mask_by_resource(pdev, IORESOURCE_MEM);
if (pci_enable_device(pdev))
goto out;
if (pci_request_regions(pdev, LPFC_DRIVER_NAME))
Kenji Kaneshige <[email protected]> writes:
> I encountered a problem that some PCI devices don't work on my system
> which have huge number of PCI devices.
Is that a large IA64 system?
[...]
The basic concept looks good to me, but I would suggest you use
the Linux bitmap functions (DECLARE_BITMAP(), set_bit, test_bit etc.)
instead of open coding all that.
And for the e1000 change - instead of adding a big switch with
magic numbers that will likely bitrot it's better to use
the driver_data field in pci_device_id for such device specific flags.
-Andi
Andi Kleen wrote:
> Kenji Kaneshige <[email protected]> writes:
>
>
>>I encountered a problem that some PCI devices don't work on my system
>>which have huge number of PCI devices.
>
>
> Is that a large IA64 system?
>
Yes. My IA64 system can have maximum 128 PCI slots, but
currently many of devices on those slots don't work...
> [...]
>
> The basic concept looks good to me, but I would suggest you use
> the Linux bitmap functions (DECLARE_BITMAP(), set_bit, test_bit etc.)
> instead of open coding all that.
>
> And for the e1000 change - instead of adding a big switch with
> magic numbers that will likely bitrot it's better to use
> the driver_data field in pci_device_id for such device specific flags.
>
I see.
I will try to fix my patches based on your suggestion.
Thanks,
Kenji Kaneshige
Kenji Kaneshige <[email protected]> wrote:
>
> This patch introduces a new interface pci_select_resource() for PCI
> device drivers to tell kernel what resources they want to use.
It'd be nice if we didn't need to introduce any new API functions for this.
If we could just do:
struct pci_something pci_something_table[] = {
...
{
...
.dont_allocate_io_space = 1,
...
},
...
};
within each driver which wants it.
But I can't think of a suitable per-device-id structure with which we can
do that :(
Andrew Morton wrote:
> Kenji Kaneshige <[email protected]> wrote:
>
>>This patch introduces a new interface pci_select_resource() for PCI
>> device drivers to tell kernel what resources they want to use.
>
>
> It'd be nice if we didn't need to introduce any new API functions for this.
> If we could just do:
>
> struct pci_something pci_something_table[] = {
> ...
> {
> ...
> .dont_allocate_io_space = 1,
> ...
> },
> ...
> };
>
> within each driver which wants it.
>
> But I can't think of a suitable per-device-id structure with which we can
> do that :(
>
>
My another idea was to use pci quirks. In this approach, we don't
need to introduce any new API. But I gave up this idea because it
looked abuse of pci quirks.
Anyway, I try to think about new ideas we don't need to introduce
any new API.
Thanks,
Kenji Kaneshige
On Wed, Feb 15, 2006 at 03:03:56PM +0900, Kenji Kaneshige wrote:
> Andrew Morton wrote:
> >Kenji Kaneshige <[email protected]> wrote:
> >
> >>This patch introduces a new interface pci_select_resource() for PCI
> >>device drivers to tell kernel what resources they want to use.
> >
> >
> >It'd be nice if we didn't need to introduce any new API functions for this.
> >If we could just do:
> >
> >struct pci_something pci_something_table[] = {
> > ...
> > {
> > ...
> > .dont_allocate_io_space = 1,
> > ...
> > },
> > ...
> >};
> >
> >within each driver which wants it.
> >
> >But I can't think of a suitable per-device-id structure with which we can
> >do that :(
> >
> >
>
> My another idea was to use pci quirks. In this approach, we don't
> need to introduce any new API. But I gave up this idea because it
> looked abuse of pci quirks.
>
> Anyway, I try to think about new ideas we don't need to introduce
> any new API.
What about pci_enable_device_bars() ?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
Russell King wrote:
> On Wed, Feb 15, 2006 at 03:03:56PM +0900, Kenji Kaneshige wrote:
>
>>Andrew Morton wrote:
>>
>>>Kenji Kaneshige <[email protected]> wrote:
>>>
>>>
>>>>This patch introduces a new interface pci_select_resource() for PCI
>>>>device drivers to tell kernel what resources they want to use.
>>>
>>>
>>>It'd be nice if we didn't need to introduce any new API functions for this.
>>>If we could just do:
>>>
>>>struct pci_something pci_something_table[] = {
>>> ...
>>> {
>>> ...
>>> .dont_allocate_io_space = 1,
>>> ...
>>> },
>>> ...
>>>};
>>>
>>>within each driver which wants it.
>>>
>>>But I can't think of a suitable per-device-id structure with which we can
>>>do that :(
>>>
>>>
>>
>>My another idea was to use pci quirks. In this approach, we don't
>>need to introduce any new API. But I gave up this idea because it
>>looked abuse of pci quirks.
>>
>>Anyway, I try to think about new ideas we don't need to introduce
>>any new API.
>
>
> What about pci_enable_device_bars() ?
>
Yes, it's one option (In fact, my first idea was using it),
though we need to move the following lines into
pci_enable_device_bars() from pci_enable_device().
pci_fixup_device(pci_fixup_enable, dev);
dev->is_enabled = 1;
Actually, IIRC, there are one or two existing drivers using it.
In addition to pci_enable_device_bars(), we can use
pci_request_region() for requesting each region instead of using
pci_request_regions().
The reason I didn't use this option was I thought we would need
bigger changes to drivers if we use pci_enable_devie_bars() and
pci_request_region(). For example, if we use pci_request_region()
for requesting the specific regions and if an error occurs
while doing that, we need to release only the regions we succeeded
in requesting. I think this is a little bit troublesome for driver
writers. In adition, though this is my personal opinion, only one
API to enable devices (e.g. pci_enable_device()) looks nice to me.
Anyway, I think it would be nice if we can solve the problem by
as small changes as possible to the existing drivers.
How do you think?
Thanks,
Kenji Kaneshige