2006-02-27 04:56:08

by Kenji Kaneshige

[permalink] [raw]
Subject: [PATCH 0/4] PCI legacy I/O port free driver (take 3)

Hi,

Here is an updated set of patches for PCI legacy I/O port free drivers
which incorporates feedbacks. Summary of changes from the previous
version are:

- Removed the device_flags field from struct pci_device_id, which
was introduced in the previous version of patch

- Changed e1000 driver to use the driver_data field in struct
pci_device_id to see if the device needs I/O port regions.

- Added proper messages instead of WARN_ON() at the error.

- Updated the Documentation/pci.txt

I'm attaching the following four patches:

[patch 1/4] Add no_ioport flag into pci_dev
[patch 2/4] Update Documentation/pci.txt
[patch 3/4] Make Intel e1000 driver legacy I/O port free
[patch 4/4] Make Emulex lpfc driver legacy I/O port free

I'm attaching the brief description below about what the problem I'm
trying to solve is.

Thanks,
Kenji Kaneshige


Brief Description
~~~~~~~~~~~~~~~~~
I encountered a problem that some PCI devices don't work on my system
which have huge number of PCI devices.

It is mandatory for all PCI device drivers to enable the device by
calling pci_enable_device() which enables all regions probed from the
device's BARs. If pci_enable_device() failes to enable any regions
probed from BARs, it returns as error. On the large servers, I/O port
resource could not be assigned to all PCI devices because it is
limited (64KB on Intel Architecture[1]) and it would be fragmented
(I/O base register of PCI-to-PCI bridge will usually be aligned to a
4KB boundary[2]). In this case, the devices which have no I/O port
resource assigned don't work because pci_enable_device() for those
devices failes. This is what happened on my machine.
---
[1]: Some machines support 64KB I/O port space per PCI segment.
[2]: Some P2P bridges support optional 1KB aligned I/O base.

Here, there are many PCI devices that provide both I/O port and MMIO
interface, and some of those devices can be handled without using I/O
port interface. The reason why such devices provide I/O port interface
is for compatibility to legacy OSs. So this kind of devices should
work even if enough I/O port resources are not assigned. The "PCI
Local Bus Specification Revision 3.0" also mentions about this topic
(Please see p.44, "IMPLEMENTATION NOTE"). On the current linux,
unfortunately, this kind of devices don't work if I/O port resources
are not assigned, because pci_enable_device() for those devices fails.



2006-02-27 04:54:46

by Kenji Kaneshige

[permalink] [raw]
Subject: [PATCH 1/4] PCI legacy I/O port free driver (take 3) - Add no_ioport flag into pci_dev

This patch adds the no_ioport field into struct pci_dev, which is used
to tell the kernel not to touch any I/O port regions.

Signed-off-by: Kenji Kaneshige <[email protected]>

drivers/pci/pci.c | 39 +++++++++++++++++++++++++++++++--------
include/linux/pci.h | 1 +
2 files changed, 32 insertions(+), 8 deletions(-)

Index: linux-2.6.16-rc4/include/linux/pci.h
===================================================================
--- linux-2.6.16-rc4.orig/include/linux/pci.h 2006-02-27 13:29:34.000000000 +0900
+++ linux-2.6.16-rc4/include/linux/pci.h 2006-02-27 13:29:41.000000000 +0900
@@ -152,6 +152,7 @@
unsigned int is_busmaster:1; /* device is busmaster */
unsigned int no_msi:1; /* device may not use msi */
unsigned int block_ucfg_access:1; /* userspace config space access is blocked */
+ unsigned int no_ioport:1; /* device may not use ioport */

u32 saved_config_space[16]; /* config space saved at suspend time */
struct bin_attribute *rom_attr; /* attribute descriptor for sysfs ROM entry */
Index: linux-2.6.16-rc4/drivers/pci/pci.c
===================================================================
--- linux-2.6.16-rc4.orig/drivers/pci/pci.c 2006-02-27 13:29:34.000000000 +0900
+++ linux-2.6.16-rc4/drivers/pci/pci.c 2006-02-27 13:29:41.000000000 +0900
@@ -495,9 +495,14 @@
int
pci_enable_device(struct pci_dev *dev)
{
- int err;
+ int err, i, bars = (1 << PCI_NUM_RESOURCES) - 1;

- if ((err = pci_enable_device_bars(dev, (1 << PCI_NUM_RESOURCES) - 1)))
+ if (dev->no_ioport)
+ for (i = 0; i < PCI_NUM_RESOURCES; i++)
+ if (pci_resource_flags(dev, i) & IORESOURCE_IO)
+ bars &= ~(1 << i);
+
+ if ((err = pci_enable_device_bars(dev, bars)))
return err;
pci_fixup_device(pci_fixup_enable, dev);
dev->is_enabled = 1;
@@ -617,9 +622,14 @@
{
if (pci_resource_len(pdev, bar) == 0)
return;
- if (pci_resource_flags(pdev, bar) & IORESOURCE_IO)
+ if (pci_resource_flags(pdev, bar) & IORESOURCE_IO) {
+ if (pdev->no_ioport)
+ dev_warn(&pdev->dev,
+ "Trying to release PCI I/O region #%d for "
+ "the device marked I/O resource free\n", bar);
release_region(pci_resource_start(pdev, bar),
pci_resource_len(pdev, bar));
+ }
else if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM)
release_mem_region(pci_resource_start(pdev, bar),
pci_resource_len(pdev, bar));
@@ -645,6 +655,10 @@
return 0;

if (pci_resource_flags(pdev, bar) & IORESOURCE_IO) {
+ if (pdev->no_ioport)
+ dev_warn(&pdev->dev,
+ "Trying to request PCI I/O region #%d for "
+ "the device marked I/O resource free\n", bar);
if (!request_region(pci_resource_start(pdev, bar),
pci_resource_len(pdev, bar), res_name))
goto err_out;
@@ -678,10 +692,13 @@

void pci_release_regions(struct pci_dev *pdev)
{
- int i;
+ int i, no_ioport = pdev->no_ioport;

- for (i = 0; i < 6; i++)
+ for (i = 0; i < 6; i++) {
+ if (no_ioport && (pci_resource_flags(pdev, i) & IORESOURCE_IO))
+ continue;
pci_release_region(pdev, i);
+ }
}

/**
@@ -699,16 +716,22 @@
*/
int pci_request_regions(struct pci_dev *pdev, char *res_name)
{
- int i;
+ int i, no_ioport = pdev->no_ioport;

- for (i = 0; i < 6; i++)
+ for (i = 0; i < 6; i++) {
+ if (no_ioport && (pci_resource_flags(pdev, i) & IORESOURCE_IO))
+ continue;
if(pci_request_region(pdev, i, res_name))
goto err_out;
+ }
return 0;

err_out:
- while(--i >= 0)
+ while(--i >= 0) {
+ if (no_ioport && (pci_resource_flags(pdev, i) & IORESOURCE_IO))
+ continue;
pci_release_region(pdev, i);
+ }

return -EBUSY;
}

2006-02-27 04:55:01

by Kenji Kaneshige

[permalink] [raw]
Subject: [PATCH 2/4] PCI legacy I/O port free driver (take 3) - Update Documentation/pci.txt

This patch adds the description about legacy I/O port free driver into
Documentation/pci.txt.

Signed-off-by: Kenji Kaneshige <[email protected]>

Documentation/pci.txt | 28 ++++++++++++++++++++++++++++
1 files changed, 28 insertions(+)

Index: linux-2.6.16-rc4/Documentation/pci.txt
===================================================================
--- linux-2.6.16-rc4.orig/Documentation/pci.txt 2006-02-27 13:29:34.000000000 +0900
+++ linux-2.6.16-rc4/Documentation/pci.txt 2006-02-27 13:29:42.000000000 +0900
@@ -269,3 +269,31 @@
pci_find_device() Superseded by pci_get_device()
pci_find_subsys() Superseded by pci_get_subsys()
pci_find_slot() Superseded by pci_get_slot()
+
+
+9. Legacy I/O port free driver
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+On the large servers, I/O port resources could not be assigned to all
+PCI devices because it is limited (64KB on Intel Architecture[1]) and
+it would be fragmented (I/O base register of PCI-to-PCI bridge will
+usually be aligned to a 4KB boundary[2]). On such systems,
+pci_enable_device() and pci_request_regions() for those devices will
+fail because those functions try to enable all the regions. However,
+it is a problem for some PCI devices which provide both I/O port and
+MMIO interface because some of them can be handled without using I/O
+port interface. The reason why such devices provide I/O port interface
+is for compatibility to legacy OSs. So this kind of devices should
+work even if enough I/O port resources are not assigned. The "PCI
+Local Bus Specification Revision 3.0" also mentions about this topic
+(Please see p.44, "IMPLEMENTATION NOTE").
+
+This problem is solved by telling the kernel if your driver needs to
+use I/O port to handle the device. If your driver doesn't need any I/O
+port regions to handle the device, you can tell it to the kernel by
+setting the no_ioport flag in struct pci_dev. If the no_ioport flag is
+set, kernel will never touch I/O port regions for the corresponding
+devices. Please note that you need to set the no_ioport flag before
+calling pci_enable_device() and pci_request_regions().
+
+[1] Some systems support 64KB I/O port space per PCI segment.
+[2] Some PCI-to-PCI bridges support optional 1KB aligned I/O base.

2006-02-27 04:56:26

by Kenji Kaneshige

[permalink] [raw]
Subject: [PATCH 3/4] PCI legacy I/O port free driver (take 3) - Make Intel e1000 driver legacy I/O port free

This patch makes Intel e1000 driver legacy I/O port free.

Signed-off-by: Kenji Kaneshige <[email protected]>

---
drivers/net/e1000/e1000.h | 5 +
drivers/net/e1000/e1000_main.c | 109 +++++++++++++++++++++--------------------
2 files changed, 61 insertions(+), 53 deletions(-)

Index: linux-2.6.16-rc4/drivers/net/e1000/e1000.h
===================================================================
--- linux-2.6.16-rc4.orig/drivers/net/e1000/e1000.h 2006-02-27 13:29:34.000000000 +0900
+++ linux-2.6.16-rc4/drivers/net/e1000/e1000.h 2006-02-27 13:29:45.000000000 +0900
@@ -77,8 +77,9 @@
#define BAR_1 1
#define BAR_5 5

-#define INTEL_E1000_ETHERNET_DEVICE(device_id) {\
- PCI_DEVICE(PCI_VENDOR_ID_INTEL, device_id)}
+#define E1000_NO_IOPORT (1 << 0)
+#define INTEL_E1000_ETHERNET_DEVICE(device_id, flags) {\
+ PCI_DEVICE(PCI_VENDOR_ID_INTEL, device_id), .driver_data = flags}

struct e1000_adapter;

Index: linux-2.6.16-rc4/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.16-rc4.orig/drivers/net/e1000/e1000_main.c 2006-02-27 13:29:34.000000000 +0900
+++ linux-2.6.16-rc4/drivers/net/e1000/e1000_main.c 2006-02-27 13:29:45.000000000 +0900
@@ -115,51 +115,51 @@
* {PCI_DEVICE(PCI_VENDOR_ID_INTEL, device_id)}
*/
static struct pci_device_id e1000_pci_tbl[] = {
- INTEL_E1000_ETHERNET_DEVICE(0x1000),
- INTEL_E1000_ETHERNET_DEVICE(0x1001),
- INTEL_E1000_ETHERNET_DEVICE(0x1004),
- INTEL_E1000_ETHERNET_DEVICE(0x1008),
- INTEL_E1000_ETHERNET_DEVICE(0x1009),
- INTEL_E1000_ETHERNET_DEVICE(0x100C),
- INTEL_E1000_ETHERNET_DEVICE(0x100D),
- INTEL_E1000_ETHERNET_DEVICE(0x100E),
- INTEL_E1000_ETHERNET_DEVICE(0x100F),
- INTEL_E1000_ETHERNET_DEVICE(0x1010),
- INTEL_E1000_ETHERNET_DEVICE(0x1011),
- INTEL_E1000_ETHERNET_DEVICE(0x1012),
- INTEL_E1000_ETHERNET_DEVICE(0x1013),
- INTEL_E1000_ETHERNET_DEVICE(0x1014),
- INTEL_E1000_ETHERNET_DEVICE(0x1015),
- INTEL_E1000_ETHERNET_DEVICE(0x1016),
- INTEL_E1000_ETHERNET_DEVICE(0x1017),
- INTEL_E1000_ETHERNET_DEVICE(0x1018),
- INTEL_E1000_ETHERNET_DEVICE(0x1019),
- INTEL_E1000_ETHERNET_DEVICE(0x101A),
- INTEL_E1000_ETHERNET_DEVICE(0x101D),
- INTEL_E1000_ETHERNET_DEVICE(0x101E),
- INTEL_E1000_ETHERNET_DEVICE(0x1026),
- INTEL_E1000_ETHERNET_DEVICE(0x1027),
- INTEL_E1000_ETHERNET_DEVICE(0x1028),
- INTEL_E1000_ETHERNET_DEVICE(0x105E),
- INTEL_E1000_ETHERNET_DEVICE(0x105F),
- INTEL_E1000_ETHERNET_DEVICE(0x1060),
- INTEL_E1000_ETHERNET_DEVICE(0x1075),
- INTEL_E1000_ETHERNET_DEVICE(0x1076),
- INTEL_E1000_ETHERNET_DEVICE(0x1077),
- INTEL_E1000_ETHERNET_DEVICE(0x1078),
- INTEL_E1000_ETHERNET_DEVICE(0x1079),
- INTEL_E1000_ETHERNET_DEVICE(0x107A),
- INTEL_E1000_ETHERNET_DEVICE(0x107B),
- INTEL_E1000_ETHERNET_DEVICE(0x107C),
- INTEL_E1000_ETHERNET_DEVICE(0x107D),
- INTEL_E1000_ETHERNET_DEVICE(0x107E),
- INTEL_E1000_ETHERNET_DEVICE(0x107F),
- INTEL_E1000_ETHERNET_DEVICE(0x108A),
- INTEL_E1000_ETHERNET_DEVICE(0x108B),
- INTEL_E1000_ETHERNET_DEVICE(0x108C),
- INTEL_E1000_ETHERNET_DEVICE(0x1099),
- INTEL_E1000_ETHERNET_DEVICE(0x109A),
- INTEL_E1000_ETHERNET_DEVICE(0x10B5),
+ INTEL_E1000_ETHERNET_DEVICE(0x1000, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x1001, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x1004, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x1008, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1009, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x100C, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x100D, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x100E, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x100F, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1010, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1011, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1012, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1013, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1014, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x1015, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1016, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1017, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1018, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1019, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x101A, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x101D, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x101E, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1026, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x1027, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x1028, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x105E, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x105F, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x1060, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x1075, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x1076, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1077, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1078, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x1079, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x107A, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x107B, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x107C, 0),
+ INTEL_E1000_ETHERNET_DEVICE(0x107D, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x107E, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x107F, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x108A, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x108B, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x108C, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x1099, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x109A, E1000_NO_IOPORT),
+ INTEL_E1000_ETHERNET_DEVICE(0x10B5, E1000_NO_IOPORT),
/* required last entry */
{0,}
};
@@ -666,6 +666,10 @@
int i, err, pci_using_dac;
uint16_t eeprom_data;
uint16_t eeprom_apme_mask = E1000_EEPROM_APME;
+
+ /* See if the device needs I/O port */
+ pdev->no_ioport = !!(ent->driver_data & E1000_NO_IOPORT);
+
if ((err = pci_enable_device(pdev)))
return err;

@@ -709,12 +713,15 @@
goto err_ioremap;
}

- for (i = BAR_1; i <= BAR_5; i++) {
- if (pci_resource_len(pdev, i) == 0)
- continue;
- if (pci_resource_flags(pdev, i) & IORESOURCE_IO) {
- adapter->hw.io_base = pci_resource_start(pdev, i);
- break;
+ if (!(ent->driver_data & E1000_NO_IOPORT)) {
+ for (i = BAR_1; i <= BAR_5; i++) {
+ if (pci_resource_len(pdev, i) == 0)
+ continue;
+ if (pci_resource_flags(pdev, i) & IORESOURCE_IO) {
+ adapter->hw.io_base =
+ pci_resource_start(pdev, i);
+ break;
+ }
}
}


2006-02-27 04:57:04

by Kenji Kaneshige

[permalink] [raw]
Subject: [PATCH 4/4] PCI legacy I/O port free driver (take 3) - Make Emulex lpfc driver legacy I/O port free

This patch makes Emulex lpfc driver legacy I/O port free.

Signed-off-by: Kenji Kaneshige <[email protected]>

---
drivers/scsi/lpfc/lpfc_init.c | 3 +++
1 files changed, 3 insertions(+)

Index: linux-2.6.16-rc4/drivers/scsi/lpfc/lpfc_init.c
===================================================================
--- linux-2.6.16-rc4.orig/drivers/scsi/lpfc/lpfc_init.c 2006-02-27 13:28:34.000000000 +0900
+++ linux-2.6.16-rc4/drivers/scsi/lpfc/lpfc_init.c 2006-02-27 13:29:47.000000000 +0900
@@ -1368,6 +1368,9 @@
int i;
uint16_t iotag;

+ /* Don't need to use I/O port regions */
+ pdev->no_ioport = 1;
+
if (pci_enable_device(pdev))
goto out;
if (pci_request_regions(pdev, LPFC_DRIVER_NAME))

2006-02-27 06:43:32

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH 2/4] PCI legacy I/O port free driver (take 3) - Update Documentation/pci.txt

On Mon, Feb 27, 2006 at 01:52:59PM +0900, Kenji Kaneshige wrote:
> This patch adds the description about legacy I/O port free driver into
> Documentation/pci.txt.

Kenji,
Since you've done all the hard work so far, my small contribution
is to rewrite the pci.txt so it's easier to read.
Does this look better to you?

The new section on "Write Posting" is just a proposal (RFC).
Write Posting has been discussed on most linux driver mailing lists
but I didn't see it mentioned in Documentation/pci.txt
(which seems like a natural place to discuss it).

cheers,
grant


9. Legacy I/O port free driver

Large servers may not be able to provide I/O port resources to all
PCI devices. I/O Port space is only 64KB on Intel Architecture[1]
and is likely also fragmented since the I/O base register of PCI-to-PCI
bridge will usually be aligned to a 4KB boundary[2]. On such systems,
pci_enable_device() and pci_request_regions() will fail when attempting
to enable I/O Port regions that don't have I/O Port resources assigned.

Fortunately, many PCI devices which request I/O Port resources also
provide access to the same registers via MMIO BARs. These devices
can be handled without using I/O port space and the drivers typically
offer a CONFIG_ option to only use MMIO regions (e.g. CONFIG_TULIP_MMIO).
PCI devices typically provide I/O port interface for legacy OSs and
will work when I/O port resources are not assigned. The "PCI Local Bus
Specification Revision 3.0" discusses this on p.44, "IMPLEMENTATION NOTE".

If your PCI device driver doesn't need I/O port resources assigned to
I/O Port BARs, set the no_ioport flag in struct pci_dev before calling
pci_enable_device() and pci_request_regions(). If the no_ioport flag
is set, generic PCI support will ignore I/O port regions for the
corresponding devices.

[1] Some systems support 64KB I/O port space per PCI segment.
[2] Some PCI-to-PCI bridges support optional 1KB aligned I/O base.


10. MMIO Space and "Write Posting"

Converting a driver from using I/O Port space to using MMIO space
often requires some additional changes. Specifically, "write posting"
needs to be handled. Most drivers (e.g. tg3, acenic, sym53c8xx_2)
already do. I/O Port space guarantees write transactions reach the
PCI device before the CPU can continue. Writes to MMIO space allow
to CPU continue before the transaction reaches the PCI device.
HW weenies call this "Write Posting" because the write completion
is "posted" to the CPU before the transaction has reached it's
destination.

Thus, timing sensitive code should add readl() where the CPU
is expected to wait before doing other work. The classic "bit
banging" sequence works fine for I/O Port space:

for (i=8; --i; val >>= 1) {
outb(val & 1, ioport_reg); /* write bit */
udelay(10);
}

The same sequence for MMIO space should be:

for (i=8; --i; val >>= 1) {
writeb(val & 1, mmio_reg); /* write bit */
readb(safe_mmio_reg); /* flush posted write */
udelay(10);
}

It is important that "safe_mmio_reg" not have any side effects that
interferes with the correct operation of the device.

Another case to watch out for is when resetting a PCI device.
Use PCI Configuration space reads to flush the writel(). This will
gracefully handle the PCI master abort on all platforms if the PCI
device is expected to not respond to a readl(). Most x86 platforms
will allow MMIO reads to master abort (aka "Soft Fail") and return
garbage (e.g. ~0). But many RISC platforms will crash (aka "Hard Fail").

2006-02-27 08:29:49

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: [PATCH 2/4] PCI legacy I/O port free driver (take 3) - Update Documentation/pci.txt

Grant Grundler wrote:
> On Mon, Feb 27, 2006 at 01:52:59PM +0900, Kenji Kaneshige wrote:
>
>>This patch adds the description about legacy I/O port free driver into
>>Documentation/pci.txt.
>
>
> Kenji,
> Since you've done all the hard work so far, my small contribution
> is to rewrite the pci.txt so it's easier to read.
> Does this look better to you?
>

Thank you very much! This looks much better to me.
I'll update the patch.

As you know, it is very hard work for me to write English. I usually
spend much more time for that than writing C language...


> The new section on "Write Posting" is just a proposal (RFC).
> Write Posting has been discussed on most linux driver mailing lists
> but I didn't see it mentioned in Documentation/pci.txt
> (which seems like a natural place to discuss it).
>

I think this new section is very important and it should be added
to Documentation/pci.txt. I have been wondering that some drivers
might cause spurious interrupts because IIRC, some drivers don't
do "Write Posting" when they update their interrupt status register
through MMIO, though I don't know about it very much.

Thanks,
Kenji Kaneshige


> cheers,
> grant
>
>
> 9. Legacy I/O port free driver
>
> Large servers may not be able to provide I/O port resources to all
> PCI devices. I/O Port space is only 64KB on Intel Architecture[1]
> and is likely also fragmented since the I/O base register of PCI-to-PCI
> bridge will usually be aligned to a 4KB boundary[2]. On such systems,
> pci_enable_device() and pci_request_regions() will fail when attempting
> to enable I/O Port regions that don't have I/O Port resources assigned.
>
> Fortunately, many PCI devices which request I/O Port resources also
> provide access to the same registers via MMIO BARs. These devices
> can be handled without using I/O port space and the drivers typically
> offer a CONFIG_ option to only use MMIO regions (e.g. CONFIG_TULIP_MMIO).
> PCI devices typically provide I/O port interface for legacy OSs and
> will work when I/O port resources are not assigned. The "PCI Local Bus
> Specification Revision 3.0" discusses this on p.44, "IMPLEMENTATION NOTE".
>
> If your PCI device driver doesn't need I/O port resources assigned to
> I/O Port BARs, set the no_ioport flag in struct pci_dev before calling
> pci_enable_device() and pci_request_regions(). If the no_ioport flag
> is set, generic PCI support will ignore I/O port regions for the
> corresponding devices.
>
> [1] Some systems support 64KB I/O port space per PCI segment.
> [2] Some PCI-to-PCI bridges support optional 1KB aligned I/O base.
>
>
> 10. MMIO Space and "Write Posting"
>
> Converting a driver from using I/O Port space to using MMIO space
> often requires some additional changes. Specifically, "write posting"
> needs to be handled. Most drivers (e.g. tg3, acenic, sym53c8xx_2)
> already do. I/O Port space guarantees write transactions reach the
> PCI device before the CPU can continue. Writes to MMIO space allow
> to CPU continue before the transaction reaches the PCI device.
> HW weenies call this "Write Posting" because the write completion
> is "posted" to the CPU before the transaction has reached it's
> destination.
>
> Thus, timing sensitive code should add readl() where the CPU
> is expected to wait before doing other work. The classic "bit
> banging" sequence works fine for I/O Port space:
>
> for (i=8; --i; val >>= 1) {
> outb(val & 1, ioport_reg); /* write bit */
> udelay(10);
> }
>
> The same sequence for MMIO space should be:
>
> for (i=8; --i; val >>= 1) {
> writeb(val & 1, mmio_reg); /* write bit */
> readb(safe_mmio_reg); /* flush posted write */
> udelay(10);
> }
>
> It is important that "safe_mmio_reg" not have any side effects that
> interferes with the correct operation of the device.
>
> Another case to watch out for is when resetting a PCI device.
> Use PCI Configuration space reads to flush the writel(). This will
> gracefully handle the PCI master abort on all platforms if the PCI
> device is expected to not respond to a readl(). Most x86 platforms
> will allow MMIO reads to master abort (aka "Soft Fail") and return
> garbage (e.g. ~0). But many RISC platforms will crash (aka "Hard Fail").
>
>

2006-02-27 12:21:23

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: [PATCH 2/4] PCI legacy I/O port free driver (take 3) - Update Documentation/pci.txt

Here is an updated patch based on the feedback from Grant Grundler.

Thanks,
Kenji Kaneshige


This patch adds the description about legacy I/O port free driver into
Documentation/pci.txt.

Signed-off-by: Kenji Kaneshige <[email protected]>

Documentation/pci.txt | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 68 insertions(+)

Index: linux-2.6.16-rc4/Documentation/pci.txt
===================================================================
--- linux-2.6.16-rc4.orig/Documentation/pci.txt 2006-02-27 18:42:39.000000000 +0900
+++ linux-2.6.16-rc4/Documentation/pci.txt 2006-02-27 18:50:54.000000000 +0900
@@ -269,3 +269,71 @@
pci_find_device() Superseded by pci_get_device()
pci_find_subsys() Superseded by pci_get_subsys()
pci_find_slot() Superseded by pci_get_slot()
+
+
+9. Legacy I/O port free driver
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Large servers may not be able to provide I/O port resources to all PCI
+devices. I/O Port space is only 64KB on Intel Architecture[1] and is
+likely also fragmented since the I/O base register of PCI-to-PCI
+bridge will usually be aligned to a 4KB boundary[2]. On such systems,
+pci_enable_device() and pci_request_regions() will fail when
+attempting to enable I/O Port regions that don't have I/O Port
+resources assigned.
+
+Fortunately, many PCI devices which request I/O Port resources also
+provide access to the same registers via MMIO BARs. These devices can
+be handled without using I/O port space and the drivers typically
+offer a CONFIG_ option to only use MMIO regions
+(e.g. CONFIG_TULIP_MMIO). PCI devices typically provide I/O port
+interface for legacy OSs and will work when I/O port resources are not
+assigned. The "PCI Local Bus Specification Revision 3.0" discusses
+this on p.44, "IMPLEMENTATION NOTE".
+
+If your PCI device driver doesn't need I/O port resources assigned to
+I/O Port BARs, set the no_ioport flag in struct pci_dev before calling
+pci_enable_device() and pci_request_regions(). If the no_ioport flag
+is set, generic PCI support will ignore I/O port regions for the
+corresponding devices.
+
+[1] Some systems support 64KB I/O port space per PCI segment.
+[2] Some PCI-to-PCI bridges support optional 1KB aligned I/O base.
+
+
+10. MMIO Space and "Write Posting"
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Converting a driver from using I/O Port space to using MMIO space
+often requires some additional changes. Specifically, "write posting"
+needs to be handled. Most drivers (e.g. tg3, acenic, sym53c8xx_2)
+already do. I/O Port space guarantees write transactions reach the PCI
+device before the CPU can continue. Writes to MMIO space allow to CPU
+continue before the transaction reaches the PCI device. HW weenies
+call this "Write Posting" because the write completion is "posted" to
+the CPU before the transaction has reached it's destination.
+
+Thus, timing sensitive code should add readl() where the CPU is
+expected to wait before doing other work. The classic "bit banging"
+sequence works fine for I/O Port space:
+
+ for (i=8; --i; val >>= 1) {
+ outb(val & 1, ioport_reg); /* write bit */
+ udelay(10);
+ }
+
+The same sequence for MMIO space should be:
+
+ for (i=8; --i; val >>= 1) {
+ writeb(val & 1, mmio_reg); /* write bit */
+ readb(safe_mmio_reg); /* flush posted write */
+ udelay(10);
+ }
+
+It is important that "safe_mmio_reg" not have any side effects that
+interferes with the correct operation of the device.
+
+Another case to watch out for is when resetting a PCI device. Use PCI
+Configuration space reads to flush the writel(). This will gracefully
+handle the PCI master abort on all platforms if the PCI device is
+expected to not respond to a readl(). Most x86 platforms will allow
+MMIO reads to master abort (aka "Soft Fail") and return garbage
+(e.g. ~0). But many RISC platforms will crash (aka "Hard Fail").


2006-02-27 12:31:05

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: [PATCH 2/4] PCI legacy I/O port free driver (take 3) - Update Documentation/pci.txt

The patch seemed to be broken by my mailer. Resending it.
Sorry about that.

Thanks,
Kenji Kaneshige


This patch adds the description about legacy I/O port free driver into
Documentation/pci.txt.

Signed-off-by: Kenji Kaneshige <[email protected]>

Documentation/pci.txt | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 68 insertions(+)

Index: linux-2.6.16-rc4/Documentation/pci.txt
===================================================================
--- linux-2.6.16-rc4.orig/Documentation/pci.txt 2006-02-27 18:42:39.000000000 +0900
+++ linux-2.6.16-rc4/Documentation/pci.txt 2006-02-27 18:50:54.000000000 +0900
@@ -269,3 +269,71 @@
pci_find_device() Superseded by pci_get_device()
pci_find_subsys() Superseded by pci_get_subsys()
pci_find_slot() Superseded by pci_get_slot()
+
+
+9. Legacy I/O port free driver
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Large servers may not be able to provide I/O port resources to all PCI
+devices. I/O Port space is only 64KB on Intel Architecture[1] and is
+likely also fragmented since the I/O base register of PCI-to-PCI
+bridge will usually be aligned to a 4KB boundary[2]. On such systems,
+pci_enable_device() and pci_request_regions() will fail when
+attempting to enable I/O Port regions that don't have I/O Port
+resources assigned.
+
+Fortunately, many PCI devices which request I/O Port resources also
+provide access to the same registers via MMIO BARs. These devices can
+be handled without using I/O port space and the drivers typically
+offer a CONFIG_ option to only use MMIO regions
+(e.g. CONFIG_TULIP_MMIO). PCI devices typically provide I/O port
+interface for legacy OSs and will work when I/O port resources are not
+assigned. The "PCI Local Bus Specification Revision 3.0" discusses
+this on p.44, "IMPLEMENTATION NOTE".
+
+If your PCI device driver doesn't need I/O port resources assigned to
+I/O Port BARs, set the no_ioport flag in struct pci_dev before calling
+pci_enable_device() and pci_request_regions(). If the no_ioport flag
+is set, generic PCI support will ignore I/O port regions for the
+corresponding devices.
+
+[1] Some systems support 64KB I/O port space per PCI segment.
+[2] Some PCI-to-PCI bridges support optional 1KB aligned I/O base.
+
+
+10. MMIO Space and "Write Posting"
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Converting a driver from using I/O Port space to using MMIO space
+often requires some additional changes. Specifically, "write posting"
+needs to be handled. Most drivers (e.g. tg3, acenic, sym53c8xx_2)
+already do. I/O Port space guarantees write transactions reach the PCI
+device before the CPU can continue. Writes to MMIO space allow to CPU
+continue before the transaction reaches the PCI device. HW weenies
+call this "Write Posting" because the write completion is "posted" to
+the CPU before the transaction has reached it's destination.
+
+Thus, timing sensitive code should add readl() where the CPU is
+expected to wait before doing other work. The classic "bit banging"
+sequence works fine for I/O Port space:
+
+ for (i=8; --i; val >>= 1) {
+ outb(val & 1, ioport_reg); /* write bit */
+ udelay(10);
+ }
+
+The same sequence for MMIO space should be:
+
+ for (i=8; --i; val >>= 1) {
+ writeb(val & 1, mmio_reg); /* write bit */
+ readb(safe_mmio_reg); /* flush posted write */
+ udelay(10);
+ }
+
+It is important that "safe_mmio_reg" not have any side effects that
+interferes with the correct operation of the device.
+
+Another case to watch out for is when resetting a PCI device. Use PCI
+Configuration space reads to flush the writel(). This will gracefully
+handle the PCI master abort on all platforms if the PCI device is
+expected to not respond to a readl(). Most x86 platforms will allow
+MMIO reads to master abort (aka "Soft Fail") and return garbage
+(e.g. ~0). But many RISC platforms will crash (aka "Hard Fail").


2006-02-27 17:43:23

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 0/4] PCI legacy I/O port free driver (take 3)

Kenji Kaneshige wrote:
> Hi,
>
> Here is an updated set of patches for PCI legacy I/O port free drivers
> which incorporates feedbacks. Summary of changes from the previous
> version are:
>
> - Removed the device_flags field from struct pci_device_id, which
> was introduced in the previous version of patch
>
> - Changed e1000 driver to use the driver_data field in struct
> pci_device_id to see if the device needs I/O port regions.
>
> - Added proper messages instead of WARN_ON() at the error.
>
> - Updated the Documentation/pci.txt
>
> I'm attaching the following four patches:
>
> [patch 1/4] Add no_ioport flag into pci_dev
> [patch 2/4] Update Documentation/pci.txt
> [patch 3/4] Make Intel e1000 driver legacy I/O port free
> [patch 4/4] Make Emulex lpfc driver legacy I/O port free
>
> I'm attaching the brief description below about what the problem I'm
> trying to solve is.
>
> Thanks,
> Kenji Kaneshige
>
>
> Brief Description
> ~~~~~~~~~~~~~~~~~
> I encountered a problem that some PCI devices don't work on my system
> which have huge number of PCI devices.
>
> It is mandatory for all PCI device drivers to enable the device by
> calling pci_enable_device() which enables all regions probed from the
> device's BARs. If pci_enable_device() failes to enable any regions
> probed from BARs, it returns as error. On the large servers, I/O port
> resource could not be assigned to all PCI devices because it is
> limited (64KB on Intel Architecture[1]) and it would be fragmented
> (I/O base register of PCI-to-PCI bridge will usually be aligned to a
> 4KB boundary[2]). In this case, the devices which have no I/O port
> resource assigned don't work because pci_enable_device() for those
> devices failes. This is what happened on my machine.

This series still leaves a lot to be desired, and creates unnecessary
driver churn. The better solution is:

1) pci_enable_device() enables what it can

2) Drivers, as they already do, will fail if they cannot map the desired
memory or IO resources that are needed.

Thus, the PCI layer needs only to do #1, and existing driver code
handles the rest of the situation as one currently expects.

Jeff


2006-02-27 21:32:26

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH 0/4] PCI legacy I/O port free driver (take 3)

On Mon, Feb 27, 2006 at 12:43:09PM -0500, Jeff Garzik wrote:
> This series still leaves a lot to be desired, and creates unnecessary
> driver churn.

This is a pretty small change and is not necessary for every driver.
What do you still desire that this patch set doesn't provide?

> The better solution is:
>
> 1) pci_enable_device() enables what it can
>
> 2) Drivers, as they already do, will fail if they cannot map the desired
> memory or IO resources that are needed.
>
> Thus, the PCI layer needs only to do #1, and existing driver code
> handles the rest of the situation as one currently expects.

If in #1 pci_enable_device() assigns I/O Port resources even though
the driver doesn't need it, PCI devices which _only_ support I/O Port
space will get screwed (depending on config). We are trying to avoid that.
Or do you have another way of avoiding unused resource allocation?

thanks,
grant

2006-02-27 22:23:12

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 0/4] PCI legacy I/O port free driver (take 3)

Grant Grundler wrote:
> On Mon, Feb 27, 2006 at 12:43:09PM -0500, Jeff Garzik wrote:
>
>>This series still leaves a lot to be desired, and creates unnecessary
>>driver churn.
>
>
> This is a pretty small change and is not necessary for every driver.

The latter is decidedly false. The change makes no sense at all unless
you update every conceivable driver that will be used on the target
platform. You will always be patching drivers as users stick new cards
in the target hardware.


>> The better solution is:
>>
>>1) pci_enable_device() enables what it can
>>
>>2) Drivers, as they already do, will fail if they cannot map the desired
>>memory or IO resources that are needed.
>>
>>Thus, the PCI layer needs only to do #1, and existing driver code
>>handles the rest of the situation as one currently expects.
>
>
> If in #1 pci_enable_device() assigns I/O Port resources even though
> the driver doesn't need it, PCI devices which _only_ support I/O Port
> space will get screwed (depending on config). We are trying to avoid that.
> Or do you have another way of avoiding unused resource allocation?

Fix the [firmware | device load order] to allocate I/O ports first to
the hardware that only supports IO port accesses. Problem solved with
zero kernel mods...

Jeff


2006-02-27 22:42:53

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 0/4] PCI legacy I/O port free driver (take 3)

On Monday 27 February 2006 23:23, Jeff Garzik wrote:
> Or do you have another way of avoiding unused resource allocation?
>
> Fix the [firmware | device load order] to allocate I/O ports first to
> the hardware that only supports IO port accesses.

How should the firmware know what hardware needs io ports and what hardware
doesn't? I don't think it will scale well to put long lists of PCI-IDs into
their firmware.

The driver is really the natural place to put such knowledge.

-Andi

2006-02-27 22:52:28

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 0/4] PCI legacy I/O port free driver (take 3)

Andi Kleen wrote:
> On Monday 27 February 2006 23:23, Jeff Garzik wrote:
>
>>Or do you have another way of avoiding unused resource allocation?
>>
>>Fix the [firmware | device load order] to allocate I/O ports first to
>>the hardware that only supports IO port accesses.
>
>
> How should the firmware know what hardware needs io ports and what hardware
> doesn't? I don't think it will scale well to put long lists of PCI-IDs into
> their firmware.

Its trivial to detect PCI hardware that doesn't support MMIO, the
"IO-only" hardware Grant mentioned...

Jeff



2006-02-27 22:56:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 0/4] PCI legacy I/O port free driver (take 3)

On Monday 27 February 2006 23:52, Jeff Garzik wrote:

> Its trivial to detect PCI hardware that doesn't support MMIO, the
> "IO-only" hardware Grant mentioned...

But there might be hardware that needs both PIO and MMIO
e.g. graphic cards.

How would the poor firmware distingush between those and
MMIO only?

-Andi

2006-02-27 23:18:19

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH 0/4] PCI legacy I/O port free driver (take 3)

On Mon, Feb 27, 2006 at 05:23:02PM -0500, Jeff Garzik wrote:
> Grant Grundler wrote:
> >On Mon, Feb 27, 2006 at 12:43:09PM -0500, Jeff Garzik wrote:
> >
> >>This series still leaves a lot to be desired, and creates unnecessary
> >>driver churn.
> >
> >
> >This is a pretty small change and is not necessary for every driver.
>
> The latter is decidedly false. The change makes no sense at all unless
> you update every conceivable driver that will be used on the target
> platform. You will always be patching drivers as users stick new cards
> in the target hardware.

It's sufficiently true if 90% of the cards plugged into the machine
are covered by 3 or 4 drivers. The remaining 10% can use IO port space
and everything will work fine.


> >If in #1 pci_enable_device() assigns I/O Port resources even though
> >the driver doesn't need it, PCI devices which _only_ support I/O Port
> >space will get screwed (depending on config). We are trying to avoid that.
> >Or do you have another way of avoiding unused resource allocation?
>
> Fix the [firmware | device load order] to allocate I/O ports first to
> the hardware that only supports IO port accesses. Problem solved with
> zero kernel mods...

After we (OS developers) have been telling firmware people to NOT
configure every device and run it's BIOS driver at power up?

OS "hotplug" now deals with resource assignment and device initialization
in most cases on these boxes. Running the BIOS/EFI driver to configure
each device just takes too long.

grant

2006-02-28 03:02:28

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: [PATCH 0/4] PCI legacy I/O port free driver (take 3)

Jeff Garzik wrote:
> Kenji Kaneshige wrote:
>
>>Hi,
>>
>>Here is an updated set of patches for PCI legacy I/O port free drivers
>>which incorporates feedbacks. Summary of changes from the previous
>>version are:
>>
>> - Removed the device_flags field from struct pci_device_id, which
>> was introduced in the previous version of patch
>>
>> - Changed e1000 driver to use the driver_data field in struct
>> pci_device_id to see if the device needs I/O port regions.
>>
>> - Added proper messages instead of WARN_ON() at the error.
>>
>> - Updated the Documentation/pci.txt
>>
>>I'm attaching the following four patches:
>>
>> [patch 1/4] Add no_ioport flag into pci_dev
>> [patch 2/4] Update Documentation/pci.txt
>> [patch 3/4] Make Intel e1000 driver legacy I/O port free
>> [patch 4/4] Make Emulex lpfc driver legacy I/O port free
>>
>>I'm attaching the brief description below about what the problem I'm
>>trying to solve is.
>>
>>Thanks,
>>Kenji Kaneshige
>>
>>
>>Brief Description
>>~~~~~~~~~~~~~~~~~
>>I encountered a problem that some PCI devices don't work on my system
>>which have huge number of PCI devices.
>>
>>It is mandatory for all PCI device drivers to enable the device by
>>calling pci_enable_device() which enables all regions probed from the
>>device's BARs. If pci_enable_device() failes to enable any regions
>>probed from BARs, it returns as error. On the large servers, I/O port
>>resource could not be assigned to all PCI devices because it is
>>limited (64KB on Intel Architecture[1]) and it would be fragmented
>>(I/O base register of PCI-to-PCI bridge will usually be aligned to a
>>4KB boundary[2]). In this case, the devices which have no I/O port
>>resource assigned don't work because pci_enable_device() for those
>>devices failes. This is what happened on my machine.
>
>
> This series still leaves a lot to be desired, and creates unnecessary
> driver churn. The better solution is:
>
> 1) pci_enable_device() enables what it can
>

I guess your idea is changing pci_enable_device() not to return as error
even if it fails to enable some regions. Is it correct? If yes, we need
to change all architecture dependent code (e.g. pcibios_enable_device())
to do that, and it would need much bigger change.


> 2) Drivers, as they already do, will fail if they cannot map the desired
> memory or IO resources that are needed.
>
> Thus, the PCI layer needs only to do #1, and existing driver code
> handles the rest of the situation as one currently expects.
>

Many driver uses pci_request_regions() instead of calling pci_request_region()
for each region. That is, we need to consider the same problem at
pci_request_regions() time. Converting drivers to use pci_request_region()
instead of pci_request_regions() requires many changes and it would be
troublesome for driver writers.

Thanks,
Kenji Kaneshige