From: Arjan van de Ven <[email protected]>
Subject: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
On PCs, PCI extended configuration space (4Kb) is riddled with problems
associated with the memory mapped access method (MMCONFIG). At the same
time, there are very few machines that actually need or use this extended
configuration space.
At this point in time, the only sensible action is to make access to the
extended configuration space an opt-in operation for those device drivers
that need/want access to this space, as well as for those userland
diagnostics utilities that (on admin request) want to access this space.
It's inevitable that this is done per device rather than per bus; we'll
be needing per device PCI quirks to turn this extended config space off
over time no matter what; in addition, it gives the least amount of surprise:
loading a driver for a device only impacts that one device, not a whole bus
worth of devices (although it'll be common to have one physical device per
bus on PCI-E).
The (desireable) side-effect of this patch is that all enumeration is done
using normal configuration cycles.
The patch below splits the lower level PCI config space operation (which
operate on a bus) in two: one that normally only operates on traditional
space, and one that gets used after the driver has opted in to using the
extended configuration space. This has lead to a little code duplication,
but it's not all that bad (most of it is prototypes in headers and such).
Architectures that have a solid reliable way to get to extended configuration
space can just keep doing what they do now and allow extended space access
from the "traditional" bus ops, and just not fill in the new bus ops.
(This could include x86 for, say, BIOS year 2009 and later, but doesn't
right now)
This patch also adds a sysfs property for each device into which root can
write a '1' to enable extended configuration space. The kernel will print
a notice into dmesg when this happens (including the name of the app) so that
if the system crashes as a result of this action, the user can know what
action/tool caused it.
Signed-off-by: Arjan van de Ven <[email protected]>
---
arch/x86/pci/common.c | 23 ++++++++++++++++++++++
arch/x86/pci/init.c | 10 +++++++++
arch/x86/pci/mmconfig_32.c | 2 -
arch/x86/pci/mmconfig_64.c | 2 -
arch/x86/pci/pci.h | 2 +
drivers/pci/access.c | 46 ++++++++++++++++++++++++++++++++++++++++++++
drivers/pci/pci-sysfs.c | 31 +++++++++++++++++++++++++++++
drivers/pci/pci.c | 28 ++++++++++++++++++++++++++
include/linux/pci.h | 47 +++++++++++++++++++++++++++++++++++++++------
9 files changed, 183 insertions(+), 8 deletions(-)
Index: linux-2.6.24-rc5/arch/x86/pci/common.c
===================================================================
--- linux-2.6.24-rc5.orig/arch/x86/pci/common.c
+++ linux-2.6.24-rc5/arch/x86/pci/common.c
@@ -26,6 +26,7 @@ int pcibios_last_bus = -1;
unsigned long pirq_table_addr;
struct pci_bus *pci_root_bus;
struct pci_raw_ops *raw_pci_ops;
+struct pci_raw_ops *raw_pci_ops_extcfg;
static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
{
@@ -39,9 +40,31 @@ static int pci_write(struct pci_bus *bus
devfn, where, size, value);
}
+static int pci_read_ext(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
+{
+ if (raw_pci_ops_extcfg)
+ return raw_pci_ops_extcfg->read(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+ else
+ return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+}
+
+static int pci_write_ext(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
+{
+ if (raw_pci_ops_extcfg)
+ return raw_pci_ops_extcfg->write(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+ else
+ return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+}
+
struct pci_ops pci_root_ops = {
.read = pci_read,
.write = pci_write,
+ .readext = pci_read_ext,
+ .writeext = pci_write_ext,
};
/*
Index: linux-2.6.24-rc5/drivers/pci/pci.c
===================================================================
--- linux-2.6.24-rc5.orig/drivers/pci/pci.c
+++ linux-2.6.24-rc5/drivers/pci/pci.c
@@ -752,6 +752,34 @@ int pci_enable_device(struct pci_dev *de
return pci_enable_device_bars(dev, (1 << PCI_NUM_RESOURCES) - 1);
}
+/**
+ * pci_enable_ext_config - Enable extended (4K) config space accesses
+ * @dev: PCI device to be changed
+ *
+ * Enable extended (4Kb) configuration space accesses for a device.
+ * Extended config space is available for PCI-E devices and can
+ * be used for things like PCI AER and other features. However,
+ * due to various stability issues, this can only be done on demand.
+ *
+ * Returns: -1 on failure, 0 on success
+ */
+
+int pci_enable_ext_config(struct pci_dev *dev)
+{
+ if (dev->ext_cfg_space < 0)
+ return -1;
+ if (dev->ext_cfg_space > 0)
+ return 0;
+ dev->ext_cfg_space = 1;
+ /*
+ * now that we enabled large accesse, we
+ * need to update the config space size variable
+ */
+ dev->cfg_size = pci_cfg_space_size(dev);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pci_enable_ext_config);
+
/*
* Managed PCI resources. This manages device on/off, intx/msi/msix
* on/off and BAR regions. pci_dev itself records msi/msix status, so
Index: linux-2.6.24-rc5/include/linux/pci.h
===================================================================
--- linux-2.6.24-rc5.orig/include/linux/pci.h
+++ linux-2.6.24-rc5/include/linux/pci.h
@@ -174,6 +174,15 @@ struct pci_dev {
int cfg_size; /* Size of configuration space */
/*
+ * ext_cfg_space gets set by drivers/quirks to device if
+ * extended (4K) config space is desired.
+ * negative values -- hard disabled (quirk etc)
+ * zero -- disabled
+ * positive values -- enable
+ */
+ int ext_cfg_space;
+
+ /*
* Instead of touching interrupt line and base address registers
* directly, use the values stored here. They might be different!
*/
@@ -302,6 +311,8 @@ struct pci_bus {
struct pci_ops {
int (*read)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val);
int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
+ int (*readext)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val);
+ int (*writeext)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
};
struct pci_raw_ops {
@@ -521,29 +532,48 @@ int pci_bus_write_config_byte (struct pc
int pci_bus_write_config_word (struct pci_bus *bus, unsigned int devfn, int where, u16 val);
int pci_bus_write_config_dword (struct pci_bus *bus, unsigned int devfn, int where, u32 val);
+int pci_bus_read_extconfig_byte(struct pci_bus *bus, unsigned int devfn, int where, u8 *val);
+int pci_bus_read_extconfig_word(struct pci_bus *bus, unsigned int devfn, int where, u16 *val);
+int pci_bus_read_extconfig_dword(struct pci_bus *bus, unsigned int devfn, int where, u32 *val);
+int pci_bus_write_extconfig_byte(struct pci_bus *bus, unsigned int devfn, int where, u8 val);
+int pci_bus_write_extconfig_word(struct pci_bus *bus, unsigned int devfn, int where, u16 val);
+int pci_bus_write_extconfig_dword(struct pci_bus *bus, unsigned int devfn, int where, u32 val);
+
static inline int pci_read_config_byte(struct pci_dev *dev, int where, u8 *val)
{
- return pci_bus_read_config_byte (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_read_extconfig_byte(dev->bus, dev->devfn, where, val);
+ return pci_bus_read_config_byte(dev->bus, dev->devfn, where, val);
}
static inline int pci_read_config_word(struct pci_dev *dev, int where, u16 *val)
{
- return pci_bus_read_config_word (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_read_extconfig_word(dev->bus, dev->devfn, where, val);
+ return pci_bus_read_config_word(dev->bus, dev->devfn, where, val);
}
static inline int pci_read_config_dword(struct pci_dev *dev, int where, u32 *val)
{
- return pci_bus_read_config_dword (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_read_extconfig_dword(dev->bus, dev->devfn, where, val);
+ return pci_bus_read_config_dword(dev->bus, dev->devfn, where, val);
}
static inline int pci_write_config_byte(struct pci_dev *dev, int where, u8 val)
{
- return pci_bus_write_config_byte (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_write_extconfig_byte(dev->bus, dev->devfn, where, val);
+ return pci_bus_write_config_byte(dev->bus, dev->devfn, where, val);
}
static inline int pci_write_config_word(struct pci_dev *dev, int where, u16 val)
{
- return pci_bus_write_config_word (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_write_extconfig_word(dev->bus, dev->devfn, where, val);
+ return pci_bus_write_config_word(dev->bus, dev->devfn, where, val);
}
static inline int pci_write_config_dword(struct pci_dev *dev, int where, u32 val)
{
- return pci_bus_write_config_dword (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_write_extconfig_dword(dev->bus, dev->devfn, where, val);
+ return pci_bus_write_config_dword(dev->bus, dev->devfn, where, val);
}
int __must_check pci_enable_device(struct pci_dev *dev);
@@ -693,6 +723,9 @@ void ht_destroy_irq(unsigned int irq);
extern void pci_block_user_cfg_access(struct pci_dev *dev);
extern void pci_unblock_user_cfg_access(struct pci_dev *dev);
+extern int pci_enable_ext_config(struct pci_dev *dev);
+
+
/*
* PCI domain support. Sometimes called PCI segment (eg by ACPI),
* a PCI domain is defined to be a set of PCI busses which share
@@ -789,6 +822,8 @@ static inline struct pci_dev *pci_get_bu
unsigned int devfn)
{ return NULL; }
+static inline int pci_enable_ext_config(struct pci_dev *dev) { return -1; }
+
#endif /* CONFIG_PCI */
/* Include architecture-dependent settings and functions */
Index: linux-2.6.24-rc5/arch/x86/pci/mmconfig_32.c
===================================================================
--- linux-2.6.24-rc5.orig/arch/x86/pci/mmconfig_32.c
+++ linux-2.6.24-rc5/arch/x86/pci/mmconfig_32.c
@@ -143,6 +143,6 @@ int __init pci_mmcfg_arch_reachable(unsi
int __init pci_mmcfg_arch_init(void)
{
printk(KERN_INFO "PCI: Using MMCONFIG\n");
- raw_pci_ops = &pci_mmcfg;
+ raw_pci_ops_extcfg = &pci_mmcfg;
return 1;
}
Index: linux-2.6.24-rc5/arch/x86/pci/mmconfig_64.c
===================================================================
--- linux-2.6.24-rc5.orig/arch/x86/pci/mmconfig_64.c
+++ linux-2.6.24-rc5/arch/x86/pci/mmconfig_64.c
@@ -152,6 +152,6 @@ int __init pci_mmcfg_arch_init(void)
return 0;
}
}
- raw_pci_ops = &pci_mmcfg;
+ raw_pci_ops_extcfg = &pci_mmcfg;
return 1;
}
Index: linux-2.6.24-rc5/drivers/pci/access.c
===================================================================
--- linux-2.6.24-rc5.orig/drivers/pci/access.c
+++ linux-2.6.24-rc5/drivers/pci/access.c
@@ -51,6 +51,45 @@ int pci_bus_write_config_##size \
return res; \
}
+#define PCI_OP_READ_EXT(size, type, len) \
+int pci_bus_read_extconfig_##size \
+ (struct pci_bus *bus, unsigned int devfn, int pos, type *value) \
+{ \
+ int res; \
+ unsigned long flags; \
+ u32 data = 0; \
+ if (PCI_##size##_BAD) \
+ return PCIBIOS_BAD_REGISTER_NUMBER; \
+ spin_lock_irqsave(&pci_lock, flags); \
+ if (bus->ops->readext) \
+ res = bus->ops->readext(bus, devfn, pos, len, &data); \
+ else \
+ res = bus->ops->read(bus, devfn, pos, len, &data); \
+ *value = (type)data; \
+ spin_unlock_irqrestore(&pci_lock, flags); \
+ return res; \
+} \
+EXPORT_SYMBOL(pci_bus_read_extconfig_##size);
+
+#define PCI_OP_WRITE_EXT(size, type, len) \
+int pci_bus_write_extconfig_##size \
+ (struct pci_bus *bus, unsigned int devfn, int pos, type value) \
+{ \
+ int res; \
+ unsigned long flags; \
+ if (PCI_##size##_BAD) \
+ return PCIBIOS_BAD_REGISTER_NUMBER; \
+ spin_lock_irqsave(&pci_lock, flags); \
+ if (bus->ops->writeext) \
+ res = bus->ops->writeext(bus, devfn, pos, len, value); \
+ else \
+ res = bus->ops->write(bus, devfn, pos, len, value); \
+ spin_unlock_irqrestore(&pci_lock, flags); \
+ return res; \
+} \
+EXPORT_SYMBOL(pci_bus_write_extconfig_##size);
+
+
PCI_OP_READ(byte, u8, 1)
PCI_OP_READ(word, u16, 2)
PCI_OP_READ(dword, u32, 4)
@@ -58,6 +97,13 @@ PCI_OP_WRITE(byte, u8, 1)
PCI_OP_WRITE(word, u16, 2)
PCI_OP_WRITE(dword, u32, 4)
+PCI_OP_READ_EXT(byte, u8, 1)
+PCI_OP_READ_EXT(word, u16, 2)
+PCI_OP_READ_EXT(dword, u32, 4)
+PCI_OP_WRITE_EXT(byte, u8, 1)
+PCI_OP_WRITE_EXT(word, u16, 2)
+PCI_OP_WRITE_EXT(dword, u32, 4)
+
EXPORT_SYMBOL(pci_bus_read_config_byte);
EXPORT_SYMBOL(pci_bus_read_config_word);
EXPORT_SYMBOL(pci_bus_read_config_dword);
Index: linux-2.6.24-rc5/arch/x86/pci/pci.h
===================================================================
--- linux-2.6.24-rc5.orig/arch/x86/pci/pci.h
+++ linux-2.6.24-rc5/arch/x86/pci/pci.h
@@ -32,6 +32,8 @@
extern unsigned int pci_probe;
extern unsigned long pirq_table_addr;
+extern struct pci_raw_ops *raw_pci_ops_extcfg;
+
enum pci_bf_sort_state {
pci_bf_sort_default,
pci_force_nobf,
Index: linux-2.6.24-rc5/arch/x86/pci/init.c
===================================================================
--- linux-2.6.24-rc5.orig/arch/x86/pci/init.c
+++ linux-2.6.24-rc5/arch/x86/pci/init.c
@@ -14,6 +14,16 @@ static __init int pci_access_init(void)
#ifdef CONFIG_PCI_MMCONFIG
pci_mmcfg_init(type);
#endif
+ /* if we ONLY have MMCONFIG, we need to use it always */
+ if (!raw_pci_ops && raw_pci_ops_extcfg) {
+ printk(KERN_INFO "No direct PCI access, using MMCONFIG always\n");
+ raw_pci_ops = raw_pci_ops_extcfg;
+ }
+
+ /*
+ * we've found a usable method; this means we can skip
+ * the potentially dangerous BIOS based methods
+ */
if (raw_pci_ops)
return 0;
#ifdef CONFIG_PCI_BIOS
Index: linux-2.6.24-rc5/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.24-rc5.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6.24-rc5/drivers/pci/pci-sysfs.c
@@ -143,6 +143,35 @@ static ssize_t is_enabled_show(struct de
return sprintf (buf, "%u\n", atomic_read(&pdev->enable_cnt));
}
+static ssize_t extended_config_space_store(struct device *dev,
+ struct device_attribute *attr, const char *buf,
+ size_t count)
+{
+ ssize_t result = -EINVAL;
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ /* this can crash the machine when done on the "wrong" device */
+ if (!capable(CAP_SYS_ADMIN))
+ return count;
+
+ if (*buf == '1') {
+ printk(KERN_WARNING "Application %s enabled extended config space for device %s\n",
+ current->comm, pci_name(pdev));
+ result = pci_enable_ext_config(pdev);
+ }
+
+ return result < 0 ? result : count;
+}
+
+static ssize_t extended_config_space_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct pci_dev *pdev;
+
+ pdev = to_pci_dev(dev);
+ return sprintf(buf, "%u\n", pdev->ext_cfg_space);
+}
+
#ifdef CONFIG_NUMA
static ssize_t
numa_node_show(struct device *dev, struct device_attribute *attr, char *buf)
@@ -206,6 +235,8 @@ struct device_attribute pci_dev_attrs[]
__ATTR_RO(numa_node),
#endif
__ATTR(enable, 0600, is_enabled_show, is_enabled_store),
+ __ATTR(extended_config_space, 0600, extended_config_space_show,
+ extended_config_space_store),
__ATTR(broken_parity_status,(S_IRUGO|S_IWUSR),
broken_parity_status_show,broken_parity_status_store),
__ATTR(msi_bus, 0644, msi_bus_show, msi_bus_store),
Arjan van de Ven wrote:
> This patch also adds a sysfs property for each device into which root can
> write a '1' to enable extended configuration space. The kernel will print
> a notice into dmesg when this happens (including the name of the app) so that
> if the system crashes as a result of this action, the user can know what
> action/tool caused it.
Comments:
1) [minor] With a bit in struct pci_dev, there is no need for separate
raw_pci_ops. That will simplify your patch, with no functionality change.
"golden" arches (no pun intended) may implement raw_pci_ops that
_always_ work with extended config space, and simply ignore that bit, if
that is how their underlying non-mmconfig-nor-type1 hardware is implemented.
2) [non-minor] hmmmm.
[jgarzik@core ~]$ lspci -n | wc -l
23
So I would have to perform 23 sysfs twiddles, before I could obtain a
full and unabridged 'lspci -vvvxxx'?
For the userspace interface, the most-often-used knob for diagnostic
purposes will be the easiest one. And that's
echo 1 > enable-ext-cfg-space-for-all-buses-ACPI-says-to
lspci -vvvxxx
3) [minor] architectures must be able to override
pci_enable_ext_config(). see "golden arches".
On Thu, 27 Dec 2007 06:52:35 -0500
Jeff Garzik <[email protected]> wrote:
> Arjan van de Ven wrote:
> > This patch also adds a sysfs property for each device into which
> > root can write a '1' to enable extended configuration space. The
> > kernel will print a notice into dmesg when this happens (including
> > the name of the app) so that if the system crashes as a result of
> > this action, the user can know what action/tool caused it.
>
>
> Comments:
>
> 1) [minor] With a bit in struct pci_dev,
I have this
> there is no need for
> separate raw_pci_ops. That will simplify your patch, with no
> functionality change.
but sadly your second statement is not correct. Part of the complication is that all PCI config ops
operate on busses not devices; at first I thought "just add a bit and be done with it", but sadly it's
not quite the case. Due to the per-bus nature of the ops, you end up having 2 type of bus operations,
and that's just boilerplate (prototypes, exports and stuff) but it makes up most of the lines of the patch
In addition, a separate raw_pci_ops (for x86 only!) is needed anyway since it's quite likely that
we'll have various options of each case (extended or not) and we want to pick the best one for each case,
at which point you really do need the 2 variables.
>
> "golden" arches (no pun intended) may implement raw_pci_ops that
> _always_ work with extended config space, and simply ignore that bit,
> if that is how their underlying non-mmconfig-nor-type1 hardware is
> implemented.
that is what I implemented already in the patch that you commented on ;-)
>
>
> 2) [non-minor] hmmmm.
>
> [jgarzik@core ~]$ lspci -n | wc -l
> 23
>
> So I would have to perform 23 sysfs twiddles, before I could obtain a
> full and unabridged 'lspci -vvvxxx'?
not you as human, but "lspci" ought to yes.
>
> For the userspace interface, the most-often-used knob for diagnostic
> purposes will be the easiest one. And that's
the easiest one is an option to lspci. Nothing more nothing less.
Making a global knob in kernel space is a lot more tricky, and in addition
really there's enough cases where userspace wants the one device anyway
Doing the "for each device I'm about to dump" in lspci is pretty much as hard as doing
the global one (if not simpler)
>
> 3) [minor] architectures must be able to override
> pci_enable_ext_config(). see "golden arches".
see the patch. All pci_enable_ext_config() does is set a flag.
The architecture decides what to do with that flag. Golden
architectures can just totally ignore the flag and always expose
the full space.
(In fact, the patch assumes all-but-x86 to be golden here; which is fair)
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Thu, 27 Dec 2007, Jeff Garzik wrote:
>
> 2) [non-minor] hmmmm.
>
> [jgarzik@core ~]$ lspci -n | wc -l
> 23
>
> So I would have to perform 23 sysfs twiddles, before I could obtain a full and
> unabridged 'lspci -vvvxxx'?
Or you force it on with "pci=mmconfig" or something at boot-time.
But yes. The *fact* is that MMCONFIG has not just been globally broken,
but broken on a per-device basis. I don't know why (and quite frankly, I
doubt anybody does), but the PCI device ID corruption happened only for a
specific set of devices.
Whether it was a timing issue with particular devices or whether it was a
timing issue with some particular bridge (and could affect any devices
behind that bridge), who knows... It almost certainly was brought on by a
borderline (or broken) northbridge, but it apparently only affected
specific devices - which makes me suspect that it wasn't *entirely* due to
just the northbridge, and it was a combination of things.
I don't understand why you cannot seem to accept that per-device thing, in
the face of clear data that yes, it really *is* per-device. Not to mention
the fact that the way MMIO config setups work, you may well have entire
buses that simply aren't accessible with MMIO config at all (because the
MMIO config window is not large enough).
Furthermore, please accept the fact that of those 23 devices, exactly
*none* will actually care. So yes, you'd have to enable it manually for
those individual devices, but that's only if you want to do something
totally pointless in the first place.
So stop this totally inane "it has to be global" crap. It doesn't have to
be global at all, and we have hard data showing that it really SHOULD NOT
be a global flag.
Linus
Arjan van de Ven wrote:
>> 2) [non-minor] hmmmm.
>>
>> [jgarzik@core ~]$ lspci -n | wc -l
>> 23
>>
>> So I would have to perform 23 sysfs twiddles, before I could obtain a
>> full and unabridged 'lspci -vvvxxx'?
>
> not you as human, but "lspci" ought to yes.
>
>> For the userspace interface, the most-often-used knob for diagnostic
>> purposes will be the easiest one. And that's
>
> the easiest one is an option to lspci. Nothing more nothing less.
>
> Making a global knob in kernel space is a lot more tricky, and in addition
> really there's enough cases where userspace wants the one device anyway
> Doing the "for each device I'm about to dump" in lspci is pretty much as hard as doing
> the global one (if not simpler)
So then if you have a system where MMCONFIG doesn't work and you're not
using any devices that require extended config space, then doing lspci
-vvvxxx will blow up the machine? Yuck.
Still don't like this approach. It seems like (partially) covering up
problems rather than solving them.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
Linus Torvalds wrote:
>
> On Thu, 27 Dec 2007, Jeff Garzik wrote:
>> 2) [non-minor] hmmmm.
>>
>> [jgarzik@core ~]$ lspci -n | wc -l
>> 23
>>
>> So I would have to perform 23 sysfs twiddles, before I could obtain a full and
>> unabridged 'lspci -vvvxxx'?
>
> Or you force it on with "pci=mmconfig" or something at boot-time.
>
> But yes. The *fact* is that MMCONFIG has not just been globally broken,
> but broken on a per-device basis. I don't know why (and quite frankly, I
> doubt anybody does), but the PCI device ID corruption happened only for a
> specific set of devices.
>
> Whether it was a timing issue with particular devices or whether it was a
> timing issue with some particular bridge (and could affect any devices
> behind that bridge), who knows... It almost certainly was brought on by a
> borderline (or broken) northbridge, but it apparently only affected
> specific devices - which makes me suspect that it wasn't *entirely* due to
> just the northbridge, and it was a combination of things.
Pointer to such a report? The only single-device problems I'm aware of
are with some devices within the K8 integrated northbridge, which we
already handle. Other than that, the only non-global problems I'm aware
of are devices behind host bridges which can't receive/handle MMCONFIG
requests, in which case the problem would be bus-wide.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
Robert Hancock wrote:
> Linus Torvalds wrote:
>>
>> On Thu, 27 Dec 2007, Jeff Garzik wrote:
>>> 2) [non-minor] hmmmm.
>>>
>>> [jgarzik@core ~]$ lspci -n | wc -l
>>> 23
>>>
>>> So I would have to perform 23 sysfs twiddles, before I could obtain
>>> a full and
>>> unabridged 'lspci -vvvxxx'?
>>
>> Or you force it on with "pci=mmconfig" or something at boot-time.
>>
>> But yes. The *fact* is that MMCONFIG has not just been globally
>> broken, but broken on a per-device basis. I don't know why (and quite
>> frankly, I doubt anybody does), but the PCI device ID corruption
>> happened only for a specific set of devices.
>>
>> Whether it was a timing issue with particular devices or whether it
>> was a timing issue with some particular bridge (and could affect any
>> devices behind that bridge), who knows... It almost certainly was
>> brought on by a borderline (or broken) northbridge, but it apparently
>> only affected specific devices - which makes me suspect that it
>> wasn't *entirely* due to just the northbridge, and it was a
>> combination of things.
>
> Pointer to such a report? The only single-device problems I'm aware of
> are with some devices within the K8 integrated northbridge, which we
> already handle. Other than that, the only non-global problems I'm aware
> of are devices behind host bridges which can't receive/handle MMCONFIG
> requests, in which case the problem would be bus-wide.
That is my computer here. The moment I do not switch off mmconfig my
graphics and my network card show up with vendor ID 0001
lspci without mmconfig:
00:00.0 Host bridge: ATI Technologies Inc Unknown device 7930
00:02.0 PCI bridge: ATI Technologies Inc Unknown device 7933
00:06.0 PCI bridge: ATI Technologies Inc Unknown device 7936
00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA
00:13.0 USB Controller: ATI Technologies Inc SB600 USB (OHCI0)
00:13.1 USB Controller: ATI Technologies Inc SB600 USB (OHCI1)
00:13.2 USB Controller: ATI Technologies Inc SB600 USB (OHCI2)
00:13.3 USB Controller: ATI Technologies Inc SB600 USB (OHCI3)
00:13.4 USB Controller: ATI Technologies Inc SB600 USB (OHCI4)
00:13.5 USB Controller: ATI Technologies Inc SB600 USB Controller (EHCI)
00:14.0 SMBus: ATI Technologies Inc SB600 SMBus (rev 13)
00:14.1 IDE interface: ATI Technologies Inc SB600 IDE
00:14.2 Audio device: ATI Technologies Inc SB600 Azalia
00:14.3 ISA bridge: ATI Technologies Inc SB600 PCI to LPC Bridge
00:14.4 PCI bridge: ATI Technologies Inc SB600 PCI to PCI Bridge
01:00.0 VGA compatible controller: nVidia Corporation G80 [GeForce 8800
GTS] (rev a2)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E
Gigabit Ethernet Controller (rev 12)
03:02.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23
IEEE-1394a-2000 Controller (PHY/Link)
My network card with enabled mmconfig:
02:00.0 Ethernet controller: Unknown device 0001:4364 (rev 12)
Subsystem: Micro-Star International Co., Ltd. Unknown device 326c
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 4 bytes
Interrupt: pin A routed to IRQ 10
Region 0: Memory at fddfc000 (64-bit, non-prefetchable) [size=16K]
Region 2: I/O ports at ee00 [size=256]
[virtual] Expansion ROM at fde00000 [disabled] [size=128K]
Capabilities: [48] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [e0] Express (v1) Legacy Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
LnkCap: Port #3, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0
<256ns, L1 unlimited
ClockPM+ Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
Capabilities: [100] Advanced Error Reporting
00: 01 00 64 43 07 00 10 00 12 00 00 02 01 00 00 00
10: 04 c0 df fd 00 00 00 00 01 ee 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 62 14 6c 32
30: 00 00 00 00 48 00 00 00 00 00 00 00 0a 01 00 00
40: 00 00 f0 01 00 80 a0 01 01 50 03 fe 00 20 00 13
50: 03 5c 00 80 00 00 00 01 00 00 00 01 05 e0 80 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 82 a8 e8 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 10 00 11 00 c0 8f 00 00 00 20 19 00 11 a4 07 03
f0: 08 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00
hth
Kai
--
This signature is left as an exercise for the reader.
On Thu, 27 Dec 2007, Robert Hancock wrote:
> Linus Torvalds wrote:
> >
> > But yes. The *fact* is that MMCONFIG has not just been globally broken, but
> > broken on a per-device basis. I don't know why (and quite frankly, I doubt
> > anybody does), but the PCI device ID corruption happened only for a specific
> > set of devices.
> >
> > Whether it was a timing issue with particular devices or whether it was a
> > timing issue with some particular bridge (and could affect any devices
> > behind that bridge), who knows... It almost certainly was brought on by a
> > borderline (or broken) northbridge, but it apparently only affected specific
> > devices - which makes me suspect that it wasn't *entirely* due to just the
> > northbridge, and it was a combination of things.
>
> Pointer to such a report? The only single-device problems I'm aware of
> are with some devices within the K8 integrated northbridge, which we
> already handle. Other than that, the only non-global problems I'm aware
> of are devices behind host bridges which can't receive/handle MMCONFIG
> requests, in which case the problem would be bus-wide.
There was a thread called "PCI vendor id == 1 regression?" (Kai Ruhnau was
the main reporter for that one). But looking back, it seems that one
didn't hit the kernel mailing list, so I guess google cannot pick it up. I
can forward all the emails if you want, but quite frankly, you don't
really want to. It boils down to:
Stephen Hemminger:
"There have been two reports with different hardware of the PCI vendor
id of 0001 showing up. I got a report on sky2, and Francois saw similar
problem on r8169.
In one case, it happened only with 2.6.23 kernel, the correct id was
returned by older kernels.
This is a heads up, there may be a PCI problem. Or just
some users smoking strange fall leaves."
And then one of the reporters:
"Good kernel:
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
00: ab 11 64 43 07 00 10 00 12 00 00 02 01 00 00 00
Bad kernel:
02:00.0 Ethernet controller: Unknown device 0001:4364 (rev 12)
00: 01 00 64 43 07 00 10 00 12 00 00 02 01 00 00 00"
and after I suspected it was mmconfig-related and asked him to try "lspci
-H1" to force conf1 accesses from user space on a broken kernel:
"Bare lspci gives the wrong vendor ID, lspci -H1 the right one."
for *one* single device.
In other words, just a single word in mmconfig was corrupt, but it was
consistently corrupt.
The PCI device chain for that particular report was:
00:00.0 Host bridge: ATI Technologies Inc Unknown device 7930
..
00:06.0 PCI bridge: ATI Technologies Inc Unknown device 7936 (prog-if 00 [Normal decode])
..
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
and with the bug, the PCI vendor ID for that ethernet chip was 0x0001
(bogus vendor) rather than the correct Marvell ID (0x11ab).
But as mentioned, there were other reports too of the exact same bug (with
different PCI devices, but the same "vendor == 0001" bogosity).
Googling for
lspci "Unknown device 0001:" mmconfig
shows reports like these:
http://lkml.org/lkml/2007/10/29/500
http://madwifi.org/ticket/1587
http://www.nvnews.net/vbulletin/showthread.php?t=103271
http://naoya.g.hatena.ne.jp/naoya/20070529/1180436756
http://bbs.archlinux.org/viewtopic.php?id=34321
...
which all seem to be due to this same bug with different cards (but the
common theme seems to be an ATI northbridge).
The point is: mmconfig is easily broken. In surprising ways. The whole
concept is stupid, it was badly done, and it has never gotten any testing
what-so-ever until Linux started using it (and now Vista). And hardware
that isn't tested is broken by *definition*.
And there's no point in blaming AMD. They may have gotten it wrong, but
hey, so did Intel (with machines that lock up completely), so it's not
like this is some "bad AMD" case. It's a "bad mmconfig" case.
Linus
On Thu, 27 Dec 2007 11:59:23 -0600
Robert Hancock <[email protected]> wrote:
> Arjan van de Ven wrote:
> >> 2) [non-minor] hmmmm.
> >>
> >> [jgarzik@core ~]$ lspci -n | wc -l
> >> 23
> >>
> >> So I would have to perform 23 sysfs twiddles, before I could
> >> obtain a full and unabridged 'lspci -vvvxxx'?
> >
> > not you as human, but "lspci" ought to yes.
> >
> >> For the userspace interface, the most-often-used knob for
> >> diagnostic purposes will be the easiest one. And that's
> >
> > the easiest one is an option to lspci. Nothing more nothing less.
> >
> > Making a global knob in kernel space is a lot more tricky, and in
> > addition really there's enough cases where userspace wants the one
> > device anyway Doing the "for each device I'm about to dump" in
> > lspci is pretty much as hard as doing the global one (if not
> > simpler)
>
> So then if you have a system where MMCONFIG doesn't work and you're
> not using any devices that require extended config space, then doing
> lspci -vvvxxx will blow up the machine? Yuck.
ONLY in the case where you would have otherwise blown up at boot.
Lets face it; blowing up if the admin does "lspci -vvvxxx" with a clear
message in dmesg about which device was enabled last is BY FAR preferable
over just plain not booting.
In all cases where the kernel knows MMCONFIG is broken, no amount of pci_enable_ext_config()
(from drivers or sysfs) will enable MMCONFIG.
>
> Still don't like this approach. It seems like (partially) covering up
> problems rather than solving them.
It's containing them not so much covering. To the point that it becomes diagnosable
and that users have working systems by default.
>
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On 12/27/2007 1:58 PM, Linus Torvalds wrote:
>
> There was a thread called "PCI vendor id == 1 regression?" (Kai Ruhnau was
> the main reporter for that one). But looking back, it seems that one
> didn't hit the kernel mailing list, so I guess google cannot pick it up. I
> can forward all the emails if you want, but quite frankly, you don't
> really want to. It boils down to:
>
> Stephen Hemminger:
> "There have been two reports with different hardware of the PCI vendor
> id of 0001 showing up. I got a report on sky2, and Francois saw similar
> problem on r8169.
> In one case, it happened only with 2.6.23 kernel, the correct id was
> returned by older kernels.
>
> This is a heads up, there may be a PCI problem. Or just
> some users smoking strange fall leaves."
>
> And then one of the reporters:
>
> "Good kernel:
>
> 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
> 00: ab 11 64 43 07 00 10 00 12 00 00 02 01 00 00 00
>
> Bad kernel:
>
> 02:00.0 Ethernet controller: Unknown device 0001:4364 (rev 12)
> 00: 01 00 64 43 07 00 10 00 12 00 00 02 01 00 00 00"
>
The root pcie port implementation is obviously buggy. But did we confirm
whether that hardware bug might be partly related to
"configuration-retry-status" pcie-root handling as introduced/described in:
http://marc.info/?l=linux-kernel&m=110541914926842&w=2
Does the 0001 vendor-id still shows up if pci_enable_crs() has never
been called?
Does anybody knows what was the original rational to call
pci_enable_crs() by default?
Loic
On Thu, Dec 27, 2007 at 04:10:33PM -0500, Loic Prylli wrote:
> The root pcie port implementation is obviously buggy. But did we confirm
> whether that hardware bug might be partly related to
> "configuration-retry-status" pcie-root handling as introduced/described in:
>
> http://marc.info/?l=linux-kernel&m=110541914926842&w=2
>
> Does the 0001 vendor-id still shows up if pci_enable_crs() has never
> been called?
>
> Does anybody knows what was the original rational to call
> pci_enable_crs() by default?
If you don't call pci_enable_crs(), no device can return the retry
status, so it would be pointless having this code in the kernel at all.
If it turns out some devices are buggy, then obviously we shouldn't
enable it for them.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Thu, 27 Dec 2007, Loic Prylli wrote:
>
> The root pcie port implementation is obviously buggy. But did we confirm
> whether that hardware bug might be partly related to
> "configuration-retry-status" pcie-root handling as introduced/described in:
Ahh, yes, that sounds like an excellent explanation of what might be going
on.
Our code expects that the retry status is read as a 32-bit word that is
0xffff0001, but you're right, that "0001" in the low-order bits is an
interesting coincidence, and it may be that what the ATI pcie bridge
does the high bits wrong.
The CRS docs *do* say that it has to return 0001h for the Vendor ID, but
also that any additional bytes (ie the device ID) should be all ones. So
we're doing the right thing from a spec standpoint, but as I often say:
the spec doesn't count for anything, compared to reality.
> Does the 0001 vendor-id still shows up if pci_enable_crs() has never
> been called?
I don't believe we have ever tried, but it would be very interesting to
hear.
Kai, can you try that? Just remove the call to pci_enable_crs() in
pci_scan_bridge() in drivers/pci/probe.c, and see if mmconfig starts
working for you?
> Does anybody knows what was the original rational to call
> pci_enable_crs() by default?
.. another good question. I don't think anybody expected it to be broken,
but if this turns out to be the thing that triggers it, I think we should
disable CRS by default.
The code doesn't actually do what CRS is supposed to help with (ie go on
to probe another device and then come back to the slow one later), so
right now it's pretty much useless *anyway*.
Matthew?
Linus
On Thu, 27 Dec 2007, Linus Torvalds wrote:
>
> Kai, can you try that? Just remove the call to pci_enable_crs() in
> pci_scan_bridge() in drivers/pci/probe.c, and see if mmconfig starts
> working for you?
We could also make the error handling more permissive, and just check for
the low 16 bits, which is the part that the CRS spec mentions the actual
value for. The whole vendor ID of 0x0001 is mentioned int he CRS spec as
being explicitly chosen exactly because it's invalid.
That said, given that we don't actually reap any benefits from CRS support
right now *anyway*, I think the right thing to do is disable it by
default. But it would be interesting to know if this patch makes it work
on those ATI bridges..
Linus
---
drivers/pci/probe.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2f75d69..94cd3a4 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -908,7 +908,7 @@ pci_scan_device(struct pci_bus *bus, int devfn)
return NULL;
/* Configuration request Retry Status */
- while (l == 0xffff0001) {
+ while ((l & 0xffff) == 0x0001) {
msleep(delay);
delay *= 2;
if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, &l))
Linus Torvalds wrote:
> On Thu, 27 Dec 2007, Loic Prylli wrote:
>
>> Does the 0001 vendor-id still shows up if pci_enable_crs() has never
>> been called?
>>
>
> I don't believe we have ever tried, but it would be very interesting to
> hear.
>
> Kai, can you try that? Just remove the call to pci_enable_crs() in
> pci_scan_bridge() in drivers/pci/probe.c, and see if mmconfig starts
> working for you?
>
Removing the call to pci_enable_crs() indeed solved it. I got the right
vendor IDs.
Kai
--
This signature is left as an exercise for the reader.
Linus Torvalds wrote:
> On Thu, 27 Dec 2007, Linus Torvalds wrote:
>
>> Kai, can you try that? Just remove the call to pci_enable_crs() in
>> pci_scan_bridge() in drivers/pci/probe.c, and see if mmconfig starts
>> working for you?
>>
>
> We could also make the error handling more permissive, and just check for
> the low 16 bits, which is the part that the CRS spec mentions the actual
> value for. The whole vendor ID of 0x0001 is mentioned int he CRS spec as
> being explicitly chosen exactly because it's invalid.
>
> That said, given that we don't actually reap any benefits from CRS support
> right now *anyway*, I think the right thing to do is disable it by
> default. But it would be interesting to know if this patch makes it work
> on those ATI bridges..
>
> Linus
>
> ---
> drivers/pci/probe.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 2f75d69..94cd3a4 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -908,7 +908,7 @@ pci_scan_device(struct pci_bus *bus, int devfn)
> return NULL;
>
> /* Configuration request Retry Status */
> - while (l == 0xffff0001) {
> + while ((l & 0xffff) == 0x0001) {
> msleep(delay);
> delay *= 2;
> if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, &l)
That one did not work out so well.
I reenabled the call to pci_enable_crs() and changed the line as above.
That resulted in two timeouts (from dmesg):
[....]
ACPI: Interpreter enabled
ACPI: (supports S0 S3 S4 S5)
ACPI: Using IOACPI for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
Device 0000:01:00.0 not responding
Device 0000:02:00.0 not responding
[....]
Then, the kernel boots up normally except of graphics and network card
not showing up at all in lspci.
Kai
--
This signature is left as an exercise for the reader.
Linus Torvalds wrote:
> But as mentioned, there were other reports too of the exact same bug (with
> different PCI devices, but the same "vendor == 0001" bogosity).
>
> Googling for
>
> lspci "Unknown device 0001:" mmconfig
>
> shows reports like these:
>
> http://lkml.org/lkml/2007/10/29/500
> http://madwifi.org/ticket/1587
> http://www.nvnews.net/vbulletin/showthread.php?t=103271
> http://naoya.g.hatena.ne.jp/naoya/20070529/1180436756
> http://bbs.archlinux.org/viewtopic.php?id=34321
> ...
>
> which all seem to be due to this same bug with different cards (but the
> common theme seems to be an ATI northbridge).
This isn't an example of a per-device breakage, though. It only shows up
on some devices, but the cause is apparently the chipset. Those devices
work fine on other boards.
As mentioned later, it appears that CRS stuff might be related to this
problem, but if it couldn't be fixed, I think the only sane solution
would be to blacklist MMCONFIG support on that chipset.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
On Thu, 27 Dec 2007, Kai Ruhnau wrote:
> Linus Torvalds wrote:
> > On Thu, 27 Dec 2007, Linus Torvalds wrote:
> >
> >> Kai, can you try that? Just remove the call to pci_enable_crs() in
> >> pci_scan_bridge() in drivers/pci/probe.c, and see if mmconfig starts
> >> working for you?
> >>
> >
> > We could also make the error handling more permissive, and just check for
> > the low 16 bits, which is the part that the CRS spec mentions the actual
> > value for. The whole vendor ID of 0x0001 is mentioned int he CRS spec as
> > being explicitly chosen exactly because it's invalid.
> >
> > That said, given that we don't actually reap any benefits from CRS support
> > right now *anyway*, I think the right thing to do is disable it by
> > default. But it would be interesting to know if this patch makes it work
> > on those ATI bridges..
> >
> > Linus
> >
> > ---
> > drivers/pci/probe.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 2f75d69..94cd3a4 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -908,7 +908,7 @@ pci_scan_device(struct pci_bus *bus, int devfn)
> > return NULL;
> >
> > /* Configuration request Retry Status */
> > - while (l == 0xffff0001) {
> > + while ((l & 0xffff) == 0x0001) {
> > msleep(delay);
> > delay *= 2;
> > if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, &l)
>
> That one did not work out so well.
> I reenabled the call to pci_enable_crs() and changed the line as above.
> That resulted in two timeouts (from dmesg):
>
> [....]
> ACPI: Interpreter enabled
> ACPI: (supports S0 S3 S4 S5)
> ACPI: Using IOACPI for interrupt routing
> ACPI: PCI Root Bridge [PCI0] (0000:00)
> Device 0000:01:00.0 not responding
> Device 0000:02:00.0 not responding
> [....]
>
> Then, the kernel boots up normally except of graphics and network card
> not showing up at all in lspci.
Uh, right. We already know that your northbridge, mmconfig, CRS, and this
device combine to always return 0001 for the Vendor ID. If we loop on
getting that, we must time out.
I'd actually bet that the hardware bug is actually that any device that
gives a CRS response the first time will have its Vendor ID appear as 0001
on subsequent mmconfig accesses, which means that it's actually a bus
quirk that probably only affects mmconfig access to something in the
conf1-visible space. The only per-device aspect would be that it uses CRS
(possibly correctly), and that doesn't mean that mmconfig won't be safe in
general for the device, or even that it won't be necessary. Actually, we
already know that per-driver enabling mmconfig is broken: sky2 is one that
wants to opt in but there are also reports of the Vendor ID 0001 bug with
it.
-Daniel
*This .sig left intentionally blank*
On Thu, 2007-12-27 at 13:37 -0800, Linus Torvalds wrote:
>
> > Does anybody knows what was the original rational to call
> > pci_enable_crs() by default?
>
> .. another good question. I don't think anybody expected it to be
> broken,
> but if this turns out to be the thing that triggers it, I think we
> should
> disable CRS by default.
Some sane archs need CRS and do a lot of HW.. However, just testing for
vendor ID being 0x0001 instead of testing all bits might be a useable
workaround.
> The code doesn't actually do what CRS is supposed to help with (ie go
> on
> to probe another device and then come back to the slow one later), so
> right now it's pretty much useless *anyway*.
On Thu, 2007-12-27 at 13:37 -0800, Linus Torvalds wrote:
> The code doesn't actually do what CRS is supposed to help with (ie go
> on
> to probe another device and then come back to the slow one later), so
> right now it's pretty much useless *anyway*.
It's not totally useless... Instead of not seeing the device that hasn't
fully initialized yet at all, we end up waiting a bit and then seeing
it. Going to probe somebody else is a nice optimization we could do with
multithread PCI probe but doesn't remove the need for CRS.
I have embedded boards where proper CRS operations is critical since the
kernel brings the PCIe link up itself, and thus is likely to hit devices
still in the middle of CRS.
Note that I'm -very- surprised however that your BIOS hands control out
of the kernel with devices still issuing CRS... Unless those devices may
do it after boot but that's dodgy and will break many other things.
Ben.
On Thu, 2007-12-27 at 23:18 +0100, Kai Ruhnau wrote:
> That one did not work out so well.
> I reenabled the call to pci_enable_crs() and changed the line as above.
> That resulted in two timeouts (from dmesg):
>
> [....]
> ACPI: Interpreter enabled
> ACPI: (supports S0 S3 S4 S5)
> ACPI: Using IOACPI for interrupt routing
> ACPI: PCI Root Bridge [PCI0] (0000:00)
> Device 0000:01:00.0 not responding
> Device 0000:02:00.0 not responding
> [....]
>
> Then, the kernel boots up normally except of graphics and network card
> not showing up at all in lspci.
Could be that DRS is totally broken on those bridges. Might need some
per bridge quirks to disable CRS. But don't do that by default please,
other people need it.
Cheers,
Ben.
On Thu, 27 Dec 2007, Daniel Barkalow wrote:
>
> I'd actually bet that the hardware bug is actually that any device that
> gives a CRS response the first time will have its Vendor ID appear as 0001
> on subsequent mmconfig accesses, which means that it's actually a bus
> quirk that probably only affects mmconfig access to something in the
> conf1-visible space. The only per-device aspect would be that it uses CRS
> (possibly correctly), and that doesn't mean that mmconfig won't be safe in
> general for the device, or even that it won't be necessary. Actually, we
> already know that per-driver enabling mmconfig is broken: sky2 is one that
> wants to opt in but there are also reports of the Vendor ID 0001 bug with
> it.
Actually, having it be a per-device thing would have fixed this particular
problem, if only because the device probing would have been done without
MMCONFIG (thus avoiding the bug), and then after it has been probed, it
wouldn't have mattered if the driver enabled MMCONFIG for the device,
since it would now have the right ID in "struct pci_device".
Sure, subsequent "lspci" users would still be confused, but the kernel
itself would never have noticed anything strange.
Of course, just doing *all* initial probing without MMCONFIG would also
have fixed it, which is another thing I advocate (regardless of any
per-device setting).
Linus
On Fri, 28 Dec 2007, Benjamin Herrenschmidt wrote:
>
> I have embedded boards where proper CRS operations is critical since the
> kernel brings the PCIe link up itself, and thus is likely to hit devices
> still in the middle of CRS.
.. but that's perfectly fine. A PCI-E bridge will certainly retry it in
hardware (or it isn't a PCI-E bridge!).
The CRS bit in question is purely the *software*visible* bit - ie whether
the OS gets told about the delay or not. As long as the OS then just
retries the same device, enabling CRS SV is pointless.
So I'm going to disable that thing. If there is some _other_ PCI-E bridge
that is simply buggy, and cannot handle the hw retry itself or is just
otherwise dodgy, we can have a white-list for cases where it really needs
to be done, but the current code is just bogus.
Linus
On Thu, 27 Dec 2007, Linus Torvalds wrote:
> On Thu, 27 Dec 2007, Daniel Barkalow wrote:
> >
> > I'd actually bet that the hardware bug is actually that any device that
> > gives a CRS response the first time will have its Vendor ID appear as 0001
> > on subsequent mmconfig accesses, which means that it's actually a bus
> > quirk that probably only affects mmconfig access to something in the
> > conf1-visible space. The only per-device aspect would be that it uses CRS
> > (possibly correctly), and that doesn't mean that mmconfig won't be safe in
> > general for the device, or even that it won't be necessary. Actually, we
> > already know that per-driver enabling mmconfig is broken: sky2 is one that
> > wants to opt in but there are also reports of the Vendor ID 0001 bug with
> > it.
>
> Actually, having it be a per-device thing would have fixed this particular
> problem, if only because the device probing would have been done without
> MMCONFIG (thus avoiding the bug), and then after it has been probed, it
> wouldn't have mattered if the driver enabled MMCONFIG for the device,
> since it would now have the right ID in "struct pci_device".
>
> Sure, subsequent "lspci" users would still be confused, but the kernel
> itself would never have noticed anything strange.
A bug making lspci see something different from what the kernel sees
initially sounds to me like a sure way to drive maintainers insane. If
somebody had a northbridge that also screwed up the rest of the word, and
a device that a mmconfig-using driver recognized but had problems with,
the user would be reporting lspci info with 0001:ffff as the device that
doesn't work.
> Of course, just doing *all* initial probing without MMCONFIG would also
> have fixed it, which is another thing I advocate (regardless of any
> per-device setting).
So would always using conf1 for the non-extended space (unless the
platform only uses mmconfig), or at least for the first 64 bytes. I'd bet
all the subtle bugs are in the first few words, anyway. (With blatant bugs
in the rest, of course, where we want to blacklist busses and devices)
-Daniel
*This .sig left intentionally blank*
On Thu, 2007-12-27 at 21:37 -0800, Linus Torvalds wrote:
>
> On Fri, 28 Dec 2007, Benjamin Herrenschmidt wrote:
> >
> > I have embedded boards where proper CRS operations is critical since the
> > kernel brings the PCIe link up itself, and thus is likely to hit devices
> > still in the middle of CRS.
>
> .. but that's perfectly fine. A PCI-E bridge will certainly retry it in
> hardware (or it isn't a PCI-E bridge!).
Only a handful of times in many bridges I've seen.
> So I'm going to disable that thing. If there is some _other_ PCI-E bridge
> that is simply buggy, and cannot handle the hw retry itself or is just
> otherwise dodgy, we can have a white-list for cases where it really needs
> to be done, but the current code is just bogus.
If you disable it, then isn't there also a problem with PCIE->PCI-X
bridge which will stop issuing CRS when they should ? (not sure here, I
may be a bit confused).
Ben.
On Fri, Dec 28, 2007 at 01:07:09AM -0500, Daniel Barkalow wrote:
> So would always using conf1 for the non-extended space (unless the
> platform only uses mmconfig), or at least for the first 64 bytes. I'd bet
> all the subtle bugs are in the first few words, anyway. (With blatant bugs
> in the rest, of course, where we want to blacklist busses and devices)
Yes. Though limiting conf1 to the first 64 bytes is simply not worth
a pain - we would still have to deal with buses that are unreachable
via mmconf.
Always using legacy configuration mechanism for the legacy config space
and extended mechanism (mmconf) for the extended config space is a simple
and very logical approach. It's supposed to resolve *all* known mmconf
problems. And it still allows per-device quirks (tweaking dev->cfg_size).
And it does *remove* code, not add anything new/untested.
Signed-off-by: Ivan Kokshaysky <[email protected]>
Ivan.
arch/x86/pci/mmconfig-shared.c | 35 -----------------------------------
arch/x86/pci/mmconfig_32.c | 22 +++++++++-------------
arch/x86/pci/mmconfig_64.c | 22 ++++++++++------------
arch/x86/pci/pci.h | 7 -------
4 files changed, 19 insertions(+), 67 deletions(-)
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 4df637e..6b521d3 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -22,42 +22,9 @@
#define MMCONFIG_APER_MIN (2 * 1024*1024)
#define MMCONFIG_APER_MAX (256 * 1024*1024)
-DECLARE_BITMAP(pci_mmcfg_fallback_slots, 32*PCI_MMCFG_MAX_CHECK_BUS);
-
/* Indicate if the mmcfg resources have been placed into the resource table. */
static int __initdata pci_mmcfg_resources_inserted;
-/* K8 systems have some devices (typically in the builtin northbridge)
- that are only accessible using type1
- Normally this can be expressed in the MCFG by not listing them
- and assigning suitable _SEGs, but this isn't implemented in some BIOS.
- Instead try to discover all devices on bus 0 that are unreachable using MM
- and fallback for them. */
-static void __init unreachable_devices(void)
-{
- int i, bus;
- /* Use the max bus number from ACPI here? */
- for (bus = 0; bus < PCI_MMCFG_MAX_CHECK_BUS; bus++) {
- for (i = 0; i < 32; i++) {
- unsigned int devfn = PCI_DEVFN(i, 0);
- u32 val1, val2;
-
- pci_conf1_read(0, bus, devfn, 0, 4, &val1);
- if (val1 == 0xffffffff)
- continue;
-
- if (pci_mmcfg_arch_reachable(0, bus, devfn)) {
- raw_pci_ops->read(0, bus, devfn, 0, 4, &val2);
- if (val1 == val2)
- continue;
- }
- set_bit(i + 32 * bus, pci_mmcfg_fallback_slots);
- printk(KERN_NOTICE "PCI: No mmconfig possible on device"
- " %02x:%02x\n", bus, i);
- }
- }
-}
-
static const char __init *pci_mmcfg_e7520(void)
{
u32 win;
@@ -270,8 +237,6 @@ void __init pci_mmcfg_init(int type)
return;
if (pci_mmcfg_arch_init()) {
- if (type == 1)
- unreachable_devices();
if (known_bridge)
pci_mmcfg_insert_resources(IORESOURCE_BUSY);
pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 1bf5816..7b75e65 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -30,10 +30,6 @@ static u32 get_base_addr(unsigned int seg, int bus, unsigned devfn)
struct acpi_mcfg_allocation *cfg;
int cfg_num;
- if (seg == 0 && bus < PCI_MMCFG_MAX_CHECK_BUS &&
- test_bit(PCI_SLOT(devfn) + 32*bus, pci_mmcfg_fallback_slots))
- return 0;
-
for (cfg_num = 0; cfg_num < pci_mmcfg_config_num; cfg_num++) {
cfg = &pci_mmcfg_config[cfg_num];
if (cfg->pci_segment == seg &&
@@ -68,13 +64,16 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
u32 base;
if ((bus > 255) || (devfn > 255) || (reg > 4095)) {
- *value = -1;
+err: *value = -1;
return -EINVAL;
}
+ if (reg < 256)
+ return pci_conf1_read(seg,bus,devfn,reg,len,value);
+
base = get_base_addr(seg, bus, devfn);
if (!base)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
+ goto err;
spin_lock_irqsave(&pci_config_lock, flags);
@@ -105,9 +104,12 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if ((bus > 255) || (devfn > 255) || (reg > 4095))
return -EINVAL;
+ if (reg < 256)
+ return pci_conf1_write(seg,bus,devfn,reg,len,value);
+
base = get_base_addr(seg, bus, devfn);
if (!base)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
+ return -EINVAL;
spin_lock_irqsave(&pci_config_lock, flags);
@@ -134,12 +136,6 @@ static struct pci_raw_ops pci_mmcfg = {
.write = pci_mmcfg_write,
};
-int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus,
- unsigned int devfn)
-{
- return get_base_addr(seg, bus, devfn) != 0;
-}
-
int __init pci_mmcfg_arch_init(void)
{
printk(KERN_INFO "PCI: Using MMCONFIG\n");
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index 4095e4d..c4cf318 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -40,9 +40,7 @@ static char __iomem *get_virt(unsigned int seg, unsigned bus)
static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned int devfn)
{
char __iomem *addr;
- if (seg == 0 && bus < PCI_MMCFG_MAX_CHECK_BUS &&
- test_bit(32*bus + PCI_SLOT(devfn), pci_mmcfg_fallback_slots))
- return NULL;
+
addr = get_virt(seg, bus);
if (!addr)
return NULL;
@@ -56,13 +54,16 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
/* Why do we have this when nobody checks it. How about a BUG()!? -AK */
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) {
- *value = -1;
+err: *value = -1;
return -EINVAL;
}
+ if (reg < 256)
+ return pci_conf1_read(seg,bus,devfn,reg,len,value);
+
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
+ goto err;
switch (len) {
case 1:
@@ -88,9 +89,12 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095)))
return -EINVAL;
+ if (reg < 256)
+ return pci_conf1_write(seg,bus,devfn,reg,len,value);
+
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
+ return -EINVAL;
switch (len) {
case 1:
@@ -126,12 +130,6 @@ static void __iomem * __init mcfg_ioremap(struct acpi_mcfg_allocation *cfg)
return addr;
}
-int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus,
- unsigned int devfn)
-{
- return pci_dev_base(seg, bus, devfn) != NULL;
-}
-
int __init pci_mmcfg_arch_init(void)
{
int i;
diff --git a/arch/x86/pci/pci.h b/arch/x86/pci/pci.h
index ac56d39..36cb44c 100644
--- a/arch/x86/pci/pci.h
+++ b/arch/x86/pci/pci.h
@@ -98,13 +98,6 @@ extern void pcibios_sort(void);
/* pci-mmconfig.c */
-/* Verify the first 16 busses. We assume that systems with more busses
- get MCFG right. */
-#define PCI_MMCFG_MAX_CHECK_BUS 16
-extern DECLARE_BITMAP(pci_mmcfg_fallback_slots, 32*PCI_MMCFG_MAX_CHECK_BUS);
-
-extern int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus,
- unsigned int devfn);
extern int __init pci_mmcfg_arch_init(void);
/*
On Fri, 28 Dec 2007 13:34:51 +0300
Ivan Kokshaysky <[email protected]> wrote:
> On Fri, Dec 28, 2007 at 01:07:09AM -0500, Daniel Barkalow wrote:
> > So would always using conf1 for the non-extended space (unless the
> > platform only uses mmconfig), or at least for the first 64 bytes.
> > I'd bet all the subtle bugs are in the first few words, anyway.
> > (With blatant bugs in the rest, of course, where we want to
> > blacklist busses and devices)
>
> Yes. Though limiting conf1 to the first 64 bytes is simply not worth
> a pain - we would still have to deal with buses that are unreachable
> via mmconf.
>
> Always using legacy configuration mechanism for the legacy config
> space and extended mechanism (mmconf) for the extended config space
> is a simple and very logical approach. It's supposed to resolve *all*
> known mmconf problems. And it still allows per-device quirks
> (tweaking dev->cfg_size). And it does *remove* code, not add anything
> new/untested.
>
it removes code by removing quirks / known not working stuff..
I really don't like it.. sorry.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Fri, Dec 28, 2007 at 08:14:18AM -0800, Arjan van de Ven wrote:
> it removes code by removing quirks / known not working stuff..
This not working stuff gets detected at probe time - see
drivers/pci/probe.c:pci_cfg_space_size().
Ivan.
On 12/28/2007 11:38 AM, Ivan Kokshaysky wrote:
> On Fri, Dec 28, 2007 at 08:14:18AM -0800, Arjan van de Ven wrote:
>
>> it removes code by removing quirks / known not working stuff..
>>
>
>
The only quirk I see removed is a bitmap with an arbitrary size (that we
don't really know is sufficient for every system), and that is only
built using comparison between mmconf and type1 accesses. IMHO, there
is zero knowledge in that removed code (no knowledge about specific
chipsets that work or don't work, or misleading BIOSes).
> This not working stuff gets detected at probe time - see
> drivers/pci/probe.c:pci_cfg_space_size().
>
This indeed avoids most mmconf invalid attempts (for extended-conf-space
probing that goes through pci_find_ext_capability() or from user-space).
One could think of adding a cfg_size check in
pci_read_config_{read/write), but IMHO that would be useless, since
direct read/write into a known extended-conf-space register in the
extended-config-space can only happen for a pci-express device, and
there is ample evidence that such accesses always work (more exactly
MCFG can always be trusted for pcie devices).
One thing that could be changed in pci_cfg_space_size() is to avoid
making a special case for PCI-X 266MHz/533Mhz (assume cfg_size == 256
for such devices too, reserve extended cfg-space for pci-express
devices). There is good reasons to think no such PCI-X 266Mhz/533 device
will ever have an extended-space (no capability IDs was ever defined in
the PCI-X 2.0 spec, no new revision is planned). Such a check would
avoid the possibility of trying extended-conf-space access for PCI-X 2.0
devices behind a amd-8132 or similar (such accesses would just returnd
-1, but there was some objections raised about doing anything like that
other than at initialization time, even if there is ample reasons to
argue it would be harmless).
Loic
On 12/28/2007 1:06 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2007-12-27 at 21:37 -0800, Linus Torvalds wrote:
>
>> On Fri, 28 Dec 2007, Benjamin Herrenschmidt wrote:
>>
>>> I have embedded boards where proper CRS operations is critical since the
>>> kernel brings the PCIe link up itself, and thus is likely to hit devices
>>> still in the middle of CRS.
>>>
>> .. but that's perfectly fine. A PCI-E bridge will certainly retry it in
>> hardware (or it isn't a PCI-E bridge!).
>>
>
> Only a handful of times in many bridges I've seen.
>
Yes retry implementation are not universal. But bridges with
CRS-visibility might be even less common (both are
optional/implementation-specific features).
To make some choices for our own devices, we experimented at Myricom
with CRS on a few chipsets to see what they implement:
broadcom/serverworks 2000/1000: no-retry/ no-visibility
nvidia CK804/IO804: no-retry/no-visibility
nvidia MPC55/IO55: no-retry/no-visibility
intel-7520: does-retry/no-visibility
intel-975X: does-retry/no-visibility
intel-5000series: does-retry/no-visibility
intel X38: does-retry/no-visibility
AMD/ATI RD790: no-retry/no-visibility
[ in all no-retry cases, when a CRS is somehow returned to the
pcie-root, the requester will get a 0xfffffffff, depending on chipsets,
other error bits (malformed-tlp, completion-timeout) might be triggered ]
>From the data reported on this mailing-list, it seems like the ATI tries
to implement CRS-visibility, but it is obviously buggy.
Not knowing whether there is any chipset with the visibility feature,
but without the retry capability, and given that CRS is irrelevant for
most Linux platforms (it only matters just after power-on, long before
Linux is started in the common case), it seems fair that
CRS-visibility-enabling should only be added in the specific code of the
specific embedded platforms (with no BIOS/firmware of any kind) that
might need it.
>
>
>> So I'm going to disable that thing. If there is some _other_ PCI-E bridge
>> that is simply buggy, and cannot handle the hw retry itself or is just
>> otherwise dodgy, we can have a white-list for cases where it really needs
>> to be done, but the current code is just bogus.
>>
>
> If you disable it, then isn't there also a problem with PCIE->PCI-X
> bridge which will stop issuing CRS when they should ? (not sure here, I
> may be a bit confused).
>
This is mostly independant, allowing a PCIE->PCI-X bridge to generate
CRS is a different bit (bit 15 of pcie->devctl on the bridge). FWIW,
Linux does not seem to touch it, and it defaults to zero, so it does not
seem like most current PCIE->PCI-X bridge will never generates a CRS
(some BIOSes might do it, but not the couple of platforms I looked at).
Again the choice of setting here seems something better left to the
specific BIOS/embedded-code for a given platform.
Loic
On Fri, 2007-12-28 at 14:14 -0500, Loic Prylli wrote:
>
> Not knowing whether there is any chipset with the visibility feature,
> but without the retry capability, and given that CRS is irrelevant for
> most Linux platforms (it only matters just after power-on, long before
> Linux is started in the common case),
"common case" I suppose in your mouth means desktop machines ? :-)
In the embedded world, I would expect CRS to be something that linux has
to deal with regulary. Maybe enable_crs should be moved to quirks on
those platforms who want it...
> This is mostly independant, allowing a PCIE->PCI-X bridge to generate
> CRS is a different bit (bit 15 of pcie->devctl on the bridge). FWIW,
> Linux does not seem to touch it, and it defaults to zero, so it does
> not
> seem like most current PCIE->PCI-X bridge will never generates a CRS
> (some BIOSes might do it, but not the couple of platforms I looked
> at).
> Again the choice of setting here seems something better left to the
> specific BIOS/embedded-code for a given platform.
Ok, I wasn't sure about that one.
Ben.
On Fri, Dec 28, 2007 at 12:40:53PM -0500, Loic Prylli wrote:
> The only quirk I see removed is a bitmap with an arbitrary size (that we
> don't really know is sufficient for every system), and that is only
> built using comparison between mmconf and type1 accesses. IMHO, there
> is zero knowledge in that removed code (no knowledge about specific
> chipsets that work or don't work, or misleading BIOSes).
Precisely.
As a side note: that code also has zero knowledge about what the generic
PCI probe code can do later, like it fails to detect this ATI/CRS breakage.
> One thing that could be changed in pci_cfg_space_size() is to avoid
> making a special case for PCI-X 266MHz/533Mhz (assume cfg_size == 256
> for such devices too, reserve extended cfg-space for pci-express
> devices). There is good reasons to think no such PCI-X 266Mhz/533 device
> will ever have an extended-space (no capability IDs was ever defined in
> the PCI-X 2.0 spec, no new revision is planned). Such a check would
> avoid the possibility of trying extended-conf-space access for PCI-X 2.0
> devices behind a amd-8132 or similar (such accesses would just returnd
> -1, but there was some objections raised about doing anything like that
> other than at initialization time, even if there is ample reasons to
> argue it would be harmless).
I agree, we should remove it. IIRC, this PCI-X check was written
long ago with some draft (not a final spec) in hands. Matthew?
Ivan.
On Tue, Dec 25, 2007 at 03:26:05AM -0800, Arjan van de Ven wrote:
>
> This patch also adds a sysfs property for each device into which root can
> write a '1' to enable extended configuration space. The kernel will print
> a notice into dmesg when this happens (including the name of the app) so that
> if the system crashes as a result of this action, the user can know what
> action/tool caused it.
Can you send me a follow-on patch that documents this in
Documentation/ABI please.
thanks,
greg k-h
On Fri, 11 Jan 2008 11:02:29 -0800
Greg KH <[email protected]> wrote:
> On Tue, Dec 25, 2007 at 03:26:05AM -0800, Arjan van de Ven wrote:
> >
> > This patch also adds a sysfs property for each device into which
> > root can write a '1' to enable extended configuration space. The
> > kernel will print a notice into dmesg when this happens (including
> > the name of the app) so that if the system crashes as a result of
> > this action, the user can know what action/tool caused it.
>
> Can you send me a follow-on patch that documents this in
> Documentation/ABI please.
>
once it's stable enough, say after 1 kernel release, sure
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Fri, Jan 11, 2008 at 11:09:31AM -0800, Arjan van de Ven wrote:
> On Fri, 11 Jan 2008 11:02:29 -0800
> Greg KH <[email protected]> wrote:
>
> > On Tue, Dec 25, 2007 at 03:26:05AM -0800, Arjan van de Ven wrote:
> > >
> > > This patch also adds a sysfs property for each device into which
> > > root can write a '1' to enable extended configuration space. The
> > > kernel will print a notice into dmesg when this happens (including
> > > the name of the app) so that if the system crashes as a result of
> > > this action, the user can know what action/tool caused it.
> >
> > Can you send me a follow-on patch that documents this in
> > Documentation/ABI please.
> >
>
> once it's stable enough, say after 1 kernel release, sure
That's what the Documentation/ABI/testing section is for. If you add
something new, it needs to be documented now, otherwise it will be
forgotten.
thanks,
greg k-h
On Fri, Jan 11, 2008 at 11:02:29AM -0800, Greg KH wrote:
> Can you send me a follow-on patch that documents this in
> Documentation/ABI please.
Greg, if you integrate Ivan's patch, you don't need Arjan's patch.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Fri, 11 Jan 2008 12:28:20 -0700
Matthew Wilcox <[email protected]> wrote:
> On Fri, Jan 11, 2008 at 11:02:29AM -0800, Greg KH wrote:
> > Can you send me a follow-on patch that documents this in
> > Documentation/ABI please.
>
> Greg, if you integrate Ivan's patch, you don't need Arjan's patch.
>
Personally I absolutely don't agree with that.
Ivan's patch is another attempt to make MMCONFIG work somewhat better,
but does not provide the explicit opt-in that I think is required at
this point; people have tried to get MMCONFIG stable for a really long time,
and failed still upto today. At least my patience is up and this needs
to be opt-in.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Fri, Jan 11, 2008 at 11:40:02AM -0800, Arjan van de Ven wrote:
> On Fri, 11 Jan 2008 12:28:20 -0700
> Matthew Wilcox <[email protected]> wrote:
>
> > On Fri, Jan 11, 2008 at 11:02:29AM -0800, Greg KH wrote:
> > > Can you send me a follow-on patch that documents this in
> > > Documentation/ABI please.
> >
> > Greg, if you integrate Ivan's patch, you don't need Arjan's patch.
> >
>
> Personally I absolutely don't agree with that.
> Ivan's patch is another attempt to make MMCONFIG work somewhat better,
> but does not provide the explicit opt-in that I think is required at
> this point; people have tried to get MMCONFIG stable for a really long time,
> and failed still upto today. At least my patience is up and this needs
> to be opt-in.
I think I agree with Arjan here, Ivan's patch should also work on top of
this one, and will help out some machines.
But as he hasn't asked for it to be included in the kernel tree, that's
a moot point right now :)
thanks,
greg k-h
On Fri, Jan 11, 2008 at 11:45:24AM -0800, Greg KH wrote:
> On Fri, Jan 11, 2008 at 11:40:02AM -0800, Arjan van de Ven wrote:
> > Personally I absolutely don't agree with that.
> > Ivan's patch is another attempt to make MMCONFIG work somewhat better,
> > but does not provide the explicit opt-in that I think is required at
> > this point; people have tried to get MMCONFIG stable for a really long time,
> > and failed still upto today. At least my patience is up and this needs
> > to be opt-in.
So your argument is that MMCONFIG sucks, therefore Linux has to have a
horrible interface to extended PCI config space?
> I think I agree with Arjan here, Ivan's patch should also work on top of
> this one, and will help out some machines.
>
> But as he hasn't asked for it to be included in the kernel tree, that's
> a moot point right now :)
He didn't? I certainly ask for it to be included.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Fri, 11 Jan 2008 11:02:29 -0800
Greg KH <[email protected]> wrote:
> On Tue, Dec 25, 2007 at 03:26:05AM -0800, Arjan van de Ven wrote:
> >
> > This patch also adds a sysfs property for each device into which
> > root can write a '1' to enable extended configuration space. The
> > kernel will print a notice into dmesg when this happens (including
> > the name of the app) so that if the system crashes as a result of
> > this action, the user can know what action/tool caused it.
>
> Can you send me a follow-on patch that documents this in
> Documentation/ABI please.
>
---
Documentation/ABI/testing/sysfs-pci-extended-config | 39 ++++++++++++++++++++
1 file changed, 39 insertions(+)
Index: linux-2.6.24-rc7/Documentation/ABI/testing/sysfs-pci-extended-config
===================================================================
--- /dev/null
+++ linux-2.6.24-rc7/Documentation/ABI/testing/sysfs-pci-extended-config
@@ -0,0 +1,39 @@
+What: /sys/devices/pci<bus>/<device>/extended_config_space
+Date: January 11, 2008
+Contact: Arjan van de Ven <[email protected]>
+Description:
+ This attribute is for use for system-diagnostic software
+ only.
+
+ The kernel may decide to restrict PCI configuration space
+ access for userspace to the first 64 or 256 bytes by
+ default, for stability reasons. This attribute, when
+ present, can be used to request access to the full
+ 4Kb from the kernel.
+
+ Request to get access to the full 4Kb can be done by
+ writing a '1' into this attribute file. All other values
+ are reserved for future use and should not be used by
+ software at this point.
+
+ The kernel may log the request to the various kernel
+ logging services. The kernel may decide to ignore the
+ request if the kernel deems extended configuration space
+ access not reliable enough for the system or the device.
+ The kernel may decide to not present this attribute
+ if the kernel decides extended config space is reliable
+ and made available by default, or if the kernel decides
+ that extended configuration space will never be
+ accessible.
+
+ Software needs to gracefully deal with getting the
+ access not granted. Software also needs to gracefully deal
+ with this attribute not being present.
+
+ Due to the fragility of extended configuratio space,
+ system diagnostic software should only set this attribute
+ on explicit user request, or in the case of GUI like tools,
+ at least with explicit user permission.
+
+
+
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings, visit http://www.lesswatts.org
On Fri, 11 Jan 2008, Matthew Wilcox wrote:
>
> So your argument is that MMCONFIG sucks, therefore Linux has to have a
> horrible interface to extended PCI config space?
What's *your* point?
MMCONFIG is known broken. If we ever start enabling it more (ie start
using it even if it's not reserved in the e820 tables), all that known
breakage will come and bite us in the *ss.
We need to have some armor-plated underwear to protect against that
ass-biting, and that's what Arjan's patch is.
Tell me what *other* armor plating you could have that actually works?
Linus
On Fri, Jan 11, 2008 at 11:58:23AM -0800, Linus Torvalds wrote:
> On Fri, 11 Jan 2008, Matthew Wilcox wrote:
> >
> > So your argument is that MMCONFIG sucks, therefore Linux has to have a
> > horrible interface to extended PCI config space?
>
> What's *your* point?
>
> MMCONFIG is known broken. If we ever start enabling it more (ie start
> using it even if it's not reserved in the e820 tables), all that known
> breakage will come and bite us in the *ss.
Ivan's patch doesn't start enabling MMCONFIG in more places than we
currently do. It makes us use conf1 accesses for all accesses below
256 bytes. That fixes all known problems to date.
> We need to have some armor-plated underwear to protect against that
> ass-biting, and that's what Arjan's patch is.
>
> Tell me what *other* armor plating you could have that actually works?
The armour plating that already exists -- pci=nommconf.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Fri, 11 Jan 2008, Matthew Wilcox wrote:
>
> Ivan's patch doesn't start enabling MMCONFIG in more places than we
> currently do. It makes us use conf1 accesses for all accesses below
> 256 bytes. That fixes all known problems to date.
.. and I agree with that patch. But there will be people who try to access
extended space by mistake, and they'll have a hard-locked machine or
something.
> > Tell me what *other* armor plating you could have that actually works?
>
> The armour plating that already exists -- pci=nommconf.
No. It needs to be automatic, OR THE OTHER WAY AROUND.
Ie we disable the unsafe feature on purpose, and then force people who
access it to do so *consciously*.
Extended config space is different, for chissake! It's not even like it's
just a bigger normal config space where normal config accesses just
overflow into it. It really does have different rules etc.
Linus
On Fri, Jan 11, 2008 at 12:27:06PM -0800, Linus Torvalds wrote:
> On Fri, 11 Jan 2008, Matthew Wilcox wrote:
> >
> > Ivan's patch doesn't start enabling MMCONFIG in more places than we
> > currently do. It makes us use conf1 accesses for all accesses below
> > 256 bytes. That fixes all known problems to date.
>
> .. and I agree with that patch. But there will be people who try to access
> extended space by mistake, and they'll have a hard-locked machine or
> something.
But they can't. We limit the size they can access to 256 bytes, unless
the kernel probed address 256 and it worked.
> > The armour plating that already exists -- pci=nommconf.
>
> No. It needs to be automatic, OR THE OTHER WAY AROUND.
>
> Ie we disable the unsafe feature on purpose, and then force people who
> access it to do so *consciously*.
I'd be fine with making mmconfig off by default. Make people pass
pci=mmconf to activate it.
> Extended config space is different, for chissake! It's not even like it's
> just a bigger normal config space where normal config accesses just
> overflow into it. It really does have different rules etc.
Yes, but it's also important to enable some of the PCIe features.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Fri, Jan 11, 2008 at 11:54:56AM -0800, Arjan van de Ven wrote:
> On Fri, 11 Jan 2008 11:02:29 -0800
> Greg KH <[email protected]> wrote:
>
> > On Tue, Dec 25, 2007 at 03:26:05AM -0800, Arjan van de Ven wrote:
> > >
> > > This patch also adds a sysfs property for each device into which
> > > root can write a '1' to enable extended configuration space. The
> > > kernel will print a notice into dmesg when this happens (including
> > > the name of the app) so that if the system crashes as a result of
> > > this action, the user can know what action/tool caused it.
> >
> > Can you send me a follow-on patch that documents this in
> > Documentation/ABI please.
> >
>
> ---
> Documentation/ABI/testing/sysfs-pci-extended-config | 39 ++++++++++++++++++++
> 1 file changed, 39 insertions(+)
Thanks, I've merged this with the original one.
greg k-h
On Fri, 11 Jan 2008, Matthew Wilcox wrote:
>
> But they can't. We limit the size they can access to 256 bytes, unless
> the kernel probed address 256 and it worked.
Umm. Probing address 256 (or *any* address) using MMCONFIG will simply
lock up the machine. HARD.
What's so hard to understand about MMCONFIG being broken on certain
hardware?
Linus
On Fri, Jan 11, 2008 at 01:12:12PM -0800, Linus Torvalds wrote:
>
>
> On Fri, 11 Jan 2008, Matthew Wilcox wrote:
> >
> > But they can't. We limit the size they can access to 256 bytes, unless
> > the kernel probed address 256 and it worked.
>
> Umm. Probing address 256 (or *any* address) using MMCONFIG will simply
> lock up the machine. HARD.
Did I miss a bug report? The only problems I'm currently aware of are
the ones where using MMCONFIG during BAR probing causes a hard lockup on
some Intel machines, and the ones where we get bad config data on some
AMD machines due to the configuration retry status being mishandled.
All the other lockups I'm aware of are already handled by the existing
checks.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Fri, 11 Jan 2008, Matthew Wilcox wrote:
>
> Did I miss a bug report? The only problems I'm currently aware of are
> the ones where using MMCONFIG during BAR probing causes a hard lockup on
> some Intel machines, and the ones where we get bad config data on some
> AMD machines due to the configuration retry status being mishandled.
Hmm. Were all those reports root-caused to just that BAR probing? If so,
we may be in better shape than I worried.
Linus
On Fri, Jan 11, 2008 at 01:28:30PM -0800, Linus Torvalds wrote:
>
>
> On Fri, 11 Jan 2008, Matthew Wilcox wrote:
> >
> > Did I miss a bug report? The only problems I'm currently aware of are
> > the ones where using MMCONFIG during BAR probing causes a hard lockup on
> > some Intel machines, and the ones where we get bad config data on some
> > AMD machines due to the configuration retry status being mishandled.
>
> Hmm. Were all those reports root-caused to just that BAR probing? If so,
> we may be in better shape than I worried.
I believe so.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Fri, Jan 11, 2008 at 02:38:03PM -0700, Matthew Wilcox wrote:
> On Fri, Jan 11, 2008 at 01:28:30PM -0800, Linus Torvalds wrote:
> > Hmm. Were all those reports root-caused to just that BAR probing? If so,
> > we may be in better shape than I worried.
>
> I believe so.
Ditto.
One typical problem is that on "Intel(r) 3 Series Experss Chipset Family"
MMCONFIG probing of the BAR #2 (frame buffer address) of integrated graphics
device locks up the machine (depending on BIOS settings, of course).
This happens because the frame buffer of IGD has higher decode priority
than MMCONFIG range, as stated in Intel docs...
Ivan.
On Sat, Jan 12, 2008 at 02:58:56AM +0300, Ivan Kokshaysky wrote:
> On Fri, Jan 11, 2008 at 02:38:03PM -0700, Matthew Wilcox wrote:
> > On Fri, Jan 11, 2008 at 01:28:30PM -0800, Linus Torvalds wrote:
> > > Hmm. Were all those reports root-caused to just that BAR probing? If so,
> > > we may be in better shape than I worried.
> >
> > I believe so.
>
> Ditto.
>
> One typical problem is that on "Intel(r) 3 Series Experss Chipset Family"
> MMCONFIG probing of the BAR #2 (frame buffer address) of integrated graphics
> device locks up the machine (depending on BIOS settings, of course).
> This happens because the frame buffer of IGD has higher decode priority
> than MMCONFIG range, as stated in Intel docs...
Ok, so what would the proposed patch look like to help resolve this?
Ivan, you posted one a while ago, but never seemed to get any
confirmation if it helped or not. Should I use that and drop Arjan's?
Or use both? Or something else like the patches proposed by Tony
Camuso?
thanks,
greg k-h
On Friday, January 11, 2008 3:58 Ivan Kokshaysky wrote:
> On Fri, Jan 11, 2008 at 02:38:03PM -0700, Matthew Wilcox wrote:
> > On Fri, Jan 11, 2008 at 01:28:30PM -0800, Linus Torvalds wrote:
> > > Hmm. Were all those reports root-caused to just that BAR probing?
> > > If so, we may be in better shape than I worried.
> >
> > I believe so.
>
> Ditto.
>
> One typical problem is that on "Intel(r) 3 Series Experss Chipset
> Family" MMCONFIG probing of the BAR #2 (frame buffer address) of
> integrated graphics device locks up the machine (depending on BIOS
> settings, of course). This happens because the frame buffer of IGD
> has higher decode priority than MMCONFIG range, as stated in Intel
> docs...
Yeah, I'm only aware of 3:
- the BAR overlapping w/MMCONFIG problem described above
- ATI chipset config space retry bug
- VIA (?) chipset host bridges don't respond well to having decode
disabled (they stop decoding RAM addresses as well)
That's it afaik, so I've never really known where Linus' paranoia comes
from. OTOH I haven't been too keen to challenge it either; MMCONFIG
space is only just beginning to be tested widely with the deployment of
Vista, so we'll doubtless see more problems on older chipsets if we
enable it by default.
Jesse
On Fri, Jan 11, 2008 at 04:26:38PM -0800, Greg KH wrote:
> > One typical problem is that on "Intel(r) 3 Series Experss Chipset Family"
> > MMCONFIG probing of the BAR #2 (frame buffer address) of integrated graphics
> > device locks up the machine (depending on BIOS settings, of course).
> > This happens because the frame buffer of IGD has higher decode priority
> > than MMCONFIG range, as stated in Intel docs...
>
> Ok, so what would the proposed patch look like to help resolve this?
Yeah, for sure.
> Ivan, you posted one a while ago, but never seemed to get any
> confirmation if it helped or not. Should I use that and drop Arjan's?
Actually I'm strongly against Arjan's patch. First, it's based on
assumption that the MMCONFIG thing is sort of fundamentally broken
on some systems, but none of the facts we have so far does confirm that.
And second, I really don't like the implementation as it breaks all
non-x86 arches (or forces them to add a set of totally meaningless
PCI functions).
> Or use both? Or something else like the patches proposed by Tony
> Camuso?
Tony's patch is a variation of the same idea, so this patch
supersedes it. The only argument for using conf1 to access only the
first 64 bytes of the config space was some concerns about performance.
But the only driver that extensively uses config space at runtime
is tg3, and only as a work around some broken revisions of the chip.
And even in that case I seriously doubt that mmconf vs. conf1 would
make any measurable difference.
On the other hand, always using conf1 for the whole 256-byte legacy
config space allows us to drop all sorts of black lists, which is
a *huge* advantage.
Here is the same patch, but with an updated commit message -
proper attribution to Loic Prylli, which I somehow missed
the first time, sorry.
Ivan.
---
PCI x86: always use conf1 to access config space below 256 bytes
Thanks to Loic Prylli <[email protected]>, who originally proposed
this idea.
Always using legacy configuration mechanism for the legacy config space
and extended mechanism (mmconf) for the extended config space is
a simple and very logical approach. It's supposed to resolve all
known mmconf problems. It still allows per-device quirks (tweaking
dev->cfg_size). It also allows to get rid of mmconf fallback code.
Signed-off-by: Ivan Kokshaysky <[email protected]>
---
arch/x86/pci/mmconfig-shared.c | 35 -----------------------------------
arch/x86/pci/mmconfig_32.c | 22 +++++++++-------------
arch/x86/pci/mmconfig_64.c | 22 ++++++++++------------
arch/x86/pci/pci.h | 7 -------
4 files changed, 19 insertions(+), 67 deletions(-)
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 4df637e..6b521d3 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -22,42 +22,9 @@
#define MMCONFIG_APER_MIN (2 * 1024*1024)
#define MMCONFIG_APER_MAX (256 * 1024*1024)
-DECLARE_BITMAP(pci_mmcfg_fallback_slots, 32*PCI_MMCFG_MAX_CHECK_BUS);
-
/* Indicate if the mmcfg resources have been placed into the resource table. */
static int __initdata pci_mmcfg_resources_inserted;
-/* K8 systems have some devices (typically in the builtin northbridge)
- that are only accessible using type1
- Normally this can be expressed in the MCFG by not listing them
- and assigning suitable _SEGs, but this isn't implemented in some BIOS.
- Instead try to discover all devices on bus 0 that are unreachable using MM
- and fallback for them. */
-static void __init unreachable_devices(void)
-{
- int i, bus;
- /* Use the max bus number from ACPI here? */
- for (bus = 0; bus < PCI_MMCFG_MAX_CHECK_BUS; bus++) {
- for (i = 0; i < 32; i++) {
- unsigned int devfn = PCI_DEVFN(i, 0);
- u32 val1, val2;
-
- pci_conf1_read(0, bus, devfn, 0, 4, &val1);
- if (val1 == 0xffffffff)
- continue;
-
- if (pci_mmcfg_arch_reachable(0, bus, devfn)) {
- raw_pci_ops->read(0, bus, devfn, 0, 4, &val2);
- if (val1 == val2)
- continue;
- }
- set_bit(i + 32 * bus, pci_mmcfg_fallback_slots);
- printk(KERN_NOTICE "PCI: No mmconfig possible on device"
- " %02x:%02x\n", bus, i);
- }
- }
-}
-
static const char __init *pci_mmcfg_e7520(void)
{
u32 win;
@@ -270,8 +237,6 @@ void __init pci_mmcfg_init(int type)
return;
if (pci_mmcfg_arch_init()) {
- if (type == 1)
- unreachable_devices();
if (known_bridge)
pci_mmcfg_insert_resources(IORESOURCE_BUSY);
pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 1bf5816..7b75e65 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -30,10 +30,6 @@ static u32 get_base_addr(unsigned int seg, int bus, unsigned devfn)
struct acpi_mcfg_allocation *cfg;
int cfg_num;
- if (seg == 0 && bus < PCI_MMCFG_MAX_CHECK_BUS &&
- test_bit(PCI_SLOT(devfn) + 32*bus, pci_mmcfg_fallback_slots))
- return 0;
-
for (cfg_num = 0; cfg_num < pci_mmcfg_config_num; cfg_num++) {
cfg = &pci_mmcfg_config[cfg_num];
if (cfg->pci_segment == seg &&
@@ -68,13 +64,16 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
u32 base;
if ((bus > 255) || (devfn > 255) || (reg > 4095)) {
- *value = -1;
+err: *value = -1;
return -EINVAL;
}
+ if (reg < 256)
+ return pci_conf1_read(seg,bus,devfn,reg,len,value);
+
base = get_base_addr(seg, bus, devfn);
if (!base)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
+ goto err;
spin_lock_irqsave(&pci_config_lock, flags);
@@ -105,9 +104,12 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if ((bus > 255) || (devfn > 255) || (reg > 4095))
return -EINVAL;
+ if (reg < 256)
+ return pci_conf1_write(seg,bus,devfn,reg,len,value);
+
base = get_base_addr(seg, bus, devfn);
if (!base)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
+ return -EINVAL;
spin_lock_irqsave(&pci_config_lock, flags);
@@ -134,12 +136,6 @@ static struct pci_raw_ops pci_mmcfg = {
.write = pci_mmcfg_write,
};
-int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus,
- unsigned int devfn)
-{
- return get_base_addr(seg, bus, devfn) != 0;
-}
-
int __init pci_mmcfg_arch_init(void)
{
printk(KERN_INFO "PCI: Using MMCONFIG\n");
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index 4095e4d..c4cf318 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -40,9 +40,7 @@ static char __iomem *get_virt(unsigned int seg, unsigned bus)
static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned int devfn)
{
char __iomem *addr;
- if (seg == 0 && bus < PCI_MMCFG_MAX_CHECK_BUS &&
- test_bit(32*bus + PCI_SLOT(devfn), pci_mmcfg_fallback_slots))
- return NULL;
+
addr = get_virt(seg, bus);
if (!addr)
return NULL;
@@ -56,13 +54,16 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
/* Why do we have this when nobody checks it. How about a BUG()!? -AK */
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) {
- *value = -1;
+err: *value = -1;
return -EINVAL;
}
+ if (reg < 256)
+ return pci_conf1_read(seg,bus,devfn,reg,len,value);
+
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
+ goto err;
switch (len) {
case 1:
@@ -88,9 +89,12 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095)))
return -EINVAL;
+ if (reg < 256)
+ return pci_conf1_write(seg,bus,devfn,reg,len,value);
+
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
+ return -EINVAL;
switch (len) {
case 1:
@@ -126,12 +130,6 @@ static void __iomem * __init mcfg_ioremap(struct acpi_mcfg_allocation *cfg)
return addr;
}
-int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus,
- unsigned int devfn)
-{
- return pci_dev_base(seg, bus, devfn) != NULL;
-}
-
int __init pci_mmcfg_arch_init(void)
{
int i;
diff --git a/arch/x86/pci/pci.h b/arch/x86/pci/pci.h
index ac56d39..36cb44c 100644
--- a/arch/x86/pci/pci.h
+++ b/arch/x86/pci/pci.h
@@ -98,13 +98,6 @@ extern void pcibios_sort(void);
/* pci-mmconfig.c */
-/* Verify the first 16 busses. We assume that systems with more busses
- get MCFG right. */
-#define PCI_MMCFG_MAX_CHECK_BUS 16
-extern DECLARE_BITMAP(pci_mmcfg_fallback_slots, 32*PCI_MMCFG_MAX_CHECK_BUS);
-
-extern int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus,
- unsigned int devfn);
extern int __init pci_mmcfg_arch_init(void);
/*
On Sat, 12 Jan 2008 17:40:30 +0300
Ivan Kokshaysky <[email protected]> wrote:
e.
>
> > Ivan, you posted one a while ago, but never seemed to get any
> > confirmation if it helped or not. Should I use that and drop
> > Arjan's?
>
> Actually I'm strongly against Arjan's patch. First, it's based on
> assumption that the MMCONFIG thing is sort of fundamentally broken
> on some systems, but none of the facts we have so far does confirm
> that. And second, I really don't like the implementation as it breaks
> all non-x86 arches (or forces them to add a set of totally meaningless
> PCI functions).
no it doesn't!
Other arches need no changes.
On Sat, Jan 12, 2008 at 07:46:32AM -0800, Arjan van de Ven wrote:
> Ivan Kokshaysky <[email protected]> wrote:
> > Actually I'm strongly against Arjan's patch. First, it's based on
> > assumption that the MMCONFIG thing is sort of fundamentally broken
> > on some systems, but none of the facts we have so far does confirm
> > that. And second, I really don't like the implementation as it breaks
> > all non-x86 arches (or forces them to add a set of totally meaningless
> > PCI functions).
>
> no it doesn't!
> Other arches need no changes.
Umm, true. I misread your patch.
But it doesn't change anything - that wasn't my main objection
anyway.
Ivan.
On Sat, 12 Jan 2008 17:40:30 +0300
Ivan Kokshaysky <[email protected]> wrote:
> --- a/arch/x86/pci/mmconfig_32.c
> +++ b/arch/x86/pci/mmconfig_32.c
> @@ -30,10 +30,6 @@ static u32 get_base_addr(unsigned int seg, int
> bus, unsigned devfn) struct acpi_mcfg_allocation *cfg;
> int cfg_num;
>
> - if (seg == 0 && bus < PCI_MMCFG_MAX_CHECK_BUS &&
> - test_bit(PCI_SLOT(devfn) + 32*bus,
> pci_mmcfg_fallback_slots))
> - return 0;
> -
> for (cfg_num = 0; cfg_num < pci_mmcfg_config_num; cfg_num++)
> { cfg = &pci_mmcfg_config[cfg_num];
> if (cfg->pci_segment == seg &&
> @@ -68,13 +64,16 @@ static int pci_mmcfg_read(unsigned int seg,
> unsigned int bus, u32 base;
>
> if ((bus > 255) || (devfn > 255) || (reg > 4095)) {
> - *value = -1;
> +err: *value = -1;
> return -EINVAL;
> }
>
> + if (reg < 256)
> + return pci_conf1_read(seg,bus,devfn,reg,len,value);
> +
btw this is my main objection to your patch; it intertwines the conf1 and mmconfig code even more.
When (and I'm saying "when" not "if") systems arrive that only have MMCONFIG for some of the devices,
we'll have to detangle this again, and I'm really not looking forward to that.
On Sat, Jan 12, 2008 at 09:45:57AM -0800, Arjan van de Ven wrote:
> btw this is my main objection to your patch; it intertwines the conf1 and mmconfig code even more.
> When (and I'm saying "when" not "if") systems arrive that only have MMCONFIG for some of the devices,
> we'll have to detangle this again, and I'm really not looking forward to that.
I think this will be OK. We'll end up with three pci_ops, one for
mmconfig-only, one for mixed mmconfig-conf1 and one for conf1. We could
do with that now actually -- the machines which will definitely go beserk
if you try to use mmconfig could have the conf1 ops on those busses.
Let's take Ivan's patch for now, and do that patch for 2.6.26.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Sat, Jan 12, 2008 at 09:45:57AM -0800, Arjan van de Ven wrote:
> btw this is my main objection to your patch; it intertwines the conf1
> and mmconfig code even more.
There is nothing wrong with it; please realize that mmconf and conf1 are
just different cpu-side interfaces. Both produce precisely the *same* bus
cycles as far as the lower 256-byte space is concerned.
> When (and I'm saying "when" not "if") systems arrive that only have
> MMCONFIG for some of the devices, we'll have to detangle this again,
> and I'm really not looking forward to that.
MMCONFIG for *some* of the devices? This doesn't sound realistic
from technical point of view.
MMCONFIG-only systems? Sure. I really hope to see these. But it won't
be PC-AT architecture anymore. It has to be something like alpha,
for instance, fully utilizing the 64-bit address space, and we'll have
to have the whole low-level PCI infrastructure completely different
for these future platforms anyway.
Right now, each and every x86 chipset *does* require working
conf1 just in order to set up the mmconf aperture. It's the very
fundamental thing, sort of design philosophy.
Ivan.
On Sun, 13 Jan 2008 00:49:11 +0300
Ivan Kokshaysky <[email protected]> wrote:
> On Sat, Jan 12, 2008 at 09:45:57AM -0800, Arjan van de Ven wrote:
> > btw this is my main objection to your patch; it intertwines the
> > conf1 and mmconfig code even more.
>
> There is nothing wrong with it; please realize that mmconf and conf1
> are just different cpu-side interfaces. Both produce precisely the
> *same* bus cycles as far as the lower 256-byte space is concerned.
>
> > When (and I'm saying "when" not "if") systems arrive that only have
> > MMCONFIG for some of the devices, we'll have to detangle this again,
> > and I'm really not looking forward to that.
>
> MMCONFIG for *some* of the devices? This doesn't sound realistic
> from technical point of view.
you're wrong.
> MMCONFIG-only systems? Sure. I really hope to see these. But it won't
> be PC-AT architecture anymore. It has to be something like alpha,
> for instance, fully utilizing the 64-bit address space, and we'll have
> to have the whole low-level PCI infrastructure completely different
> for these future platforms anyway.
> Right now, each and every x86 chipset *does* require working
> conf1 just in order to set up the mmconf aperture. It's the very
> fundamental thing, sort of design philosophy.
s/x86/pc/
and not even that.
Really this is a huge design mistake in your patch, the hard coding of conf1,
and for that reason I really don't think it should go in.
We have 4 or so methods on PC today to access config space, probably going to 6 in the next year
or two. One of those methods *HARD PICKING* another one as "second best" for cases where it
doesn't want to deal with is WRONG. It really needs to be up to the architecture/platform
to decide which ops vector is the fallback. And yes on your current PC that might well be conf1.
But hardcoding that is not the right thing. We have the vectors, we have the ranking code,
just make a "second rank" thing.
Oh wait, my patch did that ;)
Then let either the mmconfig code or the wrapper above it (doesn't matter, in fact, I can see
value of making this decision in the wrapper and keep mmconfig code simple and clean,
because maybe mmconfig IS the thing that the architecture says needs to deal with the lower 256 bytes)..
Oh wait my patch also did that pretty much ;)
The rest of my patch was defaulting to off. Is it that bit that you really hate?
Arjan,
I have not seen your MMCONFIG patch.
Would you mind sending me a copy?
Thanks.
Tony
On Sat, 12 Jan 2008 19:12:23 -0500
Tony Camuso <[email protected]> wrote:
> Arjan,
>
> I have not seen your MMCONFIG patch.
>
> Would you mind sending me a copy?
>
sure
----
On PCs, PCI extended configuration space (4Kb) is riddled with problems
associated with the memory mapped access method (MMCONFIG). At the same
time, there are very few machines that actually need or use this
extended configuration space.
At this point in time, the only sensible action is to make access to the
extended configuration space an opt-in operation for those device
drivers that need/want access to this space, as well as for those
userland diagnostics utilities that (on admin request) want to access
this space.
It's inevitable that this is done per device rather than per bus; we'll
be needing per device PCI quirks to turn this extended config space off
over time no matter what; in addition, it gives the least amount of
surprise: loading a driver for a device only impacts that one device,
not a whole bus worth of devices (although it'll be common to have one
physical device per bus on PCI-E).
The (desireable) side-effect of this patch is that all enumeration is
done using normal configuration cycles.
The patch below splits the lower level PCI config space operation (which
operate on a bus) in two: one that normally only operates on traditional
space, and one that gets used after the driver has opted in to using the
extended configuration space. This has lead to a little code
duplication, but it's not all that bad (most of it is prototypes in
headers and such).
Architectures that have a solid reliable way to get to extended
configuration space can just keep doing what they do now and allow
extended space access from the "traditional" bus ops, and just not fill
in the new bus ops. (This could include x86 for, say, BIOS year 2009
and later, but doesn't right now)
This patch also adds a sysfs property for each device into which root
can write a '1' to enable extended configuration space. The kernel will
print a notice into dmesg when this happens (including the name of the
app) so that if the system crashes as a result of this action, the user
can know what action/tool caused it.
Signed-off-by: Arjan van de Ven <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/ABI/testing/sysfs-pci-extended-config | 39 ++++++++++++++++
arch/x86/pci/common.c | 23 +++++++++
arch/x86/pci/init.c | 10 ++++
arch/x86/pci/mmconfig_32.c | 2
arch/x86/pci/mmconfig_64.c | 2
arch/x86/pci/pci.h | 2
drivers/pci/access.c | 46 +++++++++++++++++++
drivers/pci/pci-sysfs.c | 31 +++++++++++++
drivers/pci/pci.c | 28 +++++++++++
include/linux/pci.h | 47 +++++++++++++++++---
10 files changed, 222 insertions(+), 8 deletions(-)
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-pci-extended-config
@@ -0,0 +1,39 @@
+What: /sys/devices/pci<bus>/<device>/extended_config_space
+Date: January 11, 2008
+Contact: Arjan van de Ven <[email protected]>
+Description:
+ This attribute is for use for system-diagnostic software
+ only.
+
+ The kernel may decide to restrict PCI configuration space
+ access for userspace to the first 64 or 256 bytes by
+ default, for stability reasons. This attribute, when
+ present, can be used to request access to the full
+ 4Kb from the kernel.
+
+ Request to get access to the full 4Kb can be done by
+ writing a '1' into this attribute file. All other values
+ are reserved for future use and should not be used by
+ software at this point.
+
+ The kernel may log the request to the various kernel
+ logging services. The kernel may decide to ignore the
+ request if the kernel deems extended configuration space
+ access not reliable enough for the system or the device.
+ The kernel may decide to not present this attribute
+ if the kernel decides extended config space is reliable
+ and made available by default, or if the kernel decides
+ that extended configuration space will never be
+ accessible.
+
+ Software needs to gracefully deal with getting the
+ access not granted. Software also needs to gracefully deal
+ with this attribute not being present.
+
+ Due to the fragility of extended configuration space,
+ system diagnostic software should only set this attribute
+ on explicit user request, or in the case of GUI like tools,
+ at least with explicit user permission.
+
+
+
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -26,6 +26,7 @@ int pcibios_last_bus = -1;
unsigned long pirq_table_addr;
struct pci_bus *pci_root_bus;
struct pci_raw_ops *raw_pci_ops;
+struct pci_raw_ops *raw_pci_ops_extcfg;
static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
{
@@ -39,9 +40,31 @@ static int pci_write(struct pci_bus *bus
devfn, where, size, value);
}
+static int pci_read_ext(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
+{
+ if (raw_pci_ops_extcfg)
+ return raw_pci_ops_extcfg->read(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+ else
+ return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+}
+
+static int pci_write_ext(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
+{
+ if (raw_pci_ops_extcfg)
+ return raw_pci_ops_extcfg->write(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+ else
+ return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+}
+
struct pci_ops pci_root_ops = {
.read = pci_read,
.write = pci_write,
+ .readext = pci_read_ext,
+ .writeext = pci_write_ext,
};
/*
--- a/arch/x86/pci/init.c
+++ b/arch/x86/pci/init.c
@@ -14,6 +14,16 @@ static __init int pci_access_init(void)
#ifdef CONFIG_PCI_MMCONFIG
pci_mmcfg_init(type);
#endif
+ /* if we ONLY have MMCONFIG, we need to use it always */
+ if (!raw_pci_ops && raw_pci_ops_extcfg) {
+ printk(KERN_INFO "No direct PCI access, using MMCONFIG always\n");
+ raw_pci_ops = raw_pci_ops_extcfg;
+ }
+
+ /*
+ * we've found a usable method; this means we can skip
+ * the potentially dangerous BIOS based methods
+ */
if (raw_pci_ops)
return 0;
#ifdef CONFIG_PCI_BIOS
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -143,6 +143,6 @@ int __init pci_mmcfg_arch_reachable(unsi
int __init pci_mmcfg_arch_init(void)
{
printk(KERN_INFO "PCI: Using MMCONFIG\n");
- raw_pci_ops = &pci_mmcfg;
+ raw_pci_ops_extcfg = &pci_mmcfg;
return 1;
}
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -152,6 +152,6 @@ int __init pci_mmcfg_arch_init(void)
return 0;
}
}
- raw_pci_ops = &pci_mmcfg;
+ raw_pci_ops_extcfg = &pci_mmcfg;
return 1;
}
--- a/arch/x86/pci/pci.h
+++ b/arch/x86/pci/pci.h
@@ -32,6 +32,8 @@
extern unsigned int pci_probe;
extern unsigned long pirq_table_addr;
+extern struct pci_raw_ops *raw_pci_ops_extcfg;
+
enum pci_bf_sort_state {
pci_bf_sort_default,
pci_force_nobf,
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -51,6 +51,45 @@ int pci_bus_write_config_##size \
return res; \
}
+#define PCI_OP_READ_EXT(size, type, len) \
+int pci_bus_read_extconfig_##size \
+ (struct pci_bus *bus, unsigned int devfn, int pos, type *value) \
+{ \
+ int res; \
+ unsigned long flags; \
+ u32 data = 0; \
+ if (PCI_##size##_BAD) \
+ return PCIBIOS_BAD_REGISTER_NUMBER; \
+ spin_lock_irqsave(&pci_lock, flags); \
+ if (bus->ops->readext) \
+ res = bus->ops->readext(bus, devfn, pos, len, &data); \
+ else \
+ res = bus->ops->read(bus, devfn, pos, len, &data); \
+ *value = (type)data; \
+ spin_unlock_irqrestore(&pci_lock, flags); \
+ return res; \
+} \
+EXPORT_SYMBOL(pci_bus_read_extconfig_##size);
+
+#define PCI_OP_WRITE_EXT(size, type, len) \
+int pci_bus_write_extconfig_##size \
+ (struct pci_bus *bus, unsigned int devfn, int pos, type value) \
+{ \
+ int res; \
+ unsigned long flags; \
+ if (PCI_##size##_BAD) \
+ return PCIBIOS_BAD_REGISTER_NUMBER; \
+ spin_lock_irqsave(&pci_lock, flags); \
+ if (bus->ops->writeext) \
+ res = bus->ops->writeext(bus, devfn, pos, len, value); \
+ else \
+ res = bus->ops->write(bus, devfn, pos, len, value); \
+ spin_unlock_irqrestore(&pci_lock, flags); \
+ return res; \
+} \
+EXPORT_SYMBOL(pci_bus_write_extconfig_##size);
+
+
PCI_OP_READ(byte, u8, 1)
PCI_OP_READ(word, u16, 2)
PCI_OP_READ(dword, u32, 4)
@@ -58,6 +97,13 @@ PCI_OP_WRITE(byte, u8, 1)
PCI_OP_WRITE(word, u16, 2)
PCI_OP_WRITE(dword, u32, 4)
+PCI_OP_READ_EXT(byte, u8, 1)
+PCI_OP_READ_EXT(word, u16, 2)
+PCI_OP_READ_EXT(dword, u32, 4)
+PCI_OP_WRITE_EXT(byte, u8, 1)
+PCI_OP_WRITE_EXT(word, u16, 2)
+PCI_OP_WRITE_EXT(dword, u32, 4)
+
EXPORT_SYMBOL(pci_bus_read_config_byte);
EXPORT_SYMBOL(pci_bus_read_config_word);
EXPORT_SYMBOL(pci_bus_read_config_dword);
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -143,6 +143,35 @@ static ssize_t is_enabled_show(struct de
return sprintf (buf, "%u\n", atomic_read(&pdev->enable_cnt));
}
+static ssize_t extended_config_space_store(struct device *dev,
+ struct device_attribute *attr, const char *buf,
+ size_t count)
+{
+ ssize_t result = -EINVAL;
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ /* this can crash the machine when done on the "wrong" device */
+ if (!capable(CAP_SYS_ADMIN))
+ return count;
+
+ if (*buf == '1') {
+ printk(KERN_WARNING "Application %s enabled extended config space for device %s\n",
+ current->comm, pci_name(pdev));
+ result = pci_enable_ext_config(pdev);
+ }
+
+ return result < 0 ? result : count;
+}
+
+static ssize_t extended_config_space_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct pci_dev *pdev;
+
+ pdev = to_pci_dev(dev);
+ return sprintf(buf, "%u\n", pdev->ext_cfg_space);
+}
+
#ifdef CONFIG_NUMA
static ssize_t
numa_node_show(struct device *dev, struct device_attribute *attr, char *buf)
@@ -206,6 +235,8 @@ struct device_attribute pci_dev_attrs[]
__ATTR_RO(numa_node),
#endif
__ATTR(enable, 0600, is_enabled_show, is_enabled_store),
+ __ATTR(extended_config_space, 0600, extended_config_space_show,
+ extended_config_space_store),
__ATTR(broken_parity_status,(S_IRUGO|S_IWUSR),
broken_parity_status_show,broken_parity_status_store),
__ATTR(msi_bus, 0644, msi_bus_show, msi_bus_store),
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -802,6 +802,34 @@ int pci_enable_device(struct pci_dev *de
return __pci_enable_device_flags(dev, IORESOURCE_MEM | IORESOURCE_IO);
}
+/**
+ * pci_enable_ext_config - Enable extended (4K) config space accesses
+ * @dev: PCI device to be changed
+ *
+ * Enable extended (4Kb) configuration space accesses for a device.
+ * Extended config space is available for PCI-E devices and can
+ * be used for things like PCI AER and other features. However,
+ * due to various stability issues, this can only be done on demand.
+ *
+ * Returns: -1 on failure, 0 on success
+ */
+
+int pci_enable_ext_config(struct pci_dev *dev)
+{
+ if (dev->ext_cfg_space < 0)
+ return -1;
+ if (dev->ext_cfg_space > 0)
+ return 0;
+ dev->ext_cfg_space = 1;
+ /*
+ * now that we enabled large accesse, we
+ * need to update the config space size variable
+ */
+ dev->cfg_size = pci_cfg_space_size(dev);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pci_enable_ext_config);
+
/*
* Managed PCI resources. This manages device on/off, intx/msi/msix
* on/off and BAR regions. pci_dev itself records msi/msix status, so
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -169,6 +169,15 @@ struct pci_dev {
int cfg_size; /* Size of configuration space */
/*
+ * ext_cfg_space gets set by drivers/quirks to device if
+ * extended (4K) config space is desired.
+ * negative values -- hard disabled (quirk etc)
+ * zero -- disabled
+ * positive values -- enable
+ */
+ int ext_cfg_space;
+
+ /*
* Instead of touching interrupt line and base address registers
* directly, use the values stored here. They might be different!
*/
@@ -297,6 +306,8 @@ struct pci_bus {
struct pci_ops {
int (*read)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val);
int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
+ int (*readext)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val);
+ int (*writeext)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
};
struct pci_raw_ops {
@@ -517,29 +528,48 @@ int pci_bus_write_config_byte (struct pc
int pci_bus_write_config_word (struct pci_bus *bus, unsigned int devfn, int where, u16 val);
int pci_bus_write_config_dword (struct pci_bus *bus, unsigned int devfn, int where, u32 val);
+int pci_bus_read_extconfig_byte(struct pci_bus *bus, unsigned int devfn, int where, u8 *val);
+int pci_bus_read_extconfig_word(struct pci_bus *bus, unsigned int devfn, int where, u16 *val);
+int pci_bus_read_extconfig_dword(struct pci_bus *bus, unsigned int devfn, int where, u32 *val);
+int pci_bus_write_extconfig_byte(struct pci_bus *bus, unsigned int devfn, int where, u8 val);
+int pci_bus_write_extconfig_word(struct pci_bus *bus, unsigned int devfn, int where, u16 val);
+int pci_bus_write_extconfig_dword(struct pci_bus *bus, unsigned int devfn, int where, u32 val);
+
static inline int pci_read_config_byte(struct pci_dev *dev, int where, u8 *val)
{
- return pci_bus_read_config_byte (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_read_extconfig_byte(dev->bus, dev->devfn, where, val);
+ return pci_bus_read_config_byte(dev->bus, dev->devfn, where, val);
}
static inline int pci_read_config_word(struct pci_dev *dev, int where, u16 *val)
{
- return pci_bus_read_config_word (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_read_extconfig_word(dev->bus, dev->devfn, where, val);
+ return pci_bus_read_config_word(dev->bus, dev->devfn, where, val);
}
static inline int pci_read_config_dword(struct pci_dev *dev, int where, u32 *val)
{
- return pci_bus_read_config_dword (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_read_extconfig_dword(dev->bus, dev->devfn, where, val);
+ return pci_bus_read_config_dword(dev->bus, dev->devfn, where, val);
}
static inline int pci_write_config_byte(struct pci_dev *dev, int where, u8 val)
{
- return pci_bus_write_config_byte (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_write_extconfig_byte(dev->bus, dev->devfn, where, val);
+ return pci_bus_write_config_byte(dev->bus, dev->devfn, where, val);
}
static inline int pci_write_config_word(struct pci_dev *dev, int where, u16 val)
{
- return pci_bus_write_config_word (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_write_extconfig_word(dev->bus, dev->devfn, where, val);
+ return pci_bus_write_config_word(dev->bus, dev->devfn, where, val);
}
static inline int pci_write_config_dword(struct pci_dev *dev, int where, u32 val)
{
- return pci_bus_write_config_dword (dev->bus, dev->devfn, where, val);
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_write_extconfig_dword(dev->bus, dev->devfn, where, val);
+ return pci_bus_write_config_dword(dev->bus, dev->devfn, where, val);
}
int __must_check pci_enable_device(struct pci_dev *dev);
@@ -689,6 +719,9 @@ void ht_destroy_irq(unsigned int irq);
extern void pci_block_user_cfg_access(struct pci_dev *dev);
extern void pci_unblock_user_cfg_access(struct pci_dev *dev);
+extern int pci_enable_ext_config(struct pci_dev *dev);
+
+
/*
* PCI domain support. Sometimes called PCI segment (eg by ACPI),
* a PCI domain is defined to be a set of PCI busses which share
@@ -786,6 +819,8 @@ static inline struct pci_dev *pci_get_bu
unsigned int devfn)
{ return NULL; }
+static inline int pci_enable_ext_config(struct pci_dev *dev) { return -1; }
+
#endif /* CONFIG_PCI */
/* Include architecture-dependent settings and functions */
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
Thanks, Arjan.
The problem we have been experiencing has to do with Northbridges,
not with devices.
As far as the device is concerned, after the Northbridge translates
the config access into PCI bus cycles, the device has no idea what
mechanism drove the Northbridge to the translation.
That is to say, the device does not know whether the config cycle
on the bus was caused by an MMCONFIG cycle or a legacy Port IO
cycle delivered to the Northbridge.
In systems that had Northbridges that did not respond correctly to
MMCONFIG cycles, like the AMD 8132, we (HP & RH) were blacklisting
whole platforms to limit them to Port IO PCI config.
However, when platforms emerged using both legacy PCI and PCI express,
the platforms that were limited to Port IO config cycles were not
express compliant, since the express spec requires the platform to
be able to address the full 4096 byte region of config space to
be considered express-compliant.
The patch I devised concerned itself with Northbridges and separated
MMCONFIG-compliant buses from those that could not handle MMCONFIG.
Therefore, the express bus in the platform could happily employ
MMCONFIG to access the entire 4K region, while the legacy bus
with the non-compliant Northbridge could be restricted to Port IO
config.
However, even with my patch, the problem remained where devices
requiring large displacements could overlap the BIOS-mapped
MMCONFIG region. In such a situation, where the bus has passed
the MMCONFIG test, the MMCONFIG region can get doubly mapped by
bus-sizing code, causing the system to hang.
The remedy proposed by Loic and implemented by Ivan is actually
quite elegant, in that it addresses all these problems quite
effectively while eliminating a ration of specialized and somewhat
obscure code.
In my humble opinion, Port IO config access is here to stay, having
been defined as an architected mechanism in the PCI 2.1 spec.
This is most especially true for x86.
In other words, for x86, I don't think we need to worry about Port
IO config access ever going away at all.
Linus Torvalds wrote:
>
> On Fri, 11 Jan 2008, Matthew Wilcox wrote:
>> Did I miss a bug report? The only problems I'm currently aware of are
>> the ones where using MMCONFIG during BAR probing causes a hard lockup on
>> some Intel machines, and the ones where we get bad config data on some
>> AMD machines due to the configuration retry status being mishandled.
>
> Hmm. Were all those reports root-caused to just that BAR probing? If so,
> we may be in better shape than I worried.
As far as I'm aware, the known MMCONFIG-related issues that I'm aware of
are or have been:
-Some devices built into the AMD K8 integrated northbridge can't be
reached by MMCONFIG - already handled
-Overlap of device BAR and MMCONFIG aperature during BAR sizing causing
lockup - can be avoided by disabling device decode during BAR sizing.
-PCI Express CRS-related issues - already handled by disabling CRS by
default
-Devices behind certain host bridges (some AMD HT to PCI-X bridges,
others?) can't be reached by MMCONFIG - can be handled by Tony Camuso's
patch or something similar (note that this is really a BIOS bug, it
should not list those buses in the MCFG table if MMCONFIG cannot access
them, and if it didn't I think we could already handle that)
-Some issue with some AMD CPUs needing MMCONFIG accesses to use a
certain register I believe? already handled?
Of these, I think the PCI BAR/MMCONFIG overlap problem is responsible
for by far the most cases of machines thought to have "broken MMCONFIG",
when in fact they were nothing of the sort. I don't recall hearing of a
single machine where MMCONFIG really just didn't work at all.
As I've mentioned before, all of these issues (well, I suppose not the
BAR overlap one) need to be resolved whether we have Arjan's patch or
not, otherwise if a driver does opt in and tries to use extended config
space it will still break. And if they are resolved, the patch seems
quite pointless.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
On Sat, 12 Jan 2008 20:36:59 -0500
Tony Camuso <[email protected]> wrote:
> Thanks, Arjan.
>
> The problem we have been experiencing has to do with Northbridges,
> not with devices.
correct for now.
HOWEVER, and this is the point Linus has made several times:
Just about NOBODY has devices that need the extended config space. At all.
So making this opt-in for devices allows our users to boot and use
their system if they are in the majority that has no need for even getting
close to this mess.
>
> As far as the device is concerned, after the Northbridge translates
> the config access into PCI bus cycles, the device has no idea what
> mechanism drove the Northbridge to the translation.
Wanne bet there'll be devices that screw this up? THere's devices that even screwed
up the 64-256 region after all.
> The patch I devised concerned itself with Northbridges and separated
> MMCONFIG-compliant buses from those that could not handle MMCONFIG.
THis kind of patchup has been going on for the better part of a year (well 2 years)
by now and it's STILL NOT ENOUGH, as you can see by the more patchups that have
been proposed as "alternative" to my approach.
>
> In my humble opinion, Port IO config access is here to stay, having
> been defined as an architected mechanism in the PCI 2.1 spec.
>
> This is most especially true for x86.
>
> In other words, for x86, I don't think we need to worry about Port
> IO config access ever going away at all.
You're wrong there. Sad to say, but you're wrong there.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Sat, Jan 12, 2008 at 08:42:48PM -0800, Arjan van de Ven wrote:
> Wanne bet there'll be devices that screw this up? THere's devices that even screwed
> up the 64-256 region after all.
I don't know if they 'screwed it up'. There are devices that misbehave
when registers are read from pci config space. But this was never
guaranteed to be a safe thing to do; it gradualy became clear that
people expected to be able to read random registers and manufacturers
responded accordingly, but I don't think you were ever guaranteed to be
able to peek at bits of config space arbitrarily.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Sat, Dec 29, 2007 at 12:12:19AM +0300, Ivan Kokshaysky wrote:
> On Fri, Dec 28, 2007 at 12:40:53PM -0500, Loic Prylli wrote:
> > One thing that could be changed in pci_cfg_space_size() is to avoid
> > making a special case for PCI-X 266MHz/533Mhz (assume cfg_size == 256
> > for such devices too, reserve extended cfg-space for pci-express
> > devices). There is good reasons to think no such PCI-X 266Mhz/533 device
> > will ever have an extended-space (no capability IDs was ever defined in
> > the PCI-X 2.0 spec, no new revision is planned). Such a check would
> > avoid the possibility of trying extended-conf-space access for PCI-X 2.0
> > devices behind a amd-8132 or similar (such accesses would just returnd
> > -1, but there was some objections raised about doing anything like that
> > other than at initialization time, even if there is ample reasons to
> > argue it would be harmless).
>
> I agree, we should remove it. IIRC, this PCI-X check was written
> long ago with some draft (not a final spec) in hands. Matthew?
I have what I believe to be the released version of PCI-X 2.0a (July
22, 2003). It is quite clear that Mode 2 devices (ie those running at
266MHz or 533MHz) are required to support all 4096 bytes of extended
config space.
More to the point, I don't think we have any bug reports suggesting that
PCI-X Mode 2 devices/bridges have any problems. There are relatively
few of them in existance, and my impression is that PCI-X2 is only being
implemented on server-class machines. 'Consumer grade' equipment is
where all the problems lie anyway.
While the PCI-X 2.0a spec does not define any Extended Capability IDs,
it simply states that "This field is a PCI-SIG defined ID number that
indicates the nature and format of the Extended Capabilities List item".
The PCIe spec does define Extended Capability IDs, and I would think
it's entirely appropriate to use the same IDs for PCI-X Mode 2 devices.
So I don't believe any change in this area is appropriate.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
Matthew Wilcox wrote:
> On Sat, Jan 12, 2008 at 08:42:48PM -0800, Arjan van de Ven wrote:
>> Wanne bet there'll be devices that screw this up? THere's devices that even screwed
>> up the 64-256 region after all.
>
> I don't know if they 'screwed it up'. There are devices that misbehave
> when registers are read from pci config space. But this was never
> guaranteed to be a safe thing to do; it gradualy became clear that
> people expected to be able to read random registers and manufacturers
> responded accordingly, but I don't think you were ever guaranteed to be
> able to peek at bits of config space arbitrarily.
Quite correct... Reading registers can have all sorts of side effects,
for example clearing chip conditions.
Jeff
On Sat, 2008-01-12 at 17:40 +0300, Ivan Kokshaysky wrote:
>
> Actually I'm strongly against Arjan's patch. First, it's based on
> assumption that the MMCONFIG thing is sort of fundamentally broken
> on some systems, but none of the facts we have so far does confirm
> that.
> And second, I really don't like the implementation as it breaks all
> non-x86 arches (or forces them to add a set of totally meaningless
> PCI functions).
I agree, I quite dislike it too. Even If the breakage on x86 makes us
want to totally disable it there, it can be done within the existing PCI
ops I believe.
I think Arjan's problem is to try to do it per-device since the
"standard" PCI ops don't get a pci_dev structure (for obvious reasons).
But from what I read in this thread, this per-device enabling/disabling
doesn't seem very useful at all.
Cheers,
Ben.
On 1/13/2008 1:01 AM, Matthew Wilcox wrote:
> On Sat, Dec 29, 2007 at 12:12:19AM +0300, Ivan Kokshaysky wrote:
>
>> On Fri, Dec 28, 2007 at 12:40:53PM -0500, Loic Prylli wrote:
>>
>>> One thing that could be changed in pci_cfg_space_size() is to avoid
>>> making a special case for PCI-X 266MHz/533Mhz (assume cfg_size == 256
>>> for such devices too, reserve extended cfg-space for pci-express
>>> devices).
>>>
>> I agree, we should remove it. IIRC, this PCI-X check was written
>> long ago with some draft (not a final spec) in hands. Matthew?
>>
>
> I have what I believe to be the released version of PCI-X 2.0a (July
> 22, 2003). It is quite clear that Mode 2 devices (ie those running at
> 266MHz or 533MHz) are required to support all 4096 bytes of extended
> config space.
>
> More to the point, I don't think we have any bug reports suggesting that
> PCI-X Mode 2 devices/bridges have any problems.
As PCI-X2 bridge/chipset, I only knows about the AMD-8132 (from what I
understand it does PCI-X Mode 2), and some obscure IBM enterprise
chipset (I am sure there are a few more).
Too bad for the spec, but we definitely know for sure the AMD-8132
doesn't do ext-space (and makes it unusable for any device behind it).
> There are relatively
> few of them in existance, and my impression is that PCI-X2 is only being
> implemented on server-class machines.
True.
> 'Consumer grade' equipment is
> where all the problems lie anyway.
>
mmconfig has been a pain on the servers too (there are a lot of server
class amd machines using one pcie/mmconfig/chipset + amd-8131/2).
> While the PCI-X 2.0a spec does not define any Extended Capability IDs,
> it simply states that "This field is a PCI-SIG defined ID number that
> indicates the nature and format of the Extended Capabilities List item".
> The PCIe spec does define Extended Capability IDs, and I would think
> it's entirely appropriate to use the same IDs for PCI-X Mode 2 devices.
>
Sure it might be needed on PCI-X2. But contrary to pcie (where the
driver/pci/pcie/aer subsystem already use ext-conf-space, and other
usages are bound to increase), needing ext-conf-space in the future on
pci-x2 is quite unlikely (pcie is long-lived, whereas PCI-X2 was
short-lived, obsoleted by PCI-E, and nobody has mentioned yet an example
of using ext-registers with a PCI-X2 device).
I was only mentioning that because of the very small trade-off: if you
don't exclude PCI-X2, on platforms with the amd-8132+bad-MCFG, you might
trigger a cfg-read==0xffffffff/master-abort in pci_cfg_space_size() for
such devices with Ivan patch. This is harmless, because a lot of similar
master-abort happen during PCI-probing anyway, so one more won't change
anything.
Anyway, I am equally happy with keeping pci_cfg_space_size() as it is.
Loic
On Sun, Jan 13, 2008 at 06:08:05PM +1100, Benjamin Herrenschmidt wrote:
> On Sat, 2008-01-12 at 17:40 +0300, Ivan Kokshaysky wrote:
> > Actually I'm strongly against Arjan's patch. First, it's based on
> > assumption that the MMCONFIG thing is sort of fundamentally broken
> > on some systems, but none of the facts we have so far does confirm
> > that.
> > And second, I really don't like the implementation as it breaks all
> > non-x86 arches (or forces them to add a set of totally meaningless
> > PCI functions).
>
> I agree, I quite dislike it too. Even If the breakage on x86 makes us
> want to totally disable it there, it can be done within the existing PCI
> ops I believe.
>
> I think Arjan's problem is to try to do it per-device since the
> "standard" PCI ops don't get a pci_dev structure (for obvious reasons).
Here's a patch (on top of Ivan's) to improve things further.
One of Arjan's big problems with Ivan's patch is the hardcoding of conf1
as the fallback. So I took an idea from Arjan's patch, crossed it
with an idea of my own and came up with this. It gets rid of the
raw_pci_ops as a generic idea, and makes it private to the x86 arch.
It also makes the whole select-which-ops private to the x86 arch without
touching the pci layer at all.
Only compile-tested on x86-64.
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 488e48a..ffaf02b 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -43,8 +43,7 @@
#define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg) \
(((u64) seg << 28) | (bus << 20) | (devfn << 12) | (reg))
-static int
-pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *value)
{
u64 addr, data = 0;
@@ -68,8 +67,7 @@ pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
return 0;
}
-static int
-pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 value)
{
u64 addr;
@@ -91,24 +89,17 @@ pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
return 0;
}
-static struct pci_raw_ops pci_sal_ops = {
- .read = pci_sal_read,
- .write = pci_sal_write
-};
-
-struct pci_raw_ops *raw_pci_ops = &pci_sal_ops;
-
-static int
-pci_read (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
+static int pci_read(struct pci_bus *bus, unsigned int devfn, int where,
+ int size, u32 *value)
{
- return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
-static int
-pci_write (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
+static int pci_write(struct pci_bus *bus, unsigned int devfn, int where,
+ int size, u32 value)
{
- return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
diff --git a/arch/ia64/sn/pci/tioce_provider.c b/arch/ia64/sn/pci/tioce_provider.c
index e1a3e19..f6df212 100644
--- a/arch/ia64/sn/pci/tioce_provider.c
+++ b/arch/ia64/sn/pci/tioce_provider.c
@@ -752,13 +752,13 @@ tioce_kern_init(struct tioce_common *tioce_common)
* Determine the secondary bus number of the port2 logical PPB.
* This is used to decide whether a given pci device resides on
* port1 or port2. Note: We don't have enough plumbing set up
- * here to use pci_read_config_xxx() so use the raw_pci_ops vector.
+ * here to use pci_read_config_xxx() so use raw_pci_read().
*/
seg = tioce_common->ce_pcibus.bs_persist_segment;
bus = tioce_common->ce_pcibus.bs_persist_busnum;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(2, 0), PCI_SECONDARY_BUS, 1,&tmp);
+ raw_pci_read(seg, bus, PCI_DEVFN(2, 0), PCI_SECONDARY_BUS, 1,&tmp);
tioce_kern->ce_port1_secondary = (u8) tmp;
/*
@@ -799,11 +799,11 @@ tioce_kern_init(struct tioce_common *tioce_common)
/* mem base/limit */
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_MEMORY_BASE, 2, &tmp);
base = (u64)tmp << 16;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_MEMORY_LIMIT, 2, &tmp);
limit = (u64)tmp << 16;
limit |= 0xfffffUL;
@@ -817,21 +817,21 @@ tioce_kern_init(struct tioce_common *tioce_common)
* attributes.
*/
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_MEMORY_BASE, 2, &tmp);
base = ((u64)tmp & PCI_PREF_RANGE_MASK) << 16;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_BASE_UPPER32, 4, &tmp);
base |= (u64)tmp << 32;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_MEMORY_LIMIT, 2, &tmp);
limit = ((u64)tmp & PCI_PREF_RANGE_MASK) << 16;
limit |= 0xfffffUL;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_LIMIT_UPPER32, 4, &tmp);
limit |= (u64)tmp << 32;
diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index fab30e1..b92d2e6 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -27,7 +27,7 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
pci_write_config_byte(dev, 0xf4, config|0x2);
/* read xTPR register */
- raw_pci_ops->read(0, 0, 0x40, 0x4c, 2, &word);
+ raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
if (!(word & (1 << 13))) {
printk(KERN_INFO "Intel E7520/7320/7525 detected. "
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 8627463..65a6c55 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -26,16 +26,37 @@ int pcibios_last_bus = -1;
unsigned long pirq_table_addr;
struct pci_bus *pci_root_bus;
struct pci_raw_ops *raw_pci_ops;
+struct pci_raw_ops *raw_pci_ext_ops;
+
+int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 *val)
+{
+ if (reg < 256)
+ return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
+ return -EINVAL;
+}
+
+int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 val)
+{
+ if (reg < 256)
+ return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
+ return -EINVAL;
+}
static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
{
- return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
static int pci_write(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
{
- return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
index 431c9a5..42f3e4c 100644
--- a/arch/x86/pci/direct.c
+++ b/arch/x86/pci/direct.c
@@ -14,7 +14,7 @@
#define PCI_CONF1_ADDRESS(bus, devfn, reg) \
(0x80000000 | (bus << 16) | (devfn << 8) | (reg & ~3))
-int pci_conf1_read(unsigned int seg, unsigned int bus,
+static int pci_conf1_read(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 *value)
{
unsigned long flags;
@@ -45,7 +45,7 @@ int pci_conf1_read(unsigned int seg, unsigned int bus,
return 0;
}
-int pci_conf1_write(unsigned int seg, unsigned int bus,
+static int pci_conf1_write(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 value)
{
unsigned long flags;
diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index 6cff66d..c222a1f 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -215,7 +215,8 @@ static int quirk_aspm_offset[MAX_PCIEROOT << 3];
static int quirk_pcie_aspm_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
{
- return raw_pci_ops->read(0, bus->number, devfn, where, size, value);
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
}
/*
@@ -231,7 +232,8 @@ static int quirk_pcie_aspm_write(struct pci_bus *bus, unsigned int devfn, int wh
if ((offset) && (where == offset))
value = value & 0xfffffffc;
- return raw_pci_ops->write(0, bus->number, devfn, where, size, value);
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
}
static struct pci_ops quirk_pcie_aspm_ops = {
diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 5565d70..b008765 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -22,7 +22,7 @@ static void __devinit pcibios_fixup_peer_bridges(void)
if (pci_find_bus(0, n))
continue;
for (devfn = 0; devfn < 256; devfn += 8) {
- if (!raw_pci_ops->read(0, n, devfn, PCI_VENDOR_ID, 2, &l) &&
+ if (!raw_pci_read(0, n, devfn, PCI_VENDOR_ID, 2, &l) &&
l != 0x0000 && l != 0xffff) {
DBG("Found device at %02x:%02x [%04x]\n", n, devfn, l);
printk(KERN_INFO "PCI: Discovered peer bus %02x\n", n);
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 6b521d3..8d54df4 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -28,7 +28,7 @@ static int __initdata pci_mmcfg_resources_inserted;
static const char __init *pci_mmcfg_e7520(void)
{
u32 win;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
win = win & 0xf000;
if(win == 0x0000 || win == 0xf000)
@@ -53,7 +53,7 @@ static const char __init *pci_mmcfg_intel_945(void)
pci_mmcfg_config_num = 1;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
/* Enable bit */
if (!(pciexbar & 1))
@@ -118,7 +118,7 @@ static int __init pci_mmcfg_check_hostbridge(void)
int i;
const char *name;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
vendor = l & 0xffff;
device = (l >> 16) & 0xffff;
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 7b75e65..37a00fb 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -68,9 +68,6 @@ err: *value = -1;
return -EINVAL;
}
- if (reg < 256)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
-
base = get_base_addr(seg, bus, devfn);
if (!base)
goto err;
@@ -104,9 +101,6 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if ((bus > 255) || (devfn > 255) || (reg > 4095))
return -EINVAL;
- if (reg < 256)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
-
base = get_base_addr(seg, bus, devfn);
if (!base)
return -EINVAL;
@@ -140,5 +134,6 @@ int __init pci_mmcfg_arch_init(void)
{
printk(KERN_INFO "PCI: Using MMCONFIG\n");
raw_pci_ops = &pci_mmcfg;
+ raw_pci_ext_ops = &pci_mmcfg;
return 1;
}
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index c4cf318..6bf47e9 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -58,9 +58,6 @@ err: *value = -1;
return -EINVAL;
}
- if (reg < 256)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
-
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
goto err;
@@ -89,9 +86,6 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095)))
return -EINVAL;
- if (reg < 256)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
-
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
return -EINVAL;
@@ -151,5 +145,6 @@ int __init pci_mmcfg_arch_init(void)
}
}
raw_pci_ops = &pci_mmcfg;
+ raw_pci_ext_ops = &pci_mmcfg;
return 1;
}
diff --git a/arch/x86/pci/pci.h b/arch/x86/pci/pci.h
index 36cb44c..3431518 100644
--- a/arch/x86/pci/pci.h
+++ b/arch/x86/pci/pci.h
@@ -85,10 +85,17 @@ extern spinlock_t pci_config_lock;
extern int (*pcibios_enable_irq)(struct pci_dev *dev);
extern void (*pcibios_disable_irq)(struct pci_dev *dev);
-extern int pci_conf1_write(unsigned int seg, unsigned int bus,
- unsigned int devfn, int reg, int len, u32 value);
-extern int pci_conf1_read(unsigned int seg, unsigned int bus,
- unsigned int devfn, int reg, int len, u32 *value);
+struct pci_raw_ops {
+ int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 *val);
+ int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 val);
+};
+
+extern struct pci_raw_ops *raw_pci_ops;
+extern struct pci_raw_ops *raw_pci_ext_ops;
+
+extern struct pci_raw_ops pci_direct_conf1;
extern int pci_direct_probe(void);
extern void pci_direct_init(int type);
diff --git a/arch/x86/pci/visws.c b/arch/x86/pci/visws.c
index 8ecb1c7..c2df4e9 100644
--- a/arch/x86/pci/visws.c
+++ b/arch/x86/pci/visws.c
@@ -13,9 +13,6 @@
#include "pci.h"
-
-extern struct pci_raw_ops pci_direct_conf1;
-
static int pci_visws_enable_irq(struct pci_dev *dev) { return 0; }
static void pci_visws_disable_irq(struct pci_dev *dev) { }
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index e3a673a..ea68ef1 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -139,15 +139,6 @@ acpi_status __init acpi_os_initialize(void)
acpi_status acpi_os_initialize1(void)
{
- /*
- * Initialize PCI configuration space access, as we'll need to access
- * it while walking the namespace (bus 0 and root bridges w/ _BBNs).
- */
- if (!raw_pci_ops) {
- printk(KERN_ERR PREFIX
- "Access to PCI configuration space unavailable\n");
- return AE_NULL_ENTRY;
- }
kacpid_wq = create_singlethread_workqueue("kacpid");
kacpi_notify_wq = create_singlethread_workqueue("kacpi_notify");
BUG_ON(!kacpid_wq);
@@ -498,11 +489,9 @@ acpi_os_read_pci_configuration(struct acpi_pci_id * pci_id, u32 reg,
return AE_ERROR;
}
- BUG_ON(!raw_pci_ops);
-
- result = raw_pci_ops->read(pci_id->segment, pci_id->bus,
- PCI_DEVFN(pci_id->device, pci_id->function),
- reg, size, value);
+ result = raw_pci_read(pci_id->segment, pci_id->bus,
+ PCI_DEVFN(pci_id->device, pci_id->function),
+ reg, size, value);
return (result ? AE_ERROR : AE_OK);
}
@@ -529,11 +518,9 @@ acpi_os_write_pci_configuration(struct acpi_pci_id * pci_id, u32 reg,
return AE_ERROR;
}
- BUG_ON(!raw_pci_ops);
-
- result = raw_pci_ops->write(pci_id->segment, pci_id->bus,
- PCI_DEVFN(pci_id->device, pci_id->function),
- reg, size, value);
+ result = raw_pci_write(pci_id->segment, pci_id->bus,
+ PCI_DEVFN(pci_id->device, pci_id->function),
+ reg, size, value);
return (result ? AE_ERROR : AE_OK);
}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0dd93bb..75029d3 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -304,14 +304,14 @@ struct pci_ops {
int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
};
-struct pci_raw_ops {
- int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
- int reg, int len, u32 *val);
- int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
- int reg, int len, u32 val);
-};
-
-extern struct pci_raw_ops *raw_pci_ops;
+/*
+ * ACPI needs to be able to access PCI config space before we've done a
+ * PCI bus scan and created pci_bus structures.
+ */
+extern int raw_pci_read(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 *val);
+extern int raw_pci_write(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 val);
struct pci_bus_region {
unsigned long start;
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Sun, Jan 13, 2008 at 12:24:15AM -0700, Matthew Wilcox wrote:
> Here's a patch (on top of Ivan's) to improve things further.
Oops. I forgot to check the ordering of mmconfig vs direct probing, so
that patch would end up just using mmconfig for everything. Not what we
want. Also, there's three bits of mmconfig-shared that're probing using
conf1, even if it might have failed. And if we're going to use
raw_pci_read() when conf1 might have failed and mmconf isn't set up yet,
we need to check raw_pci_ops in raw_pci_read(). Add the check in
raw_pci_write too, just for symmetry.
I don't like it that mmconfig_32 prints a message and mmconfig_64
doesn't, but fixing that is not part of this patch.
Interdiff:
diff -u b/arch/x86/pci/common.c b/arch/x86/pci/common.c
--- b/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -31,7 +31,7 @@
int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *val)
{
- if (reg < 256)
+ if (reg < 256 && raw_pci_ops)
return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
if (raw_pci_ext_ops)
return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
@@ -41,7 +41,7 @@
int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 val)
{
- if (reg < 256)
+ if (reg < 256 && raw_pci_ops)
return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
if (raw_pci_ext_ops)
return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
diff -u b/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
--- b/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -28,7 +28,7 @@
static const char __init *pci_mmcfg_e7520(void)
{
u32 win;
- pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
+ raw_pci_read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
win = win & 0xf000;
if(win == 0x0000 || win == 0xf000)
@@ -53,7 +53,7 @@
pci_mmcfg_config_num = 1;
- pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
+ raw_pci_read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
/* Enable bit */
if (!(pciexbar & 1))
@@ -118,7 +118,7 @@
int i;
const char *name;
- pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
+ raw_pci_read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
vendor = l & 0xffff;
device = (l >> 16) & 0xffff;
diff -u b/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
--- b/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -132,8 +132,10 @@
int __init pci_mmcfg_arch_init(void)
{
- printk(KERN_INFO "PCI: Using MMCONFIG\n");
- raw_pci_ops = &pci_mmcfg;
+ printk(KERN_INFO "PCI: Using MMCONFIG for %s config space\n",
+ raw_pci_ops ? "extended" : "all");
+ if (!raw_pci_ops)
+ raw_pci_ops = &pci_mmcfg;
raw_pci_ext_ops = &pci_mmcfg;
return 1;
}
diff -u b/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
--- b/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -144,7 +144,8 @@
return 0;
}
}
- raw_pci_ops = &pci_mmcfg;
+ if (!raw_pci_ops)
+ raw_pci_ops = &pci_mmcfg;
raw_pci_ext_ops = &pci_mmcfg;
return 1;
}
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
Arjan van de Ven wrote:
> On Sat, 12 Jan 2008 20:36:59 -0500
> Tony Camuso <[email protected]> wrote:
>
>
> Just about NOBODY has devices that need the extended config space. At all.
The PCI express spec requires the platform to provide access to this space
for express-compliance. More devices will be using this space as express
becomes the dominant IO bus technology.
>> As far as the device is concerned, after the Northbridge translates
>> the config access into PCI bus cycles, the device has no idea what
>> mechanism drove the Northbridge to the translation.
>
> Wanne bet there'll be devices that screw this up? THere's devices that even screwed
> up the 64-256 region after all.
>
There may have been devices that incorrectly applied the PCI spec to
various fields in the header, I'll grant you that.
However, there is no way a device can determine electrically whether the
Northbridge received Port IO or MMCONFIG cycles. This is between the CPU
and the Northbridge and is utterly opaque to the devices on the bus.
>> The patch I devised concerned itself with Northbridges and separated
>> MMCONFIG-compliant buses from those that could not handle MMCONFIG.
>
> THis kind of patchup has been going on for the better part of a year (well 2 years)
> by now and it's STILL NOT ENOUGH, as you can see by the more patchups that have
> been proposed as "alternative" to my approach.
>
Which is why Loic's proposal and Ivan's implementation of it is so elegant.
It solves all these problems in one sweep, and eliminates the code rendered
cruft by Ivan's patch. A two-fer, by my reckoning.
>> In other words, for x86, I don't think we need to worry about Port
>> IO config access ever going away at all.
>
> You're wrong there. Sad to say, but you're wrong there.
>
The PCI spec provides for conf1 as an architected solution. It's not
going away, and especially not in x86 land where Port IO is built-in
to the CPU.
as a general thing I like where this patch is going
On Sun, 13 Jan 2008 00:24:15 -0700
Matthew Wilcox <[email protected]> wrote:
> +
> +int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int
> devfn,
> + int reg, int len,
> u32 *val) +{
> + if (reg < 256)
> + return raw_pci_ops->read(domain, bus, devfn, reg,
> len, val);
> + if (raw_pci_ext_ops)
> + return raw_pci_ext_ops->read(domain, bus, devfn,
> reg, len, val);
> + return -EINVAL;
would be nice the "reg > 256 && raw_pci_Ext_ops==NULL" case would just
call the raw_pci_ops-> pointer, to give that a chance of refusal
(but I guess that shouldn't really happen)
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -28,7 +28,7 @@ static int __initdata pci_mmcfg_resources_inserted;
> static const char __init *pci_mmcfg_e7520(void)
> {
> u32 win;
> - pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
> + pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
couldn't this (at least in some next patch) use the vector if it exists?
\
> @@ -140,5 +134,6 @@ int __init pci_mmcfg_arch_init(void)
> {
> printk(KERN_INFO "PCI: Using MMCONFIG\n");
> raw_pci_ops = &pci_mmcfg;
> + raw_pci_ext_ops = &pci_mmcfg;
why set BOTH vectors? you probably ONLY want to set the ext one, so
that calls to the lower 256 go to the original
On Sun, 13 Jan 2008 07:43:11 -0500
Tony Camuso <[email protected]> wrote:
> Arjan van de Ven wrote:
> > On Sat, 12 Jan 2008 20:36:59 -0500
> > Tony Camuso <[email protected]> wrote:
> >
> >
> > Just about NOBODY has devices that need the extended config space.
> > At all.
>
> The PCI express spec requires the platform to provide access to this
> space for express-compliance.
PLATFORM not OS :)
Windows isn't using it in the server space, and only in the client space it recently started
considering it.
> More devices will be using this space
> as express becomes the dominant IO bus technology.
sure in like 2009 maybe.
> Which is why Loic's proposal and Ivan's implementation of it is so
> elegant. It solves all these problems in one sweep, and eliminates
> the code rendered cruft by Ivan's patch. A two-fer, by my reckoning.
>
> >> In other words, for x86, I don't think we need to worry about Port
> >> IO config access ever going away at all.
> >
> > You're wrong there. Sad to say, but you're wrong there.
> >
>
> The PCI spec provides for conf1 as an architected solution. It's not
> going away, and especially not in x86 land where Port IO is built-in
> to the CPU.
again sadly you're wrong.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On 1/12/2008 12:45 PM, Arjan van de Ven wrote:
> On Sat, 12 Jan 2008 17:40:30 +0300
> Ivan Kokshaysky <[email protected]> wrote:
>
>>
>> + if (reg < 256)
>> + return pci_conf1_read(seg,bus,devfn,reg,len,value);
>> +
>>
>
>
> btw this is my main objection to your patch; it intertwines the conf1 and mmconfig code even more.
> When (and I'm saying "when" not "if") systems arrive that only have MMCONFIG for some of the devices,
> we'll have to detangle this again, and I'm really not looking forward to that.
>
conf1 has been a hardcoded dependencies of mmconfig for years. Ivan's
patch does not make it worse (in fact it considerably simplifies that
code, making it easier to untangle later).
IMHO, either your patch or Ivan's can be a good base, but:
1) For your remark above to be given any consideration, your patch
should be modified to remove the hardcoded conf1 from the *current*
mmconfig code, otherwise we end up with 3 set of ops (mmconfig + conf1+
a possible third set of operations) intertwined in a confusing manner.
And removing that dependency is not a straightforward operation unless
you also do 2):
2) the pci_enable_ext_config() function and dev->ext_cfg_space field,
sysfs interface should be removed from the patch. There has never been
a problem reporting crashes or any undefined behaviour while trying to
access ext-conf-space, all the problems where *using mmconfig to access
legacy-conf-space*. The "if (dev->cfg_space_ext > 0)" checks can instead
be replaced by "if (reg >= 256)".
Otherwise when using per-device explicit enabling, just *checking*
whether ext-conf-space is available by calling pci_enable_ext_config(),
will make some of the old problems of *loosing legacy conf-space* come
back: you would have introduced a new user-space and kernel API while
only solving half the problems, not a good deal.
if you do 1) and 2), then you really support the good properties you
claimed:
- You can use mmconfig for ext-space and something else for legacy-space.
- You can use mmconfig for everything (for instance if conf1 is not
implemented).
Of course it is as straightforward to modify Ivan's patch to also have
the same properties.
Loic
Loic
On Sun, 13 Jan 2008 13:23:35 -0500
Loic Prylli <[email protected]> wrote:
Matthew pointed a patch that basically does what you suggested; only one comment on your mail left after that:
>
> 2) the pci_enable_ext_config() function and dev->ext_cfg_space field,
> sysfs interface should be removed from the patch. There has never
> been a problem reporting crashes or any undefined behaviour while
> trying to access ext-conf-space, all the problems where *using
> mmconfig to access legacy-conf-space*.
This entirely misses the point of why I made the patch. The point is NOT
that devices are buggy. The point is that right now, 99.99% of the machines
out there do NOT need extended config space (no matter how it gets accessed),
yet at the same time they suffered from it's issues for... what 2 years now?
The point of my patch was to make people who don't need extended config space,
not have to deal with it anymore.
Note: There is not a 100% overlap between "need" and "will not be used in
the patches that use legacy for < 256". In the other patches posted,
extended config space will be used in cases where it won't be with my
patch. (Most obvious one is an "lspci -vx" from automated scripts).
Is that a problem? We've had 2 years of mess, with one not-enough patch after another.
There still are problems TODAY (eg im 2.6.24-rc7). The patch that falls back
to an alternative method for below 256 is no doubt a step in the right direction.
(although I'm not all that happy about mixing access types, it's not provably incorrect)
Is it enough? I'm not sure. Only time can tell I suppose, but the risk side is that
if it is not enough, users who don't need the extended config space for functionality
will suffer the bugs AGAIN.
So in short, my approach was NOT about "fix PCI", it is about "fix the user experience".
It's a stopgap for sure, until the underlying mechanism gets reliable. It's been 2 years.....
maybe this next step is "it", maybe it isn't.
On Sun, Jan 13, 2008 at 10:41:24AM -0800, Arjan van de Ven wrote:
> Note: There is not a 100% overlap between "need" and "will not be used in
> the patches that use legacy for < 256". In the other patches posted,
> extended config space will be used in cases where it won't be with my
> patch. (Most obvious one is an "lspci -vx" from automated scripts).
I believe you to be mistaken in this belief. If you take Ivan's patch,
conf1 is used for all accesses below 256 bytes. lspci -x only dumps
config space up to 64 bytes; lspci -xxxx is needed to show extended pci
config space.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On 1/13/2008 1:41 PM, Arjan van de Ven wrote:
> On Sun, 13 Jan 2008 13:23:35 -0500
> Loic Prylli <[email protected]> wrote:
>
> Matthew pointed a patch that basically does what you suggested; only one comment on your mail left after that:
>
>
>> 2) the pci_enable_ext_config() function and dev->ext_cfg_space field,
>> sysfs interface should be removed from the patch. There has never
>> been a problem reporting crashes or any undefined behaviour while
>> trying to access ext-conf-space, all the problems where *using
>> mmconfig to access legacy-conf-space*.
>>
>
>
> This entirely misses the point of why I made the patch. The point is NOT
> that devices are buggy. The point is that right now, 99.99% of the machines
> out there do NOT need extended config space (no matter how it gets accessed),
>
> The point of my patch was to make people who don't need extended config space,
> not have to deal with it anymore.
>
I think I got your point the first time, and I agree it is sound. But in
my subjective and biased opinion, I just think ext-conf-space is
already useful and widespread enough (being used is not the same as
being strictly required for basic operation) for your proposed tradeoff
to not be optimal (protecting against "future/non-proven" hardware bugs,
i.e. bringing non-proven benefits, at the expense of making life harder
for ext-conf-space users while bringing additional extra API/code).
To take an example from the linux tree: the driver/pci/pcie/aer code
uses ext-conf-space for every pcie-root (currently several distributions
enable it by default), does it mean opt-in would be automatically
activated for most pcie hierarchies (defeating most of the benefits of
being opt-in), or we just disable that code by default?
Does lspci -v will automatically opt-in all pcie (right now by default
it tries to list the extended-capabilities for pcie and pcix), or do we
now require manual explicit sysfs operations to get the whole thing? Is
is an additional flag to lspci (if so will that flag also apply to pcix,
possibly causing a crash for lspci -v
-<opt-in-all-potential-ext-devices> on some machines).
> Note: There is not a 100% overlap between "need" and "will not be used in
> the patches that use legacy for < 256". In the other patches posted,
> extended config space will be used in cases where it won't be with my
> patch. (Most obvious one is an "lspci -vx" from automated scripts).
To go one step your direction, I have already argued in a couple of
emails that I would prefer to not implement ext-conf-space access for
any PCI-X devices (removing PCI-X2 from pci_ext_cfg_size), because there
we are trying to support devices that we don't really know exists or
will ever exists. And protecting against "unproven bugs" makes more
sense when it only removes "unproven benefits".
>
> Is that a problem? We've had 2 years of mess, with one not-enough patch after another.
>
> There still are problems TODAY (eg im 2.6.24-rc7). The patch that falls back
> to an alternative method for below 256 is no doubt a step in the right direction.
> (although I'm not all that happy about mixing access types, it's not provably incorrect)
> Is it enough? I'm not sure.
FWIW, I have in my tree a patch almost identical to Ivan's dated
"December 2005". Because of the constant activity on the mmconfig front
(that I thought would make it obsolete), I never took the effort of
suggesting it before one month ago (I am not a regular user of
linux-kernel). I admit nobody else should view it that way, but for me
rather than the last attempt at fixing mmconfig, it's a patch first used
two years ago that would have arguably prevented all problems that have
been reported since then.
Besides, recent mails show that hypothetically, we could even not change
anything to the existing conf-space code, since the only known bug
remaining is the one associated with bar probing and could be adressed by:
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/broken-out/pci-disable-decoding-during-sizing-of-bars.patch
[ Thanks to Robert hancok and Grant Grundler for explaining to me the
history of bar-probing last month ]
Even if that bar-probing patch was applied (maybe it needs to be more
combat-proven), by default, it still seems better to not use mmconfig
for legacy-conf-space access, but going two extra precaution steps
beyond what seems necessary might be excessive.
> Only time can tell I suppose, but the risk side is that
> if it is not enough, users who don't need the extended config space for functionality
> will suffer the bugs AGAIN.
>
You can indeed never exclude 100% that possibility, but if they see a
problem again, it is likely to be a new category of hardware/BIOS bugs
never seen before.
Loic
On 1/13/2008 3:43 PM, Matthew Wilcox wrote:
> On Sun, Jan 13, 2008 at 10:41:24AM -0800, Arjan van de Ven wrote:
>
>> Note: There is not a 100% overlap between "need" and "will not be used in
>> the patches that use legacy for < 256". In the other patches posted,
>> extended config space will be used in cases where it won't be with my
>> patch. (Most obvious one is an "lspci -vx" from automated scripts).
>>
>
> I believe you to be mistaken in this belief. If you take Ivan's patch,
> conf1 is used for all accesses below 256 bytes. lspci -x only dumps
> config space up to 64 bytes; lspci -xxxx is needed to show extended pci
> config space.
>
I agree with Arjan about that "not a 100% overlap". It is about the
extra ext-conf-space access done while probing in drivers/pci/probe.c:
dev->cfg_size = pci_cfg_space_size(dev);
(and lspci -v will also query/show the list of extended-caps for
pci-x/pcie-x devices that have some, provided the kernel can access
ext-conf-space).
With Ivan's patch, that line would still cause one extended-conf-space
access at offset 256 for pcie/pci-x2 devices (to check the ability to
query ext-space). Arjan "opt-in" patch would prevent that extra access.
IMHO that access is OK and harmless in all cases, we are already
protected by MCFG/e820 checks, but I agree one can express a different
opinion based on trying to prevent "never-seen/potential" hardware/BIOS
bugs. FWIW it is also there that I was suggested to exclude PCI-X2
devices (when restricted to pcie, that access while probing cannot even
cause the harmless master-abort/0xffffffff), but there is a small trade-off.
Loic
Arjan van de Ven wrote:
>> The PCI spec provides for conf1 as an architected solution. It's not
>> going away, and especially not in x86 land where Port IO is built-in
>> to the CPU.
>
> again sadly you're wrong.
>
As someone gently pointed out to me, you are in a position to know this,
so I probably am wrong.
On Sun, 13 Jan 2008 16:28:08 -0500
Tony Camuso <[email protected]> wrote:
> Arjan van de Ven wrote:
>
> >> The PCI spec provides for conf1 as an architected solution. It's not
> >> going away, and especially not in x86 land where Port IO is built-in
> >> to the CPU.
> >
> > again sadly you're wrong.
> >
>
> As someone gently pointed out to me, you are in a position to know this,
> so I probably am wrong.
I suspect Arjan is wrong. It might be some Intel agenda but I still see
fairly new driver reference code that is hardcoding port accesses even
when designed for Redmond products.
Alan
On Mon, 14 Jan 2008 00:54:34 +0000
Alan Cox <[email protected]> wrote:
> On Sun, 13 Jan 2008 16:28:08 -0500
> Tony Camuso <[email protected]> wrote:
>
> > Arjan van de Ven wrote:
> >
> > >> The PCI spec provides for conf1 as an architected solution. It's
> > >> not going away, and especially not in x86 land where Port IO is
> > >> built-in to the CPU.
>
>
> I suspect Arjan is wrong. It might be some Intel agenda but I still
> see fairly new driver reference code that is hardcoding port accesses
> even when designed for Redmond products.
I find it hard to believe that even they have their drivers do PCI config access via ports directly from the drivers,
and especially in driver reference code...
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
To all ...
Well, here is what I perceive we've got so far.
. Some PCI Northbridges do not work with MMCONFIG.
. Some PCI BARs can overlap the MMCONFIG area during bus sizing.
It is hoped that new BIOSes will locate MMCONFIG in an area
safely out of the way of bus sizing code, but there can be
no guarantees.
. conf1 is going away in newer x86 implementations in the not
too distant future.
. The PCI express spec requires platforms to provide access to
the extended config area, and there are express devices today
using that area for AER.
. There is no need to provide different PCI config access
mechanisms at device granularity, since the PCI config access
mechanism between the CPU and the Northbridge is opaque to
the devices. PCI config mechanisms only need to differ at
the Northbridge level.
. We have a flurry of patches all claiming to solve all or some
of these problems.
Arjan,
I realize it may not be possible for you to answer this question,
but I feel compelled to ask it anyway. Is it possible that future
x86 architectures will be implementing a SAL-like interface to
abstract PCI config access altogether?
Or can we condense these patches down to a set that does the
following?
. If the system is capable of conf1, then PCI config access
at offsets < 256 should be confined to conf1. This solution
is most effective for existing and legacy systems.
. If the system does not support MMCONFIG, of if MMCONFIG is
not working, then accesses to offsets > 256 return -1 and an
error status.
. For systems, where the conf1 mechanism is NOT available,
then MMCONFIG should be the PCI access mechanism for all
offsets. For such systems, we must assume that the BIOS has
become smart enough to locate MMCONFIG in a region safe from
encroachment by bus sizing code.
On Sun, 13 Jan 2008 22:29:23 -0500
Tony Camuso <[email protected]> wrote:
> . There is no need to provide different PCI config access
> mechanisms at device granularity, since the PCI config access
> mechanism between the CPU and the Northbridge is opaque to
> the devices. PCI config mechanisms only need to differ at
> the Northbridge level.
This ignores the "lets make it not matter for the 99% of the users" case.
>
> . If the system is capable of conf1, then PCI config access
> at offsets < 256 should be confined to conf1. This solution
> is most effective for existing and legacy systems.
not "conf1" but "what the platform thinks is the best method for < 256".
We have this nice abstraction for the platform to select the best method... we should use it.
And still, it's another attempt to get this fixed (well.. it's been 2 years in the coming so far, maybe this will
be the last one, maybe it will not be... we'll see I suppose, but it sucks to be a user who doesn't
need any of the functionality that the extended config space provides in theory but gets to suffer more of the issues)
I'm all in favor of making this more reliable, but really..
we've thought it was fixed time and time again over the last two years. Please consider
limiting the scope of the damage as well.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Mon, 14 Jan 2008, Alan Cox wrote:
> >
> > As someone gently pointed out to me, you are in a position to know this,
> > so I probably am wrong.
>
> I suspect Arjan is wrong. It might be some Intel agenda but I still see
> fairly new driver reference code that is hardcoding port accesses even
> when designed for Redmond products.
Agreed. I suspect that the likelihood of conf1 accesses going away in the
next five years is slim to none.
Linus
> > even when designed for Redmond products.
>
> I find it hard to believe that even they have their drivers do PCI config access via ports directly from the drivers,
> and especially in driver reference code...
Microsoft may not but the standard of Taiwanese driver code (and by
reference I mean vendor reference not OS supplier reference) is not
always great. When you have weeks to write a driver for a product with a
6 month sales lifetime I guess there are other pressures on driver
authors.
Easy enough for Intel to analyse though.
Alan
Arjan van de Ven wrote:
> On Sun, 13 Jan 2008 22:29:23 -0500
> Tony Camuso <[email protected]> wrote:
>
>> . There is no need to provide different PCI config access
>> mechanisms at device granularity, since the PCI config access
>> mechanism between the CPU and the Northbridge is opaque to
>> the devices. PCI config mechanisms only need to differ at
>> the Northbridge level.
>
> This ignores the "lets make it not matter for the 99% of the users" case.
I don't understand. If we're going to differentiate MMCONFIG from some other
access mechanism, it only needs to be done at the Northbridge level. Devices
are electrically ignorant of the protocol used between CPU and Northbridge
to get the Northbridge to assert config cycles on the bus.
>> . If the system is capable of conf1, then PCI config access
>> at offsets < 256 should be confined to conf1. This solution
>> is most effective for existing and legacy systems.
>
> not "conf1" but "what the platform thinks is the best method for < 256".
>
> We have this nice abstraction for the platform to select the best method... we should use it.
>
Agreed.
So we have Loic and Ivan's patch limiting MMCONFIG accesses to
offsets >= 256.
And we have Matthew's patch that abstracts the method for config
accesses to offsets < 256.
I beleive Matthew has already tested these patches for functionality
on x86. All that's needed is to test for regressions on other arches.
Is there any interest in providing the following?
1. The ability to use MMCONFIG for all accesses on systems that have
no problems with MMCONFIG.
2. For systems using both PCI and PCI express, testing each bus
for MMCONFIG compliance, to determine whether MMCONFIG can be
used for all config accesses or whether the bus must be limited
all to the method abstracted for offsets < 256.
Or does that introduce unnecessary complications?
On Mon, 14 Jan 2008 08:01:01 -0500
Tony Camuso <[email protected]> wrote:
> Arjan van de Ven wrote:
> > On Sun, 13 Jan 2008 22:29:23 -0500
> > Tony Camuso <[email protected]> wrote:
> >
> >> . There is no need to provide different PCI config access
> >> mechanisms at device granularity, since the PCI config access
> >> mechanism between the CPU and the Northbridge is opaque to
> >> the devices. PCI config mechanisms only need to differ at
> >> the Northbridge level.
> >
> > This ignores the "lets make it not matter for the 99% of the users"
> > case.
>
> I don't understand.
That;s clear :)
> If we're going to differentiate MMCONFIG from
> some other access mechanism, it only needs to be done at the
> Northbridge level. Devices are electrically ignorant of the protocol
> used between CPU and Northbridge to get the Northbridge to assert
> config cycles on the bus.
Again this is about having systems that don't need extended config space not use it. At all.
The only way to do that is have the drivers say they need it, and not use it otherwise.
It has NOTHING to do with how things are wired up. It's pure a kernel level policy decision
about whether to use extended config space AT ALL.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
Arjan van de Ven wrote:
> On Mon, 14 Jan 2008 08:01:01 -0500
> Tony Camuso <[email protected]> wrote:
>>
>> If we're going to differentiate MMCONFIG from
>> some other access mechanism, it only needs to be done at the
>> Northbridge level. Devices are electrically ignorant of the protocol
>> used between CPU and Northbridge to get the Northbridge to assert
>> config cycles on the bus.
>
> Again this is about having systems that don't need extended config space not use it. At all.
> The only way to do that is have the drivers say they need it, and not use it otherwise.
> It has NOTHING to do with how things are wired up. It's pure a kernel level policy decision
> about whether to use extended config space AT ALL.
>
The problem with compelling device drivers to determine the PCI
config mechanism is that it must be forced upon arches that
have no PCI configuration quirks or don't even use the same
PCI config mechanisms as x86.
I don't think that's a good policy.
Better to confine arch-specific quirks to the arch-specific code
whenever possible.
On Mon, 14 Jan 2008 10:23:14 -0500
Tony Camuso <[email protected]> wrote:
> Arjan van de Ven wrote:
> > On Mon, 14 Jan 2008 08:01:01 -0500
> > Tony Camuso <[email protected]> wrote:
> >>
> >> If we're going to differentiate MMCONFIG from
> >> some other access mechanism, it only needs to be done at the
> >> Northbridge level. Devices are electrically ignorant of the
> >> protocol used between CPU and Northbridge to get the Northbridge
> >> to assert config cycles on the bus.
> >
> > Again this is about having systems that don't need extended config
> > space not use it. At all. The only way to do that is have the
> > drivers say they need it, and not use it otherwise. It has NOTHING
> > to do with how things are wired up. It's pure a kernel level policy
> > decision about whether to use extended config space AT ALL.
> >
>
> The problem with compelling device drivers to determine the PCI
> config mechanism is that it must be forced upon arches that
> have no PCI configuration quirks or don't even use the same
> PCI config mechanisms as x86.
it's not pci_enable_mmconf(), it's pci_enable_extended_config_space... it's independent of the mechanism!
>
> I don't think that's a good policy.
>
> Better to confine arch-specific quirks to the arch-specific code
> whenever possible.
>
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
Arjan van de Ven wrote:
> it's not pci_enable_mmconf(), it's pci_enable_extended_config_space... it's independent of the mechanism!
>
Arjan, you would be foisting this call on device drivers running on
arches that don't need any such distinction between extended config
space and < 256 bytes.
I still think it's a bad policy.
Let's endeavor to confine arch-specific quirks to the arch-specific
code.
On Sun, Jan 13, 2008 at 09:01:08AM -0800, Arjan van de Ven wrote:
> would be nice the "reg > 256 && raw_pci_Ext_ops==NULL" case would just
> call the raw_pci_ops-> pointer, to give that a chance of refusal
> (but I guess that shouldn't really happen)
We don't have a situation where that can happen -- all the other current
config methods on x86 are limited to <256 bytes. If we get another
method, we can revisit this.
> > - pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
> > + pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
>
> couldn't this (at least in some next patch) use the vector if it exists?
I thought so, but due to the way that things are initialised, mmconfig
happens before conf1. conf1 is known to be usable, but hasn't set
raw_pci_ops at this point. Confusing, and not ideal, but fixing this
isn't in scope for 2.6.24.
> > printk(KERN_INFO "PCI: Using MMCONFIG\n");
> > raw_pci_ops = &pci_mmcfg;
> > + raw_pci_ext_ops = &pci_mmcfg;
>
> why set BOTH vectors? you probably ONLY want to set the ext one, so
> that calls to the lower 256 go to the original
I had misunderstood how the x86 pci init happened -- I thought conf1
would override this. It doesn't.
The following patch has been tested on ia64, x86 and x86_64.
It successfully avoids the hang on my G33 machine (ie BAR probing
problem), when applied *after* Ivan's patch.
Greg, please apply Ivan's patch and then this one.
---
PCI: Rationalise raw_pci_ops
Replace raw_pci_ops with raw_pci_read() and raw_pci_write(). This is
a better interface for ACPI, ia64 and now x86.
Make pci_raw_ops private to the x86 arch, and use it to implement
raw_pci_read/write. Add a raw_pci_ext_ops for extended config space.
Signed-off-by: Matthew Wilcox <[email protected]>
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 488e48a..8fd7e82 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -43,8 +43,7 @@
#define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg) \
(((u64) seg << 28) | (bus << 20) | (devfn << 12) | (reg))
-static int
-pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *value)
{
u64 addr, data = 0;
@@ -68,8 +67,7 @@ pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
return 0;
}
-static int
-pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 value)
{
u64 addr;
@@ -91,24 +89,17 @@ pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
return 0;
}
-static struct pci_raw_ops pci_sal_ops = {
- .read = pci_sal_read,
- .write = pci_sal_write
-};
-
-struct pci_raw_ops *raw_pci_ops = &pci_sal_ops;
-
-static int
-pci_read (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
+static int pci_read(struct pci_bus *bus, unsigned int devfn, int where,
+ int size, u32 *value)
{
- return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
-static int
-pci_write (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
+static int pci_write(struct pci_bus *bus, unsigned int devfn, int where,
+ int size, u32 value)
{
- return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
diff --git a/arch/ia64/sn/pci/tioce_provider.c b/arch/ia64/sn/pci/tioce_provider.c
index e1a3e19..999f14f 100644
--- a/arch/ia64/sn/pci/tioce_provider.c
+++ b/arch/ia64/sn/pci/tioce_provider.c
@@ -752,13 +752,13 @@ tioce_kern_init(struct tioce_common *tioce_common)
* Determine the secondary bus number of the port2 logical PPB.
* This is used to decide whether a given pci device resides on
* port1 or port2. Note: We don't have enough plumbing set up
- * here to use pci_read_config_xxx() so use the raw_pci_ops vector.
+ * here to use pci_read_config_xxx() so use raw_pci_read().
*/
seg = tioce_common->ce_pcibus.bs_persist_segment;
bus = tioce_common->ce_pcibus.bs_persist_busnum;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(2, 0), PCI_SECONDARY_BUS, 1,&tmp);
+ raw_pci_read(seg, bus, PCI_DEVFN(2, 0), PCI_SECONDARY_BUS, 1,&tmp);
tioce_kern->ce_port1_secondary = (u8) tmp;
/*
@@ -799,11 +799,11 @@ tioce_kern_init(struct tioce_common *tioce_common)
/* mem base/limit */
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_MEMORY_BASE, 2, &tmp);
base = (u64)tmp << 16;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_MEMORY_LIMIT, 2, &tmp);
limit = (u64)tmp << 16;
limit |= 0xfffffUL;
@@ -817,21 +817,21 @@ tioce_kern_init(struct tioce_common *tioce_common)
* attributes.
*/
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_MEMORY_BASE, 2, &tmp);
base = ((u64)tmp & PCI_PREF_RANGE_MASK) << 16;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_BASE_UPPER32, 4, &tmp);
base |= (u64)tmp << 32;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_MEMORY_LIMIT, 2, &tmp);
limit = ((u64)tmp & PCI_PREF_RANGE_MASK) << 16;
limit |= 0xfffffUL;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_LIMIT_UPPER32, 4, &tmp);
limit |= (u64)tmp << 32;
diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index fab30e1..7f73f7c 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -27,7 +27,7 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
pci_write_config_byte(dev, 0xf4, config|0x2);
/* read xTPR register */
- raw_pci_ops->read(0, 0, 0x40, 0x4c, 2, &word);
+ raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
if (!(word & (1 << 13))) {
printk(KERN_INFO "Intel E7520/7320/7525 detected. "
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 8627463..f2bd9f3 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -26,16 +26,37 @@ int pcibios_last_bus = -1;
unsigned long pirq_table_addr;
struct pci_bus *pci_root_bus;
struct pci_raw_ops *raw_pci_ops;
+struct pci_raw_ops *raw_pci_ext_ops;
+
+int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 *val)
+{
+ if (reg < 256 && raw_pci_ops)
+ return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
+ return -EINVAL;
+}
+
+int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 val)
+{
+ if (reg < 256 && raw_pci_ops)
+ return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
+ return -EINVAL;
+}
static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
{
- return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
static int pci_write(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
{
- return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
index 431c9a5..42f3e4c 100644
--- a/arch/x86/pci/direct.c
+++ b/arch/x86/pci/direct.c
@@ -14,7 +14,7 @@
#define PCI_CONF1_ADDRESS(bus, devfn, reg) \
(0x80000000 | (bus << 16) | (devfn << 8) | (reg & ~3))
-int pci_conf1_read(unsigned int seg, unsigned int bus,
+static int pci_conf1_read(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 *value)
{
unsigned long flags;
@@ -45,7 +45,7 @@ int pci_conf1_read(unsigned int seg, unsigned int bus,
return 0;
}
-int pci_conf1_write(unsigned int seg, unsigned int bus,
+static int pci_conf1_write(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 value)
{
unsigned long flags;
diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index 6cff66d..b31cd6a 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -215,7 +215,8 @@ static int quirk_aspm_offset[MAX_PCIEROOT << 3];
static int quirk_pcie_aspm_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
{
- return raw_pci_ops->read(0, bus->number, devfn, where, size, value);
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
}
/*
@@ -231,7 +232,8 @@ static int quirk_pcie_aspm_write(struct pci_bus *bus, unsigned int devfn, int wh
if ((offset) && (where == offset))
value = value & 0xfffffffc;
- return raw_pci_ops->write(0, bus->number, devfn, where, size, value);
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
}
static struct pci_ops quirk_pcie_aspm_ops = {
diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 5565d70..e041ced 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -22,7 +22,7 @@ static void __devinit pcibios_fixup_peer_bridges(void)
if (pci_find_bus(0, n))
continue;
for (devfn = 0; devfn < 256; devfn += 8) {
- if (!raw_pci_ops->read(0, n, devfn, PCI_VENDOR_ID, 2, &l) &&
+ if (!raw_pci_read(0, n, devfn, PCI_VENDOR_ID, 2, &l) &&
l != 0x0000 && l != 0xffff) {
DBG("Found device at %02x:%02x [%04x]\n", n, devfn, l);
printk(KERN_INFO "PCI: Discovered peer bus %02x\n", n);
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 6b521d3..8d54df4 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -28,7 +28,7 @@ static int __initdata pci_mmcfg_resources_inserted;
static const char __init *pci_mmcfg_e7520(void)
{
u32 win;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
win = win & 0xf000;
if(win == 0x0000 || win == 0xf000)
@@ -53,7 +53,7 @@ static const char __init *pci_mmcfg_intel_945(void)
pci_mmcfg_config_num = 1;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
/* Enable bit */
if (!(pciexbar & 1))
@@ -118,7 +118,7 @@ static int __init pci_mmcfg_check_hostbridge(void)
int i;
const char *name;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
vendor = l & 0xffff;
device = (l >> 16) & 0xffff;
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 7b75e65..081816a 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -68,9 +68,6 @@ err: *value = -1;
return -EINVAL;
}
- if (reg < 256)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
-
base = get_base_addr(seg, bus, devfn);
if (!base)
goto err;
@@ -104,9 +101,6 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if ((bus > 255) || (devfn > 255) || (reg > 4095))
return -EINVAL;
- if (reg < 256)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
-
base = get_base_addr(seg, bus, devfn);
if (!base)
return -EINVAL;
@@ -138,7 +132,7 @@ static struct pci_raw_ops pci_mmcfg = {
int __init pci_mmcfg_arch_init(void)
{
- printk(KERN_INFO "PCI: Using MMCONFIG\n");
- raw_pci_ops = &pci_mmcfg;
+ printk(KERN_INFO "PCI: Using MMCONFIG for extended config space\n");
+ raw_pci_ext_ops = &pci_mmcfg;
return 1;
}
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index c4cf318..9207fd4 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -58,9 +58,6 @@ err: *value = -1;
return -EINVAL;
}
- if (reg < 256)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
-
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
goto err;
@@ -89,9 +86,6 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095)))
return -EINVAL;
- if (reg < 256)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
-
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
return -EINVAL;
@@ -150,6 +144,6 @@ int __init pci_mmcfg_arch_init(void)
return 0;
}
}
- raw_pci_ops = &pci_mmcfg;
+ raw_pci_ext_ops = &pci_mmcfg;
return 1;
}
diff --git a/arch/x86/pci/pci.h b/arch/x86/pci/pci.h
index 36cb44c..3431518 100644
--- a/arch/x86/pci/pci.h
+++ b/arch/x86/pci/pci.h
@@ -85,10 +85,17 @@ extern spinlock_t pci_config_lock;
extern int (*pcibios_enable_irq)(struct pci_dev *dev);
extern void (*pcibios_disable_irq)(struct pci_dev *dev);
-extern int pci_conf1_write(unsigned int seg, unsigned int bus,
- unsigned int devfn, int reg, int len, u32 value);
-extern int pci_conf1_read(unsigned int seg, unsigned int bus,
- unsigned int devfn, int reg, int len, u32 *value);
+struct pci_raw_ops {
+ int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 *val);
+ int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 val);
+};
+
+extern struct pci_raw_ops *raw_pci_ops;
+extern struct pci_raw_ops *raw_pci_ext_ops;
+
+extern struct pci_raw_ops pci_direct_conf1;
extern int pci_direct_probe(void);
extern void pci_direct_init(int type);
diff --git a/arch/x86/pci/visws.c b/arch/x86/pci/visws.c
index 8ecb1c7..c2df4e9 100644
--- a/arch/x86/pci/visws.c
+++ b/arch/x86/pci/visws.c
@@ -13,9 +13,6 @@
#include "pci.h"
-
-extern struct pci_raw_ops pci_direct_conf1;
-
static int pci_visws_enable_irq(struct pci_dev *dev) { return 0; }
static void pci_visws_disable_irq(struct pci_dev *dev) { }
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index e3a673a..f190db9 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -139,15 +139,6 @@ acpi_status __init acpi_os_initialize(void)
acpi_status acpi_os_initialize1(void)
{
- /*
- * Initialize PCI configuration space access, as we'll need to access
- * it while walking the namespace (bus 0 and root bridges w/ _BBNs).
- */
- if (!raw_pci_ops) {
- printk(KERN_ERR PREFIX
- "Access to PCI configuration space unavailable\n");
- return AE_NULL_ENTRY;
- }
kacpid_wq = create_singlethread_workqueue("kacpid");
kacpi_notify_wq = create_singlethread_workqueue("kacpi_notify");
BUG_ON(!kacpid_wq);
@@ -498,11 +489,9 @@ acpi_os_read_pci_configuration(struct acpi_pci_id * pci_id, u32 reg,
return AE_ERROR;
}
- BUG_ON(!raw_pci_ops);
-
- result = raw_pci_ops->read(pci_id->segment, pci_id->bus,
- PCI_DEVFN(pci_id->device, pci_id->function),
- reg, size, value);
+ result = raw_pci_read(pci_id->segment, pci_id->bus,
+ PCI_DEVFN(pci_id->device, pci_id->function),
+ reg, size, value);
return (result ? AE_ERROR : AE_OK);
}
@@ -529,11 +518,9 @@ acpi_os_write_pci_configuration(struct acpi_pci_id * pci_id, u32 reg,
return AE_ERROR;
}
- BUG_ON(!raw_pci_ops);
-
- result = raw_pci_ops->write(pci_id->segment, pci_id->bus,
- PCI_DEVFN(pci_id->device, pci_id->function),
- reg, size, value);
+ result = raw_pci_write(pci_id->segment, pci_id->bus,
+ PCI_DEVFN(pci_id->device, pci_id->function),
+ reg, size, value);
return (result ? AE_ERROR : AE_OK);
}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0dd93bb..f4f1edd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -304,14 +304,14 @@ struct pci_ops {
int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
};
-struct pci_raw_ops {
- int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
- int reg, int len, u32 *val);
- int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
- int reg, int len, u32 val);
-};
-
-extern struct pci_raw_ops *raw_pci_ops;
+/*
+ * ACPI needs to be able to access PCI config space before we've done a
+ * PCI bus scan and created pci_bus structures.
+ */
+extern int raw_pci_read(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 *val);
+extern int raw_pci_write(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 val);
struct pci_bus_region {
unsigned long start;
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Mon, Jan 14, 2008 at 03:52:26PM -0700, Matthew Wilcox wrote:
> On Sun, Jan 13, 2008 at 09:01:08AM -0800, Arjan van de Ven wrote:
>...
> > > - pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
> > > + pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
> >
> > couldn't this (at least in some next patch) use the vector if it exists?
>
> I thought so, but due to the way that things are initialised, mmconfig
> happens before conf1. conf1 is known to be usable, but hasn't set
> raw_pci_ops at this point. Confusing, and not ideal, but fixing this
> isn't in scope for 2.6.24.
>...
*ahem*
I don't think anything of what was discussed in this thread would be in
scope for 2.6.24 (unless Linus wants to let the bunny that brings eggs
release 2.6.24).
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
Arjan van de Ven wrote:
> On Sun, 13 Jan 2008 22:29:23 -0500
> Tony Camuso <[email protected]> wrote:
>
>> . There is no need to provide different PCI config access
>> mechanisms at device granularity, since the PCI config access
>> mechanism between the CPU and the Northbridge is opaque to
>> the devices. PCI config mechanisms only need to differ at
>> the Northbridge level.
>
> This ignores the "lets make it not matter for the 99% of the users" case.
>> . If the system is capable of conf1, then PCI config access
>> at offsets < 256 should be confined to conf1. This solution
>> is most effective for existing and legacy systems.
>
> not "conf1" but "what the platform thinks is the best method for < 256".
>
> We have this nice abstraction for the platform to select the best method... we should use it.
>
> And still, it's another attempt to get this fixed (well.. it's been 2 years in the coming so far, maybe this will
> be the last one, maybe it will not be... we'll see I suppose, but it sucks to be a user who doesn't
> need any of the functionality that the extended config space provides in theory but gets to suffer more of the issues)
There actually haven't been that many attempts to "get this fixed". It's
been more a) people complaining about it and nothing being done about
the problems and b) adding hacks to blindly disable it because of
reported problems without root-causing why those problems were showing
up. With such approaches no wonder it has not been reliable to try and
use MMCONFIG in the past..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
I just thought this might be interesting to the discussion.
I recently bought another 2 GB memory for my computer.
My hardware is as following:
Asus Commando (Intel P965 chipset)
Intel Core2 Q6600
4x1 GB Geil PC6400 memory
nVidia 8800 gts (old g80 core, 640 mb mem)
Without booting with pci=nommeconf i have severe stability issues and
often when its not crashing i get slowdowns with the error:
kern.log:Jan 15 13:19:40 bilbo kernel: [ 132.046715] NVRM: Xid
(0001:00): 6, PE0001
... repeated x times.
In addition the nVidia framebuffer seems to "leak" or not update since
i get loads of graphics artifacts.
The system works perfectly fine with 2 GB memory and not the
pci=nommconf.
It works like a charm when using pci=nommconf and 4 GB memory.
In adition i have to enable the Northbridge->PCI Memory remap feature
in the BIOS to avoid the kernel panicing when trying to access > 3 gb
but that is understandable :)
My software is Kubuntu 7.10 stock x86_64 kernel, but i do use the
binary driver by nVidia.
It works like a charm when using pci=nommconf
If you guys need any more info about hardware/software from me, please
let me know.
--
?yvind V?gen J?gtnes
+47 96 22 03 08
(i reject your diurnal rhythm and subsitute my own)
On 1/14/2008 6:04 PM, Adrian Bunk wrote:
>> I thought so, but due to the way that things are initialised, mmconfig
>> happens before conf1. conf1 is known to be usable, but hasn't set
>> raw_pci_ops at this point. Confusing, and not ideal, but fixing this
>> isn't in scope for 2.6.24.
>> ...
>>
>
> *ahem*
>
> I don't think anything of what was discussed in this thread would be in
> scope for 2.6.24 (unless Linus wants to let the bunny that brings eggs
> release 2.6.24).
>
> cu
> Adrian
>
Why not put in 2.6.24 a simple fix for the last known remaining mmconfig
problems in 2.6.24? There has mostly been three bugs related to mmconfig:
- BIOS/hardware: exaggerated MCFG claims: solved long ago
- hardware: buggy CRS+mmconfig chipset: fix included last month
- Linux code: mmconfig incompatible with live BAR-probing: *not fixed*
It would be ironic to not fix the only one that is really confined to
the Linux code.
Everybody more or less agrees *any* patches submitted so far does solve
the known problems, and will not cause regressions. The only long
discussion is about how to best prevent the effect of an "imaginary"
fourth bug, and by nature that's a controversial topic.
For 2.6.24, if nothing more than a few lines can be done, either make
pci=nommconf the default and add a pci=mmconf option, or/and apply one
of the easiest patch to review i.e.Tony's one, so small I copy it again
below (using 0x40 or 0x100 for the comparison does not really matter,
personally I would change it to 0x100 to be like Ivan's patch, but
either is much better than nothing). Replacing some mmconfig access by
conf1 cannot cause any regression.
Loic
P.S.: with that patch, conf1-less x86 systems requiring mmconfig would
not be supported. But they are like UFOs. They are plenty of them in the
galaxy, but earth sightings are not convincing enough for 2.6.24
support, they can wait 2.6.25.
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 1bf5816..4474979 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -73,7 +73,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned
int bus,
}
base = get_base_addr(seg, bus, devfn);
- if (!base)
+ if ((!base) || (reg < 0x40))
return pci_conf1_read(seg,bus,devfn,reg,len,value);
spin_lock_irqsave(&pci_config_lock, flags);
@@ -106,7 +106,7 @@ static int pci_mmcfg_write(unsigned int seg,
unsigned int bus,
return -EINVAL;
base = get_base_addr(seg, bus, devfn);
- if (!base)
+ if ((!base) || (reg < 0x40))
return pci_conf1_write(seg,bus,devfn,reg,len,value);
spin_lock_irqsave(&pci_config_lock, flags);
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index 4095e4d..4ad1fcb 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -61,7 +61,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned
int bus,
}
addr = pci_dev_base(seg, bus, devfn);
- if (!addr)
+ if ((!addr) || (reg < 0x40))
return pci_conf1_read(seg,bus,devfn,reg,len,value);
switch (len) {
@@ -89,7 +89,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned
int bus,
return -EINVAL;
addr = pci_dev_base(seg, bus, devfn);
- if (!addr)
+ if ((!addr) || (reg < 0x40))
return pci_conf1_write(seg,bus,devfn,reg,len,value);
switch (len) {
On Tue, Jan 15, 2008 at 11:00:37AM -0500, Loic Prylli wrote:
>
>
> On 1/14/2008 6:04 PM, Adrian Bunk wrote:
>>> I thought so, but due to the way that things are initialised, mmconfig
>>> happens before conf1. conf1 is known to be usable, but hasn't set
>>> raw_pci_ops at this point. Confusing, and not ideal, but fixing this
>>> isn't in scope for 2.6.24.
>>> ...
>>>
>>
>> *ahem*
>>
>> I don't think anything of what was discussed in this thread would be in
>> scope for 2.6.24 (unless Linus wants to let the bunny that brings eggs
>> release 2.6.24).
>>
>> cu
>> Adrian
>>
>
>
> Why not put in 2.6.24 a simple fix for the last known remaining mmconfig
> problems in 2.6.24?
Heh, no, because it is _way_ too late for such a patch that hasn't been
tested in any trees, sorry.
2.6.25 is the earliest I'll take such a fix, and if it's really as
simple as you say, I'll consider it for the -stable releases for .24 if
needed.
But so far, we have a zillion patches floating around, claiming
different things, some with signed-off-bys and others without, so for
now, I'll just stick with Arjan's patch in -mm and see if anyone
complains about those releases...
thanks,
greg k-h
On Tue, Jan 15, 2008 at 09:46:43AM -0800, Greg KH wrote:
> But so far, we have a zillion patches floating around, claiming
> different things, some with signed-off-bys and others without, so for
> now, I'll just stick with Arjan's patch in -mm and see if anyone
> complains about those releases...
I complain about Arjan's patch. For reasons which have been adequately
gone into already in this thread.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
I agree with Matthew.
My preference is Ivan's patch using Loic's proposal.
My patch would have tested MMCONFIG before using it, but it didn't
fix the problem where the decode of large displacement devices can
overlap the MMCONFIG region.
Ivan's patch fixes that, and the problem of Northbridges that don't
respond to MMCONFIG and as a bonus cleans out some code rendered
unnecessary by his patch.
Linus is confident that conf1 is not going away for at least the
next five years.
Matthew Wilcox wrote:
> On Tue, Jan 15, 2008 at 09:46:43AM -0800, Greg KH wrote:
>> But so far, we have a zillion patches floating around, claiming
>> different things, some with signed-off-bys and others without, so for
>> now, I'll just stick with Arjan's patch in -mm and see if anyone
>> complains about those releases...
>
> I complain about Arjan's patch. For reasons which have been adequately
> gone into already in this thread.
>
On Tue, Jan 15, 2008 at 11:38:42AM -0800, Linus Torvalds wrote:
> On Tue, 15 Jan 2008, Tony Camuso wrote:
> > Linus is confident that conf1 is not going away for at least the
> > next five years.
>
> Not on PC's. Small birds tell me that there can be all these non-PC x86
> subarchitectures that may or may not have conf1.
Right -- hence my patch on top of Ivan's which removes all the assumptions
about conf1 from mmconfig (there are still *references* to conf1 in the
mmconfig code, but they'll only be used if conf1 is functional).
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Tue, 15 Jan 2008, Tony Camuso wrote:
>
> Linus is confident that conf1 is not going away for at least the
> next five years.
Not on PC's. Small birds tell me that there can be all these non-PC x86
subarchitectures that may or may not have conf1.
Linus
On 1/15/2008 2:38 PM, Linus Torvalds wrote:
> On Tue, 15 Jan 2008, Tony Camuso wrote:
>
>> Linus is confident that conf1 is not going away for at least the
>> next five years.
>>
>
> Not on PC's. Small birds tell me that there can be all these non-PC x86
> subarchitectures that may or may not have conf1.
>
> Linus
>
>
But is there a ACPI-compliant/architecture that only offers mmconfig for
configuration-space access and no other fallback method (i.e. no conf1,
no bios,...)?
2.6.24 supports mmconfig for:
- ACPI-system with MCFG
- a couple chipset discovered by conf1
If a system has no conf1, but does not have e820+ACPI+MCFG, or does have
some other method than mmconfig, it was already irrelevant in the
discussion of Ivan's initial patch in december (because that system was
either never supported or not impacted, and we were trying to fix bugs,
not introduce support for new class of systems).
Maybe Arjan could share his knowledge, and tell us what system he was
thinking about (and whether it needed to be supported by 2.6.24) when
saying:
"When (and I'm saying "when" not "if") systems arrive that only have
MMCONFIG for some of the devices."
Anyway Ivan's patch + Matthew's extensions are handling that non-PC
arch. That combination is advocated by at least:
Ivan Kokshaysky
Matthew Wilcox
Tony Camuso
Loic Prylli
even Arjan's said that while he prefers his patch (saying it's more
conservative), he does not see a existing problem with the Ivan/Matthew
combination.
[ simpler, less ambitious fixes can be forgotten if nothing can be done
for 2.6.24, I can understand that choice ]
The list of problems I see with Arjan's patch are:
- no word on whether the existing Linux driver/pci/pcie/aer code should
be converted to opt-in?
- mmconfig still needs to be revisited to sort-out the mix of
mmconfig+conf1+third-method access.
- you cannot test if ext-conf-space is available without taking risks:
when pci_enable_ext_config() is called, even legacy-conf-space is
switched to the new method. So some administrator action (lspci -v
+maybe-other-flag) or some driver action (that can optionally use
ext-conf-space but does not *rely* on it) could cause some devices to
totally disappear (if some pci hierarchy is handled by mmconfig as a
0xffffffff section as seen on many amd machines). Matthew/Ivan will
simply in the worst case detect that ext-conf-space is not available in
pci_cfg_space_size()), legacy-conf-space will still work (and that
0xffffffff section is perfectly *safe* to query, tell me if you need
more details of why).
- introduce a new user-api, and a new kernel API, while in practice
there is no evidence that brings any benefits compared to Ivan/Matthew.
IMHO, making "pci=nommconf" the default behaviour is better than
Arjan's patch: for the exaggerated 99.99% users he claims don't need
ext-conf-space, that's obviously as good. And many of the others would
benefit from the ability to test and optionally use ext-conf-space is
available without taking the risk of crashing something, so something
else is better for them.
With Arjan's patch, in 10 years, we might still have to use an extra
option (or some other action) when using lspci to display extended caps,
and we would still run the risk of crashing some old machine when doing
so (unless maybe a blacklist of some sort will be added, making the
newly introduced API completely useless soon, or unless we keep the
painful bitmaps in mmconfig potentially ending-up with 3 set of pci-ops).
Loic
On Tue, Jan 15, 2008 at 10:56:41AM -0700, Matthew Wilcox wrote:
> On Tue, Jan 15, 2008 at 09:46:43AM -0800, Greg KH wrote:
> > But so far, we have a zillion patches floating around, claiming
> > different things, some with signed-off-bys and others without, so for
> > now, I'll just stick with Arjan's patch in -mm and see if anyone
> > complains about those releases...
>
> I complain about Arjan's patch. For reasons which have been adequately
> gone into already in this thread.
Agreed.
Greg, I think at least two better alternatives were proposed already.
Please review the thread again.
grant
Greg,
Have you given Grant's suggestion any further consideration?
I'd like to know how the MMCONFIG issues discussed in this thread are going
to be handled upstream. I have a patch implemented in RHEL 5.2, but I would
rather have the upstream patch implemented, whatever it is.
Grant Grundler wrote:
> On Tue, Jan 15, 2008 at 10:56:41AM -0700, Matthew Wilcox wrote:
>> On Tue, Jan 15, 2008 at 09:46:43AM -0800, Greg KH wrote:
>>> But so far, we have a zillion patches floating around, claiming
>>> different things, some with signed-off-bys and others without, so for
>>> now, I'll just stick with Arjan's patch in -mm and see if anyone
>>> complains about those releases...
>> I complain about Arjan's patch. For reasons which have been adequately
>> gone into already in this thread.
>
> Agreed.
> Greg, I think at least two better alternatives were proposed already.
> Please review the thread again.
>
> grant
On Mon, Jan 28, 2008 at 01:32:06PM -0500, Tony Camuso wrote:
> Greg,
>
> Have you given Grant's suggestion any further consideration?
>
> I'd like to know how the MMCONFIG issues discussed in this thread are going
> to be handled upstream. I have a patch implemented in RHEL 5.2, but I would
> rather have the upstream patch implemented, whatever it is.
Well, everyone still doesn't seem to agree on the proper way forward
here, so for me to just "pick one" isn't very appropriate.
So, can we try again?
Can people submit, what they think the change should be? Right now I
have Arjan's patch in my kernel tree, but will not send it to Linus for
.25 for now, unless everyone thinks that is the best solution at the
moment (which, for me, I'm leaning toward right now...)
thanks,
greg "can't we all just get along?" k-h
On Mon, Jan 28, 2008 at 12:44:31PM -0800, Greg KH wrote:
> On Mon, Jan 28, 2008 at 01:32:06PM -0500, Tony Camuso wrote:
> > Greg,
> >
> > Have you given Grant's suggestion any further consideration?
> >
> > I'd like to know how the MMCONFIG issues discussed in this thread are going
> > to be handled upstream. I have a patch implemented in RHEL 5.2, but I would
> > rather have the upstream patch implemented, whatever it is.
>
> Well, everyone still doesn't seem to agree on the proper way forward
> here, so for me to just "pick one" isn't very appropriate.
>
> So, can we try again?
>
> Can people submit, what they think the change should be? Right now I
> have Arjan's patch in my kernel tree, but will not send it to Linus for
> .25 for now, unless everyone thinks that is the best solution at the
> moment (which, for me, I'm leaning toward right now...)
My opinion is that Ivan's patch followed by my patch is the best way
forward. I see Arjan's patch as a good prototype, but it introduces a lot
of unnecessary infrastructure (and a userspace interface that I dislike).
I would like to see Ivan's patch merged ASAP as it does fix one of
my machines. akpm has the patch from me to disable io decoding, and
intends to send it to Linus during this merge window ... that patch
becomes unnecessary if we merge Ivan's patch.
My patch is an incremental improvement that adds some of the features
of Arjan's patch without the extra infrastructure. I don't think it's
urgent, but it does make some of our internal interfaces cleaner.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Mon, Jan 28, 2008 at 03:31:42PM -0700, Matthew Wilcox wrote:
> On Mon, Jan 28, 2008 at 12:44:31PM -0800, Greg KH wrote:
> > On Mon, Jan 28, 2008 at 01:32:06PM -0500, Tony Camuso wrote:
> > > Greg,
> > >
> > > Have you given Grant's suggestion any further consideration?
> > >
> > > I'd like to know how the MMCONFIG issues discussed in this thread are going
> > > to be handled upstream. I have a patch implemented in RHEL 5.2, but I would
> > > rather have the upstream patch implemented, whatever it is.
> >
> > Well, everyone still doesn't seem to agree on the proper way forward
> > here, so for me to just "pick one" isn't very appropriate.
> >
> > So, can we try again?
> >
> > Can people submit, what they think the change should be? Right now I
> > have Arjan's patch in my kernel tree, but will not send it to Linus for
> > .25 for now, unless everyone thinks that is the best solution at the
> > moment (which, for me, I'm leaning toward right now...)
>
> My opinion is that Ivan's patch followed by my patch is the best way
> forward. I see Arjan's patch as a good prototype, but it introduces a lot
> of unnecessary infrastructure (and a userspace interface that I dislike).
>
> I would like to see Ivan's patch merged ASAP as it does fix one of
> my machines. akpm has the patch from me to disable io decoding, and
> intends to send it to Linus during this merge window ... that patch
> becomes unnecessary if we merge Ivan's patch.
>
> My patch is an incremental improvement that adds some of the features
> of Arjan's patch without the extra infrastructure. I don't think it's
> urgent, but it does make some of our internal interfaces cleaner.
Please send me patches, in a form that can be merged, along with a
proper changelog entry, in the order in which you wish them to be
applied, so I know exactly what changes you are referring to.
thanks,
greg k-h
On Mon, Jan 28, 2008 at 02:53:34PM -0800, Greg KH wrote:
> Please send me patches, in a form that can be merged, along with a
> proper changelog entry, in the order in which you wish them to be
> applied, so I know exactly what changes you are referring to.
I'll send each patch as a reply to this email.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
PCI x86: always use conf1 to access config space below 256 bytes
Thanks to Loic Prylli <[email protected]>, who originally proposed
this idea.
Always using legacy configuration mechanism for the legacy config space
and extended mechanism (mmconf) for the extended config space is
a simple and very logical approach. It's supposed to resolve all
known mmconf problems. It still allows per-device quirks (tweaking
dev->cfg_size). It also allows to get rid of mmconf fallback code.
Signed-off-by: Ivan Kokshaysky <[email protected]>
Signed-off-by: Matthew Wilcox <[email protected]>
---
arch/x86/pci/mmconfig-shared.c | 35 -----------------------------------
arch/x86/pci/mmconfig_32.c | 22 +++++++++-------------
arch/x86/pci/mmconfig_64.c | 22 ++++++++++------------
arch/x86/pci/pci.h | 7 -------
4 files changed, 19 insertions(+), 67 deletions(-)
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 4df637e..6b521d3 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -22,42 +22,9 @@
#define MMCONFIG_APER_MIN (2 * 1024*1024)
#define MMCONFIG_APER_MAX (256 * 1024*1024)
-DECLARE_BITMAP(pci_mmcfg_fallback_slots, 32*PCI_MMCFG_MAX_CHECK_BUS);
-
/* Indicate if the mmcfg resources have been placed into the resource table. */
static int __initdata pci_mmcfg_resources_inserted;
-/* K8 systems have some devices (typically in the builtin northbridge)
- that are only accessible using type1
- Normally this can be expressed in the MCFG by not listing them
- and assigning suitable _SEGs, but this isn't implemented in some BIOS.
- Instead try to discover all devices on bus 0 that are unreachable using MM
- and fallback for them. */
-static void __init unreachable_devices(void)
-{
- int i, bus;
- /* Use the max bus number from ACPI here? */
- for (bus = 0; bus < PCI_MMCFG_MAX_CHECK_BUS; bus++) {
- for (i = 0; i < 32; i++) {
- unsigned int devfn = PCI_DEVFN(i, 0);
- u32 val1, val2;
-
- pci_conf1_read(0, bus, devfn, 0, 4, &val1);
- if (val1 == 0xffffffff)
- continue;
-
- if (pci_mmcfg_arch_reachable(0, bus, devfn)) {
- raw_pci_ops->read(0, bus, devfn, 0, 4, &val2);
- if (val1 == val2)
- continue;
- }
- set_bit(i + 32 * bus, pci_mmcfg_fallback_slots);
- printk(KERN_NOTICE "PCI: No mmconfig possible on device"
- " %02x:%02x\n", bus, i);
- }
- }
-}
-
static const char __init *pci_mmcfg_e7520(void)
{
u32 win;
@@ -270,8 +237,6 @@ void __init pci_mmcfg_init(int type)
return;
if (pci_mmcfg_arch_init()) {
- if (type == 1)
- unreachable_devices();
if (known_bridge)
pci_mmcfg_insert_resources(IORESOURCE_BUSY);
pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 1bf5816..7b75e65 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -30,10 +30,6 @@ static u32 get_base_addr(unsigned int seg, int bus, unsigned devfn)
struct acpi_mcfg_allocation *cfg;
int cfg_num;
- if (seg == 0 && bus < PCI_MMCFG_MAX_CHECK_BUS &&
- test_bit(PCI_SLOT(devfn) + 32*bus, pci_mmcfg_fallback_slots))
- return 0;
-
for (cfg_num = 0; cfg_num < pci_mmcfg_config_num; cfg_num++) {
cfg = &pci_mmcfg_config[cfg_num];
if (cfg->pci_segment == seg &&
@@ -68,13 +64,16 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
u32 base;
if ((bus > 255) || (devfn > 255) || (reg > 4095)) {
- *value = -1;
+err: *value = -1;
return -EINVAL;
}
+ if (reg < 256)
+ return pci_conf1_read(seg,bus,devfn,reg,len,value);
+
base = get_base_addr(seg, bus, devfn);
if (!base)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
+ goto err;
spin_lock_irqsave(&pci_config_lock, flags);
@@ -105,9 +104,12 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if ((bus > 255) || (devfn > 255) || (reg > 4095))
return -EINVAL;
+ if (reg < 256)
+ return pci_conf1_write(seg,bus,devfn,reg,len,value);
+
base = get_base_addr(seg, bus, devfn);
if (!base)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
+ return -EINVAL;
spin_lock_irqsave(&pci_config_lock, flags);
@@ -134,12 +136,6 @@ static struct pci_raw_ops pci_mmcfg = {
.write = pci_mmcfg_write,
};
-int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus,
- unsigned int devfn)
-{
- return get_base_addr(seg, bus, devfn) != 0;
-}
-
int __init pci_mmcfg_arch_init(void)
{
printk(KERN_INFO "PCI: Using MMCONFIG\n");
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index 4095e4d..c4cf318 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -40,9 +40,7 @@ static char __iomem *get_virt(unsigned int seg, unsigned bus)
static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned int devfn)
{
char __iomem *addr;
- if (seg == 0 && bus < PCI_MMCFG_MAX_CHECK_BUS &&
- test_bit(32*bus + PCI_SLOT(devfn), pci_mmcfg_fallback_slots))
- return NULL;
+
addr = get_virt(seg, bus);
if (!addr)
return NULL;
@@ -56,13 +54,16 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
/* Why do we have this when nobody checks it. How about a BUG()!? -AK */
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) {
- *value = -1;
+err: *value = -1;
return -EINVAL;
}
+ if (reg < 256)
+ return pci_conf1_read(seg,bus,devfn,reg,len,value);
+
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
+ goto err;
switch (len) {
case 1:
@@ -88,9 +89,12 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095)))
return -EINVAL;
+ if (reg < 256)
+ return pci_conf1_write(seg,bus,devfn,reg,len,value);
+
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
+ return -EINVAL;
switch (len) {
case 1:
@@ -126,12 +130,6 @@ static void __iomem * __init mcfg_ioremap(struct acpi_mcfg_allocation *cfg)
return addr;
}
-int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus,
- unsigned int devfn)
-{
- return pci_dev_base(seg, bus, devfn) != NULL;
-}
-
int __init pci_mmcfg_arch_init(void)
{
int i;
diff --git a/arch/x86/pci/pci.h b/arch/x86/pci/pci.h
index ac56d39..36cb44c 100644
--- a/arch/x86/pci/pci.h
+++ b/arch/x86/pci/pci.h
@@ -98,13 +98,6 @@ extern void pcibios_sort(void);
/* pci-mmconfig.c */
-/* Verify the first 16 busses. We assume that systems with more busses
- get MCFG right. */
-#define PCI_MMCFG_MAX_CHECK_BUS 16
-extern DECLARE_BITMAP(pci_mmcfg_fallback_slots, 32*PCI_MMCFG_MAX_CHECK_BUS);
-
-extern int __init pci_mmcfg_arch_reachable(unsigned int seg, unsigned int bus,
- unsigned int devfn);
extern int __init pci_mmcfg_arch_init(void);
/*
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
We want to allow different implementations of pci_raw_ops for standard
and extended config space on x86. Rather than clutter generic code with
knowledge of this, we make pci_raw_ops private to x86 and use it to
implement the new raw interface -- raw_pci_read() and raw_pci_write().
Signed-off-by: Matthew Wilcox <[email protected]>
---
arch/ia64/pci/pci.c | 25 ++++++++-----------------
arch/ia64/sn/pci/tioce_provider.c | 16 ++++++++--------
arch/x86/kernel/quirks.c | 2 +-
arch/x86/pci/common.c | 25 +++++++++++++++++++++++--
arch/x86/pci/direct.c | 4 ++--
arch/x86/pci/fixup.c | 6 ++++--
arch/x86/pci/legacy.c | 2 +-
arch/x86/pci/mmconfig-shared.c | 6 +++---
arch/x86/pci/mmconfig_32.c | 10 ++--------
arch/x86/pci/mmconfig_64.c | 8 +-------
arch/x86/pci/pci.h | 15 +++++++++++----
arch/x86/pci/visws.c | 3 ---
drivers/acpi/osl.c | 25 ++++++-------------------
drivers/ata/Kconfig | 3 +++
drivers/ata/Makefile | 3 +++
include/linux/pci.h | 16 ++++++++--------
16 files changed, 84 insertions(+), 85 deletions(-)
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 488e48a..8fd7e82 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -43,8 +43,7 @@
#define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg) \
(((u64) seg << 28) | (bus << 20) | (devfn << 12) | (reg))
-static int
-pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *value)
{
u64 addr, data = 0;
@@ -68,8 +67,7 @@ pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
return 0;
}
-static int
-pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 value)
{
u64 addr;
@@ -91,24 +89,17 @@ pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
return 0;
}
-static struct pci_raw_ops pci_sal_ops = {
- .read = pci_sal_read,
- .write = pci_sal_write
-};
-
-struct pci_raw_ops *raw_pci_ops = &pci_sal_ops;
-
-static int
-pci_read (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
+static int pci_read(struct pci_bus *bus, unsigned int devfn, int where,
+ int size, u32 *value)
{
- return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
-static int
-pci_write (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
+static int pci_write(struct pci_bus *bus, unsigned int devfn, int where,
+ int size, u32 value)
{
- return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
diff --git a/arch/ia64/sn/pci/tioce_provider.c b/arch/ia64/sn/pci/tioce_provider.c
index e1a3e19..999f14f 100644
--- a/arch/ia64/sn/pci/tioce_provider.c
+++ b/arch/ia64/sn/pci/tioce_provider.c
@@ -752,13 +752,13 @@ tioce_kern_init(struct tioce_common *tioce_common)
* Determine the secondary bus number of the port2 logical PPB.
* This is used to decide whether a given pci device resides on
* port1 or port2. Note: We don't have enough plumbing set up
- * here to use pci_read_config_xxx() so use the raw_pci_ops vector.
+ * here to use pci_read_config_xxx() so use raw_pci_read().
*/
seg = tioce_common->ce_pcibus.bs_persist_segment;
bus = tioce_common->ce_pcibus.bs_persist_busnum;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(2, 0), PCI_SECONDARY_BUS, 1,&tmp);
+ raw_pci_read(seg, bus, PCI_DEVFN(2, 0), PCI_SECONDARY_BUS, 1,&tmp);
tioce_kern->ce_port1_secondary = (u8) tmp;
/*
@@ -799,11 +799,11 @@ tioce_kern_init(struct tioce_common *tioce_common)
/* mem base/limit */
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_MEMORY_BASE, 2, &tmp);
base = (u64)tmp << 16;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_MEMORY_LIMIT, 2, &tmp);
limit = (u64)tmp << 16;
limit |= 0xfffffUL;
@@ -817,21 +817,21 @@ tioce_kern_init(struct tioce_common *tioce_common)
* attributes.
*/
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_MEMORY_BASE, 2, &tmp);
base = ((u64)tmp & PCI_PREF_RANGE_MASK) << 16;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_BASE_UPPER32, 4, &tmp);
base |= (u64)tmp << 32;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_MEMORY_LIMIT, 2, &tmp);
limit = ((u64)tmp & PCI_PREF_RANGE_MASK) << 16;
limit |= 0xfffffUL;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_LIMIT_UPPER32, 4, &tmp);
limit |= (u64)tmp << 32;
diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index fab30e1..7f73f7c 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -27,7 +27,7 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
pci_write_config_byte(dev, 0xf4, config|0x2);
/* read xTPR register */
- raw_pci_ops->read(0, 0, 0x40, 0x4c, 2, &word);
+ raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
if (!(word & (1 << 13))) {
printk(KERN_INFO "Intel E7520/7320/7525 detected. "
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 8627463..f2bd9f3 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -26,16 +26,37 @@ int pcibios_last_bus = -1;
unsigned long pirq_table_addr;
struct pci_bus *pci_root_bus;
struct pci_raw_ops *raw_pci_ops;
+struct pci_raw_ops *raw_pci_ext_ops;
+
+int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 *val)
+{
+ if (reg < 256 && raw_pci_ops)
+ return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
+ return -EINVAL;
+}
+
+int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 val)
+{
+ if (reg < 256 && raw_pci_ops)
+ return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
+ return -EINVAL;
+}
static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
{
- return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
static int pci_write(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
{
- return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
index 431c9a5..42f3e4c 100644
--- a/arch/x86/pci/direct.c
+++ b/arch/x86/pci/direct.c
@@ -14,7 +14,7 @@
#define PCI_CONF1_ADDRESS(bus, devfn, reg) \
(0x80000000 | (bus << 16) | (devfn << 8) | (reg & ~3))
-int pci_conf1_read(unsigned int seg, unsigned int bus,
+static int pci_conf1_read(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 *value)
{
unsigned long flags;
@@ -45,7 +45,7 @@ int pci_conf1_read(unsigned int seg, unsigned int bus,
return 0;
}
-int pci_conf1_write(unsigned int seg, unsigned int bus,
+static int pci_conf1_write(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 value)
{
unsigned long flags;
diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index 6cff66d..b31cd6a 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -215,7 +215,8 @@ static int quirk_aspm_offset[MAX_PCIEROOT << 3];
static int quirk_pcie_aspm_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
{
- return raw_pci_ops->read(0, bus->number, devfn, where, size, value);
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
}
/*
@@ -231,7 +232,8 @@ static int quirk_pcie_aspm_write(struct pci_bus *bus, unsigned int devfn, int wh
if ((offset) && (where == offset))
value = value & 0xfffffffc;
- return raw_pci_ops->write(0, bus->number, devfn, where, size, value);
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
}
static struct pci_ops quirk_pcie_aspm_ops = {
diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 5565d70..e041ced 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -22,7 +22,7 @@ static void __devinit pcibios_fixup_peer_bridges(void)
if (pci_find_bus(0, n))
continue;
for (devfn = 0; devfn < 256; devfn += 8) {
- if (!raw_pci_ops->read(0, n, devfn, PCI_VENDOR_ID, 2, &l) &&
+ if (!raw_pci_read(0, n, devfn, PCI_VENDOR_ID, 2, &l) &&
l != 0x0000 && l != 0xffff) {
DBG("Found device at %02x:%02x [%04x]\n", n, devfn, l);
printk(KERN_INFO "PCI: Discovered peer bus %02x\n", n);
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 6b521d3..8d54df4 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -28,7 +28,7 @@ static int __initdata pci_mmcfg_resources_inserted;
static const char __init *pci_mmcfg_e7520(void)
{
u32 win;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
win = win & 0xf000;
if(win == 0x0000 || win == 0xf000)
@@ -53,7 +53,7 @@ static const char __init *pci_mmcfg_intel_945(void)
pci_mmcfg_config_num = 1;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
/* Enable bit */
if (!(pciexbar & 1))
@@ -118,7 +118,7 @@ static int __init pci_mmcfg_check_hostbridge(void)
int i;
const char *name;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
vendor = l & 0xffff;
device = (l >> 16) & 0xffff;
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 7b75e65..081816a 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -68,9 +68,6 @@ err: *value = -1;
return -EINVAL;
}
- if (reg < 256)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
-
base = get_base_addr(seg, bus, devfn);
if (!base)
goto err;
@@ -104,9 +101,6 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if ((bus > 255) || (devfn > 255) || (reg > 4095))
return -EINVAL;
- if (reg < 256)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
-
base = get_base_addr(seg, bus, devfn);
if (!base)
return -EINVAL;
@@ -138,7 +132,7 @@ static struct pci_raw_ops pci_mmcfg = {
int __init pci_mmcfg_arch_init(void)
{
- printk(KERN_INFO "PCI: Using MMCONFIG\n");
- raw_pci_ops = &pci_mmcfg;
+ printk(KERN_INFO "PCI: Using MMCONFIG for extended config space\n");
+ raw_pci_ext_ops = &pci_mmcfg;
return 1;
}
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index c4cf318..9207fd4 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -58,9 +58,6 @@ err: *value = -1;
return -EINVAL;
}
- if (reg < 256)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
-
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
goto err;
@@ -89,9 +86,6 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095)))
return -EINVAL;
- if (reg < 256)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
-
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
return -EINVAL;
@@ -150,6 +144,6 @@ int __init pci_mmcfg_arch_init(void)
return 0;
}
}
- raw_pci_ops = &pci_mmcfg;
+ raw_pci_ext_ops = &pci_mmcfg;
return 1;
}
diff --git a/arch/x86/pci/pci.h b/arch/x86/pci/pci.h
index 36cb44c..3431518 100644
--- a/arch/x86/pci/pci.h
+++ b/arch/x86/pci/pci.h
@@ -85,10 +85,17 @@ extern spinlock_t pci_config_lock;
extern int (*pcibios_enable_irq)(struct pci_dev *dev);
extern void (*pcibios_disable_irq)(struct pci_dev *dev);
-extern int pci_conf1_write(unsigned int seg, unsigned int bus,
- unsigned int devfn, int reg, int len, u32 value);
-extern int pci_conf1_read(unsigned int seg, unsigned int bus,
- unsigned int devfn, int reg, int len, u32 *value);
+struct pci_raw_ops {
+ int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 *val);
+ int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 val);
+};
+
+extern struct pci_raw_ops *raw_pci_ops;
+extern struct pci_raw_ops *raw_pci_ext_ops;
+
+extern struct pci_raw_ops pci_direct_conf1;
extern int pci_direct_probe(void);
extern void pci_direct_init(int type);
diff --git a/arch/x86/pci/visws.c b/arch/x86/pci/visws.c
index 8ecb1c7..c2df4e9 100644
--- a/arch/x86/pci/visws.c
+++ b/arch/x86/pci/visws.c
@@ -13,9 +13,6 @@
#include "pci.h"
-
-extern struct pci_raw_ops pci_direct_conf1;
-
static int pci_visws_enable_irq(struct pci_dev *dev) { return 0; }
static void pci_visws_disable_irq(struct pci_dev *dev) { }
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index e3a673a..f190db9 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -139,15 +139,6 @@ acpi_status __init acpi_os_initialize(void)
acpi_status acpi_os_initialize1(void)
{
- /*
- * Initialize PCI configuration space access, as we'll need to access
- * it while walking the namespace (bus 0 and root bridges w/ _BBNs).
- */
- if (!raw_pci_ops) {
- printk(KERN_ERR PREFIX
- "Access to PCI configuration space unavailable\n");
- return AE_NULL_ENTRY;
- }
kacpid_wq = create_singlethread_workqueue("kacpid");
kacpi_notify_wq = create_singlethread_workqueue("kacpi_notify");
BUG_ON(!kacpid_wq);
@@ -498,11 +489,9 @@ acpi_os_read_pci_configuration(struct acpi_pci_id * pci_id, u32 reg,
return AE_ERROR;
}
- BUG_ON(!raw_pci_ops);
-
- result = raw_pci_ops->read(pci_id->segment, pci_id->bus,
- PCI_DEVFN(pci_id->device, pci_id->function),
- reg, size, value);
+ result = raw_pci_read(pci_id->segment, pci_id->bus,
+ PCI_DEVFN(pci_id->device, pci_id->function),
+ reg, size, value);
return (result ? AE_ERROR : AE_OK);
}
@@ -529,11 +518,9 @@ acpi_os_write_pci_configuration(struct acpi_pci_id * pci_id, u32 reg,
return AE_ERROR;
}
- BUG_ON(!raw_pci_ops);
-
- result = raw_pci_ops->write(pci_id->segment, pci_id->bus,
- PCI_DEVFN(pci_id->device, pci_id->function),
- reg, size, value);
+ result = raw_pci_write(pci_id->segment, pci_id->bus,
+ PCI_DEVFN(pci_id->device, pci_id->function),
+ reg, size, value);
return (result ? AE_ERROR : AE_OK);
}
diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index ba63619..1e71dc0 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -40,6 +40,9 @@ config ATA_ACPI
You can disable this at kernel boot time by using the
option libata.noacpi=1
+config ATA_RAM
+ tristate "ATA RAM driver"
+
config SATA_AHCI
tristate "AHCI SATA support"
depends on PCI
diff --git a/drivers/ata/Makefile b/drivers/ata/Makefile
index b13feb2..bc2eef0 100644
--- a/drivers/ata/Makefile
+++ b/drivers/ata/Makefile
@@ -75,6 +75,9 @@ obj-$(CONFIG_ATA_GENERIC) += ata_generic.o
# Should be last libata driver
obj-$(CONFIG_PATA_LEGACY) += pata_legacy.o
+# A fake ata driver. Can it be postultimate?
+obj-$(CONFIG_ATA_RAM) += ata_ram.o
+
libata-objs := libata-core.o libata-scsi.o libata-sff.o libata-eh.o \
libata-pmp.o
libata-$(CONFIG_ATA_ACPI) += libata-acpi.o
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0dd93bb..f4f1edd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -304,14 +304,14 @@ struct pci_ops {
int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
};
-struct pci_raw_ops {
- int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
- int reg, int len, u32 *val);
- int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
- int reg, int len, u32 val);
-};
-
-extern struct pci_raw_ops *raw_pci_ops;
+/*
+ * ACPI needs to be able to access PCI config space before we've done a
+ * PCI bus scan and created pci_bus structures.
+ */
+extern int raw_pci_read(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 *val);
+extern int raw_pci_write(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 val);
struct pci_bus_region {
unsigned long start;
--
1.5.2.5
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Mon, 28 Jan 2008 12:44:31 -0800
Greg KH <[email protected]> wrote:
> On Mon, Jan 28, 2008 at 01:32:06PM -0500, Tony Camuso wrote:
> > Greg,
> >
> > Have you given Grant's suggestion any further consideration?
> >
> > I'd like to know how the MMCONFIG issues discussed in this thread
> > are going to be handled upstream. I have a patch implemented in
> > RHEL 5.2, but I would rather have the upstream patch implemented,
> > whatever it is.
>
> Well, everyone still doesn't seem to agree on the proper way forward
> here, so for me to just "pick one" isn't very appropriate.
>
> So, can we try again?
I think there's only one fundamental disagreement; and that is:
do we think that things are now totally fixed and no new major issues
will arrive after the "fix yet another mmconfig thing" patches are merged.
If the answer is no, then imho my patch is the right approach; it will limit the damage and doesn't make
the people suffer who don't need extended config space.
If the answer is yet, then my patch is not needed.
This is a judgment call; I'm skeptical, others are more optimistic that after 2 years of messing around
they have finally found the last golden fix.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Mon, Jan 28, 2008 at 07:05:05PM -0800, Arjan van de Ven wrote:
> I think there's only one fundamental disagreement; and that is:
> do we think that things are now totally fixed and no new major issues
> will arrive after the "fix yet another mmconfig thing" patches are merged.
>
> If the answer is no, then imho my patch is the right approach; it will limit the damage and doesn't make
> the people suffer who don't need extended config space.
> If the answer is yet, then my patch is not needed.
>
> This is a judgment call; I'm skeptical, others are more optimistic that after 2 years of messing around
> they have finally found the last golden fix.
I'm more optimistic because we've so severely restricted the use of
mmconf after these patches that it's unlikely to cause problems. I also
hear Vista is now using mmconf, so fewer implementations are going to
be buggy at this point.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Mon, Jan 28, 2008 at 08:18:04PM -0700, Matthew Wilcox wrote:
> On Mon, Jan 28, 2008 at 07:05:05PM -0800, Arjan van de Ven wrote:
> > I think there's only one fundamental disagreement; and that is:
> > do we think that things are now totally fixed and no new major issues
> > will arrive after the "fix yet another mmconfig thing" patches are merged.
> >
> > If the answer is no, then imho my patch is the right approach; it will limit the damage and doesn't make
> > the people suffer who don't need extended config space.
> > If the answer is yet, then my patch is not needed.
> >
> > This is a judgment call; I'm skeptical, others are more optimistic that after 2 years of messing around
> > they have finally found the last golden fix.
>
> I'm more optimistic because we've so severely restricted the use of
> mmconf after these patches that it's unlikely to cause problems. I also
> hear Vista is now using mmconf, so fewer implementations are going to
> be buggy at this point.
Hahahaha, oh, that's a good one...
But what about the thousands of implementations out there that are
buggy?
I'm with Arjan here, I'm very skeptical.
Matthew, with Arjan's patch, is anything that currently works now
broken? Why do you feel it is somehow "wrong"?
thanks,
greg k-h
On Mon, Jan 28, 2008 at 07:57:44PM -0700, Matthew Wilcox wrote:
> PCI x86: always use conf1 to access config space below 256 bytes
>
> Thanks to Loic Prylli <[email protected]>, who originally proposed
> this idea.
>
> Always using legacy configuration mechanism for the legacy config space
> and extended mechanism (mmconf) for the extended config space is
> a simple and very logical approach. It's supposed to resolve all
> known mmconf problems. It still allows per-device quirks (tweaking
> dev->cfg_size). It also allows to get rid of mmconf fallback code.
>
> Signed-off-by: Ivan Kokshaysky <[email protected]>
> Signed-off-by: Matthew Wilcox <[email protected]>
Hm, who wrote this, Ivan?
If so, Matthew, please do not strip off authorship of patches, and place
a "From:" line on the first line above the description, so it is not
lost.
thanks,
greg k-h
Greg KH wrote:
> On Mon, Jan 28, 2008 at 08:18:04PM -0700, Matthew Wilcox wrote:
>> I'm more optimistic because we've so severely restricted the use of
>> mmconf after these patches that it's unlikely to cause problems. I also
>> hear Vista is now using mmconf, so fewer implementations are going to
>> be buggy at this point.
>
> Hahahaha, oh, that's a good one...
>
> But what about the thousands of implementations out there that are
> buggy?
>
> I'm with Arjan here, I'm very skeptical.
>
> Matthew, with Arjan's patch, is anything that currently works now
> broken? Why do you feel it is somehow "wrong"?
>
> thanks,
>
> greg k-h
Greg,
The problem with Arjan's patch, if I understand it correctly, is that it
requires drivers to make a call to access extended PCI config space.
And, IIRC, Arjan's patch encumbers drivers for all arch's, even those
that have no MMCONFIG problems.
The patches proposed by Loic, Ivan, Matthew, and myself, all address the
problem in an x86-specific manner that is transparent to the drivers.
On Tue, 29 Jan 2008 09:15:02 -0500
Tony Camuso <[email protected]> wrote:
> Greg KH wrote:
> > On Mon, Jan 28, 2008 at 08:18:04PM -0700, Matthew Wilcox wrote:
> >> I'm more optimistic because we've so severely restricted the use of
> >> mmconf after these patches that it's unlikely to cause problems.
> >> I also hear Vista is now using mmconf, so fewer implementations
> >> are going to be buggy at this point.
> >
> > Hahahaha, oh, that's a good one...
> >
> > But what about the thousands of implementations out there that are
> > buggy?
> >
> > I'm with Arjan here, I'm very skeptical.
> >
> > Matthew, with Arjan's patch, is anything that currently works now
> > broken? Why do you feel it is somehow "wrong"?
> >
> > thanks,
> >
> > greg k-h
>
> Greg,
>
> The problem with Arjan's patch, if I understand it correctly, is that
> it requires drivers to make a call to access extended PCI config
> space.
>
> And, IIRC, Arjan's patch encumbers drivers for all arch's, even those
> that have no MMCONFIG problems.
>
> The patches proposed by Loic, Ivan, Matthew, and myself, all address
> the problem in an x86-specific manner that is transparent to the
> drivers.
this is not quite correct; the patches from Loic, Ivan, Matthew and you are for a different
problem statement.
Your patch problem statement is "need to fix mmconfig", my patch problem statement is "need
to not make users who don't need it suffer". These are orthogonal problems.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
Arjan van de Ven wrote:
> On Tue, 29 Jan 2008 09:15:02 -0500
> Tony Camuso <[email protected]> wrote:
>
>> Greg,
>>
>> The problem with Arjan's patch, if I understand it correctly, is that
>> it requires drivers to make a call to access extended PCI config
>> space.
>>
>> And, IIRC, Arjan's patch encumbers drivers for all arch's, even those
>> that have no MMCONFIG problems.
>>
>> The patches proposed by Loic, Ivan, Matthew, and myself, all address
>> the problem in an x86-specific manner that is transparent to the
>> drivers.
>
> this is not quite correct; the patches from Loic, Ivan, Matthew and you are for a different
> problem statement.
>
> Your patch problem statement is "need to fix mmconfig", my patch problem statement is "need
> to not make users who don't need it suffer". These are orthogonal problems.
>
>
Yes, but your patch also makes users who need extended PCI config space suffer.
Right now, that isn't a lot of people in x86 land, but your patch encumbers drivers
for non-x86 archs with an additional call to access space that they've never had
a problem with.
As more PCI express drivers start to take advantage of AER and other advanced
express capabilities, the extra call to address a condition specific to legacy
x86 hardware is, IMNSHO, a kludge.
The patches submitted by the others fix the problems with MMCONFIG without
encumbering the drivers to be aware of any difference between legacy config
space and extended config space.
I have tested these patches on a number of systems exhibiting various MMCONFIG-
related pathologies, and they work.
On Tue, 29 Jan 2008 10:15:45 -0500
Tony Camuso <[email protected]> wrote:
> Arjan van de Ven wrote:
> > On Tue, 29 Jan 2008 09:15:02 -0500
> > Tony Camuso <[email protected]> wrote:
> >
> >> Greg,
> >>
> >> The problem with Arjan's patch, if I understand it correctly, is
> >> that it requires drivers to make a call to access extended PCI
> >> config space.
> >>
> >> And, IIRC, Arjan's patch encumbers drivers for all arch's, even
> >> those that have no MMCONFIG problems.
> >>
> >> The patches proposed by Loic, Ivan, Matthew, and myself, all
> >> address the problem in an x86-specific manner that is transparent
> >> to the drivers.
> >
> > this is not quite correct; the patches from Loic, Ivan, Matthew and
> > you are for a different problem statement.
> >
> > Your patch problem statement is "need to fix mmconfig", my patch
> > problem statement is "need to not make users who don't need it
> > suffer". These are orthogonal problems.
> >
> >
>
> Yes, but your patch also makes users who need extended PCI config
> space suffer.
>
> Right now, that isn't a lot of people in x86 land, but your patch
> encumbers drivers for non-x86 archs with an additional call to access
> space that they've never had a problem with.
lets say s/x86/x86, IA64 and architectures that use intel, amd or via chipsets/
> As more PCI express drivers start to take advantage of AER and other
> advanced express capabilities, the extra call to address a condition
> specific to legacy x86 hardware is, IMNSHO, a kludge.
in addition to pci_enable(), pci_enable_msi(), pci_enable_busmaster() they already need to do
to enable various features?
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
Arjan van de Ven wrote:
> On Tue, 29 Jan 2008 10:15:45 -0500
> Tony Camuso <[email protected]> wrote:
>
>> specific to legacy x86 hardware is, IMNSHO, a kludge.
>
> in addition to pci_enable(), pci_enable_msi(), pci_enable_busmaster() they already need to do
> to enable various features?
>
These calls are related to generic aspects of the PCI* landscape itself and are
not related to any arch-specific hardware, nor were they devised to address
chipset-specific or BIOS-specific problems.
For the good of all, we should endeavor to avoid putting arch-specific fixes into
the generic code whenever possible.
And in this case, not only is it possible, it's been done and tested.
On Tue, Jan 29, 2008 at 05:21:08AM -0800, Greg KH wrote:
> Hm, who wrote this, Ivan?
>
> If so, Matthew, please do not strip off authorship of patches, and place
> a "From:" line on the first line above the description, so it is not
> lost.
Sorry, I didn't know that was the convention. I thought the first
Signed-off-by: was assumed to be the author.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Tue, Jan 29, 2008 at 07:29:51AM -0800, Arjan van de Ven wrote:
> > Right now, that isn't a lot of people in x86 land, but your patch
> > encumbers drivers for non-x86 archs with an additional call to access
> > space that they've never had a problem with.
>
> lets say s/x86/x86, IA64 and architectures that use intel, amd or via chipsets/
Umm .. ia64 already does exactly what I'm proposing for x86. It uses
one SAL interface for bytes below 256 and a different SAL interface for
bytes 256-4095.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Tue, 29 Jan 2008, Matthew Wilcox wrote:
>
> Sorry, I didn't know that was the convention. I thought the first
> Signed-off-by: was assumed to be the author.
There's certainly a strong correlation between "first sign-off" and
authorship, but signing off doesn't guarantee it, and while it's not the
bulk of patches, it certainly happens that people sign off on patches made
by others (either because the company has specific people who have the
right to sign off on things, or simply because the code comes from some
source that did GPL it, but perhaps didn't sign off on it - hopefully
rare, but certainly not impossible or unheard of especially for
one-liners that got picked up from mailing lists etc)
Linus
Matthew Wilcox wrote:
> On Tue, Jan 29, 2008 at 07:29:51AM -0800, Arjan van de Ven wrote:
>>> Right now, that isn't a lot of people in x86 land, but your patch
>>> encumbers drivers for non-x86 archs with an additional call to access
>>> space that they've never had a problem with.
>> lets say s/x86/x86, IA64 and architectures that use intel, amd or via chipsets/
>
> Umm .. ia64 already does exactly what I'm proposing for x86. It uses
> one SAL interface for bytes below 256 and a different SAL interface for
> bytes 256-4095.
>
Not exactly.
:)
The interface is the same, ia64_sal_pci_config_write() and ia64_sal_pci_config_read(),
but a flag bit in the mode argument is used to tell the SAL interface whether to
translate the offset component of the config address as having 8 or 12 bits of
of displacement.
In my estimation, Ivan's patch, in his implementation of Loic's suggestion, is even
more elegant, since there is no need to flag whether the access is for offsets below
256. Ivan's code automatically uses Port IO (or equivalent with Matthew's patch) for
offsets below 256 and MMCONFIG for offsets from 256 to 4096.
And even better, it removes the bitmap that tracks MMCONFIG-unfriendly devices for
the first 16 buses, a solution that assumes systems with bus numbers higher than 16
will get MMCONFIG right, which turned out to be a very wrong assumption. Furthermore,
the config address is translated by the Northbridge. The delivery mechanism to
the Northbridge, whether Port IO or MMCONFIG, is utterly opaque to the devices on the
bus, since all they see is PCI config cycles, not Port IO or MMCONFIG cycles. The test
only needed to be made at the Northbridge level, not at the device level. Ivan's patch
removes all this cruft.
On Tue, Jan 29, 2008 at 05:19:55AM -0800, Greg KH wrote:
> On Mon, Jan 28, 2008 at 08:18:04PM -0700, Matthew Wilcox wrote:
> > I'm more optimistic because we've so severely restricted the use of
> > mmconf after these patches that it's unlikely to cause problems. I also
> > hear Vista is now using mmconf, so fewer implementations are going to
> > be buggy at this point.
>
> Hahahaha, oh, that's a good one...
Thanks Greg. What happened to "Can't we all try to get along"?
> But what about the thousands of implementations out there that are
> buggy?
>
> I'm with Arjan here, I'm very skeptical.
Maybe I'm insufficiently imaginative. Can you come up with a plausible
way in which the two patches I posted will succumb to bugs? After those
patches we only use mmconf if:
1. conf1 has failed to work
OR
2. user has compiled their own kernel without support for conf1
OR
3. kernel probes config space 0x100 to see if it can access extended
config space (requires the device to be PCIe or PCI-X2)
OR
4. root attempts to lspci -xxxx or lspci -v
OR
5. device driver tries to access extended config space
With Arjan's patch, I believe only case 3 changes. In cases 4 and 5,
either lspci or the device driver will jump through the hoop to enable
access to extended config space.
> Matthew, with Arjan's patch, is anything that currently works now
> broken? Why do you feel it is somehow "wrong"?
lspci is broken. It used to be able to access extended config space, and
now can't unless it is patched to know about the sysfs flag to enable it.
If you're determined to implement something to disable extended config
space by default, it can be done in a much better way than Arjan's patch
-- less code (both source and object).
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Tue, Jan 29, 2008 at 08:45:55PM -0700, Matthew Wilcox wrote:
> On Tue, Jan 29, 2008 at 05:19:55AM -0800, Greg KH wrote:
> > Matthew, with Arjan's patch, is anything that currently works now
> > broken? Why do you feel it is somehow "wrong"?
>
> lspci is broken. It used to be able to access extended config space, and
> now can't unless it is patched to know about the sysfs flag to enable it.
There is also likely damage to Xorg for the very same reason.
Ivan.
On Wed, 30 Jan 2008 18:15:39 +0300
Ivan Kokshaysky <[email protected]> wrote:
> On Tue, Jan 29, 2008 at 08:45:55PM -0700, Matthew Wilcox wrote:
> > On Tue, Jan 29, 2008 at 05:19:55AM -0800, Greg KH wrote:
> > > Matthew, with Arjan's patch, is anything that currently works now
> > > broken? Why do you feel it is somehow "wrong"?
> >
> > lspci is broken. It used to be able to access extended config
> > space, and now can't unless it is patched to know about the sysfs
> > flag to enable it.
>
> There is also likely damage to Xorg for the very same reason.
>
Xorg doesn't do pci express ..
(newer ones actually have gotten out of the "do the PCI layer ourselves" business entirely)
> Ivan.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Wed, Jan 30, 2008 at 07:42:49AM -0800, Arjan van de Ven wrote:
> Xorg doesn't do pci express ..
Xorg core provides a set of PCI config access functions (via sysfs) for
the graphics drivers. These functions do work correctly with offsets > 256
bytes. Can you guarantee that none of PCI-E video drivers use that,
including proprietary nvidia and ati ones?
> (newer ones actually have gotten out of the "do the PCI layer ourselves" business entirely)
Unfortunately, not completely true. Though it has nothing to do with
extended config space.
Ivan.
On Tuesday 29 January 2008 05:19:55 am Greg KH wrote:
> Hahahaha, oh, that's a good one...
>
> But what about the thousands of implementations out there that are
> buggy?
>
> I'm with Arjan here, I'm very skeptical.
Ugg, let's look at the actual data (again); I'm really not sure why people
are jumping to such dire conclusions about the current state of things.
AIUI we only have 3 issues so far (remember mmconfig has been enabled in -mm
for a long time):
1) host bridge decode problems (disabling decode to avoid overlaps can
cause some bridges to stop decoding RAM addrs, but we have a fix for that)
2) config space retry on ATI (I think willy already debunked this one?)
3) some FUD about SMM or other firmware interrupts coming in during BAR
sizing while decode is disabled (this one is just pure FUD; if we want to
solve it properly we need a new platform hook to disable SMM/NMI/etc.
around PCI probing)
What else was there? What reason do we have to think that things are so
disastrous?
So I really prefer willy's approach to Arjan's alternative...
Jesse
On Jan 28, 2008 7:03 PM, Matthew Wilcox <[email protected]> wrote:
>
> We want to allow different implementations of pci_raw_ops for standard
> and extended config space on x86. Rather than clutter generic code with
> knowledge of this, we make pci_raw_ops private to x86 and use it to
> implement the new raw interface -- raw_pci_read() and raw_pci_write().
>
> Signed-off-by: Matthew Wilcox <[email protected]>
> ---
> arch/ia64/pci/pci.c | 25 ++++++++-----------------
> arch/ia64/sn/pci/tioce_provider.c | 16 ++++++++--------
> arch/x86/kernel/quirks.c | 2 +-
> arch/x86/pci/common.c | 25 +++++++++++++++++++++++--
> arch/x86/pci/direct.c | 4 ++--
> arch/x86/pci/fixup.c | 6 ++++--
> arch/x86/pci/legacy.c | 2 +-
> arch/x86/pci/mmconfig-shared.c | 6 +++---
> arch/x86/pci/mmconfig_32.c | 10 ++--------
> arch/x86/pci/mmconfig_64.c | 8 +-------
> arch/x86/pci/pci.h | 15 +++++++++++----
> arch/x86/pci/visws.c | 3 ---
> drivers/acpi/osl.c | 25 ++++++-------------------
> drivers/ata/Kconfig | 3 +++
> drivers/ata/Makefile | 3 +++
> include/linux/pci.h | 16 ++++++++--------
> 16 files changed, 84 insertions(+), 85 deletions(-)
>
...
>
> diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
> index fab30e1..7f73f7c 100644
> --- a/arch/x86/kernel/quirks.c
> +++ b/arch/x86/kernel/quirks.c
> @@ -27,7 +27,7 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
> pci_write_config_byte(dev, 0xf4, config|0x2);
>
> /* read xTPR register */
> - raw_pci_ops->read(0, 0, 0x40, 0x4c, 2, &word);
> + raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
>
> if (!(word & (1 << 13))) {
> printk(KERN_INFO "Intel E7520/7320/7525 detected. "
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 8627463..f2bd9f3 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -26,16 +26,37 @@ int pcibios_last_bus = -1;
> unsigned long pirq_table_addr;
> struct pci_bus *pci_root_bus;
> struct pci_raw_ops *raw_pci_ops;
> +struct pci_raw_ops *raw_pci_ext_ops;
> +
> +int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
> + int reg, int len, u32 *val)
> +{
> + if (reg < 256 && raw_pci_ops)
> + return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
> + if (raw_pci_ext_ops)
> + return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
> + return -EINVAL;
> +}
> +
> +int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
> + int reg, int len, u32 val)
> +{
> + if (reg < 256 && raw_pci_ops)
> + return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
> + if (raw_pci_ext_ops)
> + return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
> + return -EINVAL;
> +}
>
> static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
> {
> - return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
> + return raw_pci_read(pci_domain_nr(bus), bus->number,
> devfn, where, size, value);
> }
>
> static int pci_write(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
> {
> - return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
> + return raw_pci_write(pci_domain_nr(bus), bus->number,
> devfn, where, size, value);
> }
>
> diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
> index 431c9a5..42f3e4c 100644
> --- a/arch/x86/pci/direct.c
> +++ b/arch/x86/pci/direct.c
> @@ -14,7 +14,7 @@
> #define PCI_CONF1_ADDRESS(bus, devfn, reg) \
> (0x80000000 | (bus << 16) | (devfn << 8) | (reg & ~3))
>
> -int pci_conf1_read(unsigned int seg, unsigned int bus,
> +static int pci_conf1_read(unsigned int seg, unsigned int bus,
> unsigned int devfn, int reg, int len, u32 *value)
> {
> unsigned long flags;
> @@ -45,7 +45,7 @@ int pci_conf1_read(unsigned int seg, unsigned int bus,
> return 0;
> }
>
> -int pci_conf1_write(unsigned int seg, unsigned int bus,
> +static int pci_conf1_write(unsigned int seg, unsigned int bus,
> unsigned int devfn, int reg, int len, u32 value)
> {
> unsigned long flags;
any reason to change pci_conf1_read/write to static?
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index 6cff66d..b31cd6a 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -215,7 +215,8 @@ static int quirk_aspm_offset[MAX_PCIEROOT << 3];
>
> static int quirk_pcie_aspm_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
> {
> - return raw_pci_ops->read(0, bus->number, devfn, where, size, value);
> + return raw_pci_read(pci_domain_nr(bus), bus->number,
> + devfn, where, size, value);
> }
>
> /*
> @@ -231,7 +232,8 @@ static int quirk_pcie_aspm_write(struct pci_bus *bus, unsigned int devfn, int wh
> if ((offset) && (where == offset))
> value = value & 0xfffffffc;
>
> - return raw_pci_ops->write(0, bus->number, devfn, where, size, value);
> + return raw_pci_write(pci_domain_nr(bus), bus->number,
> + devfn, where, size, value);
> }
>
> static struct pci_ops quirk_pcie_aspm_ops = {
> diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
> index 5565d70..e041ced 100644
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -22,7 +22,7 @@ static void __devinit pcibios_fixup_peer_bridges(void)
> if (pci_find_bus(0, n))
> continue;
> for (devfn = 0; devfn < 256; devfn += 8) {
> - if (!raw_pci_ops->read(0, n, devfn, PCI_VENDOR_ID, 2, &l) &&
> + if (!raw_pci_read(0, n, devfn, PCI_VENDOR_ID, 2, &l) &&
> l != 0x0000 && l != 0xffff) {
> DBG("Found device at %02x:%02x [%04x]\n", n, devfn, l);
> printk(KERN_INFO "PCI: Discovered peer bus %02x\n", n);
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index 6b521d3..8d54df4 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -28,7 +28,7 @@ static int __initdata pci_mmcfg_resources_inserted;
> static const char __init *pci_mmcfg_e7520(void)
> {
> u32 win;
> - pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
> + pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
>
> win = win & 0xf000;
> if(win == 0x0000 || win == 0xf000)
> @@ -53,7 +53,7 @@ static const char __init *pci_mmcfg_intel_945(void)
>
> pci_mmcfg_config_num = 1;
>
> - pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
> + pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
>
> /* Enable bit */
> if (!(pciexbar & 1))
> @@ -118,7 +118,7 @@ static int __init pci_mmcfg_check_hostbridge(void)
> int i;
> const char *name;
>
> - pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
> + pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
> vendor = l & 0xffff;
> device = (l >> 16) & 0xffff;
>
> diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
> index 7b75e65..081816a 100644
> --- a/arch/x86/pci/mmconfig_32.c
> +++ b/arch/x86/pci/mmconfig_32.c
> @@ -68,9 +68,6 @@ err: *value = -1;
> return -EINVAL;
> }
>
> - if (reg < 256)
> - return pci_conf1_read(seg,bus,devfn,reg,len,value);
> -
> base = get_base_addr(seg, bus, devfn);
> if (!base)
> goto err;
> @@ -104,9 +101,6 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
> if ((bus > 255) || (devfn > 255) || (reg > 4095))
> return -EINVAL;
>
> - if (reg < 256)
> - return pci_conf1_write(seg,bus,devfn,reg,len,value);
> -
> base = get_base_addr(seg, bus, devfn);
> if (!base)
> return -EINVAL;
> @@ -138,7 +132,7 @@ static struct pci_raw_ops pci_mmcfg = {
>
> int __init pci_mmcfg_arch_init(void)
> {
> - printk(KERN_INFO "PCI: Using MMCONFIG\n");
> - raw_pci_ops = &pci_mmcfg;
> + printk(KERN_INFO "PCI: Using MMCONFIG for extended config space\n");
> + raw_pci_ext_ops = &pci_mmcfg;
> return 1;
> }
> diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
> index c4cf318..9207fd4 100644
> --- a/arch/x86/pci/mmconfig_64.c
> +++ b/arch/x86/pci/mmconfig_64.c
> @@ -58,9 +58,6 @@ err: *value = -1;
> return -EINVAL;
> }
>
> - if (reg < 256)
> - return pci_conf1_read(seg,bus,devfn,reg,len,value);
> -
> addr = pci_dev_base(seg, bus, devfn);
> if (!addr)
> goto err;
> @@ -89,9 +86,6 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
> if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095)))
> return -EINVAL;
>
> - if (reg < 256)
> - return pci_conf1_write(seg,bus,devfn,reg,len,value);
> -
> addr = pci_dev_base(seg, bus, devfn);
> if (!addr)
> return -EINVAL;
> @@ -150,6 +144,6 @@ int __init pci_mmcfg_arch_init(void)
> return 0;
> }
> }
> - raw_pci_ops = &pci_mmcfg;
> + raw_pci_ext_ops = &pci_mmcfg;
> return 1;
> }
> diff --git a/arch/x86/pci/pci.h b/arch/x86/pci/pci.h
> index 36cb44c..3431518 100644
> --- a/arch/x86/pci/pci.h
> +++ b/arch/x86/pci/pci.h
> @@ -85,10 +85,17 @@ extern spinlock_t pci_config_lock;
> extern int (*pcibios_enable_irq)(struct pci_dev *dev);
> extern void (*pcibios_disable_irq)(struct pci_dev *dev);
>
> -extern int pci_conf1_write(unsigned int seg, unsigned int bus,
> - unsigned int devfn, int reg, int len, u32 value);
> -extern int pci_conf1_read(unsigned int seg, unsigned int bus,
> - unsigned int devfn, int reg, int len, u32 *value);
> +struct pci_raw_ops {
> + int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
> + int reg, int len, u32 *val);
> + int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
> + int reg, int len, u32 val);
> +};
> +
> +extern struct pci_raw_ops *raw_pci_ops;
> +extern struct pci_raw_ops *raw_pci_ext_ops;
> +
> +extern struct pci_raw_ops pci_direct_conf1;
>
> extern int pci_direct_probe(void);
> extern void pci_direct_init(int type);
> diff --git a/arch/x86/pci/visws.c b/arch/x86/pci/visws.c
> index 8ecb1c7..c2df4e9 100644
> --- a/arch/x86/pci/visws.c
> +++ b/arch/x86/pci/visws.c
> @@ -13,9 +13,6 @@
>
> #include "pci.h"
>
> -
> -extern struct pci_raw_ops pci_direct_conf1;
> -
> static int pci_visws_enable_irq(struct pci_dev *dev) { return 0; }
> static void pci_visws_disable_irq(struct pci_dev *dev) { }
>
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index e3a673a..f190db9 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -139,15 +139,6 @@ acpi_status __init acpi_os_initialize(void)
>
> acpi_status acpi_os_initialize1(void)
> {
> - /*
> - * Initialize PCI configuration space access, as we'll need to access
> - * it while walking the namespace (bus 0 and root bridges w/ _BBNs).
> - */
> - if (!raw_pci_ops) {
> - printk(KERN_ERR PREFIX
> - "Access to PCI configuration space unavailable\n");
> - return AE_NULL_ENTRY;
> - }
> kacpid_wq = create_singlethread_workqueue("kacpid");
> kacpi_notify_wq = create_singlethread_workqueue("kacpi_notify");
> BUG_ON(!kacpid_wq);
> @@ -498,11 +489,9 @@ acpi_os_read_pci_configuration(struct acpi_pci_id * pci_id, u32 reg,
> return AE_ERROR;
> }
>
> - BUG_ON(!raw_pci_ops);
> -
> - result = raw_pci_ops->read(pci_id->segment, pci_id->bus,
> - PCI_DEVFN(pci_id->device, pci_id->function),
> - reg, size, value);
> + result = raw_pci_read(pci_id->segment, pci_id->bus,
> + PCI_DEVFN(pci_id->device, pci_id->function),
> + reg, size, value);
>
> return (result ? AE_ERROR : AE_OK);
> }
> @@ -529,11 +518,9 @@ acpi_os_write_pci_configuration(struct acpi_pci_id * pci_id, u32 reg,
> return AE_ERROR;
> }
>
> - BUG_ON(!raw_pci_ops);
> -
> - result = raw_pci_ops->write(pci_id->segment, pci_id->bus,
> - PCI_DEVFN(pci_id->device, pci_id->function),
> - reg, size, value);
> + result = raw_pci_write(pci_id->segment, pci_id->bus,
> + PCI_DEVFN(pci_id->device, pci_id->function),
> + reg, size, value);
>
> return (result ? AE_ERROR : AE_OK);
> }
> diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
> index ba63619..1e71dc0 100644
> --- a/drivers/ata/Kconfig
> +++ b/drivers/ata/Kconfig
> @@ -40,6 +40,9 @@ config ATA_ACPI
> You can disable this at kernel boot time by using the
> option libata.noacpi=1
>
> +config ATA_RAM
> + tristate "ATA RAM driver"
> +
related?
YH
Matthew,
Perhaps I missed it, but did you address Yinghai's concerns?
Yinghai Lu wrote:
> On Jan 28, 2008 7:03 PM, Matthew Wilcox <[email protected]> wrote:
>>
>> -int pci_conf1_write(unsigned int seg, unsigned int bus,
>> +static int pci_conf1_write(unsigned int seg, unsigned int bus,
>> unsigned int devfn, int reg, int len, u32 value)
>
> any reason to change pci_conf1_read/write to static?
>
>>
>> +config ATA_RAM
>> + tristate "ATA RAM driver"
>> +
>
> related?
>
> YH
On Thu, 07 Feb 2008 10:54:05 -0500
Tony Camuso <[email protected]> wrote:
> Matthew,
>
> Perhaps I missed it, but did you address Yinghai's concerns?
>
> Yinghai Lu wrote:
> > On Jan 28, 2008 7:03 PM, Matthew Wilcox <[email protected]> wrote:
> >>
> >> -int pci_conf1_write(unsigned int seg, unsigned int bus,
> >> +static int pci_conf1_write(unsigned int seg, unsigned int bus,
> >> unsigned int devfn, int reg, int len,
> >> u32 value)
> >
> > any reason to change pci_conf1_read/write to static?
> >
>
nothing should use these directly. So static is the right answer ;)
Arjan van de Ven wrote:
> On Thu, 07 Feb 2008 10:54:05 -0500
> Tony Camuso <[email protected]> wrote:
>
>> Matthew,
>>
>> Perhaps I missed it, but did you address Yinghai's concerns?
>>
>> Yinghai Lu wrote:
>>> On Jan 28, 2008 7:03 PM, Matthew Wilcox <[email protected]> wrote:
>>>> -int pci_conf1_write(unsigned int seg, unsigned int bus,
>>>> +static int pci_conf1_write(unsigned int seg, unsigned int bus,
>>>> unsigned int devfn, int reg, int len,
>>>> u32 value)
>>> any reason to change pci_conf1_read/write to static?
>>>
>
> nothing should use these directly. So static is the right answer ;)
Agreed. Thanks, Arjan.
Matthew,
What about the ATA_RAM addition to Kconfig? Was it accidental,
or intended? If intended, how is it related?
On Thu, Feb 07, 2008 at 11:36:18AM -0500, Tony Camuso wrote:
> Arjan van de Ven wrote:
>> On Thu, 07 Feb 2008 10:54:05 -0500
>> Tony Camuso <[email protected]> wrote:
>>> Matthew,
>>>
>>> Perhaps I missed it, but did you address Yinghai's concerns?
>>>
>>> Yinghai Lu wrote:
>>>> On Jan 28, 2008 7:03 PM, Matthew Wilcox <[email protected]> wrote:
>>>>> -int pci_conf1_write(unsigned int seg, unsigned int bus,
>>>>> +static int pci_conf1_write(unsigned int seg, unsigned int bus,
>>>>> unsigned int devfn, int reg, int len,
>>>>> u32 value)
>>>> any reason to change pci_conf1_read/write to static?
>>>>
>> nothing should use these directly. So static is the right answer ;)
>
> Agreed. Thanks, Arjan.
>
> Matthew,
> What about the ATA_RAM addition to Kconfig? Was it accidental,
> or intended? If intended, how is it related?
AFAICT, it looks accidental. I can't see how it's related.
He should be back online next week and can answer for himself.
hth,
grant
On Thu, Feb 07, 2008 at 10:54:05AM -0500, Tony Camuso wrote:
> Matthew,
>
> Perhaps I missed it, but did you address Yinghai's concerns?
No, I was on holiday.
> Yinghai Lu wrote:
> >On Jan 28, 2008 7:03 PM, Matthew Wilcox <[email protected]> wrote:
> >>
> >>-int pci_conf1_write(unsigned int seg, unsigned int bus,
> >>+static int pci_conf1_write(unsigned int seg, unsigned int bus,
> >> unsigned int devfn, int reg, int len, u32
> >> value)
> >
> >any reason to change pci_conf1_read/write to static?
Yes -- it no longer needs to be called from outside this file.
> >>+config ATA_RAM
> >>+ tristate "ATA RAM driver"
> >>+
> >
> >related?
No. An unrelated patch that I didn't trim out.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Feb 9, 2008 4:41 AM, Matthew Wilcox <[email protected]> wrote:
> On Thu, Feb 07, 2008 at 10:54:05AM -0500, Tony Camuso wrote:
> > Matthew,
> >
> > Perhaps I missed it, but did you address Yinghai's concerns?
>
> No, I was on holiday.
>
> > Yinghai Lu wrote:
> > >On Jan 28, 2008 7:03 PM, Matthew Wilcox <[email protected]> wrote:
> > >>
> > >>-int pci_conf1_write(unsigned int seg, unsigned int bus,
> > >>+static int pci_conf1_write(unsigned int seg, unsigned int bus,
> > >> unsigned int devfn, int reg, int len, u32
> > >> value)
> > >
> > >any reason to change pci_conf1_read/write to static?
>
> Yes -- it no longer needs to be called from outside this file.
>
> > >>+config ATA_RAM
> > >>+ tristate "ATA RAM driver"
> > >>+
> > >
> > >related?
>
looks good. it should get into -mm or x86/mm for some testing
YH
On Sat, Feb 09, 2008 at 10:25:23PM -0800, Yinghai Lu wrote:
> On Feb 9, 2008 4:41 AM, Matthew Wilcox <[email protected]> wrote:
> > On Thu, Feb 07, 2008 at 10:54:05AM -0500, Tony Camuso wrote:
> > > Matthew,
> > >
> > > Perhaps I missed it, but did you address Yinghai's concerns?
> >
> > No, I was on holiday.
> >
> > > Yinghai Lu wrote:
> > > >On Jan 28, 2008 7:03 PM, Matthew Wilcox <[email protected]> wrote:
> > > >>
> > > >>-int pci_conf1_write(unsigned int seg, unsigned int bus,
> > > >>+static int pci_conf1_write(unsigned int seg, unsigned int bus,
> > > >> unsigned int devfn, int reg, int len, u32
> > > >> value)
> > > >
> > > >any reason to change pci_conf1_read/write to static?
> >
> > Yes -- it no longer needs to be called from outside this file.
> >
> > > >>+config ATA_RAM
> > > >>+ tristate "ATA RAM driver"
> > > >>+
> > > >
> > > >related?
> >
>
> looks good. it should get into -mm or x86/mm for some testing
Can I get a revised version of this, without the incorrect hunk?
thanks,
greg k-h
On Sat, Feb 09, 2008 at 11:21:16PM -0800, Greg KH wrote:
> Can I get a revised version of this, without the incorrect hunk?
Sure. I've even rebased it against current HEAD. Damn whitespace
cleanup introducing unnecessary conflicts ....
I suggest Ivan's patch be merged ASAP as it actually fixes bugs.
This patch is just cleanup (and takes care of some future concerns).
>From ad4c3f135cda6f5210735231d30ef8e9dbd58c7c Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <[email protected]>
Date: Sun, 10 Feb 2008 09:45:28 -0500
Subject: [PATCH] Change pci_raw_ops to pci_raw_read/write
We want to allow different implementations of pci_raw_ops for standard
and extended config space on x86. Rather than clutter generic code with
knowledge of this, we make pci_raw_ops private to x86 and use it to
implement the new raw interface -- raw_pci_read() and raw_pci_write().
Signed-off-by: Matthew Wilcox <[email protected]>
---
arch/ia64/pci/pci.c | 25 ++++++++-----------------
arch/ia64/sn/pci/tioce_provider.c | 16 ++++++++--------
arch/x86/kernel/quirks.c | 2 +-
arch/x86/pci/common.c | 25 +++++++++++++++++++++++--
arch/x86/pci/direct.c | 4 ++--
arch/x86/pci/fixup.c | 6 ++++--
arch/x86/pci/legacy.c | 2 +-
arch/x86/pci/mmconfig-shared.c | 6 +++---
arch/x86/pci/mmconfig_32.c | 10 ++--------
arch/x86/pci/mmconfig_64.c | 8 +-------
arch/x86/pci/pci.h | 15 +++++++++++----
arch/x86/pci/visws.c | 3 ---
drivers/acpi/osl.c | 25 ++++++-------------------
include/linux/pci.h | 16 ++++++++--------
14 files changed, 78 insertions(+), 85 deletions(-)
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 488e48a..8fd7e82 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -43,8 +43,7 @@
#define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg) \
(((u64) seg << 28) | (bus << 20) | (devfn << 12) | (reg))
-static int
-pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *value)
{
u64 addr, data = 0;
@@ -68,8 +67,7 @@ pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
return 0;
}
-static int
-pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 value)
{
u64 addr;
@@ -91,24 +89,17 @@ pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
return 0;
}
-static struct pci_raw_ops pci_sal_ops = {
- .read = pci_sal_read,
- .write = pci_sal_write
-};
-
-struct pci_raw_ops *raw_pci_ops = &pci_sal_ops;
-
-static int
-pci_read (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
+static int pci_read(struct pci_bus *bus, unsigned int devfn, int where,
+ int size, u32 *value)
{
- return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
-static int
-pci_write (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
+static int pci_write(struct pci_bus *bus, unsigned int devfn, int where,
+ int size, u32 value)
{
- return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
diff --git a/arch/ia64/sn/pci/tioce_provider.c b/arch/ia64/sn/pci/tioce_provider.c
index e1a3e19..999f14f 100644
--- a/arch/ia64/sn/pci/tioce_provider.c
+++ b/arch/ia64/sn/pci/tioce_provider.c
@@ -752,13 +752,13 @@ tioce_kern_init(struct tioce_common *tioce_common)
* Determine the secondary bus number of the port2 logical PPB.
* This is used to decide whether a given pci device resides on
* port1 or port2. Note: We don't have enough plumbing set up
- * here to use pci_read_config_xxx() so use the raw_pci_ops vector.
+ * here to use pci_read_config_xxx() so use raw_pci_read().
*/
seg = tioce_common->ce_pcibus.bs_persist_segment;
bus = tioce_common->ce_pcibus.bs_persist_busnum;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(2, 0), PCI_SECONDARY_BUS, 1,&tmp);
+ raw_pci_read(seg, bus, PCI_DEVFN(2, 0), PCI_SECONDARY_BUS, 1,&tmp);
tioce_kern->ce_port1_secondary = (u8) tmp;
/*
@@ -799,11 +799,11 @@ tioce_kern_init(struct tioce_common *tioce_common)
/* mem base/limit */
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_MEMORY_BASE, 2, &tmp);
base = (u64)tmp << 16;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_MEMORY_LIMIT, 2, &tmp);
limit = (u64)tmp << 16;
limit |= 0xfffffUL;
@@ -817,21 +817,21 @@ tioce_kern_init(struct tioce_common *tioce_common)
* attributes.
*/
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_MEMORY_BASE, 2, &tmp);
base = ((u64)tmp & PCI_PREF_RANGE_MASK) << 16;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_BASE_UPPER32, 4, &tmp);
base |= (u64)tmp << 32;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_MEMORY_LIMIT, 2, &tmp);
limit = ((u64)tmp & PCI_PREF_RANGE_MASK) << 16;
limit |= 0xfffffUL;
- raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+ raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
PCI_PREF_LIMIT_UPPER32, 4, &tmp);
limit |= (u64)tmp << 32;
diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 6ba33ca..1941482 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -27,7 +27,7 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
pci_write_config_byte(dev, 0xf4, config|0x2);
/* read xTPR register */
- raw_pci_ops->read(0, 0, 0x40, 0x4c, 2, &word);
+ raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
if (!(word & (1 << 13))) {
dev_info(&dev->dev, "Intel E7520/7320/7525 detected; "
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 52deabc..b7c67a1 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -26,16 +26,37 @@ int pcibios_last_bus = -1;
unsigned long pirq_table_addr;
struct pci_bus *pci_root_bus;
struct pci_raw_ops *raw_pci_ops;
+struct pci_raw_ops *raw_pci_ext_ops;
+
+int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 *val)
+{
+ if (reg < 256 && raw_pci_ops)
+ return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
+ return -EINVAL;
+}
+
+int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 val)
+{
+ if (reg < 256 && raw_pci_ops)
+ return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
+ return -EINVAL;
+}
static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
{
- return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
static int pci_write(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
{
- return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
devfn, where, size, value);
}
diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
index 431c9a5..42f3e4c 100644
--- a/arch/x86/pci/direct.c
+++ b/arch/x86/pci/direct.c
@@ -14,7 +14,7 @@
#define PCI_CONF1_ADDRESS(bus, devfn, reg) \
(0x80000000 | (bus << 16) | (devfn << 8) | (reg & ~3))
-int pci_conf1_read(unsigned int seg, unsigned int bus,
+static int pci_conf1_read(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 *value)
{
unsigned long flags;
@@ -45,7 +45,7 @@ int pci_conf1_read(unsigned int seg, unsigned int bus,
return 0;
}
-int pci_conf1_write(unsigned int seg, unsigned int bus,
+static int pci_conf1_write(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 value)
{
unsigned long flags;
diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index 74d30ff..a5ef5f5 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -215,7 +215,8 @@ static int quirk_aspm_offset[MAX_PCIEROOT << 3];
static int quirk_pcie_aspm_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
{
- return raw_pci_ops->read(0, bus->number, devfn, where, size, value);
+ return raw_pci_read(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
}
/*
@@ -231,7 +232,8 @@ static int quirk_pcie_aspm_write(struct pci_bus *bus, unsigned int devfn, int wh
if ((offset) && (where == offset))
value = value & 0xfffffffc;
- return raw_pci_ops->write(0, bus->number, devfn, where, size, value);
+ return raw_pci_write(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
}
static struct pci_ops quirk_pcie_aspm_ops = {
diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 5565d70..e041ced 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -22,7 +22,7 @@ static void __devinit pcibios_fixup_peer_bridges(void)
if (pci_find_bus(0, n))
continue;
for (devfn = 0; devfn < 256; devfn += 8) {
- if (!raw_pci_ops->read(0, n, devfn, PCI_VENDOR_ID, 2, &l) &&
+ if (!raw_pci_read(0, n, devfn, PCI_VENDOR_ID, 2, &l) &&
l != 0x0000 && l != 0xffff) {
DBG("Found device at %02x:%02x [%04x]\n", n, devfn, l);
printk(KERN_INFO "PCI: Discovered peer bus %02x\n", n);
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 6b521d3..8d54df4 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -28,7 +28,7 @@ static int __initdata pci_mmcfg_resources_inserted;
static const char __init *pci_mmcfg_e7520(void)
{
u32 win;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0xce, 2, &win);
win = win & 0xf000;
if(win == 0x0000 || win == 0xf000)
@@ -53,7 +53,7 @@ static const char __init *pci_mmcfg_intel_945(void)
pci_mmcfg_config_num = 1;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0x48, 4, &pciexbar);
/* Enable bit */
if (!(pciexbar & 1))
@@ -118,7 +118,7 @@ static int __init pci_mmcfg_check_hostbridge(void)
int i;
const char *name;
- pci_conf1_read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
+ pci_direct_conf1.read(0, 0, PCI_DEVFN(0,0), 0, 4, &l);
vendor = l & 0xffff;
device = (l >> 16) & 0xffff;
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 7b75e65..081816a 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -68,9 +68,6 @@ err: *value = -1;
return -EINVAL;
}
- if (reg < 256)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
-
base = get_base_addr(seg, bus, devfn);
if (!base)
goto err;
@@ -104,9 +101,6 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if ((bus > 255) || (devfn > 255) || (reg > 4095))
return -EINVAL;
- if (reg < 256)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
-
base = get_base_addr(seg, bus, devfn);
if (!base)
return -EINVAL;
@@ -138,7 +132,7 @@ static struct pci_raw_ops pci_mmcfg = {
int __init pci_mmcfg_arch_init(void)
{
- printk(KERN_INFO "PCI: Using MMCONFIG\n");
- raw_pci_ops = &pci_mmcfg;
+ printk(KERN_INFO "PCI: Using MMCONFIG for extended config space\n");
+ raw_pci_ext_ops = &pci_mmcfg;
return 1;
}
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index c4cf318..9207fd4 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -58,9 +58,6 @@ err: *value = -1;
return -EINVAL;
}
- if (reg < 256)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
-
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
goto err;
@@ -89,9 +86,6 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095)))
return -EINVAL;
- if (reg < 256)
- return pci_conf1_write(seg,bus,devfn,reg,len,value);
-
addr = pci_dev_base(seg, bus, devfn);
if (!addr)
return -EINVAL;
@@ -150,6 +144,6 @@ int __init pci_mmcfg_arch_init(void)
return 0;
}
}
- raw_pci_ops = &pci_mmcfg;
+ raw_pci_ext_ops = &pci_mmcfg;
return 1;
}
diff --git a/arch/x86/pci/pci.h b/arch/x86/pci/pci.h
index 36cb44c..3431518 100644
--- a/arch/x86/pci/pci.h
+++ b/arch/x86/pci/pci.h
@@ -85,10 +85,17 @@ extern spinlock_t pci_config_lock;
extern int (*pcibios_enable_irq)(struct pci_dev *dev);
extern void (*pcibios_disable_irq)(struct pci_dev *dev);
-extern int pci_conf1_write(unsigned int seg, unsigned int bus,
- unsigned int devfn, int reg, int len, u32 value);
-extern int pci_conf1_read(unsigned int seg, unsigned int bus,
- unsigned int devfn, int reg, int len, u32 *value);
+struct pci_raw_ops {
+ int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 *val);
+ int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 val);
+};
+
+extern struct pci_raw_ops *raw_pci_ops;
+extern struct pci_raw_ops *raw_pci_ext_ops;
+
+extern struct pci_raw_ops pci_direct_conf1;
extern int pci_direct_probe(void);
extern void pci_direct_init(int type);
diff --git a/arch/x86/pci/visws.c b/arch/x86/pci/visws.c
index 8ecb1c7..c2df4e9 100644
--- a/arch/x86/pci/visws.c
+++ b/arch/x86/pci/visws.c
@@ -13,9 +13,6 @@
#include "pci.h"
-
-extern struct pci_raw_ops pci_direct_conf1;
-
static int pci_visws_enable_irq(struct pci_dev *dev) { return 0; }
static void pci_visws_disable_irq(struct pci_dev *dev) { }
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index a14501c..34b3386 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -200,15 +200,6 @@ acpi_status __init acpi_os_initialize(void)
acpi_status acpi_os_initialize1(void)
{
- /*
- * Initialize PCI configuration space access, as we'll need to access
- * it while walking the namespace (bus 0 and root bridges w/ _BBNs).
- */
- if (!raw_pci_ops) {
- printk(KERN_ERR PREFIX
- "Access to PCI configuration space unavailable\n");
- return AE_NULL_ENTRY;
- }
kacpid_wq = create_singlethread_workqueue("kacpid");
kacpi_notify_wq = create_singlethread_workqueue("kacpi_notify");
BUG_ON(!kacpid_wq);
@@ -653,11 +644,9 @@ acpi_os_read_pci_configuration(struct acpi_pci_id * pci_id, u32 reg,
return AE_ERROR;
}
- BUG_ON(!raw_pci_ops);
-
- result = raw_pci_ops->read(pci_id->segment, pci_id->bus,
- PCI_DEVFN(pci_id->device, pci_id->function),
- reg, size, value);
+ result = raw_pci_read(pci_id->segment, pci_id->bus,
+ PCI_DEVFN(pci_id->device, pci_id->function),
+ reg, size, value);
return (result ? AE_ERROR : AE_OK);
}
@@ -682,11 +671,9 @@ acpi_os_write_pci_configuration(struct acpi_pci_id * pci_id, u32 reg,
return AE_ERROR;
}
- BUG_ON(!raw_pci_ops);
-
- result = raw_pci_ops->write(pci_id->segment, pci_id->bus,
- PCI_DEVFN(pci_id->device, pci_id->function),
- reg, size, value);
+ result = raw_pci_write(pci_id->segment, pci_id->bus,
+ PCI_DEVFN(pci_id->device, pci_id->function),
+ reg, size, value);
return (result ? AE_ERROR : AE_OK);
}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7215d3b..87195b6 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -301,14 +301,14 @@ struct pci_ops {
int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
};
-struct pci_raw_ops {
- int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
- int reg, int len, u32 *val);
- int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
- int reg, int len, u32 val);
-};
-
-extern struct pci_raw_ops *raw_pci_ops;
+/*
+ * ACPI needs to be able to access PCI config space before we've done a
+ * PCI bus scan and created pci_bus structures.
+ */
+extern int raw_pci_read(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 *val);
+extern int raw_pci_write(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 val);
struct pci_bus_region {
resource_size_t start;
--
1.5.2.5
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Sun, Feb 10, 2008 at 07:51:22AM -0700, Matthew Wilcox wrote:
> From: Matthew Wilcox <[email protected]>
> Date: Sun, 10 Feb 2008 09:45:28 -0500
> Subject: [PATCH] Change pci_raw_ops to pci_raw_read/write
...
> -static int
> -pci_read (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
> +static int pci_read(struct pci_bus *bus, unsigned int devfn, int where,
> + int size, u32 *value)
> {
> - return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
> + return raw_pci_read(pci_domain_nr(bus), bus->number,
> devfn, where, size, value);
Willy,
Just wondering...why don't we just pass "struct bus*" through to the
raw_pci* ops?
My thinking is if a PCI bus controller or bridge is discovered, then we should
always create a matching "struct bus *".
Your patch looks fine to me but if you (and others) agree with the above,
I can make patch to change the internal interface. The pci_*_config API
needs to remain the same.
...
> --- a/arch/x86/kernel/quirks.c
> +++ b/arch/x86/kernel/quirks.c
> @@ -27,7 +27,7 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
> pci_write_config_byte(dev, 0xf4, config|0x2);
>
> /* read xTPR register */
> - raw_pci_ops->read(0, 0, 0x40, 0x4c, 2, &word);
> + raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
Why are we using raw_pci_read here instead of pci_read_config_dword()?
If the pci_write_config_byte() above works, then I expect the read
to work too.
To be clear, this is not a problem with this patch...rather a seperate
problem with the original code.
hth,
grant
On Sun, Feb 10, 2008 at 12:13:13PM -0700, Grant Grundler wrote:
> Just wondering...why don't we just pass "struct bus*" through to the
> raw_pci* ops?
> My thinking is if a PCI bus controller or bridge is discovered, then we should
> always create a matching "struct bus *".
ACPI may need to access PCI config space before we've done a PCI bus
walk. There's an opregion that AML may access that is for PCI config
space, and an apparently unrelated method might happen to contain such a
piece of AML.
> > --- a/arch/x86/kernel/quirks.c
> > +++ b/arch/x86/kernel/quirks.c
> > @@ -27,7 +27,7 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
> > pci_write_config_byte(dev, 0xf4, config|0x2);
> >
> > /* read xTPR register */
> > - raw_pci_ops->read(0, 0, 0x40, 0x4c, 2, &word);
> > + raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
>
> Why are we using raw_pci_read here instead of pci_read_config_dword()?
> If the pci_write_config_byte() above works, then I expect the read
> to work too.
I have no idea. I didn't want to change the semantics in this patch.
Presumably the original author would have an idea why they needed to do
this.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Feb 10, 2008 6:51 AM, Matthew Wilcox <[email protected]> wrote:
> On Sat, Feb 09, 2008 at 11:21:16PM -0800, Greg KH wrote:
> > Can I get a revised version of this, without the incorrect hunk?
>
> Sure. I've even rebased it against current HEAD. Damn whitespace
> cleanup introducing unnecessary conflicts ....
>
> I suggest Ivan's patch be merged ASAP as it actually fixes bugs.
> This patch is just cleanup (and takes care of some future concerns).
your patch and Ivan's patch should be merged in one...
YH
On Sun, Feb 10, 2008 at 12:16:43PM -0800, Yinghai Lu wrote:
> On Feb 10, 2008 6:51 AM, Matthew Wilcox <[email protected]> wrote:
> > On Sat, Feb 09, 2008 at 11:21:16PM -0800, Greg KH wrote:
> > > Can I get a revised version of this, without the incorrect hunk?
> >
> > Sure. I've even rebased it against current HEAD. Damn whitespace
> > cleanup introducing unnecessary conflicts ....
> >
> > I suggest Ivan's patch be merged ASAP as it actually fixes bugs.
> > This patch is just cleanup (and takes care of some future concerns).
>
> your patch and Ivan's patch should be merged in one...
Why?
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Feb 10, 2008 12:19 PM, Matthew Wilcox <[email protected]> wrote:
>
> On Sun, Feb 10, 2008 at 12:16:43PM -0800, Yinghai Lu wrote:
> > On Feb 10, 2008 6:51 AM, Matthew Wilcox <[email protected]> wrote:
> > > On Sat, Feb 09, 2008 at 11:21:16PM -0800, Greg KH wrote:
> > > > Can I get a revised version of this, without the incorrect hunk?
> > >
> > > Sure. I've even rebased it against current HEAD. Damn whitespace
> > > cleanup introducing unnecessary conflicts ....
> > >
> > > I suggest Ivan's patch be merged ASAP as it actually fixes bugs.
> > > This patch is just cleanup (and takes care of some future concerns).
> >
> > your patch and Ivan's patch should be merged in one...
>
> Why?
Even Greg didn't know that there was another patch need to be applied
before this one yesterday.
he said there was some hunks..
YH
On Sun, 10 Feb 2008, Yinghai Lu wrote:
> >
> > I suggest Ivan's patch be merged ASAP as it actually fixes bugs.
> > This patch is just cleanup (and takes care of some future concerns).
>
> your patch and Ivan's patch should be merged in one...
I really don't care whether they get merges as one or separately, but I
think it should be merged _now_ (-rc1 is already delayed), and I'd like to
see the final versions of both. Does anybody have them in a final
agreed-upon format (preferably with that oddness in quirk_intel_irqbalance
also fixed?)
Linus
On Sun, Feb 10, 2008 at 12:25:02PM -0800, Yinghai Lu wrote:
> Even Greg didn't know that there was another patch need to be applied
> before this one yesterday.
I don't believe you. For example:
On Mon, Jan 28, 2008 at 02:53:34PM -0800, Greg KH wrote:
> Please send me patches, in a form that can be merged, along with a
> proper changelog entry, in the order in which you wish them to be
> applied, so I know exactly what changes you are referring to.
Which I then did.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Sun, Feb 10, 2008 at 12:24:18PM -0800, Linus Torvalds wrote:
> On Sun, 10 Feb 2008, Yinghai Lu wrote:
> > >
> > > I suggest Ivan's patch be merged ASAP as it actually fixes bugs.
> > > This patch is just cleanup (and takes care of some future concerns).
> >
> > your patch and Ivan's patch should be merged in one...
>
> I really don't care whether they get merges as one or separately, but I
> think it should be merged _now_ (-rc1 is already delayed), and I'd like to
> see the final versions of both. Does anybody have them in a final
> agreed-upon format (preferably with that oddness in quirk_intel_irqbalance
> also fixed?)
I just looked at fixing that -- the reason seems to be that we don't
actually have the struct pci_dev at that point. I can fix it, but I
think it's actually buggy. I want to look at some chipset docs to
confirm that though.
I've attached the two patches that I believe are the ones we want. We
can (and should) fix quirk_intel_irqbalance separately.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Feb 10, 2008 12:32 PM, Matthew Wilcox <[email protected]> wrote:
> On Sun, Feb 10, 2008 at 12:25:02PM -0800, Yinghai Lu wrote:
> > Even Greg didn't know that there was another patch need to be applied
> > before this one yesterday.
>
> I don't believe you. For example:
>
> On Mon, Jan 28, 2008 at 02:53:34PM -0800, Greg KH wrote:
> > Please send me patches, in a form that can be merged, along with a
> > proper changelog entry, in the order in which you wish them to be
> > applied, so I know exactly what changes you are referring to.
>
> Which I then did.
then you may need to send patches to Greg: So Grey or others don'e
need to dig Ivan's patch
[PATCH 0/2]...
[PATCH 1/2]... Ivan's patch with from statement
[PATCH 2/2] ... your patch
YH
On Sun, Feb 10, 2008 at 01:45:57PM -0700, Matthew Wilcox wrote:
> I just looked at fixing that -- the reason seems to be that we don't
> actually have the struct pci_dev at that point. I can fix it, but I
> think it's actually buggy. I want to look at some chipset docs to
> confirm that though.
I don't think I fully understand what's going on here. So here's what
I've been able to glean; hopefully someone who understands this better
can help out.
I happen to have an E7525-based machine, so here's an lspci of bus 0:
00:00.0 Host bridge: Intel Corporation E7525 Memory Controller Hub (rev 0a)
00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 0a)
00:03.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A1 (rev 0a)
00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 0a)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02)
The line in question reads:
/* read xTPR register */
raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
That's domain 0, bus 0, device 8, function 0, address 0x4c, length 2.
I've checked the public E7525 and E7520 MCH datasheets, and they don't
document the xTPR registers; nor do any of the devices in the datasheet
have registers documented at 0x4c.
You can see from my lspci above that I don't _have_ a device 8 on bus 0.
The aforementioned documentation says:
"A disabled or non-existent device's configuration register space is
hidden. A disabled or non-existent device will return all ones for reads
and will drop writes just as if the cycle terminated with a Master Abort
on PCI."
Now, my E7525 isn't affected by this quirk as it has a revision greater
than 0x9. So maybe it's expected that device 8 is hidden on my machine;
that it's only present on revisions up to 0x9. But maybe device 8 is
always hidden, and that's why the author used raw_pci_ops?
We can still do better than this, though. We can do:
- raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
+ pci_bus_read_config_word(dev->bus, PCI_DEVFN(8, 0), 0x4c, &word);
Using PCI_DEVFN tells people you really did mean device 8, and it's not
a braino for device 4 or 2 (how many bits for slot and function again?)
I'll see if I can dig up the internal documentation for the xTPR register
when I'm at work on Monday. But I've never gone looking for internal
documentation before, so I have no idea how easy it will be to find ;-)
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Feb 10, 2008 12:45 PM, Matthew Wilcox <[email protected]> wrote:
>
> On Sun, Feb 10, 2008 at 12:24:18PM -0800, Linus Torvalds wrote:
> > On Sun, 10 Feb 2008, Yinghai Lu wrote:
> > > >
> > > > I suggest Ivan's patch be merged ASAP as it actually fixes bugs.
> > > > This patch is just cleanup (and takes care of some future concerns).
> > >
> > > your patch and Ivan's patch should be merged in one...
> >
> > I really don't care whether they get merges as one or separately, but I
> > think it should be merged _now_ (-rc1 is already delayed), and I'd like to
> > see the final versions of both. Does anybody have them in a final
> > agreed-upon format (preferably with that oddness in quirk_intel_irqbalance
> > also fixed?)
>
> I just looked at fixing that -- the reason seems to be that we don't
> actually have the struct pci_dev at that point. I can fix it, but I
> think it's actually buggy. I want to look at some chipset docs to
> confirm that though.
>
> I've attached the two patches that I believe are the ones we want. We
> can (and should) fix quirk_intel_irqbalance separately.
Andrew,
those two patch just got into linus 2.6.25-rc1.
I assume that you will drop
gregkh-pci-pci-make-pci-extended-config-space-a-driver-opt-in.patch in
-mm.
please check some updated patches in -mm that could be affected. hope
it could save you some time
x86-validate-against-acpi-motherboard-resources.patch
x86-clear-pci_mmcfg_virt-when-mmcfg-get-rejected.patch
x86-mmconf-enable-mcfg-early.patch
x86_64-check-msr-to-get-mmconfig-for-amd-family-10h-opteron-v3.patch
YH
Yinghai Lu wrote:
> On Feb 10, 2008 12:45 PM, Matthew Wilcox <[email protected]> wrote:
>> On Sun, Feb 10, 2008 at 12:24:18PM -0800, Linus Torvalds wrote:
>>> On Sun, 10 Feb 2008, Yinghai Lu wrote:
>>>>> I suggest Ivan's patch be merged ASAP as it actually fixes bugs.
>>>>> This patch is just cleanup (and takes care of some future concerns).
>>>> your patch and Ivan's patch should be merged in one...
>>> I really don't care whether they get merges as one or separately, but I
>>> think it should be merged _now_ (-rc1 is already delayed), and I'd like to
>>> see the final versions of both. Does anybody have them in a final
>>> agreed-upon format (preferably with that oddness in quirk_intel_irqbalance
>>> also fixed?)
>> I just looked at fixing that -- the reason seems to be that we don't
>> actually have the struct pci_dev at that point. I can fix it, but I
>> think it's actually buggy. I want to look at some chipset docs to
>> confirm that though.
>>
>> I've attached the two patches that I believe are the ones we want. We
>> can (and should) fix quirk_intel_irqbalance separately.
>
> Andrew,
>
> those two patch just got into linus 2.6.25-rc1.
>
> I assume that you will drop
> gregkh-pci-pci-make-pci-extended-config-space-a-driver-opt-in.patch in
> -mm.
>
> please check some updated patches in -mm that could be affected. hope
> it could save you some time
>
> x86-validate-against-acpi-motherboard-resources.patch
> x86-clear-pci_mmcfg_virt-when-mmcfg-get-rejected.patch
> x86-mmconf-enable-mcfg-early.patch
> x86_64-check-msr-to-get-mmconfig-for-amd-family-10h-opteron-v3.patch
I don't think any of these patches are affected. They all affect whether
to use MMCONFIG globally or not, regardless of whether not particular
accesses will use it.
On Sun, Feb 10, 2008 at 04:02:04PM -0700, Matthew Wilcox wrote:
> The line in question reads:
>
> /* read xTPR register */
> raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
>
> That's domain 0, bus 0, device 8, function 0, address 0x4c, length 2.
>
> I've checked the public E7525 and E7520 MCH datasheets, and they don't
> document the xTPR registers; nor do any of the devices in the datasheet
> have registers documented at 0x4c.
>
> You can see from my lspci above that I don't _have_ a device 8 on bus 0.
> The aforementioned documentation says:
>
> "A disabled or non-existent device's configuration register space is
> hidden. A disabled or non-existent device will return all ones for reads
> and will drop writes just as if the cycle terminated with a Master Abort
> on PCI."
I'd like to thank Grant for pointing out to me that this is exactly what
the write immediately above this is doing -- enabling device 8 to
respond to config space cycles.
> Now, my E7525 isn't affected by this quirk as it has a revision greater
> than 0x9. So maybe it's expected that device 8 is hidden on my machine;
> that it's only present on revisions up to 0x9. But maybe device 8 is
> always hidden, and that's why the author used raw_pci_ops?
>
> We can still do better than this, though. We can do:
>
> - raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
> + pci_bus_read_config_word(dev->bus, PCI_DEVFN(8, 0), 0x4c, &word);
>
> Using PCI_DEVFN tells people you really did mean device 8, and it's not
> a braino for device 4 or 2 (how many bits for slot and function again?)
Here's the patch to implement the above two suggestions:
----
>From f565b65591a3f90a272b1d511e4ab1728861fe77 Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <[email protected]>
Date: Sun, 10 Feb 2008 23:18:15 -0500
Subject: [PATCH] Use proper abstractions in quirk_intel_irqbalance
Since we may not have a pci_dev for the device we need to access, we can't
use pci_read_config_word. But raw_pci_read is an internal implementation
detail; it's better to use the architected pci_bus_read_config_word
interface. Using PCI_DEVFN instead of a mysterious constant helps
reassure everyone that we really do intend to access device 8.
Signed-off-by: Matthew Wilcox <[email protected]>
---
arch/x86/kernel/quirks.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 1941482..c47208f 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -11,7 +11,7 @@
static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
{
u8 config, rev;
- u32 word;
+ u16 word;
/* BIOS may enable hardware IRQ balancing for
* E7520/E7320/E7525(revision ID 0x9 and below)
@@ -26,8 +26,11 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
pci_read_config_byte(dev, 0xf4, &config);
pci_write_config_byte(dev, 0xf4, config|0x2);
- /* read xTPR register */
- raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
+ /*
+ * read xTPR register. We may not have a pci_dev for device 8
+ * because it might be hidden until the above write.
+ */
+ pci_bus_read_config_word(dev->bus, PCI_DEVFN(8, 0), 0x4c, &word);
if (!(word & (1 << 13))) {
dev_info(&dev->dev, "Intel E7520/7320/7525 detected; "
--
1.5.2.5
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Feb 10, 2008 6:53 PM, Robert Hancock <[email protected]> wrote:
>
> Yinghai Lu wrote:
> > On Feb 10, 2008 12:45 PM, Matthew Wilcox <[email protected]> wrote:
..
> >> I've attached the two patches that I believe are the ones we want. We
> >> can (and should) fix quirk_intel_irqbalance separately.
> >
> > Andrew,
> >
> > those two patch just got into linus 2.6.25-rc1.
> >
> > I assume that you will drop
> > gregkh-pci-pci-make-pci-extended-config-space-a-driver-opt-in.patch in
> > -mm.
> >
> > please check some updated patches in -mm that could be affected. hope
> > it could save you some time
> >
> > x86-validate-against-acpi-motherboard-resources.patch
> > x86-clear-pci_mmcfg_virt-when-mmcfg-get-rejected.patch
> > x86-mmconf-enable-mcfg-early.patch
> > x86_64-check-msr-to-get-mmconfig-for-amd-family-10h-opteron-v3.patch
>
> I don't think any of these patches are affected. They all affect whether
> to use MMCONFIG globally or not, regardless of whether not particular
> accesses will use it.
what i mean:
gregkh-pci-pci-make-pci-extended-config-space-a-driver-opt-in.patch is
not needed.
and
> > x86-validate-against-acpi-motherboard-resources.patch
> > x86-clear-pci_mmcfg_virt-when-mmcfg-get-rejected.patch
> > x86-mmconf-enable-mcfg-early.patch
> > x86_64-check-msr-to-get-mmconfig-for-amd-family-10h-opteron-v3.patch
need some update because of changes by "Change pci_raw_ops to
pci_raw_read/write" patch.
such as pci_conf1_read became static...unreachable_devices() is gone..
YH
On Sun, Feb 10, 2008 at 10:04:16PM -0700, Matthew Wilcox wrote:
> > "A disabled or non-existent device's configuration register space is
> > hidden. A disabled or non-existent device will return all ones for reads
> > and will drop writes just as if the cycle terminated with a Master Abort
> > on PCI."
>
> I'd like to thank Grant for pointing out to me that this is exactly what
> the write immediately above this is doing -- enabling device 8 to
> respond to config space cycles.
welcome.
...
> >From f565b65591a3f90a272b1d511e4ab1728861fe77 Mon Sep 17 00:00:00 2001
> From: Matthew Wilcox <[email protected]>
> Date: Sun, 10 Feb 2008 23:18:15 -0500
> Subject: [PATCH] Use proper abstractions in quirk_intel_irqbalance
>
> Since we may not have a pci_dev for the device we need to access, we can't
> use pci_read_config_word. But raw_pci_read is an internal implementation
> detail; it's better to use the architected pci_bus_read_config_word
> interface. Using PCI_DEVFN instead of a mysterious constant helps
> reassure everyone that we really do intend to access device 8.
>
> Signed-off-by: Matthew Wilcox <[email protected]>
> ---
> arch/x86/kernel/quirks.c | 9 ++++++---
> 1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
> index 1941482..c47208f 100644
> --- a/arch/x86/kernel/quirks.c
> +++ b/arch/x86/kernel/quirks.c
> @@ -11,7 +11,7 @@
> static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
> {
> u8 config, rev;
> - u32 word;
> + u16 word;
>
> /* BIOS may enable hardware IRQ balancing for
> * E7520/E7320/E7525(revision ID 0x9 and below)
> @@ -26,8 +26,11 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
> pci_read_config_byte(dev, 0xf4, &config);
> pci_write_config_byte(dev, 0xf4, config|0x2);
Can you also add a comment which points at the Intel documentation?
http://download.intel.com/design/chipsets/datashts/30300702.pdf
Page 34 documents 0xf4 register.
And I just doubled checked that the 0xf4 register value is restored later
in the quirk (obvious when you look at the code but not from the patch).
> - /* read xTPR register */
> - raw_pci_read(0, 0, 0x40, 0x4c, 2, &word);
> + /*
> + * read xTPR register. We may not have a pci_dev for device 8
> + * because it might be hidden until the above write.
> + */
> + pci_bus_read_config_word(dev->bus, PCI_DEVFN(8, 0), 0x4c, &word);
Yeah, this should work even though we don't have a dev for it.
Acked-by: Grant Grundler <[email protected]>
thanks,
grant
On Mon, Feb 11, 2008 at 12:49:54AM -0700, Grant Grundler wrote:
> Can you also add a comment which points at the Intel documentation?
>
> http://download.intel.com/design/chipsets/datashts/30300702.pdf
> Page 34 documents 0xf4 register.
I'm told that these URLs are not guaranteed to be stable. And
remembering the pain we had when HP decided to relocate all of their
documents, I'm really not inclined to embed a link to a URL in the
source code.
> And I just doubled checked that the 0xf4 register value is restored later
> in the quirk (obvious when you look at the code but not from the patch).
Yep, I checked that too ;-)
> Acked-by: Grant Grundler <[email protected]>
Thanks.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Mon, 11 Feb 2008, Matthew Wilcox wrote:
>
> I'm told that these URLs are not guaranteed to be stable. And
> remembering the pain we had when HP decided to relocate all of their
> documents, I'm really not inclined to embed a link to a URL in the
> source code.
I put it in the commit message, but it wasn't on page 34 when I checked (I
changed it to 69), and I added the naem for the datasheet so that if/when
it moves, maybe google can help.
Linus
On Mon, Feb 11, 2008 at 09:18:49AM -0800, Linus Torvalds wrote:
> I put it in the commit message, but it wasn't on page 34 when I checked (I
> changed it to 69),
Sorry - page 34 was just the first reference to "Extended Configuration
Registers" when I originally scrounged up the info for willy.
Page 69 is in fact what I wanted to point at ("DEVPRES1" reg).
> and I added the naem for the datasheet so that if/when
> it moves, maybe google can help.
It should. But doing a quick check now only shows one other copy
(in .es domain :) when searching for "30300702.pdf".
Searching for the full document title results in several intel.com
locations and lots of other misc references that don't look quite right.
Many of those just reference the "product brief" and not the data sheet.
yahoo.com gives similar results.
thanks,
grant
>
> Linus
On Sun, 10 Feb 2008 17:49:34 -0800
"Yinghai Lu" <[email protected]> wrote:
> On Feb 10, 2008 12:45 PM, Matthew Wilcox <[email protected]> wrote:
> >
> > On Sun, Feb 10, 2008 at 12:24:18PM -0800, Linus Torvalds wrote:
> > > On Sun, 10 Feb 2008, Yinghai Lu wrote:
> > > > >
> > > > > I suggest Ivan's patch be merged ASAP as it actually fixes bugs.
> > > > > This patch is just cleanup (and takes care of some future concerns).
> > > >
> > > > your patch and Ivan's patch should be merged in one...
> > >
> > > I really don't care whether they get merges as one or separately, but I
> > > think it should be merged _now_ (-rc1 is already delayed), and I'd like to
> > > see the final versions of both. Does anybody have them in a final
> > > agreed-upon format (preferably with that oddness in quirk_intel_irqbalance
> > > also fixed?)
> >
> > I just looked at fixing that -- the reason seems to be that we don't
> > actually have the struct pci_dev at that point. I can fix it, but I
> > think it's actually buggy. I want to look at some chipset docs to
> > confirm that though.
> >
> > I've attached the two patches that I believe are the ones we want. We
> > can (and should) fix quirk_intel_irqbalance separately.
>
> Andrew,
>
> those two patch just got into linus 2.6.25-rc1.
>
> I assume that you will drop
> gregkh-pci-pci-make-pci-extended-config-space-a-driver-opt-in.patch in
> -mm.
That's no longer in Greg's tree.
> please check some updated patches in -mm that could be affected. hope
> it could save you some time
>
> x86-validate-against-acpi-motherboard-resources.patch
> x86-clear-pci_mmcfg_virt-when-mmcfg-get-rejected.patch
> x86-mmconf-enable-mcfg-early.patch
> x86_64-check-msr-to-get-mmconfig-for-amd-family-10h-opteron-v3.patch
I have unhappy feelings here - the patches seem to be churning a bit
and when I last sent them to Greg and Ingo they received no apparent
response.
So I think I'll just drop all four. Please redo, retest and fully
resubmit, thanks.
And we need to work out who owns these patches. Are they rightly part of
the PCI tree, or of the x86 tree?
* Andrew Morton <[email protected]> wrote:
> > please check some updated patches in -mm that could be affected.
> > hope it could save you some time
> >
> > x86-validate-against-acpi-motherboard-resources.patch
> > x86-clear-pci_mmcfg_virt-when-mmcfg-get-rejected.patch
> > x86-mmconf-enable-mcfg-early.patch
> > x86_64-check-msr-to-get-mmconfig-for-amd-family-10h-opteron-v3.patch
>
> I have unhappy feelings here - the patches seem to be churning a bit
> and when I last sent them to Greg and Ingo they received no apparent
> response.
i actually carried them for a while and
validate-against-acpi-motherboard-resources.patch got a fair bit of test
time with positive results. So it has a clear ACK from me.
It's something that looks appealing:
| This path adds validation of the MMCONFIG table against the ACPI
| reserved motherboard resources. If the MMCONFIG table is found to be
| reserved in ACPI, we don't bother checking the E820 table. The PCI
| Express firmware spec apparently tells BIOS developers that
| reservation in ACPI is required and E820 reservation is optional, so
| checking against ACPI first makes sense. Many BIOSes don't reserve
| the MMCONFIG region in E820 even though it is perfectly functional,
| the existing check needlessly disables MMCONFIG in these cases.
anything that isolates Linux from BIOS messups should be music to our
ears.
i also think the mmconf-enable stuff for Barcelona stuff from Yinghai,
albeit not particularly pretty, is probably good too for similar
reasons. It makes the kernel boot with noacpi which is a good sign IMO.
I have testsystems that simply do not boot with ACPI turned off - and i
have a testsystem that locks up hard if it takes an NMI in certain ACPI
AML sequences ... Just Because.
So i'd ACK them just on general principle - earlier versions of the
patches were carried in x86.git and caused no particular problems.
but ... then we got complaints from you that stuff collides and that
such patches should be carried in your or Greg's tree, so we dropped
them. And there was another 100 KLOC of x86 code to worry about ;-)
So i'd suggest to send those patches upstream, they are system enablers
and they are at fundamental enough places to be apparent if they cause
any breakage i think.
Ingo