Hi,
Linus really wants the extended (4Kb) PCI configuration space (using MCFG acpi table etc) to be opt-in, since there's many issues with it and most drivers don't even use/need it. The idea behind opt-in is that if you don't use it, you don't get to suffer the bugs...
Booted on my 64 bit test machine; sadly it has a defunct BIOS that doesn't have a working MCFG.
From: Arjan van de Ven <[email protected]>
Subject: Make MMCONFIG space a driver opt-in
There are many issues with using the extended PCI configuration space
(CPU, Chipset and most of all BIOS bugs). This while the vast majority of drivers
and devices don't even use/need to use the memory mapped access methods since they
don't use the config space beyond the traditional 256 bytes.
This patch makes accessing the extended config space a driver choice, via the
pci_enable_ext_config(pdev)
API function; drivers that want/need the extended configuration space should call this.
(a separate patch will be posted to add this function call to the driver that uses this)
On the implementation side, the pci device structure grows a member variable that sets the
current mode (default is "off", pci_enable_ext_config() turns it on; but quirks can force it off
so that even pci_enable_ext_config() won't enable it).
Unfortunately, this meant duplicating a bit of the PCI infrastructure since in the PCI layer, all the
config access stuff is based on bus, not device. The implementation basically expands the bus operations
to have 2 new operations, for reading/writing extended space. The pci_read_config_byte() and co functions
now check the flag in the pdev structure, and use either the traditional or the extended bus operation methods.
This lead to a tiny bit of duplication in code, but it's not all that bad, and imo greatly offsets all the
pain we have with extended configuration space.
Signed-off-by: Arjan van de Ven <[email protected]>
Index: linux-2.6.24-rc5/arch/x86/pci/common.c
===================================================================
--- linux-2.6.24-rc5.orig/arch/x86/pci/common.c
+++ linux-2.6.24-rc5/arch/x86/pci/common.c
@@ -26,6 +26,7 @@ int pcibios_last_bus = -1;
unsigned long pirq_table_addr;
struct pci_bus *pci_root_bus;
struct pci_raw_ops *raw_pci_ops;
+struct pci_raw_ops *raw_pci_ops_extcfg;
static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
{
@@ -39,9 +40,31 @@ static int pci_write(struct pci_bus *bus
devfn, where, size, value);
}
+static int pci_read_ext(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
+{
+ if (raw_pci_ops_extcfg)
+ return raw_pci_ops_extcfg->read(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+ else
+ return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+}
+
+static int pci_write_ext(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
+{
+ if (raw_pci_ops_extcfg)
+ return raw_pci_ops_extcfg->write(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+ else
+ return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+ devfn, where, size, value);
+}
+
struct pci_ops pci_root_ops = {
.read = pci_read,
.write = pci_write,
+ .readext = pci_read_ext,
+ .writeext = pci_write_ext,
};
/*
Index: linux-2.6.24-rc5/drivers/pci/pci.c
===================================================================
--- linux-2.6.24-rc5.orig/drivers/pci/pci.c
+++ linux-2.6.24-rc5/drivers/pci/pci.c
@@ -752,6 +752,27 @@ int pci_enable_device(struct pci_dev *de
return pci_enable_device_bars(dev, (1 << PCI_NUM_RESOURCES) - 1);
}
+/**
+ * pci_enable_ext_config - Enable extended (4K) config space accesses
+ * @dev: PCI device to be changed
+ *
+ * Enable extended (4Kb) configuration space accesses for a device.
+ * Extended config space is available for PCI-E devices and can
+ * be used for things like PCI AER and other features. However,
+ * due to various stability issues, this can only be done on demand.
+ *
+ * Returns: -1 on failure, 0 on success
+ */
+
+int pci_enable_ext_config(struct pci_dev *dev)
+{
+ if (dev->ext_cfg_space < 0)
+ return -1;
+ dev->ext_cfg_space = 1;
+ return 1;
+}
+EXPORT_SYMBOL_GPL(pci_enable_ext_config);
+
/*
* Managed PCI resources. This manages device on/off, intx/msi/msix
* on/off and BAR regions. pci_dev itself records msi/msix status, so
Index: linux-2.6.24-rc5/include/linux/pci.h
===================================================================
--- linux-2.6.24-rc5.orig/include/linux/pci.h
+++ linux-2.6.24-rc5/include/linux/pci.h
@@ -174,6 +174,15 @@ struct pci_dev {
int cfg_size; /* Size of configuration space */
/*
+ * ext_cfg_space gets set by drivers/quirks to device if
+ * extended (4K) config space is desired.
+ * negative values -- hard disabled (quirk etc)
+ * zero -- disabled
+ * positive values -- enable
+ */
+ int ext_cfg_space;
+
+ /*
* Instead of touching interrupt line and base address registers
* directly, use the values stored here. They might be different!
*/
@@ -302,6 +311,8 @@ struct pci_bus {
struct pci_ops {
int (*read)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val);
int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
+ int (*readext)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val);
+ int (*writeext)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
};
struct pci_raw_ops {
@@ -521,28 +532,47 @@ int pci_bus_write_config_byte (struct pc
int pci_bus_write_config_word (struct pci_bus *bus, unsigned int devfn, int where, u16 val);
int pci_bus_write_config_dword (struct pci_bus *bus, unsigned int devfn, int where, u32 val);
+int pci_bus_read_extconfig_byte (struct pci_bus *bus, unsigned int devfn, int where, u8 *val);
+int pci_bus_read_extconfig_word (struct pci_bus *bus, unsigned int devfn, int where, u16 *val);
+int pci_bus_read_extconfig_dword (struct pci_bus *bus, unsigned int devfn, int where, u32 *val);
+int pci_bus_write_extconfig_byte (struct pci_bus *bus, unsigned int devfn, int where, u8 val);
+int pci_bus_write_extconfig_word (struct pci_bus *bus, unsigned int devfn, int where, u16 val);
+int pci_bus_write_extconfig_dword (struct pci_bus *bus, unsigned int devfn, int where, u32 val);
+
static inline int pci_read_config_byte(struct pci_dev *dev, int where, u8 *val)
{
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_read_extconfig_byte (dev->bus, dev->devfn, where, val);
return pci_bus_read_config_byte (dev->bus, dev->devfn, where, val);
}
static inline int pci_read_config_word(struct pci_dev *dev, int where, u16 *val)
{
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_read_extconfig_word (dev->bus, dev->devfn, where, val);
return pci_bus_read_config_word (dev->bus, dev->devfn, where, val);
}
static inline int pci_read_config_dword(struct pci_dev *dev, int where, u32 *val)
{
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_read_extconfig_dword (dev->bus, dev->devfn, where, val);
return pci_bus_read_config_dword (dev->bus, dev->devfn, where, val);
}
static inline int pci_write_config_byte(struct pci_dev *dev, int where, u8 val)
{
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_write_extconfig_byte (dev->bus, dev->devfn, where, val);
return pci_bus_write_config_byte (dev->bus, dev->devfn, where, val);
}
static inline int pci_write_config_word(struct pci_dev *dev, int where, u16 val)
{
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_write_extconfig_word (dev->bus, dev->devfn, where, val);
return pci_bus_write_config_word (dev->bus, dev->devfn, where, val);
}
static inline int pci_write_config_dword(struct pci_dev *dev, int where, u32 val)
{
+ if (dev->ext_cfg_space > 0)
+ return pci_bus_write_extconfig_dword (dev->bus, dev->devfn, where, val);
return pci_bus_write_config_dword (dev->bus, dev->devfn, where, val);
}
@@ -693,6 +723,9 @@ void ht_destroy_irq(unsigned int irq);
extern void pci_block_user_cfg_access(struct pci_dev *dev);
extern void pci_unblock_user_cfg_access(struct pci_dev *dev);
+extern int pci_enable_ext_config(struct pci_dev *dev);
+
+
/*
* PCI domain support. Sometimes called PCI segment (eg by ACPI),
* a PCI domain is defined to be a set of PCI busses which share
@@ -789,6 +822,8 @@ static inline struct pci_dev *pci_get_bu
unsigned int devfn)
{ return NULL; }
+static inline int pci_enable_ext_config(struct pci_dev *dev) { return -1; }
+
#endif /* CONFIG_PCI */
/* Include architecture-dependent settings and functions */
Index: linux-2.6.24-rc5/arch/x86/pci/mmconfig_32.c
===================================================================
--- linux-2.6.24-rc5.orig/arch/x86/pci/mmconfig_32.c
+++ linux-2.6.24-rc5/arch/x86/pci/mmconfig_32.c
@@ -143,6 +143,6 @@ int __init pci_mmcfg_arch_reachable(unsi
int __init pci_mmcfg_arch_init(void)
{
printk(KERN_INFO "PCI: Using MMCONFIG\n");
- raw_pci_ops = &pci_mmcfg;
+ raw_pci_ops_extcfg = &pci_mmcfg;
return 1;
}
Index: linux-2.6.24-rc5/arch/x86/pci/mmconfig_64.c
===================================================================
--- linux-2.6.24-rc5.orig/arch/x86/pci/mmconfig_64.c
+++ linux-2.6.24-rc5/arch/x86/pci/mmconfig_64.c
@@ -152,6 +152,6 @@ int __init pci_mmcfg_arch_init(void)
return 0;
}
}
- raw_pci_ops = &pci_mmcfg;
+ raw_pci_ops_extcfg = &pci_mmcfg;
return 1;
}
Index: linux-2.6.24-rc5/drivers/pci/access.c
===================================================================
--- linux-2.6.24-rc5.orig/drivers/pci/access.c
+++ linux-2.6.24-rc5/drivers/pci/access.c
@@ -51,6 +51,41 @@ int pci_bus_write_config_##size \
return res; \
}
+#define PCI_OP_READ_EXT(size,type,len) \
+int pci_bus_read_extconfig_##size \
+ (struct pci_bus *bus, unsigned int devfn, int pos, type *value) \
+{ \
+ int res; \
+ unsigned long flags; \
+ u32 data = 0; \
+ if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
+ spin_lock_irqsave(&pci_lock, flags); \
+ if (bus->ops->readext) \
+ res = bus->ops->readext(bus, devfn, pos, len, &data); \
+ else \
+ res = bus->ops->read(bus, devfn, pos, len, &data); \
+ *value = (type)data; \
+ spin_unlock_irqrestore(&pci_lock, flags); \
+ return res; \
+}
+
+#define PCI_OP_WRITE_EXT(size,type,len) \
+int pci_bus_write_extconfig_##size \
+ (struct pci_bus *bus, unsigned int devfn, int pos, type value) \
+{ \
+ int res; \
+ unsigned long flags; \
+ if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
+ spin_lock_irqsave(&pci_lock, flags); \
+ if (bus->ops->writeext) \
+ res = bus->ops->writeext(bus, devfn, pos, len, value); \
+ else \
+ res = bus->ops->write(bus, devfn, pos, len, value); \
+ spin_unlock_irqrestore(&pci_lock, flags); \
+ return res; \
+}
+
+
PCI_OP_READ(byte, u8, 1)
PCI_OP_READ(word, u16, 2)
PCI_OP_READ(dword, u32, 4)
@@ -58,12 +93,25 @@ PCI_OP_WRITE(byte, u8, 1)
PCI_OP_WRITE(word, u16, 2)
PCI_OP_WRITE(dword, u32, 4)
+PCI_OP_READ_EXT(byte, u8, 1)
+PCI_OP_READ_EXT(word, u16, 2)
+PCI_OP_READ_EXT(dword, u32, 4)
+PCI_OP_WRITE_EXT(byte, u8, 1)
+PCI_OP_WRITE_EXT(word, u16, 2)
+PCI_OP_WRITE_EXT(dword, u32, 4)
+
EXPORT_SYMBOL(pci_bus_read_config_byte);
EXPORT_SYMBOL(pci_bus_read_config_word);
EXPORT_SYMBOL(pci_bus_read_config_dword);
EXPORT_SYMBOL(pci_bus_write_config_byte);
EXPORT_SYMBOL(pci_bus_write_config_word);
EXPORT_SYMBOL(pci_bus_write_config_dword);
+EXPORT_SYMBOL(pci_bus_read_extconfig_byte);
+EXPORT_SYMBOL(pci_bus_read_extconfig_word);
+EXPORT_SYMBOL(pci_bus_read_extconfig_dword);
+EXPORT_SYMBOL(pci_bus_write_extconfig_byte);
+EXPORT_SYMBOL(pci_bus_write_extconfig_word);
+EXPORT_SYMBOL(pci_bus_write_extconfig_dword);
/*
* The following routines are to prevent the user from accessing PCI config
Index: linux-2.6.24-rc5/arch/x86/pci/pci.h
===================================================================
--- linux-2.6.24-rc5.orig/arch/x86/pci/pci.h
+++ linux-2.6.24-rc5/arch/x86/pci/pci.h
@@ -32,6 +32,8 @@
extern unsigned int pci_probe;
extern unsigned long pirq_table_addr;
+extern struct pci_raw_ops *raw_pci_ops_extcfg;
+
enum pci_bf_sort_state {
pci_bf_sort_default,
pci_force_nobf,
From: Arjan van de Ven <[email protected]>
Subject: opt the sky2 driver into using extended config space
So far, the sky2 driver is the only one I've identified (with a quick grep)
that actually would be using extended configuration space; the patch below
adds the enablement of this to the driver.
Signed-off-by: Arjan van de Ven <[email protected]>
CC: [email protected]
Index: linux-2.6.24-rc5/drivers/net/sky2.c
===================================================================
--- linux-2.6.24-rc5.orig/drivers/net/sky2.c
+++ linux-2.6.24-rc5/drivers/net/sky2.c
@@ -4111,6 +4111,11 @@ static int __devinit sky2_probe(struct p
goto err_out;
}
+ /*
+ * enable extended config space to enable AER
+ */
+ pci_enable_ext_config(pdev);
+
err = pci_request_regions(pdev, DRV_NAME);
if (err) {
dev_err(&pdev->dev, "cannot obtain PCI resources\n");
Arjan van de Ven wrote:
> Hi,
>
> Linus really wants the extended (4Kb) PCI configuration space (using MCFG acpi table etc) to be opt-in, since there's many issues with it and most drivers don't even use/need it. The idea behind opt-in is that if you don't use it, you don't get to suffer the bugs...
>
> Booted on my 64 bit test machine; sadly it has a defunct BIOS that doesn't have a working MCFG.
>
>
> From: Arjan van de Ven <[email protected]>
> Subject: Make MMCONFIG space a driver opt-in
>
> There are many issues with using the extended PCI configuration space
> (CPU, Chipset and most of all BIOS bugs). This while the vast majority of drivers
> and devices don't even use/need to use the memory mapped access methods since they
> don't use the config space beyond the traditional 256 bytes.
>
> This patch makes accessing the extended config space a driver choice, via the
>
> pci_enable_ext_config(pdev)
Yuck. And, Linus is just being silly. Wait a year then turn on
MMCONFIG :) It took PCI MSI a while to mature, but is finally getting
there.
Jeff
On Sat, 22 Dec 2007 09:20:06 -0500
Jeff Garzik <[email protected]> wrote:
> Arjan van de Ven wrote:
> > Hi,
> >
> > Linus really wants the extended (4Kb) PCI configuration space
> > (using MCFG acpi table etc) to be opt-in, since there's many issues
> > with it and most drivers don't even use/need it. The idea behind
> > opt-in is that if you don't use it, you don't get to suffer the
> > bugs...
> >
> > Booted on my 64 bit test machine; sadly it has a defunct BIOS that
> > doesn't have a working MCFG.
> >
> >
> > From: Arjan van de Ven <[email protected]>
> > Subject: Make MMCONFIG space a driver opt-in
> >
> > There are many issues with using the extended PCI configuration
> > space (CPU, Chipset and most of all BIOS bugs). This while the vast
> > majority of drivers and devices don't even use/need to use the
> > memory mapped access methods since they don't use the config space
> > beyond the traditional 256 bytes.
> >
> > This patch makes accessing the extended config space a driver
> > choice, via the
> >
> > pci_enable_ext_config(pdev)
>
> Yuck. And, Linus is just being silly. Wait a year then turn on
> MMCONFIG :) It took PCI MSI a while to mature, but is finally
> getting there.
>
Do you hate the name or the concept? I'm certainly open for a better name....
On Sat, Dec 22, 2007 at 06:47:19AM -0800, Arjan van de Ven wrote:
> Do you hate the name or the concept? I'm certainly open for a better name....
I hate the concept. We should just fall back to type1 accesses on
mmconfig machines for all accesses below 64 bytes, as Ivan suggested.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Sat, 22 Dec 2007 08:54:37 -0700
Matthew Wilcox <[email protected]> wrote:
> On Sat, Dec 22, 2007 at 06:47:19AM -0800, Arjan van de Ven wrote:
> > Do you hate the name or the concept? I'm certainly open for a
> > better name....
>
> I hate the concept. We should just fall back to type1 accesses on
> mmconfig machines for all accesses below 64 bytes, as Ivan suggested.
I assume you meant 256 not 64;
even then.. it's technically not correct; you're not supposed to mix access types for the same device..
Just doing opt-in also allows us to do quirks (forbid access) as well.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
Arjan van de Ven wrote:
> Hi,
>
> Linus really wants the extended (4Kb) PCI configuration space (using MCFG acpi table etc) to be opt-in, since there's many issues with it and most drivers don't even use/need it. The idea behind opt-in is that if you don't use it, you don't get to suffer the bugs...
>
> Booted on my 64 bit test machine; sadly it has a defunct BIOS that doesn't have a working MCFG.
>
>
> From: Arjan van de Ven <[email protected]>
> Subject: Make MMCONFIG space a driver opt-in
>
> There are many issues with using the extended PCI configuration space
> (CPU, Chipset and most of all BIOS bugs). This while the vast majority of drivers
> and devices don't even use/need to use the memory mapped access methods since they
> don't use the config space beyond the traditional 256 bytes.
>
> This patch makes accessing the extended config space a driver choice, via the
>
> pci_enable_ext_config(pdev)
>
> API function; drivers that want/need the extended configuration space should call this.
> (a separate patch will be posted to add this function call to the driver that uses this)
I don't really like this approach. Whether MMCONFIG works or not has
nothing to do with the device itself, it's an attribute of the machine,
and possibly the bus it's been plugged into. This patch might prevent
problems in some cases, but it's equally likely to just delay problems
until somebody plugs in a device that tries to use extended config
space. Neither do I really like the approach of limiting MMCONFIG
accesses to ones beyond a certain address in the config space, for a
similar reason.
The detection of whether MMCONFIG works or not has to work properly (and
I think we're pretty close, or at least we know what we need to do to
get there, like fixing the stupid MMCONFIG/PCI bar sizing overlap
problem, and likely Tony Camuso's patch or something like it, to disable
MMCONFIG accesses to devices behind certain broken host bridges). Once
that works, then this patch really serves no purpose.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
On Sat, 22 Dec 2007, Arjan van de Ven wrote:
> On Sat, 22 Dec 2007 09:20:06 -0500
> Jeff Garzik <[email protected]> wrote:
> >
> > Yuck. And, Linus is just being silly. Wait a year then turn on
> > MMCONFIG :) It took PCI MSI a while to mature, but is finally
> > getting there.
That _really_ doesn't work.
Old machines don't go away. We can't just say "wait a year and turn on
MMCONFIG", because all the broken machines WILL STILL EXIST.
> Do you hate the name or the concept? I'm certainly open for a better name....
Just make it so. The name is fine, the concept is unavoidable. The people
who complain are whiners that haven't ever had to deal with the fact that
there are broken machines around.
For example, right now Jeff never sees the problem, because when MMCONFIG
doesn't work, it's never his problem - nothing in the machine works. But
if he has to add a "pci_enable_mmconfig()" to the drivers he maintains, he
sees it as a _new_ problem, so he obviously thinks it's stupid: he was
never impacted by the issues it fixed!
So it's natural for Jeff to not like it, but that doesn't make Jeff right
in this case. It just means that Jeff never had to worry about it before,
because as long as MMCONFIG wasn't per-driver, the problems it caused were
never *his* problems. But that doesn't make them less of a problem.
Linus
On Sat, Dec 22, 2007 at 10:22:29AM -0600, Robert Hancock wrote:
> Arjan van de Ven wrote:
>> Hi,
>> Linus really wants the extended (4Kb) PCI configuration space (using MCFG
>> acpi table etc) to be opt-in, since there's many issues with it and most
>> drivers don't even use/need it. The idea behind opt-in is that if you
>> don't use it, you don't get to suffer the bugs...
>> Booted on my 64 bit test machine; sadly it has a defunct BIOS that doesn't
>> have a working MCFG.
>> From: Arjan van de Ven <[email protected]>
>> Subject: Make MMCONFIG space a driver opt-in
>> There are many issues with using the extended PCI configuration space
>> (CPU, Chipset and most of all BIOS bugs). This while the vast majority of
>> drivers
>> and devices don't even use/need to use the memory mapped access methods
>> since they
>> don't use the config space beyond the traditional 256 bytes.
>> This patch makes accessing the extended config space a driver choice, via
>> the
>> pci_enable_ext_config(pdev)
>> API function; drivers that want/need the extended configuration space
>> should call this.
>> (a separate patch will be posted to add this function call to the driver
>> that uses this)
>
> I don't really like this approach. Whether MMCONFIG works or not has
> nothing to do with the device itself, it's an attribute of the machine, and
> possibly the bus it's been plugged into. This patch might prevent problems
> in some cases, but it's equally likely to just delay problems until
> somebody plugs in a device that tries to use extended config space.
But it is that device, and the driver that controls this device that
cares about the extended config space. So it's fair to push this onto
the driver if needed, instead of forcing the pci core to just blindly
guess for all devices, and getting it wrong...
thanks,
greg k-h
Hello!
> Just make it so. The name is fine, the concept is unavoidable. The people
> who complain are whiners that haven't ever had to deal with the fact that
> there are broken machines around.
I complain as well as the maintainer of the pciutils. Breaking all userspace
accesses to extended configuration space just because there is a couple
of chipsets which have MMCONFIG messed up is completely unacceptable.
Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
"This quote has been selected randomly. Really." -- M. Ulrichs
On Sat, 22 Dec 2007 20:30:58 +0100
Martin Mares <[email protected]> wrote:
> Hello!
>
> > Just make it so. The name is fine, the concept is unavoidable. The
> > people who complain are whiners that haven't ever had to deal with
> > the fact that there are broken machines around.
>
> I complain as well as the maintainer of the pciutils. Breaking all
> userspace accesses to extended configuration space just because there
> is a couple of chipsets
it's not "just a couple of chipsets", it's actually
* a whole lot of bioses
* at least one whole CPU generation
* ..
* ..
Do you really want to code all of that into your userspace access code as well?
On Sat, 22 Dec 2007 04:35:59 -0800
Arjan van de Ven <[email protected]> wrote:
> From: Arjan van de Ven <[email protected]>
> Subject: opt the sky2 driver into using extended config space
>
> So far, the sky2 driver is the only one I've identified (with a quick grep)
> that actually would be using extended configuration space; the patch below
> adds the enablement of this to the driver.
>
> Signed-off-by: Arjan van de Ven <[email protected]>
> CC: [email protected]
Current driver uses the drivers memory map access to PCI config space
to avoid all the extra config space access issues. So this patch is not needed.
Hello!
> it's not "just a couple of chipsets", it's actually
> * a whole lot of bioses
> * at least one whole CPU generation
> * ..
> * ..
>
> Do you really want to code all of that into your userspace access code as well?
No, I certainly don't. But I expect this to be handled reasonably in the
kernel.
If I understand the proposed patch correctly, it's only swepping the problem
under the carpet -- if you have one of the malfunctioning systems and
you happen to load a driver which wants to use the extended config
space, it crashes anyway.
Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
It said, "Insert disk #3," but only two will fit!
On Sat, 2007-12-22 at 04:31 -0800, Arjan van de Ven wrote:
> Hi,
>
> Linus really wants the extended (4Kb) PCI configuration space (using MCFG acpi table etc) to be opt-in, since there's many issues with it and most drivers don't even use/need it. The idea behind opt-in is that if you don't use it, you don't get to suffer the bugs...
>
> Booted on my 64 bit test machine; sadly it has a defunct BIOS that doesn't have a working MCFG.
>
>
> From: Arjan van de Ven <[email protected]>
> Subject: Make MMCONFIG space a driver opt-in
>
> There are many issues with using the extended PCI configuration space
> (CPU, Chipset and most of all BIOS bugs). This while the vast majority of drivers
> and devices don't even use/need to use the memory mapped access methods since they
> don't use the config space beyond the traditional 256 bytes.
>
> This patch makes accessing the extended config space a driver choice, via the
>
> pci_enable_ext_config(pdev)
>
> API function; drivers that want/need the extended configuration space should call this.
> (a separate patch will be posted to add this function call to the driver that uses this)
That doesn't look like the right approach to me.
The extended config space is generally used by PCI capabilities. So you
end up in a situation where part of the capabilities will be invisible
until somebody calls your new API, which might affect generic code in
some cases beyond simply what a driver is supposed to do.
I think best is to have your low level config ops switch between
indirect and MM depending on the requested register offset.
Ben.
On Sat, Dec 22, 2007 at 08:02:49AM -0800, Arjan van de Ven wrote:
> even then.. it's technically not correct; you're not supposed to mix access types for the same device..
> Just doing opt-in also allows us to do quirks (forbid access) as well.
I think what this specification means is that you can't mix both _at the
same time_. ie, doing a type 1 cycle, then before it returns, do an
mmconfig cycle. We never try to do that because there's a spinlock to
prevent more than one config space access at a time. It *has* to work
to do them sequentially -- for example, the BIOS may do type1 config
accesses, then the OS may do mmconfig.
I've occasionally wondered if that spinlock needs to get split up, but
for the amount of pain that could ensue, I can't imagine it ever being
worthwhile.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
Greg KH wrote:
> But it is that device, and the driver that controls this device that
> cares about the extended config space. So it's fair to push this onto
> the driver if needed, instead of forcing the pci core to just blindly
> guess for all devices, and getting it wrong...
Nothing is being forced to guess. You make sure the broken chipsets
never enable mmconfig... just like we do for plenty of other errata.
Jeff
Linus Torvalds wrote:
>
> On Sat, 22 Dec 2007, Arjan van de Ven wrote:
>
>> On Sat, 22 Dec 2007 09:20:06 -0500
>> Jeff Garzik <[email protected]> wrote:
>>> Yuck. And, Linus is just being silly. Wait a year then turn on
>>> MMCONFIG :) It took PCI MSI a while to mature, but is finally
>>> getting there.
>
> That _really_ doesn't work.
>
> Old machines don't go away. We can't just say "wait a year and turn on
> MMCONFIG", because all the broken machines WILL STILL EXIST.
You are reinforcing my point :)
The exact same thing can be said for PCI MSI machines that are broken.
Outside of Intel machines, early PCI MSI life was yucky and broken.
Time passed, we got a handle on the problem set, we quirked that
problematic set of systems, and life moved on.
And these days, we are benefitting from that -- most new hardware is
MSI-capable if not MSIX-capable, and we are turning that on.
Some day in the future MSI-only (no INTX) hardware will ship in volume,
and we will already be adequately prepared for that day.
But here the two items diverge: PCI MSI _can_ be an _option_ for most
hardware, hence pci_enable_msi(). But the same cannot be said for
MMCONFIG, for reasons outlined below...
>> Do you hate the name or the concept? I'm certainly open for a better name....
>
> Just make it so. The name is fine, the concept is unavoidable. The people
> who complain are whiners that haven't ever had to deal with the fact that
> there are broken machines around.
>
> For example, right now Jeff never sees the problem, because when MMCONFIG
> doesn't work, it's never his problem - nothing in the machine works. But
> if he has to add a "pci_enable_mmconfig()" to the drivers he maintains, he
> sees it as a _new_ problem, so he obviously thinks it's stupid: he was
> never impacted by the issues it fixed!
I forcibly turn on mmconfig on all my machines, and monitor lkml, to
make sure I'm aware of the extent of the problem -- and the extent of
peoples' exaggeration of this problem.
> So it's natural for Jeff to not like it, but that doesn't make Jeff right
> in this case. It just means that Jeff never had to worry about it before,
> because as long as MMCONFIG wasn't per-driver, the problems it caused were
> never *his* problems. But that doesn't make them less of a problem.
Doing it at the driver level fundamentally doesn't work:
Regardless of whether a driver is loaded or not, you may NEED to see
extended capabilities. The system may NEED to see those capabilities
just to parse them for sane operation.
And pciutils should be able to see all of config space, not just the
Linus-sanitized portion of it. This will make debugging more difficult,
for example, because "lspci -vvvxxx" will not be giving me the full
"before" and "after" snapshots I need from users, to see if anything
changes based on <some hardware poking during debugging>.
I know when you look you see all sorts of brokenness. But you and I
both know that will pass, with time. If its buggy, turn it off. If
not, turn it on. All these hardware makers are paying attention and
fixing errata; evidence of forward progress in that regard has already
appeared even.
Finally, this introduces a new per-device model for PCI config space
accesses, something we have NEVER done before. PCI config space should
always be all or none. If MMCONFIG is fucked, just turn it off completely.
Having to deal with NEW issues brought on by this NEW per-device config
space access model is stupid-by-design. Always-off is better than
changing the access model from global to per-device.
It is even MORE unlikely that hardware makers test the "mmconfig over
here, type1 over there" mixed accesses. Linux should not go down that road.
Always-off is better than mixed access models.
Jeff
Arjan van de Ven wrote:
> On Sat, 22 Dec 2007 20:30:58 +0100
> Martin Mares <[email protected]> wrote:
>
>> Hello!
>>
>>> Just make it so. The name is fine, the concept is unavoidable. The
>>> people who complain are whiners that haven't ever had to deal with
>>> the fact that there are broken machines around.
>> I complain as well as the maintainer of the pciutils. Breaking all
>> userspace accesses to extended configuration space just because there
>> is a couple of chipsets
>
> it's not "just a couple of chipsets", it's actually
> * a whole lot of bioses
> * at least one whole CPU generation
> * ..
> * ..
>
> Do you really want to code all of that into your userspace access code as well?
That's silly. He clearly should not have to... just like he should
not have to add code to figure out if a device is MMCONFIG-active or not.
MMCONFIG should be all or none. System vendors sure as hell will not be
testing this crazy mixed-access model. System vendors DO test the
"always off" model, obviously, and the "always on" model is entering
their testing regime as Vista certification looms and as Linux starts to
find bugs.
Just Say No to entering "hw vendor never ever tests it this way" land.
Jeff
Arjan van de Ven wrote:
> On Sat, 22 Dec 2007 09:20:06 -0500
> Jeff Garzik <[email protected]> wrote:
>
>> Arjan van de Ven wrote:
>>> Hi,
>>>
>>> Linus really wants the extended (4Kb) PCI configuration space
>>> (using MCFG acpi table etc) to be opt-in, since there's many issues
>>> with it and most drivers don't even use/need it. The idea behind
>>> opt-in is that if you don't use it, you don't get to suffer the
>>> bugs...
>>>
>>> Booted on my 64 bit test machine; sadly it has a defunct BIOS that
>>> doesn't have a working MCFG.
>>>
>>>
>>> From: Arjan van de Ven <[email protected]>
>>> Subject: Make MMCONFIG space a driver opt-in
>>>
>>> There are many issues with using the extended PCI configuration
>>> space (CPU, Chipset and most of all BIOS bugs). This while the vast
>>> majority of drivers and devices don't even use/need to use the
>>> memory mapped access methods since they don't use the config space
>>> beyond the traditional 256 bytes.
>>>
>>> This patch makes accessing the extended config space a driver
>>> choice, via the
>>>
>>> pci_enable_ext_config(pdev)
>> Yuck. And, Linus is just being silly. Wait a year then turn on
>> MMCONFIG :) It took PCI MSI a while to mature, but is finally
>> getting there.
>>
>
> Do you hate the name or the concept? I'm certainly open for a better name....
Many problems:
* even if driver not loaded, you might need to access extended capabilities
* kernel hacker (me!) might request user to dump PCI config space to see
what changes, after various experiments. we need to see that extended
space, if it exists, even if driver not loaded.
* this "mixed config access" model is new to Linux, after always having
config access type be a global system attribute. It introduces new
complexity and new inconsistency all over the place.
* hardware makers will not test this weird "mixed access" model.
You thought mmconfig was poorly tested? Well, why the hell choose
something with even less testing behind it (and future likelihood of nil
testing).
Always-off is better than mixed access.
Jeff
On Sat, 22 Dec 2007, Jeff Garzik wrote:
>
> I forcibly turn on mmconfig on all my machines, and monitor lkml, to make sure
> I'm aware of the extent of the problem -- and the extent of peoples'
> exaggeration of this problem.
Bullshit.
You have how many machines? Ten?
The problem is that it isn't enough that it works on common machines with
good hardware. The problem is that we end up chasing insane bugs, wasting
peoples valuable time and effort, on those *few* - out of *millions* - of
machines that then surprisingly don't work.
And "surprisingly" is definitely the watch-word. That includes silently
just not booting (because the first time anybody tries to do a PCI config
cycle access, the machine just locks up) to some really *odd* behaviour
(ie everything works fine *except* that reading the PCI card ID from a
few cards returns a value of 0x0001 rather than the correct one).
The fact is, we're currently turning off MMCONFIG very aggressively,
exactly because it has caused problems. So most people never even use
MMCONFIG when it's enabled, because all of our sanity checks that turn it
off again. And it still has caused odd problems.
> Regardless of whether a driver is loaded or not, you may NEED to see extended
> capabilities. The system may NEED to see those capabilities just to parse
> them for sane operation.
And that's just not true.
I don't know why you even claim so. There is absolutely *zero* need for
MMCONFIG for 100% of old hardware, and old hardware is exactly the
problem. The hardware that people will run next year isn't the issue.
There may be an absolute need for MMCONFIG for future machines and future
drivers, but there is not a single machine in existence today that really
depends on it. You're just making things up. At worst, you lose some
not-so-important error reporting, no more.
So don't go spreading obvious untruths,
Linus
Linus Torvalds wrote:
> The problem is that it isn't enough that it works on common machines with
> good hardware. The problem is that we end up chasing insane bugs, wasting
> peoples valuable time and effort, on those *few* - out of *millions* - of
> machines that then surprisingly don't work.
>
> And "surprisingly" is definitely the watch-word. That includes silently
> just not booting (because the first time anybody tries to do a PCI config
> cycle access, the machine just locks up) to some really *odd* behaviour
> (ie everything works fine *except* that reading the PCI card ID from a
> few cards returns a value of 0x0001 rather than the correct one).
> The fact is, we're currently turning off MMCONFIG very aggressively,
> exactly because it has caused problems. So most people never even use
> MMCONFIG when it's enabled, because all of our sanity checks that turn it
> off again. And it still has caused odd problems.
Yes, I know all this. I am quite aware of the situation.
My core assertion: the present situation -- turning off MMCONFIG
aggressively -- is greatly preferable to adding
pci_enable_mmconfig_accesses(pdev).
IOW, don't add a new API. Keep doing what we're doing today.
Today, we have a consistent "all or none" model for mmconfig. Any
per-device mmconfig enabling introduces pain and inconsistency, in both
the kernel and userland.
Users with devices that REQUIRE extended config accesses should buy
machines with known good mmconfig. The situation will sort itself out
from there. We don't need ugly hacks like
pci_enable_mmconfig_for_this_device().
Jeff
(second response, for other paragraphs, following in separate email. I
wanted to underscore the core API issue...)
Linus Torvalds wrote:
> On Sat, 22 Dec 2007, Jeff Garzik wrote:
>> Regardless of whether a driver is loaded or not, you may NEED to see extended
>> capabilities. The system may NEED to see those capabilities just to parse
>> them for sane operation.
>
> And that's just not true.
>
> I don't know why you even claim so. There is absolutely *zero* need for
> MMCONFIG for 100% of old hardware, and old hardware is exactly the
> problem. The hardware that people will run next year isn't the issue.
I'm not claiming anything about old hardware.
It should be self-evident that mmconfig doesn't work on old hardware, is
not needed on old hardware, should not be turned on for old hardware,
and in general should never disturb old hardware.
> There may be an absolute need for MMCONFIG for future machines and future
> drivers, but there is not a single machine in existence today that really
> depends on it. You're just making things up. At worst, you lose some
> not-so-important error reporting, no more.
>
> So don't go spreading obvious untruths,
The facts as they exist today:
1) Existing 256-byte config space devices have been known put
capabilities in the high end (>= 0xc8) of config space.
2) It is legal for PCI-Express to put capabilities anywhere in PCI
config space, including extended config space. (I hope our PCI cap
walking code checks for overruns...)
3) Most new machines ship with PCI-Express devices with extended config
space.
Therefore it is provable /possible/, and is indeed logical to conclude
that capabilities in extended config space will follow the same pattern
that existing hw designers have been following... but only once the
current OS's have stable extended-config-space support.
Maybe that day will never come, but it is nonetheless quite possible
without today's PCI Express spec for this to happen.
We must handle this case, even if mmconfig is COMPLETELY DISABLED (i.e.
normal mode today).
Jeff
Jeff Garzik wrote:
> Maybe that day will never come, but it is nonetheless quite possible
> without today's PCI Express spec for this to happen.
er, s/without/within/
On Sat, 22 Dec 2007, Jeff Garzik wrote:
> My core assertion: the present situation -- turning off MMCONFIG aggressively
> -- is greatly preferable to adding pci_enable_mmconfig_accesses(pdev).
Well, you do realize that right now we have to have _users_ just doing
"pci=nommconf" on the kernel command line, because we apparently don't do
it aggressively enough?
The fact is, we simply don't know what the breakage is. The only safe
situation is to just assume things are broken, and enable it only when
necessary.
And if there isn't a driver that needs it, then it sure as *hell* isn't
necessary.
Linus
On Sat, 22 Dec 2007, Jeff Garzik wrote:
>
> It should be self-evident that mmconfig doesn't work on old hardware, is not
> needed on old hardware, should not be turned on for old hardware, and in
> general should never disturb old hardware.
.. but it does. How do you figure out when to turn it off?
By "old hardware" I don't mean stuff from the last century, that generally
doesn't even *have* MMCONFIG. I mean the stuff you can buy *today*, which
will be old by the time people really start _needing_ MMCONFIG.
The fact is, 99% of the hardware you buy *today* has absolutely zero need
for extended PCI config access. In fact, I would not be surprised at all
if most hardware sold today generally doesn't have *any* devices that even
have config registers in the 0x100+ range.
So those are the kinds of machines we want to protect from blowing up.
Stuff that isn't sold with Vista, and has never been tested (or where
vista does some work-arounds we don't even know about - somebody was
mentioning things like looking at the BIOS date etc).
Linus
On Sat, 22 Dec 2007, Jeff Garzik wrote:
>
> Jeff Garzik wrote:
> > Maybe that day will never come, but it is nonetheless quite possible without
> > today's PCI Express spec for this to happen.
>
> er, s/without/within/
You're talking specs. I'm talking machines.
I agree with you 100% that as per specs, you need to support MMCONFIG,
because anything can happen.
As per *reality*, though, machines sold today simply don't need it. So
there is no upside, and there is definitely downside.
I want to limit that downside. Right now, the easiest way to limit it
seems to be to say that those (very very few) drivers that actually care
could enable it. That way, we automatically limit it to only those
machines that have hardware that cares.
And yes, if you want the capability following to notice automatically when
capabilities really do go into the 0x100+ range, that's fine. I suspect
that there aren't really very many cards that do that (because they
wouldn't work with WinXP etc), so I doubt it will actually enable MMCONFIG
for any noticeable amount of hardware sold today. Which is exactly what I
want. I do *not* want to enable MMCONFIG unless it is shown to actually be
needed.
And no, "specs" do not show it is needed. MMCONFIG is needed depending on
the actual *hardware* the kernel runs on, not based on some external
specs.
Linus
Linus Torvalds wrote:
>
> On Sat, 22 Dec 2007, Jeff Garzik wrote:
>
>> My core assertion: the present situation -- turning off MMCONFIG aggressively
>> -- is greatly preferable to adding pci_enable_mmconfig_accesses(pdev).
>
> Well, you do realize that right now we have to have _users_ just doing
> "pci=nommconf" on the kernel command line, because we apparently don't do
> it aggressively enough?
>
> The fact is, we simply don't know what the breakage is. The only safe
> situation is to just assume things are broken, and enable it only when
> necessary.
Absolutely.
But regardless of problems, enabling should be done globally, not per
device... You have three possibilities for an mmconfig-maybe-capable
machine:
1) mmconfig disabled globally (for a single computer)
Widely tested by users and hw vendors
Consistent (all devices use same type of config access)
2) mmconfig enabled globally (for a single computer)
Poorly tested by hw vendors, apparently
Consistent (all devices use same type of config access)
3) mmconfig might or might not be enabled, depending on which driver is
loaded, whether it called an API or not.
Even LESS testing by hw vendors than #2. Maybe even "never"
Inconsistent (config access depends on device)
Choosing solution #3 is to choose the /least/ tested, /least/ likely to
get hw vendor testing, /most/ inconsistent solution available. Doing so
is not maximizing good engineering attributes :)
AFAICS, solution #3 has a much higher chance of breaking userspace
(pciutils, X.org, distro installers that read PCI config space, ...) as
a result of the inconsistencies introduced.
I agree 100% with your summary of the problem.
But the per-device enabling solution introduced a "mixed model" for
config space accesses is worse than any whitelist or other global [for a
single computer] solution.
The per-device enabling solution is IMO worse than simply deleting all
mmconfig code from the kernel. At least the "kill mmconfig" solution
would maintains the "tested" and "consistent" properties :)
Jeff
On Sat, 22 Dec 2007, Jeff Garzik wrote:
>
> But regardless of problems, enabling should be done globally, not per
> device...
I'm ok with trying the "globally" idea, but it has to be "globally but
only if absolutely required".
And quite frankly, how do you tell whether it's absolutely required or
not?
I have an idea: the drivers that really need it will do a "please enable
MMCONFIG, because I will need it" thing?
Ok?
And then, since we *need* such a "pci_enable_mmconfig()" call anyway, why
not let the driver give which device it controls too, so that we can print
out the information (in case the machine then hangs immediately
afterwards), and perhaps - if it is shown to help - only do the MMCONFIG
cycles for that particular device?
Sounds like a plan?
Linus
Linus Torvalds wrote:
> I want to limit that downside. Right now, the easiest way to limit it
> seems to be to say that those (very very few) drivers that actually care
> could enable it. That way, we automatically limit it to only those
> machines that have hardware that cares.
Then let's do it right: disable mmconfig by default on x86, and enable
it when passed "pci=mmconfig".
For the rare -- you and I agree its very rare -- case where it is
REQUIRED, the user can pass pci=mmconfig as instructed by driver
documentation somewhere.
Let's not bend over backwards and introduce an API for these
presently-theoretical cases. Given the complete lack of hw vendor
testing and potential to confuse userspace, the two choices for a
computer should be "mmconfig off" or "mmconfig on."
Kernel hackers developing drivers and code for new machines will know
enough to pass pci=mmconfig if they NEED it. That practice will only
become annoying when x86 hardware actually starts to NEED extended
config space -- at which future time we can revisit, as you describe.
> And yes, if you want the capability following to notice automatically when
> capabilities really do go into the 0x100+ range, that's fine. I suspect
Yes, we /must/ do this checking, if we don't already.
Jeff
Linus Torvalds wrote:
>
> On Sat, 22 Dec 2007, Jeff Garzik wrote:
>> But regardless of problems, enabling should be done globally, not per
>> device...
>
> I'm ok with trying the "globally" idea, but it has to be "globally but
> only if absolutely required".
>
> And quite frankly, how do you tell whether it's absolutely required or
> not?
>
> I have an idea: the drivers that really need it will do a "please enable
> MMCONFIG, because I will need it" thing?
>
> Ok?
>
> And then, since we *need* such a "pci_enable_mmconfig()" call anyway, why
> not let the driver give which device it controls too, so that we can print
> out the information (in case the machine then hangs immediately
> afterwards), and perhaps - if it is shown to help - only do the MMCONFIG
> cycles for that particular device?
>
> Sounds like a plan?
As long as pci_enable_ext_cfg_space(pdev) enables extended accesses for
-all- devices, the plan is mostly sound.
That largely eliminates the inconsistency issue.
The only thing I would worry about is whether "config space suddenly
grew larger" condition will confuse userspace -- but that is NOT an
objection, just a worry.
Jeff
On 12/22/2007 11:13 PM, Jeff Garzik wrote:
>
> The facts as they exist today:
>
> 1) Existing 256-byte config space devices have been known put
> capabilities in the high end (>= 0xc8) of config space.
>
> 2) It is legal for PCI-Express to put capabilities anywhere in PCI
> config space, including extended config space. (I hope our PCI cap
> walking code checks for overruns...)
You make it sound almost as if the capability list that starts in
regular conf-space could cross into extended conf-space >= 256). That's
not true, the capability lists in the regular conf-space and the
extended conf-space are really separate, they use a different structure
for linking (different number of bits to define the capability IDs), a
different starting point, different capability IDs definition tables.
The regular conf-space and the extended conf-space are really independant.
>
> 3) Most new machines ship with PCI-Express devices with extended
> config space.
>
> Therefore it is provable /possible/, and is indeed logical to conclude
> that capabilities in extended config space will follow the same
> pattern that existing hw designers have been following... but only
> once the current OS's have stable extended-config-space support.
>
> Maybe that day will never come, but it is nonetheless quite possible
> within today's PCI Express spec for this to happen.
I agree with that statement. In fact it is already quite useful today. I
am doing a lot of support activities where extended-conf-space is a must
for troubleshooting. It was important enough for us that we have
user-tools that allows us to access mmconfig-space for pci-express even
on systems that don't advertise a MCFG attribute (as long as the chipset
supports it, we have reverse-engineered the location of the "mmconfig
bar" for a few chipsets including nvidia chipsets, for Intel it is well
documented, and there are couple others).
Loic
On Sun, 23 Dec 2007, Jeff Garzik wrote:
>
> Then let's do it right: disable mmconfig by default on x86, and enable it
> when passed "pci=mmconfig".
I'm certainly ok with that, but then you say:
> > And yes, if you want the capability following to notice automatically when
> > capabilities really do go into the 0x100+ range, that's fine. I suspect
>
> Yes, we /must/ do this checking, if we don't already.
Hell no. If the user asked for mmconfig to be disabled, it needs to be
disabled, because it may well be broken and non-working. It's better to
not see some capabilities than to lock up the machine.
So what is it? Is it "some machines do it automatically when needed" or is
it "always disabled"?
You seem to now say that "always disabled by default" isn't good either.
What do I need to do to convince you that the *right* thing to do is:
- disabled by default, unless user *explicitly* asks for it with
"pci=mmconfig"
- but certain events will enable it automatically, unless the user
*explicitly* said (or we had a quirk that explicitly disabled it)
something like pci=mmconfig
See what I'm saying? You don't really seem to be disagreeing. It seems to
be, in fact, exactly what you want.
I'm saying that yes, the PCI capability code might be one such "enable
mmconfig if not explicitly disabled" user.
But I also suspect that some drivers might want to do it, and I'm almost
certain that some DMI quirks will want to do it (ie "I recognize this
machine, and this machine is from 2011, and wants MMCONFIG"). So we need
to have an interface for those quirks or other events to also do it, it
cannot be just a "PCI capabilities do it".
Quite frankly, the fewer drivers that do it, the happier I am. Maybe *no*
drivers will do it. If so, "Hallelujah, brother, give me an Amen!". But we
clearly want an interface for _some_ things to say "we might want to have
mmconfig". Maybe it's only PCI capabilities.
Regardless. Even if *no* driver ever does it, the logical interface will
clearly be "pci_enable_mmio(pdev)".
You can then argue all you want against individual drivers actually ever
using it, and I'll support you on that. I think MMCONFIG was/is a total
waste of everybodys time.
Linus
Linus Torvalds wrote:
> On Sun, 23 Dec 2007, Jeff Garzik wrote:
>>> And yes, if you want the capability following to notice automatically when
>>> capabilities really do go into the 0x100+ range, that's fine. I suspect
>> Yes, we /must/ do this checking, if we don't already.
>
> Hell no. If the user asked for mmconfig to be disabled, it needs to be
> disabled, because it may well be broken and non-working. It's better to
> not see some capabilities than to lock up the machine.
I think my short response created confusion. I was only saying that we
should check for capability overruns, if we don't already[1].
I wasn't saying anything about mmconfig in that sentence, sorry.
Let's pop this sub-thread off the stack, and jump to your "Sounds like a
plan?" train of thought.
Jeff
[1] Lioc's response seems to imply my worry about overruns is unfounded.
On 12/22/2007 11:52 PM, Jeff Garzik wrote:
>
> Absolutely.
>
> But regardless of problems, enabling should be done globally, not per
> device...
The "enabling globally" requirement, i.e. not per-device, neither
depending on reg >= 256 seems a very debatable assumption (IMHO a big
mistake) that seems to be responsible for many of the problems seen in
the past.
There might be for a very long time AMD-architectures where
extended-conf-space access for pci-express device works and is
beneficial (and where mmconf is not supported by the hardware on
non-pci-express devices). You are basically saying you don't want ever
to support extended-conf-space globally for those systems, where it
would be so easy to precisely use mmconf *only* when attempting
*extended-conf-space * (>= 256) to some device (that provides a strong
guarantee that you will never break anything unless somebody actually
tries to use the extended-conf-space).
Supporting extended-conf-space is independant of the issue of using
mmconf for legacy conf-space. There is no real reason to use the same
method to access both. I have seen several arguments used that were
implying that, and they all seem really bogus to me. Not only are the
two ranges (<= 256 and >= 256) structurally independant (you have
totally independant capabilities lists that are independantly organized
in each of them), even if they were not there is no consistency issue
that cannot be dealt with a memory barrier, and the concern about taking
an extra branch for each pci-conf-space access is also bogus once you
look at the numbers.
By possibly using different implementations for the two ranges you avoid
introducing a new API, you avoid taking risks with mmconf when you don't
need it. That doesn't preclude using mmconf for everything either if the
user requests it or based on enough knowledge of the system to be sure
nothing will break.
Loic
Loic Prylli wrote:
> Supporting extended-conf-space is independant of the issue of using
> mmconf for legacy conf-space.
True.
> There is no real reason to use the same
> method to access both. I have seen several arguments used that were
> implying that, and they all seem really bogus to me. Not only are the
> two ranges (<= 256 and >= 256) structurally independant (you have
> totally independant capabilities lists that are independantly organized
> in each of them), even if they were not there is no consistency issue
> that cannot be dealt with a memory barrier, and the concern about taking
> an extra branch for each pci-conf-space access is also bogus once you
> look at the numbers.
>
> By possibly using different implementations for the two ranges you avoid
> introducing a new API, you avoid taking risks with mmconf when you don't
> need it. That doesn't preclude using mmconf for everything either if the
> user requests it or based on enough knowledge of the system to be sure
> nothing will break.
Are you prepared to guarantee that freely mixing mmconfig and type1
config accesses at runtime will always work, on all chipsets? I'm
talking about silicon here, not kernel software.
Furthermore, is it best for our users to find problems with mixed config
accesses -- not at boot time, not at driver load time, but at some
random time when some random driver does its first extended config space
access? IMO, no. If you are going to fail, failing in a predictable,
visible way is best.
Failures are more predictable and more consistent with an all-or-none
scenario. The all-or-none solutions are the least complex on the
software side, and far more widely tested than any mixed config access
scheme.
Jeff
Jeff Garzik wrote:
>
> 2) It is legal for PCI-Express to put capabilities anywhere in PCI
> config space, including extended config space. (I hope our PCI cap
> walking code checks for overruns...)
>
Uh, not really. The classical capability format only has 8-bit
addresses, and the spec requires that all extended config space that is
used be claimed by extended capabilities -- a different format chain.
-hpa
On 12/23/2007 12:44 AM, Jeff Garzik wrote:
>>
>> By possibly using different implementations for the two ranges you avoid
>> introducing a new API, you avoid taking risks with mmconf when you don't
>> need it. That doesn't preclude using mmconf for everything either if the
>> user requests it or based on enough knowledge of the system to be sure
>> nothing will break.
>
> Are you prepared to guarantee that freely mixing mmconfig and type1
> config accesses at runtime will always work, on all chipsets? I'm
> talking about silicon here, not kernel software.
Your question is whether I expect a mix of type1 (for legacy conf space)
and mmconfig (for >= 256) to trigger some bug in silicon causing
flakiness (why not possible random corruption) that would not happen
when only using type1 or mmconfig.
Obviously, I cannot prove or guarantee the absence of subtle silicon
bugs. But I can say:
- documentation is saying mixing them is OK, pci-express docs insists
on paying attention to ordering when mixing (which is easy on x86),
acknowledging one is able to mix them.
- All current kernels have been using a mix of "type1" and "mmconf"
accesses to the same device even if that's only used during mmconfig
initialization.
- On amd platforms with typical combination on mmconf-aware and
non-mmconf aware chipsets, you always have a mix of "type1" and "mmconf"
accesses all the time with standard kernels (not on the same bus)
- Mixing such accesses on the same device at runtime has happened on
thousands of systems for Myricom device/software users.
Are you prepared to guarantee no processor will ever have a wierd
silicon problem that's triggered by any Linux change?
Mixing mmconf/type-1 is an approach that any mmconf-able hardware is
supposed to support, it has been tried on at least most
server/workstation chipsets typically used in the last two years, so the
burden of proof is on actually finding some silicon that cannot do that
properly before rejecting that based on hardware fears (otherwise you
go on a slippery slope).
>
> Furthermore, is it best for our users to find problems with mixed
> config accesses -- not at boot time, not at driver load time, but at
> some random time when some random driver does its first extended
> config space access? IMO, no. If you are going to fail, failing in a
> predictable, visible way is best.
A failure happening during driver initialization or some very specific
identifiable driver event (the first ext-conf-space access is not
something popping randomly in the life of a driver) is predictable and
visible. I am not sure what kind of problem you are afraid of anyway
(ext-conf-space not available?), so it's hard to talk generally.
Needless to say, I never said to not to test mmconf on all pci-express
devices advertised in MCFG during initialization time (preferably after
all PCI-memory-space initialization is done since there is uncertainty
about whether mmconf might conflict with that in some corner cases).
>
> Failures are more predictable and more consistent with an all-or-none
> scenario. The all-or-none solutions are the least complex on the
> software side, and far more widely tested than any mixed config access
> scheme.
Disabling altogether ext-conf-space access by default (that's the
default your propose, right? ) is definitely the safest approach. But
that's the least useful too.
Caricaturing, I am the one advocating the real all-or-none scenario:
always use type1 for legacy conf-space (never mmconf). You are saying
that for the legacy-conf-space access, some people should use mmconf and
other people should use type1 based on their kernel configuration or
command-line, and that use depends on something unrelated (whether they
feel they might need extended-conf-space access at some point, be it
even for lspci/setpci).
Loic
Hi!
> The fact is, 99% of the hardware you buy *today* has absolutely zero need
> for extended PCI config access. In fact, I would not be surprised at all
> if most hardware sold today generally doesn't have *any* devices that even
> have config registers in the 0x100+ range.
Most PCI Express bridges have root complex control block capabilities
which have to reside in the extended range.
However I agree that hardware _requiring_ extended config register access
is rare.
Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
"Dijkstra probably hates me." -- /usr/src/linux/kernel/sched.c
>
> 3) mmconfig might or might not be enabled, depending on which driver
> is loaded, whether it called an API or not.
>
> Even LESS testing by hw vendors than #2. Maybe even "never"
>
> Inconsistent (config access depends on device)
the "depends on device" is even true for Linux today. For us today, MMCONFIG isn't always used, it's used on a per device basis already; except that the per-device is both defined by the bios and our quirks....
(the mmconfig code already falls back to conf1 cycles in various cases)
So I'm not entirely buying your argument. IN fact I'm not buying your "mixed is not tested at all" argument; while the statement may be true, it's not different than it is from Linux today... which is mixed. Just differently mixed I suppose.
On Sun, Dec 23, 2007 at 12:44:30AM -0500, Jeff Garzik wrote:
> Failures are more predictable and more consistent with an all-or-none
> scenario. The all-or-none solutions are the least complex on the software
> side, and far more widely tested than any mixed config access scheme.
Nope. The vast majority of mmconfig problems that happen at boot time
actually have nothing to do with "broken" or "working" mmconfig.
Typically these are
- mmconf area overlapped during BAR sizing;
- BIOS (or kernel) placed some MMIO in the middle of mmconfig area,
which looks like some random device "doesn't like mmconfig".
This is a result of all-or-none approach, as mmconfig is fundamentally
unsafe until after PCI init is done.
The mixed access that Loic proposes allows to avoid these boot problems
just for free. Systems that have both non-mmconf PCI(X) and mmconf PCI-E
domains could be handled almost for free as well.
And probably most important thing is that the x86 pci_conf implementation
would be so much simpler and cleaner that I honestly don't understand
why are we still discussing it ;-)
At the same time, making drivers to request extended config space
still makes sense. I think of pci_request_ext_conf(dev, drv_name) which
doesn't set any per-device flag, but
- returns success or failure depending on mmconf availability;
- can be logged by kernel to make it easier to debug if something
goes wrong.
Ivan.
On Sun, 23 Dec 2007, Ivan Kokshaysky wrote:
>
> This is a result of all-or-none approach, as mmconfig is fundamentally
> unsafe until after PCI init is done.
Yes. One of the things I want to have happen (and which
"pci_enable_mmconf()" would do automatically) is that we always probe
using conf1 cycles in any machine where conf1 works at all.
Then, some late PCI quirk or some per-device quirk (or driver) can decide
to enable mmconf later, and that avoids the current nightmare with the
whole resource ordering.
Of course, if there really are machines that have somehow disabled conf1
accesses, we'd have to use mmconfig early, but that should automatically
be handled by the PCI config probing stuff (ie we already test whether
conf1 accesses seem to work, and would fall back on alternate config
cycle accesses if conf1 looks broken).
Right now, the check that MMCONF space has to be in e820-reserved memory
space protects us from *most* of the probe-time problems, but that's also
obviously the check that actually means that MMCONF isn't actually used on
the vast majority of machines out there.
(For example: I have three machines that I know have working MMCONF. On
only one of theose does Linux actually even enable MMCONF accesses,
because on the two other ones the BIOSes do the crazy "put it in some
space that is reserved by PnP crap later", so we actually refuse to touch
it. So at least in my limited environment, we hardly get any MMCONFIG test
coverage, exactly because we have to be so totally anal about not enabling
it early, because we cannot guarantee that it's not clashing with anything
else).
Linus
On Sat, 2007-12-22 at 21:00 -0800, Linus Torvalds wrote:
>
> On Sat, 22 Dec 2007, Jeff Garzik wrote:
> >
> > But regardless of problems, enabling should be done globally, not per
> > device...
>
> I'm ok with trying the "globally" idea, but it has to be "globally but
> only if absolutely required".
>
> And quite frankly, how do you tell whether it's absolutely required or
> not?
>
> I have an idea: the drivers that really need it will do a "please enable
> MMCONFIG, because I will need it" thing?
What about this approach:
- At boot, MMCONFIG is disabled, you perform the initial probe of all
devices.
- During that probe, you set a flag if any device has capabilities that
extend beyond 0xff.
- Once the main probe pass is done (before drivers are hooked up, which
on PCI is a separate phase), if that flag has been set, you print a fat
warning (on peecee's, which are our main concern, this will generally be
visible on vga console), that you're about to enable MMCONFIG for this
device and if the machine crashes, send a bug report and boot with
nommconfig. Then, for each of those devices, attempt to re-read the
config space header with MMCONFIG, and compare to what is supposed to be
there (+/- irrelevant status bits). If that fails, warn and keep
MMCONFIG disabled.
At this point, you can decide to either keep MMCONFIG on/off on a
per-device basis (if a device has capabilities in the ext space, use
MMCONFIG for it). In that case, your proposed API is useful to
force-enabling MMCONFIG for a given device if there are no such
capabilities but the driver wants to force-enable. That case should be
extremely rare if existing at all
Or you can decide to keep the enable/disable global (if a single device
fails, don't enable MMCONFIG globally).
Ben.
> Are you prepared to guarantee that freely mixing mmconfig and type1
> config accesses at runtime will always work, on all chipsets? I'm
> talking about silicon here, not kernel software.
We spinlock config space accesses. So the silicon will never see
simultaneous accesses, which is what I strongly suspect would be broken
in chipsets (asking for livelocks here).
Now there is the question of whether the silicon implements MMCONFIG
writes in some asynchronous way, in which case, there might be an issue
if you go bang the port right away but I very much doubt it.
Ben.
Hello!
> - During that probe, you set a flag if any device has capabilities that
> extend beyond 0xff.
Can this work? The extended capabilities are not linked to the normal
ones in any way.
Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
A bug in the code is worth two in the documentation.
On Sun, Dec 23, 2007 at 10:15:10PM +0100, Martin Mares wrote:
> > - During that probe, you set a flag if any device has capabilities that
> > extend beyond 0xff.
>
> Can this work? The extended capabilities are not linked to the normal
> ones in any way.
Good point.
OTOH, we *do* have a flag for the extended capabilities - dev->cfg_size.
Obviously, the pci probe *without* mmconfig will set it to 256 for *all*
devices.
So Linus' idea of enabling mmconfig per-device makes a lot of sense in the
end - if mmconfig works, it just sets dev->cfg_size to 4096.
Without bloating the struct pci_dev or screwing up innocent arches...
Ivan.
On Sun, 2007-12-23 at 22:15 +0100, Martin Mares wrote:
> Hello!
>
> > - During that probe, you set a flag if any device has capabilities that
> > extend beyond 0xff.
>
> Can this work? The extended capabilities are not linked to the normal
> ones in any way.
Yeah, well, you set a flag if you have extended capabilities. I don't
have my spec at hand (or the code) but can't we know if there are any
based on some non-ext field ?
Ben.
On Mon, 2007-12-24 at 10:06 +1100, Benjamin Herrenschmidt wrote:
> On Sun, 2007-12-23 at 22:15 +0100, Martin Mares wrote:
> > Hello!
> >
> > > - During that probe, you set a flag if any device has capabilities that
> > > extend beyond 0xff.
> >
> > Can this work? The extended capabilities are not linked to the normal
> > ones in any way.
>
> Yeah, well, you set a flag if you have extended capabilities. I don't
> have my spec at hand (or the code) but can't we know if there are any
> based on some non-ext field ?
Allright, I d/l the spec and you are right...
Yet another massive Intel f*ckup in the definition of PCI... Argh.
So that means you can't automatically detect that there's potentially
stuff in the extended caps. You still can do the initial probe solely
with type1, that would work around the problem with MMCONFIG overlapping
sized BARs (ah, BAR sizing, yet another case where somebody at Intel
should rather have broken an arm that day).
I still don't believe much in the idea of drivers enabling MMCONFIG
though.
Ben.,
> I've occasionally wondered if that spinlock needs to get split up, but
> for the amount of pain that could ensue, I can't imagine it ever being
> worthwhile.
Wondered the same thing and decided against it. I do have every now and
then some really crazy cases to deal with. One of them is the need to
use something like fixmap/kmap_atomic because a vendors makes 32 bits
CPUs that need 512MB of address space to map config space :-)
Ben.
Arjan van de Ven wrote:
>> 3) mmconfig might or might not be enabled, depending on which driver
>> is loaded, whether it called an API or not.
>>
>> Even LESS testing by hw vendors than #2. Maybe even "never"
>>
>> Inconsistent (config access depends on device)
>
> the "depends on device" is even true for Linux today. For us today, MMCONFIG isn't always used, it's used on a per device basis already; except that the per-device is both defined by the bios and our quirks....
> (the mmconfig code already falls back to conf1 cycles in various cases)
>
> So I'm not entirely buying your argument. IN fact I'm not buying your "mixed is not tested at all" argument; while the statement may be true, it's not different than it is from Linux today... which is mixed. Just differently mixed I suppose.
/If/ mixed use is truly well tested, then that eliminates one of my two
objections.
But I don't see that in the code now... I see we choose [get_virt() in
mmconfig_{32,6 4}.c} strictly on a per-bus basis, based on the allowed
bus list provided by the BIOS.
In other words, if the BIOS says "use this segment/bus range for
mmconfig", the code does that. There is no mixing of accesses on a
per-device basis AFAICS in the current code. (feel free to prove me
wrong! :)) It looks like per-bus/domain to me, which is a more
reasonable and expected arrangement than per-device mmconfig|typeN conf
accesses.
At boot time, we use type1 accesses until a "flag day", at which time an
entire bus is switched to mmconfig all at once. After that point there
is no mixing of accesses on that bus -- and nor were there mixed
accesses on that bus /before/ that point.
And -- please forgive me, I mean no offense here -- we have to make sure
whatever rule we come up with makes AMD and other non-Intel folks happy.
Like with PCI MSI, mmconfig (+ its Linux support) has a history of
being written first for Intel, working great on select Intel platforms,
and not working so great initially for other vendors even when said
technology is supposedly compatible. So having AMD sign-off on such an
approach would greatly increase comfort level.
The second objection was regarding the inconsistencies introduced to
userland (and the kernel?) whereby the existing states:
* 256b config space
* on per-bus basis, ext cfg space is available
if device provides && mmconfig active
were joined by the additional state
* on a per-bus basis, ext cfg space is available
if device provides && mmconfig active
which has the potential to confuse the codebase that makes assumptions
based upon whole system (mmconfig or not) and per-bus (ext cfg space
capable or not) basis.
So, if my two worries here are needless, then those objections are
eliminated.
Finally, I wonder if anything can be done about the diagnostic angle:
it is not __only__ the driver that may want extended config space access.
It is quite reasonable and logical for an admin or developer to want to
_view_ the entire extended cfg space (with lspci -vvvxxx) of a device,
even if the device has not called pci_enable_ext_cfg_space(); indeed,
even if a driver is not loaded at all.
How to address that need?
Jeff
Ivan Kokshaysky wrote:
> On Sun, Dec 23, 2007 at 12:44:30AM -0500, Jeff Garzik wrote:
>> Failures are more predictable and more consistent with an all-or-none
>> scenario. The all-or-none solutions are the least complex on the software
>> side, and far more widely tested than any mixed config access scheme.
>
> Nope. The vast majority of mmconfig problems that happen at boot time
> actually have nothing to do with "broken" or "working" mmconfig.
> Typically these are
> - mmconf area overlapped during BAR sizing;
> - BIOS (or kernel) placed some MMIO in the middle of mmconfig area,
> which looks like some random device "doesn't like mmconfig".
>
> This is a result of all-or-none approach, as mmconfig is fundamentally
> unsafe until after PCI init is done.
The phrase "all or none" specifically describes the current practice in
arch/x86/pci/mmconfig_{32,64}.c whereby a PCI bus always has one, and
only one, access method.
So the problems you describe are unrelated to "all or none" as I was
describing it.
> The mixed access that Loic proposes allows to avoid these boot problems
> just for free.
False. If you have overlapping areas, then clearly it is
board-dependent (undefined) what happens -- you might trigger an
mmconfig access by writel() to your PCI device's MMIO BAR space.
Consider, too, what the current code in arch/x86/pci/mmconfig_{32,64}.c
does: it uses the range BIOS told to use for mmconfig, no more no less.
Clearly many early mmconfig BIOSes have buggy resource allocation...
Thus if mmconfig is not known working on a system, turn it off 100% for
all buses. End of story.
> Systems that have both non-mmconf PCI(X) and mmconf PCI-E
> domains could be handled almost for free as well.
The existing code in arch/x86/pci/mmconfig_{32,64}.c today handles
mixing of PCI-E and PCI-X just fine. As noted above, its done on a
per-bus and per-segment basis today.
> And probably most important thing is that the x86 pci_conf implementation
> would be so much simpler and cleaner that I honestly don't understand
> why are we still discussing it ;-)
The proposed API adds code, so I don't see how it simplifies things.
The current approach is quite clean anyway:
1) "raw_pci_ops" points to a single set of PCI config accessors.
2) For mmconfig, if the BIOS did not tell us to use mmconfig with a
given PCI bus, fall back to type1 accesses.
> At the same time, making drivers to request extended config space
> still makes sense. I think of pci_request_ext_conf(dev, drv_name) which
> doesn't set any per-device flag, but
> - returns success or failure depending on mmconf availability;
> - can be logged by kernel to make it easier to debug if something
> goes wrong.
I agree with the function as noted in response to Linus's "Sound ok with
this plan?" email. But note -- users and developers also need to access
extended config space, even if driver did not request it. Even if there
is no driver at all.
Otherwise the value of "lspci -vvvxxx" debugging reports from users is
diminished.
Jeff
Linus Torvalds wrote:
> (For example: I have three machines that I know have working MMCONF. On
> only one of theose does Linux actually even enable MMCONF accesses,
> because on the two other ones the BIOSes do the crazy "put it in some
> space that is reserved by PnP crap later", so we actually refuse to touch
> it. So at least in my limited environment, we hardly get any MMCONFIG test
> coverage, exactly because we have to be so totally anal about not enabling
> it early, because we cannot guarantee that it's not clashing with anything
> else).
Definitely. So, two questions:
What's the preferred way to deal with the desire to view extended config
space with "lspci -vvvxxx"?
Right now, we enable mmconfig for the bus/domain range requested by the
BIOS [heh, so by implication many early BIOSen stomp themselves].
Is there a path for hw vendors, after passing 1,001 anal checks, to
maintain the current behavior as it exists today in
arch/x86/pci/mmconfig_{32,64}.c?
Jeff
On Mon, 24 Dec 2007 02:04:41 -0500
Jeff Garzik <[email protected]> wrote:
> Arjan van de Ven wrote:
> >> 3) mmconfig might or might not be enabled, depending on which
> >> driver is loaded, whether it called an API or not.
> >>
> >> Even LESS testing by hw vendors than #2. Maybe even
> >> "never"
> >>
> >> Inconsistent (config access depends on device)
> >
> > the "depends on device" is even true for Linux today. For us today,
> > MMCONFIG isn't always used, it's used on a per device basis
> > already; except that the per-device is both defined by the bios and
> > our quirks.... (the mmconfig code already falls back to conf1
> > cycles in various cases)
> >
> > So I'm not entirely buying your argument. IN fact I'm not buying
> > your "mixed is not tested at all" argument; while the statement may
> > be true, it's not different than it is from Linux today... which is
> > mixed. Just differently mixed I suppose.
>
> /If/ mixed use is truly well tested, then that eliminates one of my
> two objections.
>
> But I don't see that in the code now... I see we choose [get_virt()
> in mmconfig_{32,6 4}.c} strictly on a per-bus basis, based on the
> allowed bus list provided by the BIOS.
and this is PCI express only; afaik (but I could be wrong) it's pretty much a bus per device there in a 1:1 way
> And -- please forgive me, I mean no offense here -- we have to make
> sure whatever rule we come up with makes AMD and other non-Intel
> folks happy. Like with PCI MSI, mmconfig (+ its Linux support) has a
> history of being written first for Intel, working great on select
> Intel platforms, and not working so great initially for other vendors
> even when said technology is supposedly compatible. So having AMD
> sign-off on such an approach would greatly increase comfort level.
they're very welcome to chime in.
At this point the majority of the mmconfig nightmares are for non-Intel (not that Intel
gets it right, but the current diagnostics work there pretty ok), so these patches are
aimed for non-Intel boxes in the first place.
> The second objection was regarding the inconsistencies introduced to
> userland (and the kernel?) whereby the existing states:
>
> * 256b config space
> * on per-bus basis, ext cfg space is available
> if device provides && mmconfig active
>
> were joined by the additional state
>
> * on a per-bus basis, ext cfg space is available
> if device provides && mmconfig active
>
> which has the potential to confuse the codebase that makes
> assumptions based upon whole system (mmconfig or not) and per-bus
> (ext cfg space capable or not) basis.
right now they very very very rarely see the extended space in the first place.
(since on a LOT of the machines it just is turned off due to our very very strict
checks, which are MUCH more strict than the standards allow for). In addition, it's not even unlikely
that there will be per-device quirks where we have to disable mmconfig for a specific device
(see recent mails about that), at which point this is game over anyway; unless you want that kind of
thing to just disable mmconfig for the entire bus, which would be WAY overkill.
I can see the point of having a sysfs attribute to enable MMCONF from userspace, so
that userland diagnostics tools can turn it on if they really really want to.
(I'd make that printk a nice warning "application XYZ is enabling extended config space for devize ABC" so
that if the box then crashes and burns, people know who/why and where to direct their emails ;-)
We did something similar for "enable", it's maybe 10 lines of code or so.
I would assume lspci and friends would then only turn that on at explicit admin request
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
Arjan van de Ven wrote:
> I can see the point of having a sysfs attribute to enable MMCONF from userspace, so
> that userland diagnostics tools can turn it on if they really really want to.
> (I'd make that printk a nice warning "application XYZ is enabling extended config space for devize ABC" so
> that if the box then crashes and burns, people know who/why and where to direct their emails ;-)
>
> We did something similar for "enable", it's maybe 10 lines of code or so.
>
> I would assume lspci and friends would then only turn that on at explicit admin request
Absolutely... I'm not asking to default it on, just asking for it to be
possible :)
Jeff
On Mon, 24 Dec 2007 06:54:30 -0500
Jeff Garzik <[email protected]> wrote:
> Arjan van de Ven wrote:
> > I can see the point of having a sysfs attribute to enable MMCONF
> > from userspace, so that userland diagnostics tools can turn it on
> > if they really really want to. (I'd make that printk a nice warning
> > "application XYZ is enabling extended config space for devize ABC"
> > so that if the box then crashes and burns, people know who/why and
> > where to direct their emails ;-)
> >
> > We did something similar for "enable", it's maybe 10 lines of code
> > or so.
> >
> > I would assume lspci and friends would then only turn that on at
> > explicit admin request
>
> Absolutely... I'm not asking to default it on, just asking for it to
> be possible :)
>
ok so to summarize things a bit (I'll admit bias here but still ;)
1) having a per driver function to say "I'd like extended config space" is ok
(it's the driver that knows what is needed after all)
2) we need a way for userspace to do the same for a given device
(which then will print a nice warning who does what to whom)
3) we need to have the "no extended config space unless someone wants it" behavior
4) It's inevitable that this will end up being per device given that we'll end up with
per device "this one is b0rked" quirks over time (even shortly)
5) architectures that have sane extended config space access should just be able to provide
it always. This could even be on x86 based on BIOS date (say 2009 :)
the patch I posted does 1) 3) 4) and the first half of 5)
I'll update the patch to do 2) and the rest of 5)
Is there anything I skipped in the summary above?
(and yes I realize this needs lspci to be expanded some to set the flag if the admin really asks for it,
but such is life)
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Mon, Dec 24, 2007 at 02:22:10AM -0500, Jeff Garzik wrote:
> The phrase "all or none" specifically describes the current practice in
> arch/x86/pci/mmconfig_{32,64}.c whereby a PCI bus always has one, and only
> one, access method.
>
> So the problems you describe are unrelated to "all or none" as I was
> describing it.
Ok. So a correct description would be "one or another but not both".
>> The mixed access that Loic proposes allows to avoid these boot problems
>> just for free.
>
> False. If you have overlapping areas, then clearly it is board-dependent
> (undefined) what happens -- you might trigger an mmconfig access by
> writel() to your PCI device's MMIO BAR space.
You're not allowed to access MMIO defined by BARs before the PCI setup
process is finished because a) decode of the BAR might be disabled for
a number of reasons and b) the BAR may contain any random value.
So you never know *what* you're accessing until after
pci_assign_unassigned_resources() and without pci_enable_device().
> Consider, too, what the current code in arch/x86/pci/mmconfig_{32,64}.c
> does: it uses the range BIOS told to use for mmconfig, no more no less.
Incorrect. It does reads mcfg info directly from hardware for two Intel
bridges (should be much more - mmconfig BAR is well documented for all
Intel (and AMD, I guess) chipsets.
> Clearly many early mmconfig BIOSes have buggy resource allocation... Thus
> if mmconfig is not known working on a system, turn it off 100% for all
> buses. End of story.
Direct hardware support for more chipsets would certainly help here.
>> Systems that have both non-mmconf PCI(X) and mmconf PCI-E
>> domains could be handled almost for free as well.
>
> The existing code in arch/x86/pci/mmconfig_{32,64}.c today handles mixing
> of PCI-E and PCI-X just fine. As noted above, its done on a per-bus and
> per-segment basis today.
With mixed access unreachable_devices() and related stuff could have gone.
> The proposed API adds code, so I don't see how it simplifies things.
See above.
> The current approach is quite clean anyway:
>
> 1) "raw_pci_ops" points to a single set of PCI config accessors.
> 2) For mmconfig, if the BIOS did not tell us to use mmconfig with a given
> PCI bus, fall back to type1 accesses.
But it doesn't work on systems where [basically working] mmconfig causes
boot problems because it's enabled too early.
> I agree with the function as noted in response to Linus's "Sound ok with
> this plan?" email. But note -- users and developers also need to access
> extended config space, even if driver did not request it. Even if there is
> no driver at all.
There *is* a driver - pci-sysfs.c.
> Otherwise the value of "lspci -vvvxxx" debugging reports from users is
> diminished.
Not at all. I don't see any reason to change the userspace API -
just make pci-sysfs.c:pci_{read,write}_config to request extended space
if (off + count > 256).
Ivan.
On Mon, 24 Dec 2007, Jeff Garzik wrote:
>
> Definitely. So, two questions:
>
> What's the preferred way to deal with the desire to view extended config space
> with "lspci -vvvxxx"?
Well, there's two issues right now with MMCONFIG
- we've hit various bugs in it. The bugs are admittedly very rare, but
they are really painful when they happen.
This is the more "fundamental" of the problems, and this is the one
that means that on some machines, the answer to the above question will
simply *always* be that we simply will never *ever* show the extended
config space - because even though it might work, we are going to
decide that it's simply too dangerous.
(Hypothetical example: we might, for example, end up saying that we
will simply never enable mmconfig at all unless the BIOS DMI date says
that the motherboard was built in 2008 or later)
- the (currently more common) problem that our initial probing is totally
screwed up with mmconfig.
This is the thing that causes *most* of our current problems, and the
fact is, we absolutely cannot do the initial kernel PCI probing using
mmconfig accesses. Not only do we not have enough information about the
resources yet at that stage to decide sanely whether mmconfig can
really work, but it is my sincere hope that some day the mmconfig MMIO
region itself will be defined by some standard BAR etc, so trying to
probe the BARs using mmconfig would be a chicken-and-egg problem.
There's also the issue that we want to often *validate* the mmconfig
address using config space accesses, and right now we have some really
ugly code that actually uses "pci_conf1_read()" _explicitly_ to avoid
using mmconfig for this (see arch/x86/pci/mmconfig-shared.c).
The *second* problem is entirely a kernel internal issue. It's the one
that causes us the biggest issues right now, but it's also the one that
will not impact user space at all once if is fixed. So once we do the
*early* probing using anything but mmconfig accesses, we can then much
more easily enable mmconfig later, and by the time the user does anything
like "lspci -vvvxxxx", we could do those mmconfig accesses.
I also suspect that we *may* want to use a separate file for the extended
config. Right now, things like lspci read the config space by accessing
a file like
/sys/bus/pci/devices/0000:00:00.0/config
and I'm not at all sure we want to extend that one past the first 256
bytes of config space. Why? Because I don't want old programs that may not
know how dangerous the rest of the space is to read extended config space
by mistake when they don't know how to parse it anyway.
So I would *suggest* (but this may be overly cautious) that we at least
consider forcing people who actually want to read extended config space to
have to use a separate file for it ("/sys/.../extended-config"), because
that would then also be a sign to the kernel that "ok, the user really
asked for us to use mmconfig cycles here".
> Is there a path for hw vendors, after passing 1,001 anal checks, to maintain
> the current behavior as it exists today in arch/x86/pci/mmconfig_{32,64}.c?
Well, the *current* behaviour as far as setup is concerned is
unacceptable. But yes, longer term, we should be able to just have quirk
entries for saying "enable mmconfig because I know it's safe", except we
should not enable them until after the core PCI probing has completed.
Linus
On Mon, Dec 24, 2007 at 10:51:22AM -0800, Linus Torvalds wrote:
> The *second* problem is entirely a kernel internal issue. It's the one
> that causes us the biggest issues right now, but it's also the one that
> will not impact user space at all once if is fixed. So once we do the
> *early* probing using anything but mmconfig accesses, we can then much
> more easily enable mmconfig later, and by the time the user does anything
> like "lspci -vvvxxxx", we could do those mmconfig accesses.
I had a nice idea to fix this ... will post a patch to do that later.
> I also suspect that we *may* want to use a separate file for the extended
> config. Right now, things like lspci read the config space by accessing
> a file like
>
> /sys/bus/pci/devices/0000:00:00.0/config
>
> and I'm not at all sure we want to extend that one past the first 256
> bytes of config space. Why? Because I don't want old programs that may not
> know how dangerous the rest of the space is to read extended config space
> by mistake when they don't know how to parse it anyway.
Unless we're talking about crazy, crazy programs that blindly open and
read every file in sysfs as root (yes, they exist, and they already cause
problems simply by reading past the first 64 bytes of config space which
causes problems for, eg, sym53c875 cards), non-root accesses are already
restricted to the first 64 bytes, so it's no more of a problem than it
currently is.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
Hello!
> (and yes I realize this needs lspci to be expanded some to set the
> flag if the admin really asks for it, but such is life)
Reading past the first 256 bytes of the sysfs file should be enough
-- only root can do that (other users get only 64 bytes anyway) and
problems with reading random registers have hopefully taught programs
not to read blindly a long time ago.
Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Never make any mistaeks.
> Reading past the first 256 bytes of the sysfs file should be enough
> -- only root can do that (other users get only 64 bytes anyway) and
> problems with reading random registers have hopefully taught programs
> not to read blindly a long time ago.
... except for lspci itself which reads the first few bytes of the
extended space to see if there are any capabilities, which is generally
safe, but it would always enable MMCONFIG.
So an extra switch will be really needed.
Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
The best way to accelerate Windows is at 9.8 m / sec^2
Linus Torvalds wrote:
> Well, the *current* behaviour as far as setup is concerned is
> unacceptable. But yes, longer term, we should be able to just have quirk
> entries for saying "enable mmconfig because I know it's safe", except we
> should not enable them until after the core PCI probing has completed.
IMO that should be an arch decision, buried somewhere in arch/x86.
If other arches implement extended config space sanely -- and possibly
via some arch-specific means that is /not/ mmconfig -- then they should
be able to make an arch decision that extended PCI config space accesses
Just Work(tm).
For such arches, pci_enable_ext_cfg_space(pdev) would be a no-op, always
returning success.
Jeff
On Thu, 27 Dec 2007 06:46:22 -0500
Jeff Garzik <[email protected]> wrote:
> Linus Torvalds wrote:
> > Well, the *current* behaviour as far as setup is concerned is
> > unacceptable. But yes, longer term, we should be able to just have
> > quirk entries for saying "enable mmconfig because I know it's
> > safe", except we should not enable them until after the core PCI
> > probing has completed.
>
>
> IMO that should be an arch decision, buried somewhere in arch/x86.
>
> If other arches implement extended config space sanely -- and
> possibly via some arch-specific means that is /not/ mmconfig -- then
> they should be able to make an arch decision that extended PCI config
> space accesses Just Work(tm).
>
> For such arches, pci_enable_ext_cfg_space(pdev) would be a no-op,
> always returning success.
>
both the first and the second patch already have this behavior.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org