2007-10-25 23:18:50

by Robert Hancock

[permalink] [raw]
Subject: Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree

Where did this patch go? I didn't get notified that anyone dropped it,
but I don't see it in current -git..

[email protected] wrote:
> The patch titled
> MMCONFIG: validate against ACPI motherboard resources
> has been removed from the -mm tree. Its filename was
> mmconfig-validate-against-acpi-motherboard-resources.patch
>
> This patch was dropped because it was merged into mainline or a subsystem tree
>
> ------------------------------------------------------
> Subject: MMCONFIG: validate against ACPI motherboard resources
> From: Robert Hancock <[email protected]>
>
> This path adds validation of the MMCONFIG table against the ACPI reserved
> motherboard resources. If the MMCONFIG table is found to be reserved in
> ACPI, we don't bother checking the E820 table. The PCI Express firmware
> spec apparently tells BIOS developers that reservation in ACPI is required
> and E820 reservation is optional, so checking against ACPI first makes
> sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though
> it is perfectly functional, the existing check needlessly disables MMCONFIG
> in these cases.
>
> In order to do this, MMCONFIG setup has been split into two phases. If PCI
> configuration type 1 is not available then MMCONFIG is enabled early as
> before. Otherwise, it is enabled later after the ACPI interpreter is
> enabled, since we need to be able to execute control methods in order to
> check the ACPI reserved resources. Presently this is just triggered off
> the end of ACPI interpreter initialization.
>
> There are a few other behavioral changes here:
>
> - Validate all MMCONFIG configurations provided, not just the first one.
>
> - Validate the entire required length of each configuration according to
> the provided ending bus number is reserved, not just the minimum required
> allocation.
>
> - Validate that the area is reserved even if we read it from the chipset
> directly and not from the MCFG table. This catches the case where the
> BIOS didn't set the location properly in the chipset and has mapped it
> over other things it shouldn't have.
>
> This also cleans up the MMCONFIG initialization functions so that they
> simply do nothing if MMCONFIG is not compiled in.
>
> Based on an original patch by Rajesh Shah from Intel.
>
> [[email protected]: many fixes and cleanups]
> Signed-off-by: Robert Hancock <[email protected]>
> Cc: Rajesh Shah <[email protected]>
> Cc: Jesse Barnes <[email protected]>
> Acked-by: Linus Torvalds <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Greg KH <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> arch/i386/pci/init.c | 4
> arch/i386/pci/mmconfig-shared.c | 151 ++++++++++++++++++++++++++----
> arch/i386/pci/pci.h | 1
> drivers/acpi/bus.c | 2
> include/linux/pci.h | 8 +
> 5 files changed, 144 insertions(+), 22 deletions(-)
>
> diff -puN arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-resources arch/i386/pci/init.c
> --- a/arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-resources
> +++ a/arch/i386/pci/init.c
> @@ -11,9 +11,7 @@ static __init int pci_access_init(void)
> #ifdef CONFIG_PCI_DIRECT
> type = pci_direct_probe();
> #endif
> -#ifdef CONFIG_PCI_MMCONFIG
> - pci_mmcfg_init(type);
> -#endif
> + pci_mmcfg_early_init(type);
> if (raw_pci_ops)
> return 0;
> #ifdef CONFIG_PCI_BIOS
> diff -puN arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-motherboard-resources arch/i386/pci/mmconfig-shared.c
> --- a/arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-motherboard-resources
> +++ a/arch/i386/pci/mmconfig-shared.c
> @@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso
> pci_mmcfg_resources_inserted = 1;
> }
>
> -static void __init pci_mmcfg_reject_broken(int type)
> +static acpi_status __init check_mcfg_resource(struct acpi_resource *res,
> + void *data)
> +{
> + struct resource *mcfg_res = data;
> + struct acpi_resource_address64 address;
> + acpi_status status;
> +
> + if (res->type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) {
> + struct acpi_resource_fixed_memory32 *fixmem32 =
> + &res->data.fixed_memory32;
> + if (!fixmem32)
> + return AE_OK;
> + if ((mcfg_res->start >= fixmem32->address) &&
> + (mcfg_res->end < (fixmem32->address +
> + fixmem32->address_length))) {
> + mcfg_res->flags = 1;
> + return AE_CTRL_TERMINATE;
> + }
> + }
> + if ((res->type != ACPI_RESOURCE_TYPE_ADDRESS32) &&
> + (res->type != ACPI_RESOURCE_TYPE_ADDRESS64))
> + return AE_OK;
> +
> + status = acpi_resource_to_address64(res, &address);
> + if (ACPI_FAILURE(status) ||
> + (address.address_length <= 0) ||
> + (address.resource_type != ACPI_MEMORY_RANGE))
> + return AE_OK;
> +
> + if ((mcfg_res->start >= address.minimum) &&
> + (mcfg_res->end < (address.minimum + address.address_length))) {
> + mcfg_res->flags = 1;
> + return AE_CTRL_TERMINATE;
> + }
> + return AE_OK;
> +}
> +
> +static acpi_status __init find_mboard_resource(acpi_handle handle, u32 lvl,
> + void *context, void **rv)
> +{
> + struct resource *mcfg_res = context;
> +
> + acpi_walk_resources(handle, METHOD_NAME__CRS,
> + check_mcfg_resource, context);
> +
> + if (mcfg_res->flags)
> + return AE_CTRL_TERMINATE;
> +
> + return AE_OK;
> +}
> +
> +static int __init is_acpi_reserved(unsigned long start, unsigned long end)
> +{
> + struct resource mcfg_res;
> +
> + mcfg_res.start = start;
> + mcfg_res.end = end;
> + mcfg_res.flags = 0;
> +
> + acpi_get_devices("PNP0C01", find_mboard_resource, &mcfg_res, NULL);
> +
> + if (!mcfg_res.flags)
> + acpi_get_devices("PNP0C02", find_mboard_resource, &mcfg_res,
> + NULL);
> +
> + return mcfg_res.flags;
> +}
> +
> +static void __init pci_mmcfg_reject_broken(void)
> {
> typeof(pci_mmcfg_config[0]) *cfg;
> + int i;
>
> if ((pci_mmcfg_config_num == 0) ||
> (pci_mmcfg_config == NULL) ||
> @@ -229,17 +298,37 @@ static void __init pci_mmcfg_reject_brok
> goto reject;
> }
>
> - /*
> - * Only do this check when type 1 works. If it doesn't work
> - * assume we run on a Mac and always use MCFG
> - */
> - if (type == 1 && !e820_all_mapped(cfg->address,
> - cfg->address + MMCONFIG_APER_MIN,
> - E820_RESERVED)) {
> - printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %Lx is not"
> - " E820-reserved\n", cfg->address);
> - goto reject;
> + for (i = 0; i < pci_mmcfg_config_num; i++) {
> + u32 size = (cfg->end_bus_number + 1) << 20;
> + cfg = &pci_mmcfg_config[i];
> + printk(KERN_NOTICE "PCI: MCFG configuration %d: base %lu "
> + "segment %hu buses %u - %u\n",
> + i, (unsigned long)cfg->address, cfg->pci_segment,
> + (unsigned int)cfg->start_bus_number,
> + (unsigned int)cfg->end_bus_number);
> + if (is_acpi_reserved(cfg->address, cfg->address + size - 1)) {
> + printk(KERN_NOTICE "PCI: MCFG area at %Lx reserved "
> + "in ACPI motherboard resources\n",
> + cfg->address);
> + } else {
> + printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %Lx is not"
> + " reserved in ACPI motherboard resources\n",
> + cfg->address);
> + /* Don't try to do this check unless configuration
> + type 1 is available. */
> + if ((pci_probe & PCI_PROBE_CONF1) &&
> + e820_all_mapped(cfg->address,
> + cfg->address + size - 1,
> + E820_RESERVED))
> + printk(KERN_NOTICE
> + "PCI: MCFG area at %Lx reserved in "
> + "E820\n",
> + cfg->address);
> + else
> + goto reject;
> + }
> }
> +
> return;
>
> reject:
> @@ -249,20 +338,46 @@ reject:
> pci_mmcfg_config_num = 0;
> }
>
> -void __init pci_mmcfg_init(int type)
> +void __init pci_mmcfg_early_init(int type)
> +{
> + if ((pci_probe & PCI_PROBE_MMCONF) == 0)
> + return;
> +
> + /* If type 1 access is available, no need to enable MMCONFIG yet, we can
> + defer until later when the ACPI interpreter is available to better
> + validate things. */
> + if (type == 1)
> + return;
> +
> + acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg);
> +
> + if ((pci_mmcfg_config_num == 0) ||
> + (pci_mmcfg_config == NULL) ||
> + (pci_mmcfg_config[0].address == 0))
> + return;
> +
> + if (pci_mmcfg_arch_init())
> + pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
> +}
> +
> +void __init pci_mmcfg_late_init(void)
> {
> int known_bridge = 0;
>
> + /* MMCONFIG disabled */
> if ((pci_probe & PCI_PROBE_MMCONF) == 0)
> return;
>
> - if (type == 1 && pci_mmcfg_check_hostbridge())
> - known_bridge = 1;
> + /* MMCONFIG already enabled */
> + if (!(pci_probe & PCI_PROBE_MASK & ~PCI_PROBE_MMCONF))
> + return;
>
> - if (!known_bridge) {
> + if ((pci_probe & PCI_PROBE_CONF1) && pci_mmcfg_check_hostbridge())
> + known_bridge = 1;
> + else
> acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg);
> - pci_mmcfg_reject_broken(type);
> - }
> +
> + pci_mmcfg_reject_broken();
>
> if ((pci_mmcfg_config_num == 0) ||
> (pci_mmcfg_config == NULL) ||
> @@ -270,7 +385,7 @@ void __init pci_mmcfg_init(int type)
> return;
>
> if (pci_mmcfg_arch_init()) {
> - if (type == 1)
> + if (pci_probe & PCI_PROBE_CONF1)
> unreachable_devices();
> if (known_bridge)
> pci_mmcfg_insert_resources(IORESOURCE_BUSY);
> diff -puN arch/i386/pci/pci.h~mmconfig-validate-against-acpi-motherboard-resources arch/i386/pci/pci.h
> --- a/arch/i386/pci/pci.h~mmconfig-validate-against-acpi-motherboard-resources
> +++ a/arch/i386/pci/pci.h
> @@ -91,7 +91,6 @@ extern int pci_conf1_read(unsigned int s
> extern int pci_direct_probe(void);
> extern void pci_direct_init(int type);
> extern void pci_pcbios_init(void);
> -extern void pci_mmcfg_init(int type);
> extern void pcibios_sort(void);
>
> /* pci-mmconfig.c */
> diff -puN drivers/acpi/bus.c~mmconfig-validate-against-acpi-motherboard-resources drivers/acpi/bus.c
> --- a/drivers/acpi/bus.c~mmconfig-validate-against-acpi-motherboard-resources
> +++ a/drivers/acpi/bus.c
> @@ -35,6 +35,7 @@
> #ifdef CONFIG_X86
> #include <asm/mpspec.h>
> #endif
> +#include <linux/pci.h>
> #include <acpi/acpi_bus.h>
> #include <acpi/acpi_drivers.h>
>
> @@ -759,6 +760,7 @@ static int __init acpi_init(void)
> result = acpi_bus_init();
>
> if (!result) {
> + pci_mmcfg_late_init();
> #ifdef CONFIG_PM_LEGACY
> if (!PM_IS_ACTIVE())
> pm_active = 1;
> diff -puN include/linux/pci.h~mmconfig-validate-against-acpi-motherboard-resources include/linux/pci.h
> --- a/include/linux/pci.h~mmconfig-validate-against-acpi-motherboard-resources
> +++ a/include/linux/pci.h
> @@ -889,5 +889,13 @@ extern unsigned long pci_cardbus_mem_siz
>
> extern int pcibios_add_platform_entries(struct pci_dev *dev);
>
> +#ifdef CONFIG_PCI_MMCONFIG
> +extern void __init pci_mmcfg_early_init(int type);
> +extern void __init pci_mmcfg_late_init(void);
> +#else
> +static inline void pci_mmcfg_early_init(int type) { }
> +static inline void pci_mmcfg_late_init(void) { }
> +#endif
> +
> #endif /* __KERNEL__ */
> #endif /* LINUX_PCI_H */
> _
>
> Patches currently in -mm which might be from [email protected] are
>
> mmconfig-validate-against-acpi-motherboard-resources.patch
> sata_nv-allow-changing-queue-depth.patch
> libata-add-human-readable-error-value-decoding-v3.patch
> pci-disable-decode-of-io-memory-during-bar-sizing.patch
>
>


2007-10-25 23:24:34

by Jesse Barnes

[permalink] [raw]
Subject: Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree

I think Greg doesn't like it, even though we don't have an alternative
at this point...

Jesse

On Thursday, October 25, 2007 4:20 pm Robert Hancock wrote:
> Where did this patch go? I didn't get notified that anyone dropped
> it, but I don't see it in current -git..
>
> [email protected] wrote:
> > The patch titled
> > MMCONFIG: validate against ACPI motherboard resources
> > has been removed from the -mm tree. Its filename was
> > mmconfig-validate-against-acpi-motherboard-resources.patch
> >
> > This patch was dropped because it was merged into mainline or a
> > subsystem tree
> >
> > ------------------------------------------------------
> > Subject: MMCONFIG: validate against ACPI motherboard resources
> > From: Robert Hancock <[email protected]>
> >
> > This path adds validation of the MMCONFIG table against the ACPI
> > reserved motherboard resources. If the MMCONFIG table is found to
> > be reserved in ACPI, we don't bother checking the E820 table. The
> > PCI Express firmware spec apparently tells BIOS developers that
> > reservation in ACPI is required and E820 reservation is optional,
> > so checking against ACPI first makes sense. Many BIOSes don't
> > reserve the MMCONFIG region in E820 even though it is perfectly
> > functional, the existing check needlessly disables MMCONFIG in
> > these cases.
> >
> > In order to do this, MMCONFIG setup has been split into two phases.
> > If PCI configuration type 1 is not available then MMCONFIG is
> > enabled early as before. Otherwise, it is enabled later after the
> > ACPI interpreter is enabled, since we need to be able to execute
> > control methods in order to check the ACPI reserved resources.
> > Presently this is just triggered off the end of ACPI interpreter
> > initialization.
> >
> > There are a few other behavioral changes here:
> >
> > - Validate all MMCONFIG configurations provided, not just the first
> > one.
> >
> > - Validate the entire required length of each configuration
> > according to the provided ending bus number is reserved, not just
> > the minimum required allocation.
> >
> > - Validate that the area is reserved even if we read it from the
> > chipset directly and not from the MCFG table. This catches the
> > case where the BIOS didn't set the location properly in the chipset
> > and has mapped it over other things it shouldn't have.
> >
> > This also cleans up the MMCONFIG initialization functions so that
> > they simply do nothing if MMCONFIG is not compiled in.
> >
> > Based on an original patch by Rajesh Shah from Intel.
> >
> > [[email protected]: many fixes and cleanups]
> > Signed-off-by: Robert Hancock <[email protected]>
> > Cc: Rajesh Shah <[email protected]>
> > Cc: Jesse Barnes <[email protected]>
> > Acked-by: Linus Torvalds <[email protected]>
> > Cc: Andi Kleen <[email protected]>
> > Cc: Greg KH <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>
> > ---
> >
> > arch/i386/pci/init.c | 4
> > arch/i386/pci/mmconfig-shared.c | 151
> > ++++++++++++++++++++++++++---- arch/i386/pci/pci.h |
> > 1
> > drivers/acpi/bus.c | 2
> > include/linux/pci.h | 8 +
> > 5 files changed, 144 insertions(+), 22 deletions(-)
> >
> > diff -puN
> > arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-res
> >ources arch/i386/pci/init.c ---
> > a/arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-r
> >esources +++ a/arch/i386/pci/init.c
> > @@ -11,9 +11,7 @@ static __init int pci_access_init(void)
> > #ifdef CONFIG_PCI_DIRECT
> > type = pci_direct_probe();
> > #endif
> > -#ifdef CONFIG_PCI_MMCONFIG
> > - pci_mmcfg_init(type);
> > -#endif
> > + pci_mmcfg_early_init(type);
> > if (raw_pci_ops)
> > return 0;
> > #ifdef CONFIG_PCI_BIOS
> > diff -puN
> > arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-moth
> >erboard-resources arch/i386/pci/mmconfig-shared.c ---
> > a/arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-mo
> >therboard-resources +++ a/arch/i386/pci/mmconfig-shared.c
> > @@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso
> > pci_mmcfg_resources_inserted = 1;
> > }
> >
> > -static void __init pci_mmcfg_reject_broken(int type)
> > +static acpi_status __init check_mcfg_resource(struct acpi_resource
> > *res, + void *data)
> > +{
> > + struct resource *mcfg_res = data;
> > + struct acpi_resource_address64 address;
> > + acpi_status status;
> > +
> > + if (res->type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) {
> > + struct acpi_resource_fixed_memory32 *fixmem32 =
> > + &res->data.fixed_memory32;
> > + if (!fixmem32)
> > + return AE_OK;
> > + if ((mcfg_res->start >= fixmem32->address) &&
> > + (mcfg_res->end < (fixmem32->address +
> > + fixmem32->address_length))) {
> > + mcfg_res->flags = 1;
> > + return AE_CTRL_TERMINATE;
> > + }
> > + }
> > + if ((res->type != ACPI_RESOURCE_TYPE_ADDRESS32) &&
> > + (res->type != ACPI_RESOURCE_TYPE_ADDRESS64))
> > + return AE_OK;
> > +
> > + status = acpi_resource_to_address64(res, &address);
> > + if (ACPI_FAILURE(status) ||
> > + (address.address_length <= 0) ||
> > + (address.resource_type != ACPI_MEMORY_RANGE))
> > + return AE_OK;
> > +
> > + if ((mcfg_res->start >= address.minimum) &&
> > + (mcfg_res->end < (address.minimum + address.address_length)))
> > { + mcfg_res->flags = 1;
> > + return AE_CTRL_TERMINATE;
> > + }
> > + return AE_OK;
> > +}
> > +
> > +static acpi_status __init find_mboard_resource(acpi_handle handle,
> > u32 lvl, + void *context, void **rv)
> > +{
> > + struct resource *mcfg_res = context;
> > +
> > + acpi_walk_resources(handle, METHOD_NAME__CRS,
> > + check_mcfg_resource, context);
> > +
> > + if (mcfg_res->flags)
> > + return AE_CTRL_TERMINATE;
> > +
> > + return AE_OK;
> > +}
> > +
> > +static int __init is_acpi_reserved(unsigned long start, unsigned
> > long end) +{
> > + struct resource mcfg_res;
> > +
> > + mcfg_res.start = start;
> > + mcfg_res.end = end;
> > + mcfg_res.flags = 0;
> > +
> > + acpi_get_devices("PNP0C01", find_mboard_resource, &mcfg_res,
> > NULL); +
> > + if (!mcfg_res.flags)
> > + acpi_get_devices("PNP0C02", find_mboard_resource, &mcfg_res,
> > + NULL);
> > +
> > + return mcfg_res.flags;
> > +}
> > +
> > +static void __init pci_mmcfg_reject_broken(void)
> > {
> > typeof(pci_mmcfg_config[0]) *cfg;
> > + int i;
> >
> > if ((pci_mmcfg_config_num == 0) ||
> > (pci_mmcfg_config == NULL) ||
> > @@ -229,17 +298,37 @@ static void __init pci_mmcfg_reject_brok
> > goto reject;
> > }
> >
> > - /*
> > - * Only do this check when type 1 works. If it doesn't work
> > - * assume we run on a Mac and always use MCFG
> > - */
> > - if (type == 1 && !e820_all_mapped(cfg->address,
> > - cfg->address + MMCONFIG_APER_MIN,
> > - E820_RESERVED)) {
> > - printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %Lx is not"
> > - " E820-reserved\n", cfg->address);
> > - goto reject;
> > + for (i = 0; i < pci_mmcfg_config_num; i++) {
> > + u32 size = (cfg->end_bus_number + 1) << 20;
> > + cfg = &pci_mmcfg_config[i];
> > + printk(KERN_NOTICE "PCI: MCFG configuration %d: base %lu "
> > + "segment %hu buses %u - %u\n",
> > + i, (unsigned long)cfg->address, cfg->pci_segment,
> > + (unsigned int)cfg->start_bus_number,
> > + (unsigned int)cfg->end_bus_number);
> > + if (is_acpi_reserved(cfg->address, cfg->address + size - 1)) {
> > + printk(KERN_NOTICE "PCI: MCFG area at %Lx reserved "
> > + "in ACPI motherboard resources\n",
> > + cfg->address);
> > + } else {
> > + printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %Lx is not"
> > + " reserved in ACPI motherboard resources\n",
> > + cfg->address);
> > + /* Don't try to do this check unless configuration
> > + type 1 is available. */
> > + if ((pci_probe & PCI_PROBE_CONF1) &&
> > + e820_all_mapped(cfg->address,
> > + cfg->address + size - 1,
> > + E820_RESERVED))
> > + printk(KERN_NOTICE
> > + "PCI: MCFG area at %Lx reserved in "
> > + "E820\n",
> > + cfg->address);
> > + else
> > + goto reject;
> > + }
> > }
> > +
> > return;
> >
> > reject:
> > @@ -249,20 +338,46 @@ reject:
> > pci_mmcfg_config_num = 0;
> > }
> >
> > -void __init pci_mmcfg_init(int type)
> > +void __init pci_mmcfg_early_init(int type)
> > +{
> > + if ((pci_probe & PCI_PROBE_MMCONF) == 0)
> > + return;
> > +
> > + /* If type 1 access is available, no need to enable MMCONFIG yet,
> > we can + defer until later when the ACPI interpreter is
> > available to better + validate things. */
> > + if (type == 1)
> > + return;
> > +
> > + acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg);
> > +
> > + if ((pci_mmcfg_config_num == 0) ||
> > + (pci_mmcfg_config == NULL) ||
> > + (pci_mmcfg_config[0].address == 0))
> > + return;
> > +
> > + if (pci_mmcfg_arch_init())
> > + pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
> > +}
> > +
> > +void __init pci_mmcfg_late_init(void)
> > {
> > int known_bridge = 0;
> >
> > + /* MMCONFIG disabled */
> > if ((pci_probe & PCI_PROBE_MMCONF) == 0)
> > return;
> >
> > - if (type == 1 && pci_mmcfg_check_hostbridge())
> > - known_bridge = 1;
> > + /* MMCONFIG already enabled */
> > + if (!(pci_probe & PCI_PROBE_MASK & ~PCI_PROBE_MMCONF))
> > + return;
> >
> > - if (!known_bridge) {
> > + if ((pci_probe & PCI_PROBE_CONF1) &&
> > pci_mmcfg_check_hostbridge()) + known_bridge = 1;
> > + else
> > acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg);
> > - pci_mmcfg_reject_broken(type);
> > - }
> > +
> > + pci_mmcfg_reject_broken();
> >
> > if ((pci_mmcfg_config_num == 0) ||
> > (pci_mmcfg_config == NULL) ||
> > @@ -270,7 +385,7 @@ void __init pci_mmcfg_init(int type)
> > return;
> >
> > if (pci_mmcfg_arch_init()) {
> > - if (type == 1)
> > + if (pci_probe & PCI_PROBE_CONF1)
> > unreachable_devices();
> > if (known_bridge)
> > pci_mmcfg_insert_resources(IORESOURCE_BUSY);
> > diff -puN
> > arch/i386/pci/pci.h~mmconfig-validate-against-acpi-motherboard-reso
> >urces arch/i386/pci/pci.h ---
> > a/arch/i386/pci/pci.h~mmconfig-validate-against-acpi-motherboard-re
> >sources +++ a/arch/i386/pci/pci.h
> > @@ -91,7 +91,6 @@ extern int pci_conf1_read(unsigned int s
> > extern int pci_direct_probe(void);
> > extern void pci_direct_init(int type);
> > extern void pci_pcbios_init(void);
> > -extern void pci_mmcfg_init(int type);
> > extern void pcibios_sort(void);
> >
> > /* pci-mmconfig.c */
> > diff -puN
> > drivers/acpi/bus.c~mmconfig-validate-against-acpi-motherboard-resou
> >rces drivers/acpi/bus.c ---
> > a/drivers/acpi/bus.c~mmconfig-validate-against-acpi-motherboard-res
> >ources +++ a/drivers/acpi/bus.c
> > @@ -35,6 +35,7 @@
> > #ifdef CONFIG_X86
> > #include <asm/mpspec.h>
> > #endif
> > +#include <linux/pci.h>
> > #include <acpi/acpi_bus.h>
> > #include <acpi/acpi_drivers.h>
> >
> > @@ -759,6 +760,7 @@ static int __init acpi_init(void)
> > result = acpi_bus_init();
> >
> > if (!result) {
> > + pci_mmcfg_late_init();
> > #ifdef CONFIG_PM_LEGACY
> > if (!PM_IS_ACTIVE())
> > pm_active = 1;
> > diff -puN
> > include/linux/pci.h~mmconfig-validate-against-acpi-motherboard-reso
> >urces include/linux/pci.h ---
> > a/include/linux/pci.h~mmconfig-validate-against-acpi-motherboard-re
> >sources +++ a/include/linux/pci.h
> > @@ -889,5 +889,13 @@ extern unsigned long pci_cardbus_mem_siz
> >
> > extern int pcibios_add_platform_entries(struct pci_dev *dev);
> >
> > +#ifdef CONFIG_PCI_MMCONFIG
> > +extern void __init pci_mmcfg_early_init(int type);
> > +extern void __init pci_mmcfg_late_init(void);
> > +#else
> > +static inline void pci_mmcfg_early_init(int type) { }
> > +static inline void pci_mmcfg_late_init(void) { }
> > +#endif
> > +
> > #endif /* __KERNEL__ */
> > #endif /* LINUX_PCI_H */
> > _
> >
> > Patches currently in -mm which might be from [email protected] are
> >
> > mmconfig-validate-against-acpi-motherboard-resources.patch
> > sata_nv-allow-changing-queue-depth.patch
> > libata-add-human-readable-error-value-decoding-v3.patch
> > pci-disable-decode-of-io-memory-during-bar-sizing.patch


2007-10-26 03:29:45

by Greg KH

[permalink] [raw]
Subject: Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree

On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote:
> I think Greg doesn't like it, even though we don't have an alternative
> at this point...

Yes, I didn't like it, Ivan didn't like it, and I got reports that it
wasn't even needed at all once you upgraded your BIOS to the latest
version.

So, is this still needed? And if so, can you try to implement what Ivan
suggested to do here instead?

thanks,

greg k-h

2007-10-26 05:02:57

by Robert Hancock

[permalink] [raw]
Subject: Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree

Greg KH wrote:
> On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote:
>> I think Greg doesn't like it, even though we don't have an alternative
>> at this point...
>
> Yes, I didn't like it, Ivan didn't like it, and I got reports that it
> wasn't even needed at all once you upgraded your BIOS to the latest
> version.
>
> So, is this still needed? And if so, can you try to implement what Ivan
> suggested to do here instead?

Aren't you guys referring to
pci-disable-decode-of-io-memory-during-bar-sizing.patch? This is another
one entirely, though related.

2007-10-26 05:46:43

by Greg KH

[permalink] [raw]
Subject: Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree

On Thu, Oct 25, 2007 at 11:03:51PM -0600, Robert Hancock wrote:
> Greg KH wrote:
>> On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote:
>>> I think Greg doesn't like it, even though we don't have an alternative at
>>> this point...
>> Yes, I didn't like it, Ivan didn't like it, and I got reports that it
>> wasn't even needed at all once you upgraded your BIOS to the latest
>> version.
>> So, is this still needed? And if so, can you try to implement what Ivan
>> suggested to do here instead?
>
> Aren't you guys referring to
> pci-disable-decode-of-io-memory-during-bar-sizing.patch? This is another
> one entirely, though related.

I have no idea, there have been a lot of conflicting patches in this
area...

2007-10-26 17:00:46

by Jesse Barnes

[permalink] [raw]
Subject: Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree

On Thursday, October 25, 2007 10:27 pm Greg KH wrote:
> On Thu, Oct 25, 2007 at 11:03:51PM -0600, Robert Hancock wrote:
> > Greg KH wrote:
> >> On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote:
> >>> I think Greg doesn't like it, even though we don't have an
> >>> alternative at this point...
> >>
> >> Yes, I didn't like it, Ivan didn't like it, and I got reports that
> >> it wasn't even needed at all once you upgraded your BIOS to the
> >> latest version.
> >> So, is this still needed? And if so, can you try to implement
> >> what Ivan suggested to do here instead?
> >
> > Aren't you guys referring to
> > pci-disable-decode-of-io-memory-during-bar-sizing.patch? This is
> > another one entirely, though related.
>
> I have no idea, there have been a lot of conflicting patches in this
> area...

Oh sorry, my confusion.

I don't know what happened to the mmconfig vs. ACPI resources patch.

Jesse

2007-10-26 17:01:56

by Jesse Barnes

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

On Thursday, October 25, 2007 7:54 pm Greg KH wrote:
> On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote:
> > I think Greg doesn't like it, even though we don't have an
> > alternative at this point...
>
> Yes, I didn't like it, Ivan didn't like it, and I got reports that it
> wasn't even needed at all once you upgraded your BIOS to the latest
> version.
>
> So, is this still needed? And if so, can you try to implement what
> Ivan suggested to do here instead?

Yes, it's still needed. Auke rescinded his "BIOS upgrade makes it work"
message, so something like this is still necessary.

Jesse

2007-10-27 03:06:17

by Greg KH

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

On Fri, Oct 26, 2007 at 09:59:45AM -0700, Jesse Barnes wrote:
> On Thursday, October 25, 2007 7:54 pm Greg KH wrote:
> > On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote:
> > > I think Greg doesn't like it, even though we don't have an
> > > alternative at this point...
> >
> > Yes, I didn't like it, Ivan didn't like it, and I got reports that it
> > wasn't even needed at all once you upgraded your BIOS to the latest
> > version.
> >
> > So, is this still needed? And if so, can you try to implement what
> > Ivan suggested to do here instead?
>
> Yes, it's still needed. Auke rescinded his "BIOS upgrade makes it work"
> message, so something like this is still necessary.

He did? Ugh, I can't keep these all straight, sorry.

Can someone just send what they think is still needed, and explain why
Ivan will not object to it? :)

thanks,

greg k-h

2007-10-29 23:51:18

by Robert Hancock

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

Greg KH wrote:
> On Fri, Oct 26, 2007 at 09:59:45AM -0700, Jesse Barnes wrote:
>> On Thursday, October 25, 2007 7:54 pm Greg KH wrote:
>>> On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote:
>>>> I think Greg doesn't like it, even though we don't have an
>>>> alternative at this point...
>>> Yes, I didn't like it, Ivan didn't like it, and I got reports that it
>>> wasn't even needed at all once you upgraded your BIOS to the latest
>>> version.
>>>
>>> So, is this still needed? And if so, can you try to implement what
>>> Ivan suggested to do here instead?
>> Yes, it's still needed. Auke rescinded his "BIOS upgrade makes it work"
>> message, so something like this is still necessary.
>
> He did? Ugh, I can't keep these all straight, sorry.
>
> Can someone just send what they think is still needed, and explain why
> Ivan will not object to it? :)

Here's a recap of the whole issue just for people's information:

Right now we disable MMCONFIG on machines where the MCFG area is not
reserved in the E820 memory map since we figure it's not valid. This is
a broken heuristic because the PCI Express firmware spec doesn't require
that it be so reserved, it only needs to be reserved as an ACPI
motherboard resource, and so many times it's not reserved in E820
despite being completely valid and working. The
mmconfig-validate-against-acpi-motherboard-resources.patch changes this
to validate against the ACPI motherboard resources instead.

The second problem is that on some machines, when we are doing BAR
sizing on PCI devices, and write all ones to a BAR in order to determine
how many bits "stick", the BAR ends up overlapping with the MCFG area.
On some chipsets, this causes writes to the MCFG area (like, to restore
the original BAR contents) to get decoded by the device instead of by
the MCFG mechanism, which means the BAR stays disabled and configuration
access stops working, wreaking havoc. Usually on these machines the
MMCONFIG is located near the top of 32-bit memory and the PCI device
causing problems is a PCI Express graphics card.
pci-disable-decode-of-io-memory-during-bar-sizing.patch, and its
successors, switch off the device's decoding during sizing so that it
won't absorb the accesses to the MCFG table.

The concern raised was that this might affect some devices negatively.
We do avoid disabling decode on host bridges, as it's known that some of
them disable RAM access when you turn decode off, stupidly. I've yet to
hear of any other conclusive case where disabling the decode is harmful.
In general, if disabling the decode causes issues, the mere fact of
doing the BAR sizing could cause the same issues, and that is unavoidable.

The other possible workaround would be to avoid using MMCONFIG until the
BAR sizing is done. However, this seems like a poor solution. First of
all, in the future there may come machines where MMCONFIG is the only
config mechanism (or, perhaps more likely, it becomes the only tested
one, so the old methods get broken). Secondly, what happens with
hot-plug devices that need to be sized after MMCONFIG gets turned on?

The only way these two patches are related is that the E820 check
happens to wrongly disable MMCONFIG on some of the machines where the
memory areas could overlap during sizing, so removing that check alone
without fixing the overlap issue could cause breakage on some machines.
However, this is purely by chance, and it doesn't prevent the breakage
on many other machines - as well as the one mentioned in the earlier
thread, there's this one:

https://bugzilla.redhat.com/show_bug.cgi?id=251493

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-10-30 15:16:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch



On Mon, 29 Oct 2007, Robert Hancock wrote:
>
> The other possible workaround would be to avoid using MMCONFIG until the BAR
> sizing is done.

The only sane solution is the one that people constantly seem to ignore:

- only use MMCONFIG if absolutely required by the access itself

In other words, make the MMCONFIG code fall back on old-style accesses for
any register access to a word with reg+size <= 256.

At that point, almost all the issues with mmconfig just go away, BECAUSE
NOBODY USES IT, so it doesn't matter if it's broken?

The fact is, CONF1 style accesses are just safer, and *work*.

Can we please just do that?

Linus

2007-10-30 16:52:17

by Arjan van de Ven

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

On Tue, 30 Oct 2007 08:15:46 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

>
>
> On Mon, 29 Oct 2007, Robert Hancock wrote:
> >
> > The other possible workaround would be to avoid using MMCONFIG
> > until the BAR sizing is done.
>
> The only sane solution is the one that people constantly seem to
> ignore:
>
> - only use MMCONFIG if absolutely required by the access itself
>
> In other words, make the MMCONFIG code fall back on old-style
> accesses for any register access to a word with reg+size <= 256.

the problem is... you're not supposed to mix both types of accesses.

> At that point, almost all the issues with mmconfig just go away,
> BECAUSE NOBODY USES IT, so it doesn't matter if it's broken?
>
> The fact is, CONF1 style accesses are just safer, and *work*.

I would suggest a slight twist then: use CONF1 *until* you're using
something above 256, and then and only then switch to MMCONFIG from
then on for all accesses.

That should solve the spec issue (yes I know about specs, but this is
one of those things that will bite us in the future if we're not
careful)

2007-10-30 17:08:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch



On Tue, 30 Oct 2007, Arjan van de Ven wrote:
>
> the problem is... you're not supposed to mix both types of accesses.

You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for
startup.

Also, there's reason to believe that mixing things up _has_ to work
anyway, and if the issue is between "works in practice" and "theory says
that you shouldn't mix", I'll take practice every time.

Especially since we *know* that the theory is broken. Right now MMCONFIG
is effectively disabled very aggressively because it's simply unusably
flaky. So the choice is between:

- don't use MMCONFIG at all, because it has so many problems
- use MMCONFIG sparingly enough to hide the problems

and what "you're supposed to do" is simply trumped by Real Life(tm).
Because Intel screwed up so badly when they designed that piece of shit.

(Where "screwed up badly" is the usual "left it to firmware people" thing,
of course. Dammit, Intel *could* have just made it a real PCI BAR in the
Northbridge, and specified it as such, and we wouldn't have these
problems! But no, it had to be another idiotic "firmware tells where it
is" thing)

> > The fact is, CONF1 style accesses are just safer, and *work*.
>
> I would suggest a slight twist then: use CONF1 *until* you're using
> something above 256, and then and only then switch to MMCONFIG from
> then on for all accesses.

No.

Maybe if you do it per-device, and only *after* probing (ie we have seen
multiple, and successful, accesses), but globally, absolutely not. That
would be useless. The bugs we have had in this area have been exactly the
kinds of things like "we don't know the real size of the MMCONFIG areas"
etc.

I could easily see device driver writers probing to see if something
works, and I absolutely don't think we should just automatically enable
MMCONFIG from then on.

But maybe we could have a per-device flag that a driver *can* set. Ie have
the logic be:

- use MMCONFIG if we have to (reg >= 256)

OR

- use MMCONFIG if the driver specifically asked us to

and then drivers that absolutely need it, and know they do, can set that
flag. Preferably after they actually verified that it works.

That way you _can_ get the "this is how you're supposed to do it"
behaviour, but you get it when there is a reasonable chance that it
actually works.

And quite frankly, if you're not supposed to mix these things even across
devices, then I think we are better off just doing what we effectively do
now: mostly ignore the damn thing because it's too broken to use.

Maybe somebody inside Intel could just clarify the documentation, and
change it from "you're not supposed to mix" to "mix all you want".

Linus

2007-10-30 17:32:44

by Arjan van de Ven

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

On Tue, 30 Oct 2007 10:07:48 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:
>
> > > The fact is, CONF1 style accesses are just safer, and *work*.
> >
> > I would suggest a slight twist then: use CONF1 *until* you're using
> > something above 256, and then and only then switch to MMCONFIG from
> > then on for all accesses.
>
> No.
>
> Maybe if you do it per-device, and only *after* probing (ie we have
> seen multiple, and successful, accesses), but globally, absolutely
> not. That would be useless. The bugs we have had in this area have
> been exactly the kinds of things like "we don't know the real size of
> the MMCONFIG areas" etc.

sorry I wasn't very clear, I meant "per device".

>
> I could easily see device driver writers probing to see if something
> works, and I absolutely don't think we should just automatically
> enable MMCONFIG from then on.
>
> But maybe we could have a per-device flag that a driver *can* set. Ie
> have the logic be:
>
> - use MMCONFIG if we have to (reg >= 256)
>
> OR
>
> - use MMCONFIG if the driver specifically asked us to

something like
int pci_enable_mmconfig(struct pci_dev *pdev) ?
sounds like a very solid plan to me...


> Maybe somebody inside Intel could just clarify the documentation, and
> change it from "you're not supposed to mix" to "mix all you want".

I'll see what I can do ;)


--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2007-10-30 17:44:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch



On Tue, 30 Oct 2007, Arjan van de Ven wrote:
>
> something like
> int pci_enable_mmconfig(struct pci_dev *pdev) ?

Yes, that looks fine. It also matches the kinds of things drivers already
have to do (ie enable DMA, MSI etc), both conceptually and from a purely
syntactic/practical standpoint.

I think mmconfig and MSI have a lot in common, in that both are relatively
new features and both have had issues that made them not work well.

> sounds like a very solid plan to me...
>
> > Maybe somebody inside Intel could just clarify the documentation, and
> > change it from "you're not supposed to mix" to "mix all you want".
>
> I'll see what I can do ;)

Well, let's see if "pci_enable_mmconfig()" works out first. There probably
aren't more than a few drivers that would be interested in this anyway, so
it's probably fine.

And by the time mmconfig is commonly used, probably all the teething
problems are long gone, so we could plan on a "in five years, maybe we
could even enable it by default" thing?

Linus

2007-10-30 18:51:19

by Andi Kleen

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch


> At that point, almost all the issues with mmconfig just go away, BECAUSE
> NOBODY USES IT, so it doesn't matter if it's broken?

We seem to slowly grow more uses of registers > 256, but yes it's still mostly
non critical stuff.

> The fact is, CONF1 style accesses are just safer, and *work*.
>
> Can we please just do that?

At least for the AMD GART IOMMU I would prefer if that was not done.
It has a pci config space access in a time critical path (flush).
And Family10h supports accessing the northbridge using mmconfig.

I believe the tigon3 driver also does config space access frequently
on some chips.

Also there are still the old x86 Macs where conf1 doesn't work.

-Andi

2007-10-30 19:06:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch



On Tue, 30 Oct 2007, Andi Kleen wrote:
>
> Also there are still the old x86 Macs where conf1 doesn't work.

I don't think that was ever true. That was a made-up rumor by the EFI
people who were just confused.

There's no way to disable conf1 accesses on any known chipset, afaik.

Linus

2007-10-30 19:37:30

by Andi Kleen

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

On Tuesday 30 October 2007 20:06:12 Linus Torvalds wrote:
>
> On Tue, 30 Oct 2007, Andi Kleen wrote:
> >
> > Also there are still the old x86 Macs where conf1 doesn't work.
>
> I don't think that was ever true. That was a made-up rumor by the EFI
> people who were just confused.
> There's no way to disable conf1 accesses on any known chipset, afaik.

There was a tester with such a machine and he complained until
the heuristic handled this case. So there is evidence it
really was like this.

-Andi

2007-10-30 21:42:18

by David Miller

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

From: Andi Kleen <[email protected]>
Date: Tue, 30 Oct 2007 19:50:58 +0100

> I believe the tigon3 driver also does config space access frequently
> on some chips.

This case is not worth optimizing for at all.

2007-10-30 22:07:44

by Jesse Barnes

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

On Tuesday, October 30, 2007 10:07 am Linus Torvalds wrote:
> (Where "screwed up badly" is the usual "left it to firmware people"
> thing, of course. Dammit, Intel *could* have just made it a real PCI
> BAR in the Northbridge, and specified it as such, and we wouldn't
> have these problems! But no, it had to be another idiotic "firmware
> tells where it is" thing)

The per-device flag is fine with me, but I should make something clear:

MMCONFIG IS NOT BROKEN!

Nor is finding mmconfig space. We know how to do that too.

What's broken is our PCI probing with certain address space layouts that
include MMCONFIG space. Since MMCONFIG is in MMIO space (by
definition) there will always be the potential for problems when we use
MMCONFIG and don't disable decode while sizing BARs (probably a latent
bug on many non-PC Linux platforms).

So we can either use I/O ports for sizing and only switch to MMCONFIG
later, or we can just use it on an as-needed basis, or we can make our
PCI probing safe if MMCONFIG is in use.

Jesse

2007-10-30 22:23:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch



On Tue, 30 Oct 2007, Jesse Barnes wrote:
>
> The per-device flag is fine with me, but I should make something clear:
>
> MMCONFIG IS NOT BROKEN!

Trust me, it is.

The particular problem _you_ had with it is only a small small part of the
bugs we have had.

> What's broken is our PCI probing with certain address space layouts that
> include MMCONFIG space.

No. You really don't see the big picture. There's been tons of problems
with MMCONFIG. Like the fact that other devices have their IO regions
registered on top of it, because the MMCONFIG thing was done as a hidden
resource. Or the fact that the area claimed was too small. Or too large.
Or not listed at all.

The whole thing is a total disaster. I told Intel engineers literally
*years* ago to not do that idiotic "hidden IO resources that are described
by firmware that then inevitably gets things wrong", and yet what happens?
Every single time.

Linus

2007-10-30 22:35:16

by Jesse Barnes

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

On Tuesday, October 30, 2007 3:22 pm Linus Torvalds wrote:
> On Tue, 30 Oct 2007, Jesse Barnes wrote:
> > The per-device flag is fine with me, but I should make something
> > clear:
> >
> > MMCONFIG IS NOT BROKEN!
>
> Trust me, it is.
>
> The particular problem _you_ had with it is only a small small part
> of the bugs we have had.
>
> > What's broken is our PCI probing with certain address space layouts
> > that include MMCONFIG space.
>
> No. You really don't see the big picture. There's been tons of
> problems with MMCONFIG. Like the fact that other devices have their
> IO regions registered on top of it, because the MMCONFIG thing was
> done as a hidden resource. Or the fact that the area claimed was too
> small. Or too large. Or not listed at all.

Yeah, that's definitely a problem, and would be a firmware bug. There's
no doubt that firmwares have had trouble with this in the past, but
given that Vista now relies on this stuff working, it's a lot more
likely to be reliable in current and future systems.

> The whole thing is a total disaster. I told Intel engineers literally
> *years* ago to not do that idiotic "hidden IO resources that are
> described by firmware that then inevitably gets things wrong", and
> yet what happens? Every single time.

I don't disagree there. I'm just saying the actual mechanism is fine
(as illustrated by the numerous non-PC ports of Linux), and this
particular problem, at least, isn't really specific to how MMCONFIG is
described or configured by the firmware and OS, it's simply a Linux
problem.

But like I said, the per-device flag Arjan suggested is fine with me...

Jesse

2007-10-30 22:38:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch



On Tue, 30 Oct 2007, Linus Torvalds wrote:
>
> No. You really don't see the big picture. There's been tons of problems
> with MMCONFIG. Like the fact that other devices have their IO regions
> registered on top of it, because the MMCONFIG thing was done as a hidden
> resource. Or the fact that the area claimed was too small. Or too large.
> Or not listed at all.

Actually, I guess the bad case wasn't "not listed at all", but incorrectly
listed - so the probing would go to the wrong address, not find any
devices, and then promptly result in an unusable machine with no hardware
attached.

I _think_ (and hope) those machines were never released. But even now, on
my main machine, I get "MCFG area at f0000000 is not E820-reserved", and
probably the only reason the PCI layer doesn't overwrite it is because it
does show up as a PnP region, and I have pnp support enabled.

Basically, the resource allocation for mmconf has always been broken
(largely by *design* by Intel!). And by being broken, it has been
unreliable.

Linus

2007-10-30 22:42:27

by Jesse Barnes

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

On Tuesday, October 30, 2007 3:38 pm Linus Torvalds wrote:
> On Tue, 30 Oct 2007, Linus Torvalds wrote:
> > No. You really don't see the big picture. There's been tons of
> > problems with MMCONFIG. Like the fact that other devices have their
> > IO regions registered on top of it, because the MMCONFIG thing was
> > done as a hidden resource. Or the fact that the area claimed was
> > too small. Or too large. Or not listed at all.
>
> Actually, I guess the bad case wasn't "not listed at all", but
> incorrectly listed - so the probing would go to the wrong address,
> not find any devices, and then promptly result in an unusable machine
> with no hardware attached.

Yeah, busted BIOS (and I agree a poor design decision).

> I _think_ (and hope) those machines were never released. But even
> now, on my main machine, I get "MCFG area at f0000000 is not
> E820-reserved", and probably the only reason the PCI layer doesn't
> overwrite it is because it does show up as a PnP region, and I have
> pnp support enabled.

Unfortunately I think some such machines *were* released, only because
the release engineers figured no one actually uses MMCONFIG yet
(Windows == whole world of users), so why worry about that particular
bug?

The "not E820-reserved" message is actually bogus. I'll bet MMCONFIG
works fine on your machine with Robert's patch and the disable decode
stuff applied.

Jesse

2007-10-30 22:45:20

by David Miller

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

From: Jesse Barnes <[email protected]>
Date: Tue, 30 Oct 2007 15:31:57 -0700

> I don't disagree there. I'm just saying the actual mechanism is fine
> (as illustrated by the numerous non-PC ports of Linux), and this
> particular problem, at least, isn't really specific to how MMCONFIG is
> described or configured by the firmware and OS, it's simply a Linux
> problem.

I totally disagree, it is not a Linux problem.

The non-PC ports don't have this issue because the PCI config space
MMIO area of the PCI controller does not overlap PCI MMIO space at
all. So the problem simply cannot happen.

They choose to put this MMCONFIG in an area overlapping with PCI MMIO
space, which breaks both existing practice and code.

2007-10-30 22:48:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch



On Tue, 30 Oct 2007, Jesse Barnes wrote:
>
> Yeah, that's definitely a problem, and would be a firmware bug. There's
> no doubt that firmwares have had trouble with this in the past, but
> given that Vista now relies on this stuff working, it's a lot more
> likely to be reliable in current and future systems.

I agree. I think most of the big problems we had were basically with
unreleased or very early systems.

But the per-device flag should "just fix it" regardless, and we can go
forward assuming that things work, but without breaking those borderline
systems.

Linus

2007-10-30 23:34:08

by Andi Kleen

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch


> Actually, I guess the bad case wasn't "not listed at all", but incorrectly
> listed - so the probing would go to the wrong address, not find any
> devices, and then promptly result in an unusable machine with no hardware
> attached.

iirc they tended to hang for whatever reason when the mcfg
area was accessed, not just not detect anything.

>
> I _think_ (and hope) those machines were never released.

They were :/ The bug flowed from a Intel reference BIOS to lots of
production boards.

> But even now, on
> my main machine, I get "MCFG area at f0000000 is not E820-reserved", and

That's a different issue. The spec does not require it to be e820-reserved.
That was just a (bogus) heuristic originally added to handle the early BIOS
bug. Even BIOS where the mcfg is fine do not reserve it there.

There were some patches to check it against ACPI resources (which
was presumably better specification conforming and more importantly
similar to what M$ does). I'm not sure that fixed all problems and
made the e820 check obsolete though.

> Basically, the resource allocation for mmconf has always been broken
> (largely by *design* by Intel!). And by being broken, it has been
> unreliable.

Well the overlapping to MMIO would have been fine as design if BIOS had
gotten it right. Trying to design things that BIOS can't get it wrong
is probably futile -- the BIOSes would just find more subtle ways to break.
The real problem I guess was that Linux was trying to use it before
Windows which is usually a bad idea regarding BIOS support at least
in the Desktop/Laptop space.

-Andi

2007-10-30 23:39:46

by Robert Hancock

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

Linus Torvalds wrote:
>
> On Tue, 30 Oct 2007, Arjan van de Ven wrote:
>> the problem is... you're not supposed to mix both types of accesses.
>
> You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for
> startup.

If it does, it's not by necessity. As soon as you read the table
location out of the ACPI tables you can start using it, and that
shouldn't require any config space accesses.

>
> Also, there's reason to believe that mixing things up _has_ to work
> anyway, and if the issue is between "works in practice" and "theory says
> that you shouldn't mix", I'll take practice every time.
>
> Especially since we *know* that the theory is broken. Right now MMCONFIG
> is effectively disabled very aggressively because it's simply unusably
> flaky. So the choice is between:
>
> - don't use MMCONFIG at all, because it has so many problems
> - use MMCONFIG sparingly enough to hide the problems

Fact is, we don't really know how many of these systems with supposedly
"broken" MMCONFIG were really just suffering from the overlapping
PCI/MMCONFIG address space problem, which is entirely the fault of the
way we do PCI probing. I would bet quite a few of them.

>
> and what "you're supposed to do" is simply trumped by Real Life(tm).
> Because Intel screwed up so badly when they designed that piece of shit.
>
> (Where "screwed up badly" is the usual "left it to firmware people" thing,
> of course. Dammit, Intel *could* have just made it a real PCI BAR in the
> Northbridge, and specified it as such, and we wouldn't have these
> problems! But no, it had to be another idiotic "firmware tells where it
> is" thing)

This wouldn't have helped anything with the problem in question.

>
>>> The fact is, CONF1 style accesses are just safer, and *work*.
>> I would suggest a slight twist then: use CONF1 *until* you're using
>> something above 256, and then and only then switch to MMCONFIG from
>> then on for all accesses.
>
> No.
>
> Maybe if you do it per-device, and only *after* probing (ie we have seen
> multiple, and successful, accesses), but globally, absolutely not. That
> would be useless. The bugs we have had in this area have been exactly the
> kinds of things like "we don't know the real size of the MMCONFIG areas"
> etc.
>
> I could easily see device driver writers probing to see if something
> works, and I absolutely don't think we should just automatically enable
> MMCONFIG from then on.

Why per device? It's not like the MSI case where both the platform and
the device are potentially busted. Whether or not MMCONFIG works has
nothing to do with the device, all that matters is whether it works on
the platform. It shouldn't be the driver's responsibility to know this.

>
> But maybe we could have a per-device flag that a driver *can* set. Ie have
> the logic be:
>
> - use MMCONFIG if we have to (reg >= 256)
>
> OR
>
> - use MMCONFIG if the driver specifically asked us to
>
> and then drivers that absolutely need it, and know they do, can set that
> flag. Preferably after they actually verified that it works.

How will they verify that it works? If it works, then verifying it works
is all well and good. If it doesn't work, trying to verify if it does
could very well blow up the machine.

I've made the point before that if we're going to allow using it at all,
we'd better find out if it works or not early on, not after we've been
running and somebody decides it's a good idea to try using it and
causing a lockup or something.

>
> That way you _can_ get the "this is how you're supposed to do it"
> behaviour, but you get it when there is a reasonable chance that it
> actually works.
>
> And quite frankly, if you're not supposed to mix these things even across
> devices, then I think we are better off just doing what we effectively do
> now: mostly ignore the damn thing because it's too broken to use.
>
> Maybe somebody inside Intel could just clarify the documentation, and
> change it from "you're not supposed to mix" to "mix all you want".

Intel could say what they want on the subject.. but that doesn't
necessarily reflect what happens with anyone else's chipset implementations.

2007-10-30 23:56:33

by Arjan van de Ven

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

On Tue, 30 Oct 2007 17:41:26 -0600
Robert Hancock <[email protected]> wrote:
> > I could easily see device driver writers probing to see if
> > something works, and I absolutely don't think we should just
> > automatically enable MMCONFIG from then on.
>
> Why per device? It's not like the MSI case where both the platform
> and the device are potentially busted. Whether or not MMCONFIG works
> has nothing to do with the device, all that matters is whether it
> works on the platform. It shouldn't be the driver's responsibility to
> know this.

it's per device so that if you have have no users for this stuff,
you'll never ever get bitten by bugs (be it linux or the bios).
It's just avoiding the problem for a large class of cases... doesn't
mean it's solved for the other class, just in practice it turns it into
a much smaller problem.

2007-10-31 00:00:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch



On Tue, 30 Oct 2007, Robert Hancock wrote:
> >
> > You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for
> > startup.
>
> If it does, it's not by necessity. As soon as you read the table location out
> of the ACPI tables you can start using it, and that shouldn't require any
> config space accesses.

Don't be silly. Exactly _BECAUSE_ we cannot trust the firmware, we have to
use conf1 (which we can trust) to verify it and/or fix things up.

Also, there are several devices that don't show up in the MMCFG things, or
just otherwise get it wrong.

So just take a look at arch/x86/pci/mmconfig-shared.c and look for
"conf1".

Really. Damn, I'm nervous taking any MMCFG patches that has you as an
author, if you aren't even aware of these kinds of fundamnetal issues. You
probably read the standards about how things are "supposed" to work, and
then just believed them?

Rule #1 in kernel programming: don't *ever* think that things actually
work the way they are documented to work. The documentation is a starting
point, nothing else.

And please be defensive in programming. We *know* conf1 cycles work. The
hardware has been extensively tested, and there are no firmware
interactions. There is *zero* reasons to use MMCONF cycles for normal
devices. Ergo: switching over to MMCONF when not needed is stupid and
fragile.

Linus

2007-10-31 00:02:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch



On Tue, 30 Oct 2007, Arjan van de Ven wrote:
>
> it's per device so that if you have have no users for this stuff,
> you'll never ever get bitten by bugs (be it linux or the bios).

Exactly. It's not that we care about trying to protect a system that
really needs to use MMCFG - once you really *really* need it, you're
screwed if it doesn't work.

But currently 99% of all systems will never need it, and for the broken
platforms, that's a nice round 100%. So if we make code not use MMCFG by
default, and you have to enable it for just those device drivers that
really want it and care, you automatically avoid all the broken hardware,
with no real downsides.

Linus

2007-10-31 00:23:52

by Robert Hancock

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

Linus Torvalds wrote:
>
> On Tue, 30 Oct 2007, Robert Hancock wrote:
>>> You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for
>>> startup.
>> If it does, it's not by necessity. As soon as you read the table location out
>> of the ACPI tables you can start using it, and that shouldn't require any
>> config space accesses.
>
> Don't be silly. Exactly _BECAUSE_ we cannot trust the firmware, we have to
> use conf1 (which we can trust) to verify it and/or fix things up.

My point was, it's not inherently necessary in order to use MMCONFIG.
I'm not saying the checks (unreachable_devices and
pci_mmcfg_check_hostbridge) aren't useful or needed with many real
machines. However, in the event that type1 access isn't available we
just skip all those checks because we have no other option. It would
indeed be a pretty broken spec if there was no way to bootstrap with it
even under ideal conditions..

>
> Also, there are several devices that don't show up in the MMCFG things, or
> just otherwise get it wrong.
>
> So just take a look at arch/x86/pci/mmconfig-shared.c and look for
> "conf1".
>
> Really. Damn, I'm nervous taking any MMCFG patches that has you as an
> author, if you aren't even aware of these kinds of fundamnetal issues. You
> probably read the standards about how things are "supposed" to work, and
> then just believed them?
>
> Rule #1 in kernel programming: don't *ever* think that things actually
> work the way they are documented to work. The documentation is a starting
> point, nothing else.
>
> And please be defensive in programming. We *know* conf1 cycles work. The
> hardware has been extensively tested, and there are no firmware
> interactions. There is *zero* reasons to use MMCONF cycles for normal
> devices. Ergo: switching over to MMCONF when not needed is stupid and
> fragile.

I can't really disagree that MMCONFIG doesn't have great advantages for
most devices (though it likely is faster on a lot of platforms, which
may be significant if the device does lots of config space accesses). So
for the moment, avoiding using it except where necessary will likely
work out (except if some system does indeed puke on mixing type1 and
MMCONFIG).

However, what Microsoft is doing with Vista may eventually make a
difference in the future. Many hardware vendors seem to use the testing
strategy of "test with latest Windows version. Works OK? Ship it." If
Vista decides that MMCONFIG is good to use all the time, then type1
access support is likely going to a) end up less tested and b) probably
deleted entirely in time. We've seen it before - it used to be that not
using ACPI was the safe option on most hardware with Linux. Now you
pretty much have to use it because the manufacturers only test with it
enabled. I've seen at least one board where the interrupt routing was
completely broken with ACPI off, because they obviously only tested in
Windows..

2007-10-31 01:30:43

by Jesse Barnes

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

On Tuesday, October 30, 2007 4:59 pm Linus Torvalds wrote:
> Also, there are several devices that don't show up in the MMCFG
> things, or just otherwise get it wrong.
>
> So just take a look at arch/x86/pci/mmconfig-shared.c and look for
> "conf1".
>
> Really. Damn, I'm nervous taking any MMCFG patches that has you as an
> author, if you aren't even aware of these kinds of fundamnetal
> issues. You probably read the standards about how things are
> "supposed" to work, and then just believed them?

We have a few bits of code in there to look at the actual MMCONFIG base
register, which is good since we can sanity check it against the
firmware. However, by itself it's not enough since we may end up using
it even though the BIOS didn't tell us about an overlap that really
does exist. Robert recognized this and added checks to make sure the
values read from the base regs actually matched what the firmware was
telling us in the ACPI tables (at least iirc, it's been awhile since I
looked at the patch).

So don't be too nervous. :)

Jesse

2007-11-01 09:46:52

by Martin Mares

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

Hello!

> something like
> int pci_enable_mmconfig(struct pci_dev *pdev) ?
> sounds like a very solid plan to me...

Please remember that the driver is not the sole user of the PCI config
space -- user-space programs (e.g., lspci) can access it via sysfs, too.
Should we force users of such programs to add a magic kernel parameter
to enable MMCONFIG? Does not make much sense.

Maybe we should do all bus scanning with conf1 and then try if MMCONFIG
returns the same values?

Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
"Beware of bugs in the above code; I have only proved it correct, not tried it." -- D.E.K.

2007-11-01 14:09:32

by Arjan van de Ven

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

On Thu, 1 Nov 2007 09:31:40 +0100
Martin Mares <[email protected]> wrote:

> Hello!
>
> > something like
> > int pci_enable_mmconfig(struct pci_dev *pdev) ?
> > sounds like a very solid plan to me...
>
> Please remember that the driver is not the sole user of the PCI config
> space -- user-space programs (e.g., lspci) can access it via sysfs,
> too. Should we force users of such programs to add a magic kernel
> parameter to enable MMCONFIG? Does not make much sense.
>
> Maybe we should do all bus scanning with conf1 and then try if
> MMCONFIG returns the same values?

that is already in the code today but not nearly enough; there's a ton
of cases where it's "touch mmconfig and the box is dead"...

2007-11-08 13:50:53

by Martin Mares

[permalink] [raw]
Subject: Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

Hello!

> > Maybe we should do all bus scanning with conf1 and then try if
> > MMCONFIG returns the same values?
>
> that is already in the code today but not nearly enough; there's a ton
> of cases where it's "touch mmconfig and the box is dead"...

Argh. However I still do not think it is an acceptable reason for breaking
all userspace access to the extended config space.

At the very least, there should be a sysfs toggle to enable MMCONFIG if
root really wishes that (maybe for a particular device, as the drivers
would do).

Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Do not believe in miracles -- rely on them.