2021-10-14 23:09:27

by Hans de Goede

[permalink] [raw]
Subject: [PATCH v5 0/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Hi All,

Here is v5 of my patch to address the E820 reservations vs PCI host bridge
windows issue which is causing touchpad and/or thunderbolt issues on many
different laptop models.

Changes in v5:
- Drop mention of Windows behavior from the commit msg, replace with a
reference to the specs
- Improve documentation in Documentation/admin-guide/kernel-parameters.txt
- Reword the big comment added, use "PCI host bridge window" in it and drop
all references to Windows
- Add a second patch moving the arch/x86/pci/acpi.c printk-s to pr_info/warn

Changes in v4:
- Rewrap the big comment block to fit in 80 columns

I believe that this is ready for merging now.

Bjorn, can you review/ack this please ?

x86/tip folks it would be ideal if you can pick this up and send at
least the first patch as a fix to Linus for 5.15. This fixes a bug which
has been plaguing a lot of users (see all the bug links in the commit msg).

Regards,

Hans


Hans de Goede (2):
x86/PCI: Ignore E820 reservations for bridge windows on newer systems
x86/PCI/ACPI: Replace printk calls with pr_info/pr_warn calls

.../admin-guide/kernel-parameters.txt | 9 ++++
arch/x86/include/asm/pci_x86.h | 10 ++++
arch/x86/kernel/resource.c | 4 ++
arch/x86/pci/acpi.c | 49 +++++++++++++++----
arch/x86/pci/common.c | 6 +++
5 files changed, 68 insertions(+), 10 deletions(-)

--
2.31.1


2021-10-14 23:18:59

by Hans de Goede

[permalink] [raw]
Subject: [PATCH v5 2/2] x86/PCI/ACPI: Replace printk calls with pr_info/pr_warn calls

The direct use of printk is deprecated, replace the printk calls
in arch/x86/pci/acpi.c with pr_info/pr_warn calls.

Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
---
Changes in v5:
- Rebase on v5 of:
"x86/PCI: Ignore E820 reservations for bridge windows on newer systems"
and send it out as part of a series instead of as a stand-alone patch
- Add Rafael's Acked-by
---
arch/x86/pci/acpi.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 72d473054262..f357dac92610 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -1,4 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
+
+#define pr_fmt(fmt) "PCI: " fmt
+
#include <linux/pci.h>
#include <linux/acpi.h>
#include <linux/init.h>
@@ -38,7 +41,7 @@ static int __init set_nouse_crs(const struct dmi_system_id *id)

static int __init set_ignore_seg(const struct dmi_system_id *id)
{
- printk(KERN_INFO "PCI: %s detected: ignoring ACPI _SEG\n", id->ident);
+ pr_info("%s detected: ignoring ACPI _SEG\n", id->ident);
pci_ignore_seg = true;
return 0;
}
@@ -158,10 +161,9 @@ void __init pci_acpi_crs_quirks(void)
else if (pci_probe & PCI_USE__CRS)
pci_use_crs = true;

- printk(KERN_INFO "PCI: %s host bridge windows from ACPI; "
- "if necessary, use \"pci=%s\" and report a bug\n",
- pci_use_crs ? "Using" : "Ignoring",
- pci_use_crs ? "nocrs" : "use_crs");
+ pr_info("%s host bridge windows from ACPI; if necessary, use \"pci=%s\" and report a bug\n",
+ pci_use_crs ? "Using" : "Ignoring",
+ pci_use_crs ? "nocrs" : "use_crs");

/*
* Some BIOS-es contain a bug where they add addresses which map to
@@ -186,8 +188,8 @@ void __init pci_acpi_crs_quirks(void)
else if (pci_probe & PCI_USE_E820)
pci_use_e820 = true;

- printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
- pci_use_e820 ? "Using" : "Ignoring");
+ pr_info("%s E820 reservations for host bridge windows\n",
+ pci_use_e820 ? "Using" : "Ignoring");
}

#ifdef CONFIG_PCI_MMCONFIG
@@ -362,9 +364,8 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
root->segment = domain = 0;

if (domain && !pci_domains_supported) {
- printk(KERN_WARNING "pci_bus %04x:%02x: "
- "ignored (multiple domains not supported)\n",
- domain, busnum);
+ pr_warn("pci_bus %04x:%02x: ignored (multiple domains not supported)\n",
+ domain, busnum);
return NULL;
}

@@ -432,7 +433,7 @@ int __init pci_acpi_init(void)
if (acpi_noirq)
return -ENODEV;

- printk(KERN_INFO "PCI: Using ACPI for IRQ routing\n");
+ pr_info("Using ACPI for IRQ routing\n");
acpi_irq_penalty_init();
pcibios_enable_irq = acpi_pci_irq_enable;
pcibios_disable_irq = acpi_pci_irq_disable;
@@ -444,7 +445,7 @@ int __init pci_acpi_init(void)
* also do it here in case there are still broken drivers that
* don't use pci_enable_device().
*/
- printk(KERN_INFO "PCI: Routing PCI interrupts for all devices because \"pci=routeirq\" specified\n");
+ pr_info("Routing PCI interrupts for all devices because \"pci=routeirq\" specified\n");
for_each_pci_dev(dev)
acpi_pci_irq_enable(dev);
}
--
2.31.1

2021-10-14 23:19:45

by Hans de Goede

[permalink] [raw]
Subject: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Some BIOS-es contain a bug where they add addresses which map to system
RAM in the PCI host bridge window returned by the ACPI _CRS method, see
commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
space").

To work around this bug Linux excludes E820 reserved addresses when
allocating addresses from the PCI host bridge window since 2010.

Recently (2020) some systems have shown-up with E820 reservations which
cover the entire _CRS returned PCI bridge memory window, causing all
attempts to assign memory to PCI BARs which have not been setup by the
BIOS to fail. For example here are the relevant dmesg bits from a
Lenovo IdeaPad 3 15IIL 81WE:

[mem 0x000000004bc50000-0x00000000cfffffff] reserved
pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]

The ACPI specifications appear to allow this new behavior:

The relationship between E820 and ACPI _CRS is not really very clear.
ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:

This range of addresses is in use or reserved by the system and is
not to be included in the allocatable memory pool of the operating
system's memory manager.

and it may be used when:

The address range is in use by a memory-mapped system device.

Furthermore, sec 15.2 says:

Address ranges defined for baseboard memory-mapped I/O devices, such
as APICs, are returned as reserved.

A PCI host bridge qualifies as a baseboard memory-mapped I/O device,
and its apertures are in use and certainly should not be included in
the general allocatable pool, so the fact that some BIOS-es reports
the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.

So it seems that the excluding of E820 reserved addresses is a mistake.

Ideally Linux would fully stop excluding E820 reserved addresses,
but then the old systems this was added for will regress.
Instead keep the old behavior for old systems, while ignoring
the E820 reservations for any systems from now on.

Old systems are defined here as BIOS year < 2018, this was chosen to
make sure that pci_use_e820 will not be set on the currently affected
systems, while at the same time also taking into account that the
systems for which the E820 checking was originally added may have
received BIOS updates for quite a while (esp. CVE related ones),
giving them a more recent BIOS year then 2010.

Also add pci=no_e820 and pci=use_e820 options to allow overriding
the BIOS year heuristic.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
BugLink: https://bugs.launchpad.net/bugs/1878279
BugLink: https://bugs.launchpad.net/bugs/1931715
BugLink: https://bugs.launchpad.net/bugs/1932069
BugLink: https://bugs.launchpad.net/bugs/1921649
Cc: Benoit GrĂ©goire <[email protected]>
Cc: Hui Wang <[email protected]>
Cc: [email protected]
Reviewed-by: Mika Westerberg <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
---
Changes in v5:
- Drop mention of Windows behavior from the commit msg, replace with a
reference to the specs
- Improve documentation in Documentation/admin-guide/kernel-parameters.txt
- Reword the big comment added, use "PCI host bridge window" in it and drop
all refences to Windows

Changes in v4:
- Rewrap the big comment block to fit in 80 columns
- Add Rafael's Acked-by
- Add Cc: [email protected]

Changes in v3:
- Commit msg tweaks (drop dmesg timestamps, typo fix)
- Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
- Add Mika's Reviewed-by

Changes in v2:
- Replace the per model DMI quirk approach with disabling E820 reservations
checking for all systems with a BIOS year >= 2018
- Add documentation for the new kernel-parameters to
Documentation/admin-guide/kernel-parameters.txt
---
Other patches trying to address the same issue:
https://lore.kernel.org/r/[email protected]
https://lore.kernel.org/r/[email protected]
V1 patch:
https://lore.kernel.org/r/[email protected]
---
.../admin-guide/kernel-parameters.txt | 9 ++++++
arch/x86/include/asm/pci_x86.h | 10 +++++++
arch/x86/kernel/resource.c | 4 +++
arch/x86/pci/acpi.c | 28 +++++++++++++++++++
arch/x86/pci/common.c | 6 ++++
5 files changed, 57 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35fe5bc0..07f1615206d4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3949,6 +3949,15 @@
please report a bug.
nocrs [X86] Ignore PCI host bridge windows from ACPI.
If you need to use this, please report a bug.
+ use_e820 [X86] Use E820 reservations to exclude parts of
+ PCI host bridge windows. This is a workaround
+ for BIOS defects in host bridge _CRS methods.
+ If you need to use this, please report a bug to
+ <[email protected]>.
+ no_e820 [X86] Ignore E820 reservations for PCI host
+ bridge windows. This is the default on modern
+ hardware. If you need to use this, please report
+ a bug to <[email protected]>.
routeirq Do IRQ routing for all PCI devices.
This is normally done in pci_enable_device(),
so this option is a temporary workaround
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index 490411dba438..0bb4e7dd0ffc 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -39,6 +39,8 @@ do { \
#define PCI_ROOT_NO_CRS 0x100000
#define PCI_NOASSIGN_BARS 0x200000
#define PCI_BIG_ROOT_WINDOW 0x400000
+#define PCI_USE_E820 0x800000
+#define PCI_NO_E820 0x1000000

extern unsigned int pci_probe;
extern unsigned long pirq_table_addr;
@@ -64,6 +66,8 @@ void pcibios_scan_specific_bus(int busn);

/* pci-irq.c */

+struct pci_dev;
+
struct irq_info {
u8 bus, devfn; /* Bus, device and function */
struct {
@@ -232,3 +236,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
# define x86_default_pci_init_irq NULL
# define x86_default_pci_fixup_irqs NULL
#endif
+
+#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
+extern bool pci_use_e820;
+#else
+#define pci_use_e820 false
+#endif
diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
index 9b9fb7882c20..e8dc9bc327bd 100644
--- a/arch/x86/kernel/resource.c
+++ b/arch/x86/kernel/resource.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/ioport.h>
#include <asm/e820/api.h>
+#include <asm/pci_x86.h>

static void resource_clip(struct resource *res, resource_size_t start,
resource_size_t end)
@@ -28,6 +29,9 @@ static void remove_e820_regions(struct resource *avail)
int i;
struct e820_entry *entry;

+ if (!pci_use_e820)
+ return;
+
for (i = 0; i < e820_table->nr_entries; i++) {
entry = &e820_table->entries[i];

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 948656069cdd..72d473054262 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -21,6 +21,8 @@ struct pci_root_info {

static bool pci_use_crs = true;
static bool pci_ignore_seg = false;
+/* Consumed in arch/x86/kernel/resource.c */
+bool pci_use_e820 = false;

static int __init set_use_crs(const struct dmi_system_id *id)
{
@@ -160,6 +162,32 @@ void __init pci_acpi_crs_quirks(void)
"if necessary, use \"pci=%s\" and report a bug\n",
pci_use_crs ? "Using" : "Ignoring",
pci_use_crs ? "nocrs" : "use_crs");
+
+ /*
+ * Some BIOS-es contain a bug where they add addresses which map to
+ * system RAM in the PCI host bridge window returned by the ACPI _CRS
+ * method, see commit 4dc2287c1805 ("x86: avoid E820 regions when
+ * allocating address space"). To avoid this Linux by default excludes
+ * E820 reservations when allocating addresses since 2010.
+ * In 2020 some systems have shown-up with E820 reservations which cover
+ * the entire _CRS returned PCI host bridge window, causing all attempts
+ * to assign memory to PCI BARs to fail if Linux uses E820 reservations.
+ *
+ * Ideally Linux would fully stop using E820 reservations, but then
+ * the old systems this was added for will regress.
+ * Instead keep the old behavior for old systems, while ignoring the
+ * E820 reservations for any systems from now on.
+ */
+ if (year >= 0 && year < 2018)
+ pci_use_e820 = true;
+
+ if (pci_probe & PCI_NO_E820)
+ pci_use_e820 = false;
+ else if (pci_probe & PCI_USE_E820)
+ pci_use_e820 = true;
+
+ printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
+ pci_use_e820 ? "Using" : "Ignoring");
}

#ifdef CONFIG_PCI_MMCONFIG
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 3507f456fcd0..091ec7e94fcb 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -595,6 +595,12 @@ char *__init pcibios_setup(char *str)
} else if (!strcmp(str, "nocrs")) {
pci_probe |= PCI_ROOT_NO_CRS;
return NULL;
+ } else if (!strcmp(str, "use_e820")) {
+ pci_probe |= PCI_USE_E820;
+ return NULL;
+ } else if (!strcmp(str, "no_e820")) {
+ pci_probe |= PCI_NO_E820;
+ return NULL;
#ifdef CONFIG_PHYS_ADDR_T_64BIT
} else if (!strcmp(str, "big_root_window")) {
pci_probe |= PCI_BIG_ROOT_WINDOW;
--
2.31.1

2021-10-18 01:07:43

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
> Some BIOS-es contain a bug where they add addresses which map to system
> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> space").
>
> To work around this bug Linux excludes E820 reserved addresses when
> allocating addresses from the PCI host bridge window since 2010.
>
> Recently (2020) some systems have shown-up with E820 reservations which
> cover the entire _CRS returned PCI bridge memory window, causing all
> attempts to assign memory to PCI BARs which have not been setup by the
> BIOS to fail. For example here are the relevant dmesg bits from a
> Lenovo IdeaPad 3 15IIL 81WE:
>
> [mem 0x000000004bc50000-0x00000000cfffffff] reserved
> pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
>
> The ACPI specifications appear to allow this new behavior:
>
> The relationship between E820 and ACPI _CRS is not really very clear.
> ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
>
> This range of addresses is in use or reserved by the system and is
> not to be included in the allocatable memory pool of the operating
> system's memory manager.
>
> and it may be used when:
>
> The address range is in use by a memory-mapped system device.
>
> Furthermore, sec 15.2 says:
>
> Address ranges defined for baseboard memory-mapped I/O devices, such
> as APICs, are returned as reserved.
>
> A PCI host bridge qualifies as a baseboard memory-mapped I/O device,
> and its apertures are in use and certainly should not be included in
> the general allocatable pool, so the fact that some BIOS-es reports
> the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
>
> So it seems that the excluding of E820 reserved addresses is a mistake.
>
> Ideally Linux would fully stop excluding E820 reserved addresses,
> but then the old systems this was added for will regress.
> Instead keep the old behavior for old systems, while ignoring
> the E820 reservations for any systems from now on.
>
> Old systems are defined here as BIOS year < 2018, this was chosen to
> make sure that pci_use_e820 will not be set on the currently affected
> systems, while at the same time also taking into account that the
> systems for which the E820 checking was originally added may have
> received BIOS updates for quite a while (esp. CVE related ones),
> giving them a more recent BIOS year then 2010.
>
> Also add pci=no_e820 and pci=use_e820 options to allow overriding
> the BIOS year heuristic.
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
> BugLink: https://bugs.launchpad.net/bugs/1878279
> BugLink: https://bugs.launchpad.net/bugs/1931715
> BugLink: https://bugs.launchpad.net/bugs/1932069
> BugLink: https://bugs.launchpad.net/bugs/1921649
> Cc: Benoit Gr?goire <[email protected]>
> Cc: Hui Wang <[email protected]>
> Cc: [email protected]
> Reviewed-by: Mika Westerberg <[email protected]>
> Acked-by: Rafael J. Wysocki <[email protected]>
> Signed-off-by: Hans de Goede <[email protected]>

Acked-by: Bjorn Helgaas <[email protected]>

I'm sorry we have to add more ugliness because of my mistake in
4dc2287c1805, but I don't see a better option.

> ---
> Changes in v5:
> - Drop mention of Windows behavior from the commit msg, replace with a
> reference to the specs
> - Improve documentation in Documentation/admin-guide/kernel-parameters.txt
> - Reword the big comment added, use "PCI host bridge window" in it and drop
> all refences to Windows
>
> Changes in v4:
> - Rewrap the big comment block to fit in 80 columns
> - Add Rafael's Acked-by
> - Add Cc: [email protected]
>
> Changes in v3:
> - Commit msg tweaks (drop dmesg timestamps, typo fix)
> - Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
> - Add Mika's Reviewed-by
>
> Changes in v2:
> - Replace the per model DMI quirk approach with disabling E820 reservations
> checking for all systems with a BIOS year >= 2018
> - Add documentation for the new kernel-parameters to
> Documentation/admin-guide/kernel-parameters.txt
> ---
> Other patches trying to address the same issue:
> https://lore.kernel.org/r/[email protected]
> https://lore.kernel.org/r/[email protected]
> V1 patch:
> https://lore.kernel.org/r/[email protected]
> ---
> .../admin-guide/kernel-parameters.txt | 9 ++++++
> arch/x86/include/asm/pci_x86.h | 10 +++++++
> arch/x86/kernel/resource.c | 4 +++
> arch/x86/pci/acpi.c | 28 +++++++++++++++++++
> arch/x86/pci/common.c | 6 ++++
> 5 files changed, 57 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 43dc35fe5bc0..07f1615206d4 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3949,6 +3949,15 @@
> please report a bug.
> nocrs [X86] Ignore PCI host bridge windows from ACPI.
> If you need to use this, please report a bug.
> + use_e820 [X86] Use E820 reservations to exclude parts of
> + PCI host bridge windows. This is a workaround
> + for BIOS defects in host bridge _CRS methods.
> + If you need to use this, please report a bug to
> + <[email protected]>.
> + no_e820 [X86] Ignore E820 reservations for PCI host
> + bridge windows. This is the default on modern
> + hardware. If you need to use this, please report
> + a bug to <[email protected]>.
> routeirq Do IRQ routing for all PCI devices.
> This is normally done in pci_enable_device(),
> so this option is a temporary workaround
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 490411dba438..0bb4e7dd0ffc 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -39,6 +39,8 @@ do { \
> #define PCI_ROOT_NO_CRS 0x100000
> #define PCI_NOASSIGN_BARS 0x200000
> #define PCI_BIG_ROOT_WINDOW 0x400000
> +#define PCI_USE_E820 0x800000
> +#define PCI_NO_E820 0x1000000
>
> extern unsigned int pci_probe;
> extern unsigned long pirq_table_addr;
> @@ -64,6 +66,8 @@ void pcibios_scan_specific_bus(int busn);
>
> /* pci-irq.c */
>
> +struct pci_dev;
> +
> struct irq_info {
> u8 bus, devfn; /* Bus, device and function */
> struct {
> @@ -232,3 +236,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
> # define x86_default_pci_init_irq NULL
> # define x86_default_pci_fixup_irqs NULL
> #endif
> +
> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
> +extern bool pci_use_e820;
> +#else
> +#define pci_use_e820 false
> +#endif
> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> index 9b9fb7882c20..e8dc9bc327bd 100644
> --- a/arch/x86/kernel/resource.c
> +++ b/arch/x86/kernel/resource.c
> @@ -1,6 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
> #include <linux/ioport.h>
> #include <asm/e820/api.h>
> +#include <asm/pci_x86.h>
>
> static void resource_clip(struct resource *res, resource_size_t start,
> resource_size_t end)
> @@ -28,6 +29,9 @@ static void remove_e820_regions(struct resource *avail)
> int i;
> struct e820_entry *entry;
>
> + if (!pci_use_e820)
> + return;
> +
> for (i = 0; i < e820_table->nr_entries; i++) {
> entry = &e820_table->entries[i];
>
> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> index 948656069cdd..72d473054262 100644
> --- a/arch/x86/pci/acpi.c
> +++ b/arch/x86/pci/acpi.c
> @@ -21,6 +21,8 @@ struct pci_root_info {
>
> static bool pci_use_crs = true;
> static bool pci_ignore_seg = false;
> +/* Consumed in arch/x86/kernel/resource.c */
> +bool pci_use_e820 = false;
>
> static int __init set_use_crs(const struct dmi_system_id *id)
> {
> @@ -160,6 +162,32 @@ void __init pci_acpi_crs_quirks(void)
> "if necessary, use \"pci=%s\" and report a bug\n",
> pci_use_crs ? "Using" : "Ignoring",
> pci_use_crs ? "nocrs" : "use_crs");
> +
> + /*
> + * Some BIOS-es contain a bug where they add addresses which map to
> + * system RAM in the PCI host bridge window returned by the ACPI _CRS
> + * method, see commit 4dc2287c1805 ("x86: avoid E820 regions when
> + * allocating address space"). To avoid this Linux by default excludes
> + * E820 reservations when allocating addresses since 2010.
> + * In 2020 some systems have shown-up with E820 reservations which cover
> + * the entire _CRS returned PCI host bridge window, causing all attempts
> + * to assign memory to PCI BARs to fail if Linux uses E820 reservations.
> + *
> + * Ideally Linux would fully stop using E820 reservations, but then
> + * the old systems this was added for will regress.
> + * Instead keep the old behavior for old systems, while ignoring the
> + * E820 reservations for any systems from now on.
> + */
> + if (year >= 0 && year < 2018)
> + pci_use_e820 = true;
> +
> + if (pci_probe & PCI_NO_E820)
> + pci_use_e820 = false;
> + else if (pci_probe & PCI_USE_E820)
> + pci_use_e820 = true;
> +
> + printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
> + pci_use_e820 ? "Using" : "Ignoring");
> }
>
> #ifdef CONFIG_PCI_MMCONFIG
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 3507f456fcd0..091ec7e94fcb 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -595,6 +595,12 @@ char *__init pcibios_setup(char *str)
> } else if (!strcmp(str, "nocrs")) {
> pci_probe |= PCI_ROOT_NO_CRS;
> return NULL;
> + } else if (!strcmp(str, "use_e820")) {
> + pci_probe |= PCI_USE_E820;
> + return NULL;
> + } else if (!strcmp(str, "no_e820")) {
> + pci_probe |= PCI_NO_E820;
> + return NULL;
> #ifdef CONFIG_PHYS_ADDR_T_64BIT
> } else if (!strcmp(str, "big_root_window")) {
> pci_probe |= PCI_BIG_ROOT_WINDOW;
> --
> 2.31.1
>

2021-10-18 01:55:45

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Hi,

On 10/15/21 10:03 PM, Bjorn Helgaas wrote:
> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
>> Some BIOS-es contain a bug where they add addresses which map to system
>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>> space").
>>
>> To work around this bug Linux excludes E820 reserved addresses when
>> allocating addresses from the PCI host bridge window since 2010.
>>
>> Recently (2020) some systems have shown-up with E820 reservations which
>> cover the entire _CRS returned PCI bridge memory window, causing all
>> attempts to assign memory to PCI BARs which have not been setup by the
>> BIOS to fail. For example here are the relevant dmesg bits from a
>> Lenovo IdeaPad 3 15IIL 81WE:
>>
>> [mem 0x000000004bc50000-0x00000000cfffffff] reserved
>> pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
>>
>> The ACPI specifications appear to allow this new behavior:
>>
>> The relationship between E820 and ACPI _CRS is not really very clear.
>> ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
>>
>> This range of addresses is in use or reserved by the system and is
>> not to be included in the allocatable memory pool of the operating
>> system's memory manager.
>>
>> and it may be used when:
>>
>> The address range is in use by a memory-mapped system device.
>>
>> Furthermore, sec 15.2 says:
>>
>> Address ranges defined for baseboard memory-mapped I/O devices, such
>> as APICs, are returned as reserved.
>>
>> A PCI host bridge qualifies as a baseboard memory-mapped I/O device,
>> and its apertures are in use and certainly should not be included in
>> the general allocatable pool, so the fact that some BIOS-es reports
>> the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
>>
>> So it seems that the excluding of E820 reserved addresses is a mistake.
>>
>> Ideally Linux would fully stop excluding E820 reserved addresses,
>> but then the old systems this was added for will regress.
>> Instead keep the old behavior for old systems, while ignoring
>> the E820 reservations for any systems from now on.
>>
>> Old systems are defined here as BIOS year < 2018, this was chosen to
>> make sure that pci_use_e820 will not be set on the currently affected
>> systems, while at the same time also taking into account that the
>> systems for which the E820 checking was originally added may have
>> received BIOS updates for quite a while (esp. CVE related ones),
>> giving them a more recent BIOS year then 2010.
>>
>> Also add pci=no_e820 and pci=use_e820 options to allow overriding
>> the BIOS year heuristic.
>>
>> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
>> BugLink: https://bugs.launchpad.net/bugs/1878279
>> BugLink: https://bugs.launchpad.net/bugs/1931715
>> BugLink: https://bugs.launchpad.net/bugs/1932069
>> BugLink: https://bugs.launchpad.net/bugs/1921649
>> Cc: Benoit GrĂ©goire <[email protected]>
>> Cc: Hui Wang <[email protected]>
>> Cc: [email protected]
>> Reviewed-by: Mika Westerberg <[email protected]>
>> Acked-by: Rafael J. Wysocki <[email protected]>
>> Signed-off-by: Hans de Goede <[email protected]>
>
> Acked-by: Bjorn Helgaas <[email protected]>

Thank you.

Regards,

Hans



>
> I'm sorry we have to add more ugliness because of my mistake in
> 4dc2287c1805, but I don't see a better option.
>
>> ---
>> Changes in v5:
>> - Drop mention of Windows behavior from the commit msg, replace with a
>> reference to the specs
>> - Improve documentation in Documentation/admin-guide/kernel-parameters.txt
>> - Reword the big comment added, use "PCI host bridge window" in it and drop
>> all refences to Windows
>>
>> Changes in v4:
>> - Rewrap the big comment block to fit in 80 columns
>> - Add Rafael's Acked-by
>> - Add Cc: [email protected]
>>
>> Changes in v3:
>> - Commit msg tweaks (drop dmesg timestamps, typo fix)
>> - Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
>> - Add Mika's Reviewed-by
>>
>> Changes in v2:
>> - Replace the per model DMI quirk approach with disabling E820 reservations
>> checking for all systems with a BIOS year >= 2018
>> - Add documentation for the new kernel-parameters to
>> Documentation/admin-guide/kernel-parameters.txt
>> ---
>> Other patches trying to address the same issue:
>> https://lore.kernel.org/r/[email protected]
>> https://lore.kernel.org/r/[email protected]
>> V1 patch:
>> https://lore.kernel.org/r/[email protected]
>> ---
>> .../admin-guide/kernel-parameters.txt | 9 ++++++
>> arch/x86/include/asm/pci_x86.h | 10 +++++++
>> arch/x86/kernel/resource.c | 4 +++
>> arch/x86/pci/acpi.c | 28 +++++++++++++++++++
>> arch/x86/pci/common.c | 6 ++++
>> 5 files changed, 57 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 43dc35fe5bc0..07f1615206d4 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3949,6 +3949,15 @@
>> please report a bug.
>> nocrs [X86] Ignore PCI host bridge windows from ACPI.
>> If you need to use this, please report a bug.
>> + use_e820 [X86] Use E820 reservations to exclude parts of
>> + PCI host bridge windows. This is a workaround
>> + for BIOS defects in host bridge _CRS methods.
>> + If you need to use this, please report a bug to
>> + <[email protected]>.
>> + no_e820 [X86] Ignore E820 reservations for PCI host
>> + bridge windows. This is the default on modern
>> + hardware. If you need to use this, please report
>> + a bug to <[email protected]>.
>> routeirq Do IRQ routing for all PCI devices.
>> This is normally done in pci_enable_device(),
>> so this option is a temporary workaround
>> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
>> index 490411dba438..0bb4e7dd0ffc 100644
>> --- a/arch/x86/include/asm/pci_x86.h
>> +++ b/arch/x86/include/asm/pci_x86.h
>> @@ -39,6 +39,8 @@ do { \
>> #define PCI_ROOT_NO_CRS 0x100000
>> #define PCI_NOASSIGN_BARS 0x200000
>> #define PCI_BIG_ROOT_WINDOW 0x400000
>> +#define PCI_USE_E820 0x800000
>> +#define PCI_NO_E820 0x1000000
>>
>> extern unsigned int pci_probe;
>> extern unsigned long pirq_table_addr;
>> @@ -64,6 +66,8 @@ void pcibios_scan_specific_bus(int busn);
>>
>> /* pci-irq.c */
>>
>> +struct pci_dev;
>> +
>> struct irq_info {
>> u8 bus, devfn; /* Bus, device and function */
>> struct {
>> @@ -232,3 +236,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>> # define x86_default_pci_init_irq NULL
>> # define x86_default_pci_fixup_irqs NULL
>> #endif
>> +
>> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
>> +extern bool pci_use_e820;
>> +#else
>> +#define pci_use_e820 false
>> +#endif
>> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
>> index 9b9fb7882c20..e8dc9bc327bd 100644
>> --- a/arch/x86/kernel/resource.c
>> +++ b/arch/x86/kernel/resource.c
>> @@ -1,6 +1,7 @@
>> // SPDX-License-Identifier: GPL-2.0
>> #include <linux/ioport.h>
>> #include <asm/e820/api.h>
>> +#include <asm/pci_x86.h>
>>
>> static void resource_clip(struct resource *res, resource_size_t start,
>> resource_size_t end)
>> @@ -28,6 +29,9 @@ static void remove_e820_regions(struct resource *avail)
>> int i;
>> struct e820_entry *entry;
>>
>> + if (!pci_use_e820)
>> + return;
>> +
>> for (i = 0; i < e820_table->nr_entries; i++) {
>> entry = &e820_table->entries[i];
>>
>> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
>> index 948656069cdd..72d473054262 100644
>> --- a/arch/x86/pci/acpi.c
>> +++ b/arch/x86/pci/acpi.c
>> @@ -21,6 +21,8 @@ struct pci_root_info {
>>
>> static bool pci_use_crs = true;
>> static bool pci_ignore_seg = false;
>> +/* Consumed in arch/x86/kernel/resource.c */
>> +bool pci_use_e820 = false;
>>
>> static int __init set_use_crs(const struct dmi_system_id *id)
>> {
>> @@ -160,6 +162,32 @@ void __init pci_acpi_crs_quirks(void)
>> "if necessary, use \"pci=%s\" and report a bug\n",
>> pci_use_crs ? "Using" : "Ignoring",
>> pci_use_crs ? "nocrs" : "use_crs");
>> +
>> + /*
>> + * Some BIOS-es contain a bug where they add addresses which map to
>> + * system RAM in the PCI host bridge window returned by the ACPI _CRS
>> + * method, see commit 4dc2287c1805 ("x86: avoid E820 regions when
>> + * allocating address space"). To avoid this Linux by default excludes
>> + * E820 reservations when allocating addresses since 2010.
>> + * In 2020 some systems have shown-up with E820 reservations which cover
>> + * the entire _CRS returned PCI host bridge window, causing all attempts
>> + * to assign memory to PCI BARs to fail if Linux uses E820 reservations.
>> + *
>> + * Ideally Linux would fully stop using E820 reservations, but then
>> + * the old systems this was added for will regress.
>> + * Instead keep the old behavior for old systems, while ignoring the
>> + * E820 reservations for any systems from now on.
>> + */
>> + if (year >= 0 && year < 2018)
>> + pci_use_e820 = true;
>> +
>> + if (pci_probe & PCI_NO_E820)
>> + pci_use_e820 = false;
>> + else if (pci_probe & PCI_USE_E820)
>> + pci_use_e820 = true;
>> +
>> + printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
>> + pci_use_e820 ? "Using" : "Ignoring");
>> }
>>
>> #ifdef CONFIG_PCI_MMCONFIG
>> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
>> index 3507f456fcd0..091ec7e94fcb 100644
>> --- a/arch/x86/pci/common.c
>> +++ b/arch/x86/pci/common.c
>> @@ -595,6 +595,12 @@ char *__init pcibios_setup(char *str)
>> } else if (!strcmp(str, "nocrs")) {
>> pci_probe |= PCI_ROOT_NO_CRS;
>> return NULL;
>> + } else if (!strcmp(str, "use_e820")) {
>> + pci_probe |= PCI_USE_E820;
>> + return NULL;
>> + } else if (!strcmp(str, "no_e820")) {
>> + pci_probe |= PCI_NO_E820;
>> + return NULL;
>> #ifdef CONFIG_PHYS_ADDR_T_64BIT
>> } else if (!strcmp(str, "big_root_window")) {
>> pci_probe |= PCI_BIG_ROOT_WINDOW;
>> --
>> 2.31.1
>>
>

2021-10-19 21:54:07

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
> Some BIOS-es contain a bug where they add addresses which map to system
> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> space").
>
> To work around this bug Linux excludes E820 reserved addresses when
> allocating addresses from the PCI host bridge window since 2010.
>
> Recently (2020) some systems have shown-up with E820 reservations which
> cover the entire _CRS returned PCI bridge memory window, causing all
> attempts to assign memory to PCI BARs which have not been setup by the
> BIOS to fail. For example here are the relevant dmesg bits from a
> Lenovo IdeaPad 3 15IIL 81WE:
>
> [mem 0x000000004bc50000-0x00000000cfffffff] reserved
> pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
>
> The ACPI specifications appear to allow this new behavior:
>
> The relationship between E820 and ACPI _CRS is not really very clear.
> ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
>
> This range of addresses is in use or reserved by the system and is
> not to be included in the allocatable memory pool of the operating
> system's memory manager.
>
> and it may be used when:
>
> The address range is in use by a memory-mapped system device.
>
> Furthermore, sec 15.2 says:
>
> Address ranges defined for baseboard memory-mapped I/O devices, such
> as APICs, are returned as reserved.
>
> A PCI host bridge qualifies as a baseboard memory-mapped I/O device,
> and its apertures are in use and certainly should not be included in
> the general allocatable pool, so the fact that some BIOS-es reports
> the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
>
> So it seems that the excluding of E820 reserved addresses is a mistake.
>
> Ideally Linux would fully stop excluding E820 reserved addresses,
> but then the old systems this was added for will regress.
> Instead keep the old behavior for old systems, while ignoring
> the E820 reservations for any systems from now on.
>
> Old systems are defined here as BIOS year < 2018, this was chosen to
> make sure that pci_use_e820 will not be set on the currently affected
> systems, while at the same time also taking into account that the
> systems for which the E820 checking was originally added may have
> received BIOS updates for quite a while (esp. CVE related ones),
> giving them a more recent BIOS year then 2010.
>
> Also add pci=no_e820 and pci=use_e820 options to allow overriding
> the BIOS year heuristic.
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
> BugLink: https://bugs.launchpad.net/bugs/1878279
> BugLink: https://bugs.launchpad.net/bugs/1931715
> BugLink: https://bugs.launchpad.net/bugs/1932069
> BugLink: https://bugs.launchpad.net/bugs/1921649
> Cc: Benoit Gr?goire <[email protected]>
> Cc: Hui Wang <[email protected]>
> Cc: [email protected]
> Reviewed-by: Mika Westerberg <[email protected]>
> Acked-by: Rafael J. Wysocki <[email protected]>
> Signed-off-by: Hans de Goede <[email protected]>

I haven't seen anybody else eager to merge this, so I guess I'll stick
my neck out here.

I applied this to my for-linus branch for v5.15.

> ---
> Changes in v5:
> - Drop mention of Windows behavior from the commit msg, replace with a
> reference to the specs
> - Improve documentation in Documentation/admin-guide/kernel-parameters.txt
> - Reword the big comment added, use "PCI host bridge window" in it and drop
> all refences to Windows
>
> Changes in v4:
> - Rewrap the big comment block to fit in 80 columns
> - Add Rafael's Acked-by
> - Add Cc: [email protected]
>
> Changes in v3:
> - Commit msg tweaks (drop dmesg timestamps, typo fix)
> - Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
> - Add Mika's Reviewed-by
>
> Changes in v2:
> - Replace the per model DMI quirk approach with disabling E820 reservations
> checking for all systems with a BIOS year >= 2018
> - Add documentation for the new kernel-parameters to
> Documentation/admin-guide/kernel-parameters.txt
> ---
> Other patches trying to address the same issue:
> https://lore.kernel.org/r/[email protected]
> https://lore.kernel.org/r/[email protected]
> V1 patch:
> https://lore.kernel.org/r/[email protected]
> ---
> .../admin-guide/kernel-parameters.txt | 9 ++++++
> arch/x86/include/asm/pci_x86.h | 10 +++++++
> arch/x86/kernel/resource.c | 4 +++
> arch/x86/pci/acpi.c | 28 +++++++++++++++++++
> arch/x86/pci/common.c | 6 ++++
> 5 files changed, 57 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 43dc35fe5bc0..07f1615206d4 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3949,6 +3949,15 @@
> please report a bug.
> nocrs [X86] Ignore PCI host bridge windows from ACPI.
> If you need to use this, please report a bug.
> + use_e820 [X86] Use E820 reservations to exclude parts of
> + PCI host bridge windows. This is a workaround
> + for BIOS defects in host bridge _CRS methods.
> + If you need to use this, please report a bug to
> + <[email protected]>.
> + no_e820 [X86] Ignore E820 reservations for PCI host
> + bridge windows. This is the default on modern
> + hardware. If you need to use this, please report
> + a bug to <[email protected]>.
> routeirq Do IRQ routing for all PCI devices.
> This is normally done in pci_enable_device(),
> so this option is a temporary workaround
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 490411dba438..0bb4e7dd0ffc 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -39,6 +39,8 @@ do { \
> #define PCI_ROOT_NO_CRS 0x100000
> #define PCI_NOASSIGN_BARS 0x200000
> #define PCI_BIG_ROOT_WINDOW 0x400000
> +#define PCI_USE_E820 0x800000
> +#define PCI_NO_E820 0x1000000
>
> extern unsigned int pci_probe;
> extern unsigned long pirq_table_addr;
> @@ -64,6 +66,8 @@ void pcibios_scan_specific_bus(int busn);
>
> /* pci-irq.c */
>
> +struct pci_dev;
> +
> struct irq_info {
> u8 bus, devfn; /* Bus, device and function */
> struct {
> @@ -232,3 +236,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
> # define x86_default_pci_init_irq NULL
> # define x86_default_pci_fixup_irqs NULL
> #endif
> +
> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
> +extern bool pci_use_e820;
> +#else
> +#define pci_use_e820 false
> +#endif
> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> index 9b9fb7882c20..e8dc9bc327bd 100644
> --- a/arch/x86/kernel/resource.c
> +++ b/arch/x86/kernel/resource.c
> @@ -1,6 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
> #include <linux/ioport.h>
> #include <asm/e820/api.h>
> +#include <asm/pci_x86.h>
>
> static void resource_clip(struct resource *res, resource_size_t start,
> resource_size_t end)
> @@ -28,6 +29,9 @@ static void remove_e820_regions(struct resource *avail)
> int i;
> struct e820_entry *entry;
>
> + if (!pci_use_e820)
> + return;
> +
> for (i = 0; i < e820_table->nr_entries; i++) {
> entry = &e820_table->entries[i];
>
> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> index 948656069cdd..72d473054262 100644
> --- a/arch/x86/pci/acpi.c
> +++ b/arch/x86/pci/acpi.c
> @@ -21,6 +21,8 @@ struct pci_root_info {
>
> static bool pci_use_crs = true;
> static bool pci_ignore_seg = false;
> +/* Consumed in arch/x86/kernel/resource.c */
> +bool pci_use_e820 = false;
>
> static int __init set_use_crs(const struct dmi_system_id *id)
> {
> @@ -160,6 +162,32 @@ void __init pci_acpi_crs_quirks(void)
> "if necessary, use \"pci=%s\" and report a bug\n",
> pci_use_crs ? "Using" : "Ignoring",
> pci_use_crs ? "nocrs" : "use_crs");
> +
> + /*
> + * Some BIOS-es contain a bug where they add addresses which map to
> + * system RAM in the PCI host bridge window returned by the ACPI _CRS
> + * method, see commit 4dc2287c1805 ("x86: avoid E820 regions when
> + * allocating address space"). To avoid this Linux by default excludes
> + * E820 reservations when allocating addresses since 2010.
> + * In 2020 some systems have shown-up with E820 reservations which cover
> + * the entire _CRS returned PCI host bridge window, causing all attempts
> + * to assign memory to PCI BARs to fail if Linux uses E820 reservations.
> + *
> + * Ideally Linux would fully stop using E820 reservations, but then
> + * the old systems this was added for will regress.
> + * Instead keep the old behavior for old systems, while ignoring the
> + * E820 reservations for any systems from now on.
> + */
> + if (year >= 0 && year < 2018)
> + pci_use_e820 = true;
> +
> + if (pci_probe & PCI_NO_E820)
> + pci_use_e820 = false;
> + else if (pci_probe & PCI_USE_E820)
> + pci_use_e820 = true;
> +
> + printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
> + pci_use_e820 ? "Using" : "Ignoring");
> }
>
> #ifdef CONFIG_PCI_MMCONFIG
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 3507f456fcd0..091ec7e94fcb 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -595,6 +595,12 @@ char *__init pcibios_setup(char *str)
> } else if (!strcmp(str, "nocrs")) {
> pci_probe |= PCI_ROOT_NO_CRS;
> return NULL;
> + } else if (!strcmp(str, "use_e820")) {
> + pci_probe |= PCI_USE_E820;
> + return NULL;
> + } else if (!strcmp(str, "no_e820")) {
> + pci_probe |= PCI_NO_E820;
> + return NULL;
> #ifdef CONFIG_PHYS_ADDR_T_64BIT
> } else if (!strcmp(str, "big_root_window")) {
> pci_probe |= PCI_BIG_ROOT_WINDOW;
> --
> 2.31.1
>

2021-10-19 21:56:27

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

On Tue, Oct 19, 2021 at 04:52:42PM -0500, Bjorn Helgaas wrote:
> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
> > Some BIOS-es contain a bug where they add addresses which map to system
> > RAM in the PCI host bridge window returned by the ACPI _CRS method, see
> > commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> > space").
> >
> > To work around this bug Linux excludes E820 reserved addresses when
> > allocating addresses from the PCI host bridge window since 2010.
> >
> > Recently (2020) some systems have shown-up with E820 reservations which
> > cover the entire _CRS returned PCI bridge memory window, causing all
> > attempts to assign memory to PCI BARs which have not been setup by the
> > BIOS to fail. For example here are the relevant dmesg bits from a
> > Lenovo IdeaPad 3 15IIL 81WE:
> >
> > [mem 0x000000004bc50000-0x00000000cfffffff] reserved
> > pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
> >
> > The ACPI specifications appear to allow this new behavior:
> >
> > The relationship between E820 and ACPI _CRS is not really very clear.
> > ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
> >
> > This range of addresses is in use or reserved by the system and is
> > not to be included in the allocatable memory pool of the operating
> > system's memory manager.
> >
> > and it may be used when:
> >
> > The address range is in use by a memory-mapped system device.
> >
> > Furthermore, sec 15.2 says:
> >
> > Address ranges defined for baseboard memory-mapped I/O devices, such
> > as APICs, are returned as reserved.
> >
> > A PCI host bridge qualifies as a baseboard memory-mapped I/O device,
> > and its apertures are in use and certainly should not be included in
> > the general allocatable pool, so the fact that some BIOS-es reports
> > the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
> >
> > So it seems that the excluding of E820 reserved addresses is a mistake.
> >
> > Ideally Linux would fully stop excluding E820 reserved addresses,
> > but then the old systems this was added for will regress.
> > Instead keep the old behavior for old systems, while ignoring
> > the E820 reservations for any systems from now on.
> >
> > Old systems are defined here as BIOS year < 2018, this was chosen to
> > make sure that pci_use_e820 will not be set on the currently affected
> > systems, while at the same time also taking into account that the
> > systems for which the E820 checking was originally added may have
> > received BIOS updates for quite a while (esp. CVE related ones),
> > giving them a more recent BIOS year then 2010.
> >
> > Also add pci=no_e820 and pci=use_e820 options to allow overriding
> > the BIOS year heuristic.
> >
> > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
> > BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
> > BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
> > BugLink: https://bugs.launchpad.net/bugs/1878279
> > BugLink: https://bugs.launchpad.net/bugs/1931715
> > BugLink: https://bugs.launchpad.net/bugs/1932069
> > BugLink: https://bugs.launchpad.net/bugs/1921649
> > Cc: Benoit Gr?goire <[email protected]>
> > Cc: Hui Wang <[email protected]>
> > Cc: [email protected]
> > Reviewed-by: Mika Westerberg <[email protected]>
> > Acked-by: Rafael J. Wysocki <[email protected]>
> > Signed-off-by: Hans de Goede <[email protected]>
>
> I haven't seen anybody else eager to merge this, so I guess I'll stick
> my neck out here.
>
> I applied this to my for-linus branch for v5.15.

(I only applied patch 1/2, to fix the PCI BAR assignments. The 2/2
patch to convert printk to pr_info might be nice, but definitely not
-rc7 material. I'm hesitant enough about 1/2.)

2021-10-20 21:17:31

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
> On 10/19/21 23:52, Bjorn Helgaas wrote:
> > On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
> >> Some BIOS-es contain a bug where they add addresses which map to system
> >> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
> >> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> >> space").
> >>
> >> To work around this bug Linux excludes E820 reserved addresses when
> >> allocating addresses from the PCI host bridge window since 2010.
> >> ...

> > I haven't seen anybody else eager to merge this, so I guess I'll stick
> > my neck out here.
> >
> > I applied this to my for-linus branch for v5.15.
>
> Thank you, and sorry about the build-errors which the lkp
> kernel-test-robot found.
>
> I've just send out a patch which fixes these build-errors
> (verified with both .config-s from the lkp reports).
> Feel free to squash this into the original patch (or keep
> them separate, whatever works for you).

Thanks, I squashed the fix in.

HOWEVER, I think it would be fairly risky to push this into v5.15.
We would be relying on the assumption that current machines have all
fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
evidence for that.

I'm not sure there's significant benefit to having this in v5.15.
Yes, the mainline v5.15 kernel would work on the affected machines,
but I suspect most people with those machines are running distro
kernels, not mainline kernels.

This issue has been around a long time, so it's not like a regression
that we just introduced. If we fixed these machines and regressed
*other* machines, we'd be worse off than we are now.

Convince me otherwise if you see this differently :)

In the meantime, here's another possibility for working around this.
What if we discarded remove_e820_regions() completely, but aligned the
problem _CRS windows a little more? The 4dc2287c1805 case was this:

BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]

where the _CRS window was of size 0x20100000, i.e., 512M + 1M. At
least in this particular case, we could avoid the problem by throwing
away that first 1M and aligning the window to a nice 3G boundary.
Maybe it would be worth giving up a small fraction (less than 0.2% in
this case) of questionable windows like this?

Bjorn

2021-10-21 17:18:50

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Hi Bjorn,

On 10/20/21 23:14, Bjorn Helgaas wrote:
> On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
>> On 10/19/21 23:52, Bjorn Helgaas wrote:
>>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
>>>> Some BIOS-es contain a bug where they add addresses which map to system
>>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
>>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>>>> space").
>>>>
>>>> To work around this bug Linux excludes E820 reserved addresses when
>>>> allocating addresses from the PCI host bridge window since 2010.
>>>> ...
>
>>> I haven't seen anybody else eager to merge this, so I guess I'll stick
>>> my neck out here.
>>>
>>> I applied this to my for-linus branch for v5.15.
>>
>> Thank you, and sorry about the build-errors which the lkp
>> kernel-test-robot found.
>>
>> I've just send out a patch which fixes these build-errors
>> (verified with both .config-s from the lkp reports).
>> Feel free to squash this into the original patch (or keep
>> them separate, whatever works for you).
>
> Thanks, I squashed the fix in.
>
> HOWEVER, I think it would be fairly risky to push this into v5.15.
> We would be relying on the assumption that current machines have all
> fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
> evidence for that.

It is a 10 year old BIOS defect, so hopefully anything from 2018
or later will not have it.

> I'm not sure there's significant benefit to having this in v5.15.
> Yes, the mainline v5.15 kernel would work on the affected machines,
> but I suspect most people with those machines are running distro
> kernels, not mainline kernels.

Fedora and Arch do follow mainline pretty closely and a lot of
users are affected by this (see the large number of BugLinks in
the commit).

I completely understand why you are reluctant to push this out, but
your argument about most distros not running mainline kernels also
applies to chances of people where this may cause a regression
running mainline kernels also being quite small.

> This issue has been around a long time, so it's not like a regression
> that we just introduced. If we fixed these machines and regressed
> *other* machines, we'd be worse off than we are now.

If we break one machine model and fix a whole bunch of other machines
then in my book that is a win. Ideally we would not break anything,
but we can only find out if we actually break anything if we ship
the change.

> Convince me otherwise if you see this differently :)

See above :)

> In the meantime, here's another possibility for working around this.
> What if we discarded remove_e820_regions() completely, but aligned the
> problem _CRS windows a little more? The 4dc2287c1805 case was this:
>
> BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
> pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
>
> where the _CRS window was of size 0x20100000, i.e., 512M + 1M. At
> least in this particular case, we could avoid the problem by throwing
> away that first 1M and aligning the window to a nice 3G boundary.
> Maybe it would be worth giving up a small fraction (less than 0.2% in
> this case) of questionable windows like this?

The PCI BAR allocation code tries to fall back to the BIOS assigned
resource if the allocation fails. That BIOS assigned resource might
fall outside of the host bridge window after we round the address.

My initial gut instinct here is that this has a bigger chance
of breaking things then my change.

In the beginning of the thread you said that ideally we would
completely stop using the E820 reservations for PCI host bridge
windows. Because in hindsight messing with the windows on all
machines just to work around a clear BIOS bug in some was not a
good idea.

This address-rounding/-aligning you now suggest, is again
messing with the windows on all machines just to work around
a clear BIOS bug in some. At least that is how I see this.

I can understand that you're not entirely happy with my patch,
but it does get rid of the use of E820 reservations for
any current and future machines, removing any messing with
the _CRS returned windows which we are doing.

I also understand that you're not entirely comfortable with
my "fix" not causing regressions else where. If you want to
delay my fix till 5.16-rc1 that is fine (1).

Regards,

Hans



1) The stable series will likely pick it up soon after
5.16-rc1 though, so not sure how much that actually helps
with getting more testing time.

2021-10-22 01:23:46

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

On Thu, Oct 21, 2021 at 07:15:57PM +0200, Hans de Goede wrote:
> On 10/20/21 23:14, Bjorn Helgaas wrote:
> > On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
> >> On 10/19/21 23:52, Bjorn Helgaas wrote:
> >>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
> >>>> Some BIOS-es contain a bug where they add addresses which map to system
> >>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
> >>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> >>>> space").
> >>>>
> >>>> To work around this bug Linux excludes E820 reserved addresses when
> >>>> allocating addresses from the PCI host bridge window since 2010.
> >>>> ...
> >
> >>> I haven't seen anybody else eager to merge this, so I guess I'll stick
> >>> my neck out here.
> >>>
> >>> I applied this to my for-linus branch for v5.15.
> >>
> >> Thank you, and sorry about the build-errors which the lkp
> >> kernel-test-robot found.
> >>
> >> I've just send out a patch which fixes these build-errors
> >> (verified with both .config-s from the lkp reports).
> >> Feel free to squash this into the original patch (or keep
> >> them separate, whatever works for you).
> >
> > Thanks, I squashed the fix in.
> >
> > HOWEVER, I think it would be fairly risky to push this into v5.15.
> > We would be relying on the assumption that current machines have all
> > fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
> > evidence for that.
>
> It is a 10 year old BIOS defect, so hopefully anything from 2018
> or later will not have it.

We can hope. AFAIK, Windows allocates space top-down, while Linux
allocates bottom-up, so I think it's quite possible these defects
would never be discovered or fixed. In any event, I don't think we
have much evidence either way.

> > I'm not sure there's significant benefit to having this in v5.15.
> > Yes, the mainline v5.15 kernel would work on the affected machines,
> > but I suspect most people with those machines are running distro
> > kernels, not mainline kernels.
>
> Fedora and Arch do follow mainline pretty closely and a lot of
> users are affected by this (see the large number of BugLinks in
> the commit).
>
> I completely understand why you are reluctant to push this out, but
> your argument about most distros not running mainline kernels also
> applies to chances of people where this may cause a regression
> running mainline kernels also being quite small.

True.

> > This issue has been around a long time, so it's not like a regression
> > that we just introduced. If we fixed these machines and regressed
> > *other* machines, we'd be worse off than we are now.
>
> If we break one machine model and fix a whole bunch of other machines
> then in my book that is a win. Ideally we would not break anything,
> but we can only find out if we actually break anything if we ship
> the change.

I'm definitely not going to try the "fix many, break one" argument on
Linus. Of course we want to fix systems, but IMO it's far better to
leave a system broken than it is to break one that used to work.

> > In the meantime, here's another possibility for working around this.
> > What if we discarded remove_e820_regions() completely, but aligned the
> > problem _CRS windows a little more? The 4dc2287c1805 case was this:
> >
> > BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
> > pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
> >
> > where the _CRS window was of size 0x20100000, i.e., 512M + 1M. At
> > least in this particular case, we could avoid the problem by throwing
> > away that first 1M and aligning the window to a nice 3G boundary.
> > Maybe it would be worth giving up a small fraction (less than 0.2% in
> > this case) of questionable windows like this?
>
> The PCI BAR allocation code tries to fall back to the BIOS assigned
> resource if the allocation fails. That BIOS assigned resource might
> fall outside of the host bridge window after we round the address.
>
> My initial gut instinct here is that this has a bigger chance
> of breaking things then my change.
>
> In the beginning of the thread you said that ideally we would
> completely stop using the E820 reservations for PCI host bridge
> windows. Because in hindsight messing with the windows on all
> machines just to work around a clear BIOS bug in some was not a
> good idea.
>
> This address-rounding/-aligning you now suggest, is again
> messing with the windows on all machines just to work around
> a clear BIOS bug in some. At least that is how I see this.

That's true. I assume Red Hat has a bunch of machines and hopefully
an archive of dmesg logs from them. Those logs should contain good
E820 and _CRS information, so with a little scripting, maybe we could
get some idea of what's out there.

Bjorn

2021-10-22 09:56:29

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Hi Bjorn,

On 10/22/21 03:20, Bjorn Helgaas wrote:
> On Thu, Oct 21, 2021 at 07:15:57PM +0200, Hans de Goede wrote:
>> On 10/20/21 23:14, Bjorn Helgaas wrote:
>>> On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
>>>> On 10/19/21 23:52, Bjorn Helgaas wrote:
>>>>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
>>>>>> Some BIOS-es contain a bug where they add addresses which map to system
>>>>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
>>>>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>>>>>> space").
>>>>>>
>>>>>> To work around this bug Linux excludes E820 reserved addresses when
>>>>>> allocating addresses from the PCI host bridge window since 2010.
>>>>>> ...
>>>
>>>>> I haven't seen anybody else eager to merge this, so I guess I'll stick
>>>>> my neck out here.
>>>>>
>>>>> I applied this to my for-linus branch for v5.15.
>>>>
>>>> Thank you, and sorry about the build-errors which the lkp
>>>> kernel-test-robot found.
>>>>
>>>> I've just send out a patch which fixes these build-errors
>>>> (verified with both .config-s from the lkp reports).
>>>> Feel free to squash this into the original patch (or keep
>>>> them separate, whatever works for you).
>>>
>>> Thanks, I squashed the fix in.
>>>
>>> HOWEVER, I think it would be fairly risky to push this into v5.15.
>>> We would be relying on the assumption that current machines have all
>>> fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
>>> evidence for that.
>>
>> It is a 10 year old BIOS defect, so hopefully anything from 2018
>> or later will not have it.
>
> We can hope. AFAIK, Windows allocates space top-down, while Linux
> allocates bottom-up, so I think it's quite possible these defects
> would never be discovered or fixed. In any event, I don't think we
> have much evidence either way.

Ack.

>>> I'm not sure there's significant benefit to having this in v5.15.
>>> Yes, the mainline v5.15 kernel would work on the affected machines,
>>> but I suspect most people with those machines are running distro
>>> kernels, not mainline kernels.
>>
>> Fedora and Arch do follow mainline pretty closely and a lot of
>> users are affected by this (see the large number of BugLinks in
>> the commit).
>>
>> I completely understand why you are reluctant to push this out, but
>> your argument about most distros not running mainline kernels also
>> applies to chances of people where this may cause a regression
>> running mainline kernels also being quite small.
>
> True.
>
>>> This issue has been around a long time, so it's not like a regression
>>> that we just introduced. If we fixed these machines and regressed
>>> *other* machines, we'd be worse off than we are now.
>>
>> If we break one machine model and fix a whole bunch of other machines
>> then in my book that is a win. Ideally we would not break anything,
>> but we can only find out if we actually break anything if we ship
>> the change.
>
> I'm definitely not going to try the "fix many, break one" argument on
> Linus. Of course we want to fix systems, but IMO it's far better to
> leave a system broken than it is to break one that used to work.

Right, what I meant to say with "a win" is a step in the right direction,
we definitely must address any regressions coming from this change as
soon as we learn about them.

>>> In the meantime, here's another possibility for working around this.
>>> What if we discarded remove_e820_regions() completely, but aligned the
>>> problem _CRS windows a little more? The 4dc2287c1805 case was this:
>>>
>>> BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
>>> pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
>>>
>>> where the _CRS window was of size 0x20100000, i.e., 512M + 1M. At
>>> least in this particular case, we could avoid the problem by throwing
>>> away that first 1M and aligning the window to a nice 3G boundary.
>>> Maybe it would be worth giving up a small fraction (less than 0.2% in
>>> this case) of questionable windows like this?
>>
>> The PCI BAR allocation code tries to fall back to the BIOS assigned
>> resource if the allocation fails. That BIOS assigned resource might
>> fall outside of the host bridge window after we round the address.
>>
>> My initial gut instinct here is that this has a bigger chance
>> of breaking things then my change.
>>
>> In the beginning of the thread you said that ideally we would
>> completely stop using the E820 reservations for PCI host bridge
>> windows. Because in hindsight messing with the windows on all
>> machines just to work around a clear BIOS bug in some was not a
>> good idea.
>>
>> This address-rounding/-aligning you now suggest, is again
>> messing with the windows on all machines just to work around
>> a clear BIOS bug in some. At least that is how I see this.
>
> That's true. I assume Red Hat has a bunch of machines and hopefully
> an archive of dmesg logs from them. Those logs should contain good
> E820 and _CRS information, so with a little scripting, maybe we could
> get some idea of what's out there.

We do have a (large-ish) test-lab, but that contains almost exclusively
servers, where as the original problem was on Dell Precision laptops.

Also I'm not sure if I can get aggregate data from the lab's machines.
I can reserve time on any model we have to debug specific problems,
but that is targeting one specific model. I'll ask around about this.

Regards,

Hans

2021-10-29 08:13:12

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Hi Bjorn,

On 10/22/21 11:53, Hans de Goede wrote:
> Hi Bjorn,
>
> On 10/22/21 03:20, Bjorn Helgaas wrote:
>> On Thu, Oct 21, 2021 at 07:15:57PM +0200, Hans de Goede wrote:
>>> On 10/20/21 23:14, Bjorn Helgaas wrote:
>>>> On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
>>>>> On 10/19/21 23:52, Bjorn Helgaas wrote:
>>>>>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
>>>>>>> Some BIOS-es contain a bug where they add addresses which map to system
>>>>>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
>>>>>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>>>>>>> space").
>>>>>>>
>>>>>>> To work around this bug Linux excludes E820 reserved addresses when
>>>>>>> allocating addresses from the PCI host bridge window since 2010.
>>>>>>> ...
>>>>
>>>>>> I haven't seen anybody else eager to merge this, so I guess I'll stick
>>>>>> my neck out here.
>>>>>>
>>>>>> I applied this to my for-linus branch for v5.15.
>>>>>
>>>>> Thank you, and sorry about the build-errors which the lkp
>>>>> kernel-test-robot found.
>>>>>
>>>>> I've just send out a patch which fixes these build-errors
>>>>> (verified with both .config-s from the lkp reports).
>>>>> Feel free to squash this into the original patch (or keep
>>>>> them separate, whatever works for you).
>>>>
>>>> Thanks, I squashed the fix in.
>>>>
>>>> HOWEVER, I think it would be fairly risky to push this into v5.15.
>>>> We would be relying on the assumption that current machines have all
>>>> fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
>>>> evidence for that.
>>>
>>> It is a 10 year old BIOS defect, so hopefully anything from 2018
>>> or later will not have it.
>>
>> We can hope. AFAIK, Windows allocates space top-down, while Linux
>> allocates bottom-up, so I think it's quite possible these defects
>> would never be discovered or fixed. In any event, I don't think we
>> have much evidence either way.
>
> Ack.
>
>>>> I'm not sure there's significant benefit to having this in v5.15.
>>>> Yes, the mainline v5.15 kernel would work on the affected machines,
>>>> but I suspect most people with those machines are running distro
>>>> kernels, not mainline kernels.
>>>
>>> Fedora and Arch do follow mainline pretty closely and a lot of
>>> users are affected by this (see the large number of BugLinks in
>>> the commit).
>>>
>>> I completely understand why you are reluctant to push this out, but
>>> your argument about most distros not running mainline kernels also
>>> applies to chances of people where this may cause a regression
>>> running mainline kernels also being quite small.
>>
>> True.
>>
>>>> This issue has been around a long time, so it's not like a regression
>>>> that we just introduced. If we fixed these machines and regressed
>>>> *other* machines, we'd be worse off than we are now.
>>>
>>> If we break one machine model and fix a whole bunch of other machines
>>> then in my book that is a win. Ideally we would not break anything,
>>> but we can only find out if we actually break anything if we ship
>>> the change.
>>
>> I'm definitely not going to try the "fix many, break one" argument on
>> Linus. Of course we want to fix systems, but IMO it's far better to
>> leave a system broken than it is to break one that used to work.
>
> Right, what I meant to say with "a win" is a step in the right direction,
> we definitely must address any regressions coming from this change as
> soon as we learn about them.
>
>>>> In the meantime, here's another possibility for working around this.
>>>> What if we discarded remove_e820_regions() completely, but aligned the
>>>> problem _CRS windows a little more? The 4dc2287c1805 case was this:
>>>>
>>>> BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
>>>> pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
>>>>
>>>> where the _CRS window was of size 0x20100000, i.e., 512M + 1M. At
>>>> least in this particular case, we could avoid the problem by throwing
>>>> away that first 1M and aligning the window to a nice 3G boundary.
>>>> Maybe it would be worth giving up a small fraction (less than 0.2% in
>>>> this case) of questionable windows like this?
>>>
>>> The PCI BAR allocation code tries to fall back to the BIOS assigned
>>> resource if the allocation fails. That BIOS assigned resource might
>>> fall outside of the host bridge window after we round the address.
>>>
>>> My initial gut instinct here is that this has a bigger chance
>>> of breaking things then my change.
>>>
>>> In the beginning of the thread you said that ideally we would
>>> completely stop using the E820 reservations for PCI host bridge
>>> windows. Because in hindsight messing with the windows on all
>>> machines just to work around a clear BIOS bug in some was not a
>>> good idea.
>>>
>>> This address-rounding/-aligning you now suggest, is again
>>> messing with the windows on all machines just to work around
>>> a clear BIOS bug in some. At least that is how I see this.
>>
>> That's true. I assume Red Hat has a bunch of machines and hopefully
>> an archive of dmesg logs from them. Those logs should contain good
>> E820 and _CRS information, so with a little scripting, maybe we could
>> get some idea of what's out there.
>
> We do have a (large-ish) test-lab, but that contains almost exclusively
> servers, where as the original problem was on Dell Precision laptops.
>
> Also I'm not sure if I can get aggregate data from the lab's machines.
> I can reserve time on any model we have to debug specific problems,
> but that is targeting one specific model. I'll ask around about this.

So I had another idea to get us a whole bunch of dmesg outputs and that
is to use the database collected by linux-hardware.org . The dmesg
were already individually accessible by selecting a specific model machine,
but I asked them if they could do a dump and I just got an email that a
dmesg dump is now available here:

https://github.com/linuxhw/Dmesg

Note be careful with the size of the repository - it will take ~3 gigabytes
of network traffic and ~20 gigabytes of space on the drive to checkout it.

So if you want dmesg outputs to grep through for e820 / host-bridge-window
info, here you go.

Regards,

Hans

2021-11-06 16:49:54

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Hi Bjorn,

On 10/20/21 23:14, Bjorn Helgaas wrote:
> On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
>> On 10/19/21 23:52, Bjorn Helgaas wrote:
>>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
>>>> Some BIOS-es contain a bug where they add addresses which map to system
>>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
>>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>>>> space").
>>>>
>>>> To work around this bug Linux excludes E820 reserved addresses when
>>>> allocating addresses from the PCI host bridge window since 2010.
>>>> ...
>
>>> I haven't seen anybody else eager to merge this, so I guess I'll stick
>>> my neck out here.
>>>
>>> I applied this to my for-linus branch for v5.15.
>>
>> Thank you, and sorry about the build-errors which the lkp
>> kernel-test-robot found.
>>
>> I've just send out a patch which fixes these build-errors
>> (verified with both .config-s from the lkp reports).
>> Feel free to squash this into the original patch (or keep
>> them separate, whatever works for you).
>
> Thanks, I squashed the fix in.
>
> HOWEVER, I think it would be fairly risky to push this into v5.15.
> We would be relying on the assumption that current machines have all
> fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
> evidence for that.
>
> I'm not sure there's significant benefit to having this in v5.15.
> Yes, the mainline v5.15 kernel would work on the affected machines,
> but I suspect most people with those machines are running distro
> kernels, not mainline kernels.

I understand that you were reluctant to add this to 5.15 so close
near the end of the 5.15 cycle, but can we please get this into
5.16 now ?

I know you ultimately want to see if there is a better fix,
but this is hitting a *lot* of users right now and if we come up
with a better fix we can always use that to replace this one
later.

So cam we please just go with this fix now, so that we can
fix the issues a lot of users are seeing caused by the current
*wrong* behavior of taking the e820 reservations into account ?

Regards,

Hans

2021-11-10 00:24:25

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

On Sat, Nov 06, 2021 at 11:15:07AM +0100, Hans de Goede wrote:
> On 10/20/21 23:14, Bjorn Helgaas wrote:
> > On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
> >> On 10/19/21 23:52, Bjorn Helgaas wrote:
> >>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
> >>>> Some BIOS-es contain a bug where they add addresses which map to system
> >>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
> >>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> >>>> space").
> >>>>
> >>>> To work around this bug Linux excludes E820 reserved addresses when
> >>>> allocating addresses from the PCI host bridge window since 2010.
> >>>> ...
> >
> >>> I haven't seen anybody else eager to merge this, so I guess I'll stick
> >>> my neck out here.
> >>>
> >>> I applied this to my for-linus branch for v5.15.
> >>
> >> Thank you, and sorry about the build-errors which the lkp
> >> kernel-test-robot found.
> >>
> >> I've just send out a patch which fixes these build-errors
> >> (verified with both .config-s from the lkp reports).
> >> Feel free to squash this into the original patch (or keep
> >> them separate, whatever works for you).
> >
> > Thanks, I squashed the fix in.
> >
> > HOWEVER, I think it would be fairly risky to push this into v5.15.
> > We would be relying on the assumption that current machines have all
> > fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
> > evidence for that.
> >
> > I'm not sure there's significant benefit to having this in v5.15.
> > Yes, the mainline v5.15 kernel would work on the affected machines,
> > but I suspect most people with those machines are running distro
> > kernels, not mainline kernels.
>
> I understand that you were reluctant to add this to 5.15 so close
> near the end of the 5.15 cycle, but can we please get this into
> 5.16 now ?
>
> I know you ultimately want to see if there is a better fix,
> but this is hitting a *lot* of users right now and if we come up
> with a better fix we can always use that to replace this one
> later.

I don't know whether there's a "better" fix, but I do know that if we
merge what we have right now, nobody will be looking for a better
one.

We're in the middle of the merge window, so the v5.16 development
cycle is over. The v5.17 cycle is just starting, so we have time to
hit that. Obviously a fix can be backported to older kernels as
needed.

> So can we please just go with this fix now, so that we can
> fix the issues a lot of users are seeing caused by the current
> *wrong* behavior of taking the e820 reservations into account ?

I think the fix on the table is "ignore E820 for BIOS date >= 2018"
plus the obvious parameters to force it both ways.

The thing I don't like is that this isn't connected at all to the
actual BIOS defect. We have no indication that current BIOSes have
fixed the defect, and we have no assurance that future ones will not
have the defect. It would be better if we had some algorithmic way of
figuring out what to do.

Thank you very much for chasing down the dmesg log archive
(https://github.com/linuxhw/Dmesg; see
https://lore.kernel.org/r/[email protected]).
Unfortunately I haven't had time to look through it myself, and I
haven't heard of anybody else doing it either.

Bjorn

2021-11-10 08:47:00

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Hi Bjorn,

On 11/9/21 23:07, Bjorn Helgaas wrote:
> On Sat, Nov 06, 2021 at 11:15:07AM +0100, Hans de Goede wrote:
>> On 10/20/21 23:14, Bjorn Helgaas wrote:
>>> On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
>>>> On 10/19/21 23:52, Bjorn Helgaas wrote:
>>>>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
>>>>>> Some BIOS-es contain a bug where they add addresses which map to system
>>>>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
>>>>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>>>>>> space").
>>>>>>
>>>>>> To work around this bug Linux excludes E820 reserved addresses when
>>>>>> allocating addresses from the PCI host bridge window since 2010.
>>>>>> ...
>>>
>>>>> I haven't seen anybody else eager to merge this, so I guess I'll stick
>>>>> my neck out here.
>>>>>
>>>>> I applied this to my for-linus branch for v5.15.
>>>>
>>>> Thank you, and sorry about the build-errors which the lkp
>>>> kernel-test-robot found.
>>>>
>>>> I've just send out a patch which fixes these build-errors
>>>> (verified with both .config-s from the lkp reports).
>>>> Feel free to squash this into the original patch (or keep
>>>> them separate, whatever works for you).
>>>
>>> Thanks, I squashed the fix in.
>>>
>>> HOWEVER, I think it would be fairly risky to push this into v5.15.
>>> We would be relying on the assumption that current machines have all
>>> fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
>>> evidence for that.
>>>
>>> I'm not sure there's significant benefit to having this in v5.15.
>>> Yes, the mainline v5.15 kernel would work on the affected machines,
>>> but I suspect most people with those machines are running distro
>>> kernels, not mainline kernels.
>>
>> I understand that you were reluctant to add this to 5.15 so close
>> near the end of the 5.15 cycle, but can we please get this into
>> 5.16 now ?
>>
>> I know you ultimately want to see if there is a better fix,
>> but this is hitting a *lot* of users right now and if we come up
>> with a better fix we can always use that to replace this one
>> later.
>
> I don't know whether there's a "better" fix, but I do know that if we
> merge what we have right now, nobody will be looking for a better
> one.
>
> We're in the middle of the merge window, so the v5.16 development
> cycle is over. The v5.17 cycle is just starting, so we have time to
> hit that. Obviously a fix can be backported to older kernels as
> needed.
>
>> So can we please just go with this fix now, so that we can
>> fix the issues a lot of users are seeing caused by the current
>> *wrong* behavior of taking the e820 reservations into account ?
>
> I think the fix on the table is "ignore E820 for BIOS date >= 2018"
> plus the obvious parameters to force it both ways.

Correct.

> The thing I don't like is that this isn't connected at all to the
> actual BIOS defect. We have no indication that current BIOSes have
> fixed the defect,

We also have no indication that that defect from 10 years ago, from
pre UEFI firmware is still present in modern day UEFI firmware which
is basically an entire different code-base.

And even 10 years ago the problem was only happening to a single
family of laptop models (Dell Precision laptops) so this clearly
was a bug in that specific implementation and not some generic
issue which is likely to be carried forward.

> and we have no assurance that future ones will not
> have the defect. It would be better if we had some algorithmic way of
> figuring out what to do.

You yourself have said that in hindsight taking E820 reservations
into account for PCI bridge host windows was a mistake. So what
the "ignore E820 for BIOS date >= 2018" is doing is letting the
past be the past (without regressing on older models) while fixing
that mistake on any hardware going forward.

In the unlikely case that we hit that BIOS bug again on 1 or 2 models,
we can simply DMI quirk those models, as we do for countless other
BIOS issues.

> Thank you very much for chasing down the dmesg log archive
> (https://github.com/linuxhw/Dmesg; see
> https://lore.kernel.org/r/[email protected]).
> Unfortunately I haven't had time to look through it myself, and I
> haven't heard of anybody else doing it either.

Right, I'm afraid that I already have spend way too much time on this
myself. Note that I've been working with users on this bug on and off
for over a year now.

This is hitting many users and now that we have a viable fix, this
really needs to be fixed now.

I believe that the "ignore E820 for BIOS date >= 2018" fix is good
enough and that you are letting perfect be the enemy of good here.

As an upstream kernel maintainer myself, I'm sorry to say this,
but if we don't get some fix for this merged soon you are leaving
my no choice but to add my fix to the Fedora kernels as a downstream
patch (and to advise other distros to do the same).

Note that if you are still afraid of regressions going the downstream
route is also an opportunity, Fedora will start testing moving users
to 5.15.y soon, so I could add the patch to Fedora's 5.15.y builds and
see how that goes ?

Regards,

Hans

2021-11-10 13:08:40

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Hi Bjorn,

On 11/10/21 09:45, Hans de Goede wrote:
> Hi Bjorn,
>
> On 11/9/21 23:07, Bjorn Helgaas wrote:
>> On Sat, Nov 06, 2021 at 11:15:07AM +0100, Hans de Goede wrote:
>>> On 10/20/21 23:14, Bjorn Helgaas wrote:
>>>> On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
>>>>> On 10/19/21 23:52, Bjorn Helgaas wrote:
>>>>>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
>>>>>>> Some BIOS-es contain a bug where they add addresses which map to system
>>>>>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
>>>>>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>>>>>>> space").
>>>>>>>
>>>>>>> To work around this bug Linux excludes E820 reserved addresses when
>>>>>>> allocating addresses from the PCI host bridge window since 2010.
>>>>>>> ...
>>>>
>>>>>> I haven't seen anybody else eager to merge this, so I guess I'll stick
>>>>>> my neck out here.
>>>>>>
>>>>>> I applied this to my for-linus branch for v5.15.
>>>>>
>>>>> Thank you, and sorry about the build-errors which the lkp
>>>>> kernel-test-robot found.
>>>>>
>>>>> I've just send out a patch which fixes these build-errors
>>>>> (verified with both .config-s from the lkp reports).
>>>>> Feel free to squash this into the original patch (or keep
>>>>> them separate, whatever works for you).
>>>>
>>>> Thanks, I squashed the fix in.
>>>>
>>>> HOWEVER, I think it would be fairly risky to push this into v5.15.
>>>> We would be relying on the assumption that current machines have all
>>>> fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
>>>> evidence for that.
>>>>
>>>> I'm not sure there's significant benefit to having this in v5.15.
>>>> Yes, the mainline v5.15 kernel would work on the affected machines,
>>>> but I suspect most people with those machines are running distro
>>>> kernels, not mainline kernels.
>>>
>>> I understand that you were reluctant to add this to 5.15 so close
>>> near the end of the 5.15 cycle, but can we please get this into
>>> 5.16 now ?
>>>
>>> I know you ultimately want to see if there is a better fix,
>>> but this is hitting a *lot* of users right now and if we come up
>>> with a better fix we can always use that to replace this one
>>> later.
>>
>> I don't know whether there's a "better" fix, but I do know that if we
>> merge what we have right now, nobody will be looking for a better
>> one.
>>
>> We're in the middle of the merge window, so the v5.16 development
>> cycle is over. The v5.17 cycle is just starting, so we have time to
>> hit that. Obviously a fix can be backported to older kernels as
>> needed.
>>
>>> So can we please just go with this fix now, so that we can
>>> fix the issues a lot of users are seeing caused by the current
>>> *wrong* behavior of taking the e820 reservations into account ?
>>
>> I think the fix on the table is "ignore E820 for BIOS date >= 2018"
>> plus the obvious parameters to force it both ways.
>
> Correct.
>
>> The thing I don't like is that this isn't connected at all to the
>> actual BIOS defect. We have no indication that current BIOSes have
>> fixed the defect,
>
> We also have no indication that that defect from 10 years ago, from
> pre UEFI firmware is still present in modern day UEFI firmware which
> is basically an entire different code-base.
>
> And even 10 years ago the problem was only happening to a single
> family of laptop models (Dell Precision laptops) so this clearly
> was a bug in that specific implementation and not some generic
> issue which is likely to be carried forward.
>
>> and we have no assurance that future ones will not
>> have the defect. It would be better if we had some algorithmic way of
>> figuring out what to do.
>
> You yourself have said that in hindsight taking E820 reservations
> into account for PCI bridge host windows was a mistake. So what
> the "ignore E820 for BIOS date >= 2018" is doing is letting the
> past be the past (without regressing on older models) while fixing
> that mistake on any hardware going forward.
>
> In the unlikely case that we hit that BIOS bug again on 1 or 2 models,
> we can simply DMI quirk those models, as we do for countless other
> BIOS issues.
>
>> Thank you very much for chasing down the dmesg log archive
>> (https://github.com/linuxhw/Dmesg; see
>> https://lore.kernel.org/r/[email protected]).
>> Unfortunately I haven't had time to look through it myself, and I
>> haven't heard of anybody else doing it either.
>
> Right, I'm afraid that I already have spend way too much time on this
> myself. Note that I've been working with users on this bug on and off
> for over a year now.
>
> This is hitting many users and now that we have a viable fix, this
> really needs to be fixed now.
>
> I believe that the "ignore E820 for BIOS date >= 2018" fix is good
> enough and that you are letting perfect be the enemy of good here.
>
> As an upstream kernel maintainer myself, I'm sorry to say this,
> but if we don't get some fix for this merged soon you are leaving
> my no choice but to add my fix to the Fedora kernels as a downstream
> patch (and to advise other distros to do the same).
>
> Note that if you are still afraid of regressions going the downstream
> route is also an opportunity, Fedora will start testing moving users
> to 5.15.y soon, so I could add the patch to Fedora's 5.15.y builds and
> see how that goes ?

So I've discussed this with the Fedora kernel maintainers and they have
agreed to add the patch to the Fedora 5.15 kernels, which we will ask
our users to start testing soon (we first run some voluntary testing
before eventually moving all users over).

This will provide us with valuable feedback wrt this patch causing
regressions as you are worried about, or not.

Assuming no regressions show up I hope that this will give you
some assurance that there the patch causes no regressions and that
you will then be willing to pick this up later during the 5.16
cycle so that Fedora only deviates from upstream for 1 cycle.

Regards,

Hans

2021-11-10 21:51:40

by Thomas Backlund

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Den 2021-11-10 kl. 15:05, skrev Hans de Goede:

>
> So I've discussed this with the Fedora kernel maintainers and they have
> agreed to add the patch to the Fedora 5.15 kernels, which we will ask
> our users to start testing soon (we first run some voluntary testing
> before eventually moving all users over).
>
> This will provide us with valuable feedback wrt this patch causing
> regressions as you are worried about, or not.
>
> Assuming no regressions show up I hope that this will give you
> some assurance that there the patch causes no regressions and that
> you will then be willing to pick this up later during the 5.16
> cycle so that Fedora only deviates from upstream for 1 cycle.
>

FWIW... As an extra data point...

I've backported this one on top of 5.14.14 in Mageia Cauldron and Mageia
8 backports where it has been in active use for ~3 weeks now, and so far
no reports of systems breaking ...

--
Thomas

2021-12-07 16:52:50

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Hi Bjorn,

On 11/10/21 14:05, Hans de Goede wrote:
> Hi Bjorn,
>
> On 11/10/21 09:45, Hans de Goede wrote:
>> Hi Bjorn,
>>
>> On 11/9/21 23:07, Bjorn Helgaas wrote:
>>> On Sat, Nov 06, 2021 at 11:15:07AM +0100, Hans de Goede wrote:
>>>> On 10/20/21 23:14, Bjorn Helgaas wrote:
>>>>> On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
>>>>>> On 10/19/21 23:52, Bjorn Helgaas wrote:
>>>>>>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
>>>>>>>> Some BIOS-es contain a bug where they add addresses which map to system
>>>>>>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
>>>>>>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>>>>>>>> space").
>>>>>>>>
>>>>>>>> To work around this bug Linux excludes E820 reserved addresses when
>>>>>>>> allocating addresses from the PCI host bridge window since 2010.
>>>>>>>> ...
>>>>>
>>>>>>> I haven't seen anybody else eager to merge this, so I guess I'll stick
>>>>>>> my neck out here.
>>>>>>>
>>>>>>> I applied this to my for-linus branch for v5.15.
>>>>>>
>>>>>> Thank you, and sorry about the build-errors which the lkp
>>>>>> kernel-test-robot found.
>>>>>>
>>>>>> I've just send out a patch which fixes these build-errors
>>>>>> (verified with both .config-s from the lkp reports).
>>>>>> Feel free to squash this into the original patch (or keep
>>>>>> them separate, whatever works for you).
>>>>>
>>>>> Thanks, I squashed the fix in.
>>>>>
>>>>> HOWEVER, I think it would be fairly risky to push this into v5.15.
>>>>> We would be relying on the assumption that current machines have all
>>>>> fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
>>>>> evidence for that.
>>>>>
>>>>> I'm not sure there's significant benefit to having this in v5.15.
>>>>> Yes, the mainline v5.15 kernel would work on the affected machines,
>>>>> but I suspect most people with those machines are running distro
>>>>> kernels, not mainline kernels.
>>>>
>>>> I understand that you were reluctant to add this to 5.15 so close
>>>> near the end of the 5.15 cycle, but can we please get this into
>>>> 5.16 now ?
>>>>
>>>> I know you ultimately want to see if there is a better fix,
>>>> but this is hitting a *lot* of users right now and if we come up
>>>> with a better fix we can always use that to replace this one
>>>> later.
>>>
>>> I don't know whether there's a "better" fix, but I do know that if we
>>> merge what we have right now, nobody will be looking for a better
>>> one.
>>>
>>> We're in the middle of the merge window, so the v5.16 development
>>> cycle is over. The v5.17 cycle is just starting, so we have time to
>>> hit that. Obviously a fix can be backported to older kernels as
>>> needed.
>>>
>>>> So can we please just go with this fix now, so that we can
>>>> fix the issues a lot of users are seeing caused by the current
>>>> *wrong* behavior of taking the e820 reservations into account ?
>>>
>>> I think the fix on the table is "ignore E820 for BIOS date >= 2018"
>>> plus the obvious parameters to force it both ways.
>>
>> Correct.
>>
>>> The thing I don't like is that this isn't connected at all to the
>>> actual BIOS defect. We have no indication that current BIOSes have
>>> fixed the defect,
>>
>> We also have no indication that that defect from 10 years ago, from
>> pre UEFI firmware is still present in modern day UEFI firmware which
>> is basically an entire different code-base.
>>
>> And even 10 years ago the problem was only happening to a single
>> family of laptop models (Dell Precision laptops) so this clearly
>> was a bug in that specific implementation and not some generic
>> issue which is likely to be carried forward.
>>
>>> and we have no assurance that future ones will not
>>> have the defect. It would be better if we had some algorithmic way of
>>> figuring out what to do.
>>
>> You yourself have said that in hindsight taking E820 reservations
>> into account for PCI bridge host windows was a mistake. So what
>> the "ignore E820 for BIOS date >= 2018" is doing is letting the
>> past be the past (without regressing on older models) while fixing
>> that mistake on any hardware going forward.
>>
>> In the unlikely case that we hit that BIOS bug again on 1 or 2 models,
>> we can simply DMI quirk those models, as we do for countless other
>> BIOS issues.
>>
>>> Thank you very much for chasing down the dmesg log archive
>>> (https://github.com/linuxhw/Dmesg; see
>>> https://lore.kernel.org/r/[email protected]).
>>> Unfortunately I haven't had time to look through it myself, and I
>>> haven't heard of anybody else doing it either.
>>
>> Right, I'm afraid that I already have spend way too much time on this
>> myself. Note that I've been working with users on this bug on and off
>> for over a year now.
>>
>> This is hitting many users and now that we have a viable fix, this
>> really needs to be fixed now.
>>
>> I believe that the "ignore E820 for BIOS date >= 2018" fix is good
>> enough and that you are letting perfect be the enemy of good here.
>>
>> As an upstream kernel maintainer myself, I'm sorry to say this,
>> but if we don't get some fix for this merged soon you are leaving
>> my no choice but to add my fix to the Fedora kernels as a downstream
>> patch (and to advise other distros to do the same).
>>
>> Note that if you are still afraid of regressions going the downstream
>> route is also an opportunity, Fedora will start testing moving users
>> to 5.15.y soon, so I could add the patch to Fedora's 5.15.y builds and
>> see how that goes ?
>
> So I've discussed this with the Fedora kernel maintainers and they have
> agreed to add the patch to the Fedora 5.15 kernels, which we will ask
> our users to start testing soon (we first run some voluntary testing
> before eventually moving all users over).
>
> This will provide us with valuable feedback wrt this patch causing
> regressions as you are worried about, or not.
>
> Assuming no regressions show up I hope that this will give you
> some assurance that there the patch causes no regressions and that
> you will then be willing to pick this up later during the 5.16
> cycle so that Fedora only deviates from upstream for 1 cycle.

5.15.y kernels with this patch added have been in Fedora's
stable updates repo for a while now without any reports of the
regressions you feared this may cause.

Bjorn, I hope that you are willing to merge this patch now that it has
seen some more wide spread testing ?

Regards,

Hans



2021-12-15 16:02:01

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

On Tue, Dec 07, 2021 at 05:52:40PM +0100, Hans de Goede wrote:
> On 11/10/21 14:05, Hans de Goede wrote:
> > On 11/10/21 09:45, Hans de Goede wrote:
> >> On 11/9/21 23:07, Bjorn Helgaas wrote:
> >>> On Sat, Nov 06, 2021 at 11:15:07AM +0100, Hans de Goede wrote:
> >>>> On 10/20/21 23:14, Bjorn Helgaas wrote:
> >>>>> On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
> >>>>>> On 10/19/21 23:52, Bjorn Helgaas wrote:
> >>>>>>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
> >>>>>>>> Some BIOS-es contain a bug where they add addresses which map to system
> >>>>>>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
> >>>>>>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> >>>>>>>> space").
> >>>>>>>>
> >>>>>>>> To work around this bug Linux excludes E820 reserved addresses when
> >>>>>>>> allocating addresses from the PCI host bridge window since 2010.
> >>>>>>>> ...
> >>>>>
> >>>>>>> I haven't seen anybody else eager to merge this, so I guess I'll stick
> >>>>>>> my neck out here.
> >>>>>>>
> >>>>>>> I applied this to my for-linus branch for v5.15.
> >>>>>>
> >>>>>> Thank you, and sorry about the build-errors which the lkp
> >>>>>> kernel-test-robot found.
> >>>>>>
> >>>>>> I've just send out a patch which fixes these build-errors
> >>>>>> (verified with both .config-s from the lkp reports).
> >>>>>> Feel free to squash this into the original patch (or keep
> >>>>>> them separate, whatever works for you).
> >>>>>
> >>>>> Thanks, I squashed the fix in.
> >>>>>
> >>>>> HOWEVER, I think it would be fairly risky to push this into v5.15.
> >>>>> We would be relying on the assumption that current machines have all
> >>>>> fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
> >>>>> evidence for that.
> >>>>>
> >>>>> I'm not sure there's significant benefit to having this in v5.15.
> >>>>> Yes, the mainline v5.15 kernel would work on the affected machines,
> >>>>> but I suspect most people with those machines are running distro
> >>>>> kernels, not mainline kernels.
> >>>>
> >>>> I understand that you were reluctant to add this to 5.15 so close
> >>>> near the end of the 5.15 cycle, but can we please get this into
> >>>> 5.16 now ?
> >>>>
> >>>> I know you ultimately want to see if there is a better fix,
> >>>> but this is hitting a *lot* of users right now and if we come up
> >>>> with a better fix we can always use that to replace this one
> >>>> later.
> >>>
> >>> I don't know whether there's a "better" fix, but I do know that if we
> >>> merge what we have right now, nobody will be looking for a better
> >>> one.
> >>>
> >>> We're in the middle of the merge window, so the v5.16 development
> >>> cycle is over. The v5.17 cycle is just starting, so we have time to
> >>> hit that. Obviously a fix can be backported to older kernels as
> >>> needed.
> >>>
> >>>> So can we please just go with this fix now, so that we can
> >>>> fix the issues a lot of users are seeing caused by the current
> >>>> *wrong* behavior of taking the e820 reservations into account ?
> >>>
> >>> I think the fix on the table is "ignore E820 for BIOS date >= 2018"
> >>> plus the obvious parameters to force it both ways.
> >>
> >> Correct.
> >>
> >>> The thing I don't like is that this isn't connected at all to the
> >>> actual BIOS defect. We have no indication that current BIOSes have
> >>> fixed the defect,
> >>
> >> We also have no indication that that defect from 10 years ago, from
> >> pre UEFI firmware is still present in modern day UEFI firmware which
> >> is basically an entire different code-base.
> >>
> >> And even 10 years ago the problem was only happening to a single
> >> family of laptop models (Dell Precision laptops) so this clearly
> >> was a bug in that specific implementation and not some generic
> >> issue which is likely to be carried forward.
> >>
> >>> and we have no assurance that future ones will not
> >>> have the defect. It would be better if we had some algorithmic way of
> >>> figuring out what to do.
> >>
> >> You yourself have said that in hindsight taking E820 reservations
> >> into account for PCI bridge host windows was a mistake. So what
> >> the "ignore E820 for BIOS date >= 2018" is doing is letting the
> >> past be the past (without regressing on older models) while fixing
> >> that mistake on any hardware going forward.
> >>
> >> In the unlikely case that we hit that BIOS bug again on 1 or 2 models,
> >> we can simply DMI quirk those models, as we do for countless other
> >> BIOS issues.
> >>
> >>> Thank you very much for chasing down the dmesg log archive
> >>> (https://github.com/linuxhw/Dmesg; see
> >>> https://lore.kernel.org/r/[email protected]).
> >>> Unfortunately I haven't had time to look through it myself, and I
> >>> haven't heard of anybody else doing it either.
> >>
> >> Right, I'm afraid that I already have spend way too much time on this
> >> myself. Note that I've been working with users on this bug on and off
> >> for over a year now.
> >>
> >> This is hitting many users and now that we have a viable fix, this
> >> really needs to be fixed now.
> >>
> >> I believe that the "ignore E820 for BIOS date >= 2018" fix is good
> >> enough and that you are letting perfect be the enemy of good here.
> >>
> >> As an upstream kernel maintainer myself, I'm sorry to say this,
> >> but if we don't get some fix for this merged soon you are leaving
> >> my no choice but to add my fix to the Fedora kernels as a downstream
> >> patch (and to advise other distros to do the same).
> >>
> >> Note that if you are still afraid of regressions going the downstream
> >> route is also an opportunity, Fedora will start testing moving users
> >> to 5.15.y soon, so I could add the patch to Fedora's 5.15.y builds and
> >> see how that goes ?
> >
> > So I've discussed this with the Fedora kernel maintainers and they have
> > agreed to add the patch to the Fedora 5.15 kernels, which we will ask
> > our users to start testing soon (we first run some voluntary testing
> > before eventually moving all users over).
> >
> > This will provide us with valuable feedback wrt this patch causing
> > regressions as you are worried about, or not.
> >
> > Assuming no regressions show up I hope that this will give you
> > some assurance that there the patch causes no regressions and that
> > you will then be willing to pick this up later during the 5.16
> > cycle so that Fedora only deviates from upstream for 1 cycle.
>
> 5.15.y kernels with this patch added have been in Fedora's
> stable updates repo for a while now without any reports of the
> regressions you feared this may cause.
>
> Bjorn, I hope that you are willing to merge this patch now that it has
> seen some more wide spread testing ?

I'm still not happy about the idea of basing this on BIOS dates. I
did this with 7bc5e3f2be32 ("x86/PCI: use host bridge _CRS info by
default on 2008 and newer machines"), and it was a mistake.

Because of that mistake, we now have the use_crs/nocrs kernel
parameters, which confuse users and lead to them being passed around
as "fixes" on random bulletin boards.

Adding another BIOS date check and use_e820/no_e820 kernel parameters
feels like it's layering on more complexity to cover up another major
mistake I made, 4dc2287c1805 ("x86: avoid E820 regions when allocating
address space").

I think it would be better for the code to recognize the situation
addressed by 4dc2287c1805 and deal with it directly. Is that
possible? I dunno; I don't think we've really tried.

Bjorn

2021-12-15 16:33:54

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Hi Bjorn,

On 12/15/21 17:01, Bjorn Helgaas wrote:
> On Tue, Dec 07, 2021 at 05:52:40PM +0100, Hans de Goede wrote:
>> On 11/10/21 14:05, Hans de Goede wrote:
>>> On 11/10/21 09:45, Hans de Goede wrote:
>>>> On 11/9/21 23:07, Bjorn Helgaas wrote:
>>>>> On Sat, Nov 06, 2021 at 11:15:07AM +0100, Hans de Goede wrote:
>>>>>> On 10/20/21 23:14, Bjorn Helgaas wrote:
>>>>>>> On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
>>>>>>>> On 10/19/21 23:52, Bjorn Helgaas wrote:
>>>>>>>>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
>>>>>>>>>> Some BIOS-es contain a bug where they add addresses which map to system
>>>>>>>>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
>>>>>>>>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>>>>>>>>>> space").
>>>>>>>>>>
>>>>>>>>>> To work around this bug Linux excludes E820 reserved addresses when
>>>>>>>>>> allocating addresses from the PCI host bridge window since 2010.
>>>>>>>>>> ...
>>>>>>>
>>>>>>>>> I haven't seen anybody else eager to merge this, so I guess I'll stick
>>>>>>>>> my neck out here.
>>>>>>>>>
>>>>>>>>> I applied this to my for-linus branch for v5.15.
>>>>>>>>
>>>>>>>> Thank you, and sorry about the build-errors which the lkp
>>>>>>>> kernel-test-robot found.
>>>>>>>>
>>>>>>>> I've just send out a patch which fixes these build-errors
>>>>>>>> (verified with both .config-s from the lkp reports).
>>>>>>>> Feel free to squash this into the original patch (or keep
>>>>>>>> them separate, whatever works for you).
>>>>>>>
>>>>>>> Thanks, I squashed the fix in.
>>>>>>>
>>>>>>> HOWEVER, I think it would be fairly risky to push this into v5.15.
>>>>>>> We would be relying on the assumption that current machines have all
>>>>>>> fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
>>>>>>> evidence for that.
>>>>>>>
>>>>>>> I'm not sure there's significant benefit to having this in v5.15.
>>>>>>> Yes, the mainline v5.15 kernel would work on the affected machines,
>>>>>>> but I suspect most people with those machines are running distro
>>>>>>> kernels, not mainline kernels.
>>>>>>
>>>>>> I understand that you were reluctant to add this to 5.15 so close
>>>>>> near the end of the 5.15 cycle, but can we please get this into
>>>>>> 5.16 now ?
>>>>>>
>>>>>> I know you ultimately want to see if there is a better fix,
>>>>>> but this is hitting a *lot* of users right now and if we come up
>>>>>> with a better fix we can always use that to replace this one
>>>>>> later.
>>>>>
>>>>> I don't know whether there's a "better" fix, but I do know that if we
>>>>> merge what we have right now, nobody will be looking for a better
>>>>> one.
>>>>>
>>>>> We're in the middle of the merge window, so the v5.16 development
>>>>> cycle is over. The v5.17 cycle is just starting, so we have time to
>>>>> hit that. Obviously a fix can be backported to older kernels as
>>>>> needed.
>>>>>
>>>>>> So can we please just go with this fix now, so that we can
>>>>>> fix the issues a lot of users are seeing caused by the current
>>>>>> *wrong* behavior of taking the e820 reservations into account ?
>>>>>
>>>>> I think the fix on the table is "ignore E820 for BIOS date >= 2018"
>>>>> plus the obvious parameters to force it both ways.
>>>>
>>>> Correct.
>>>>
>>>>> The thing I don't like is that this isn't connected at all to the
>>>>> actual BIOS defect. We have no indication that current BIOSes have
>>>>> fixed the defect,
>>>>
>>>> We also have no indication that that defect from 10 years ago, from
>>>> pre UEFI firmware is still present in modern day UEFI firmware which
>>>> is basically an entire different code-base.
>>>>
>>>> And even 10 years ago the problem was only happening to a single
>>>> family of laptop models (Dell Precision laptops) so this clearly
>>>> was a bug in that specific implementation and not some generic
>>>> issue which is likely to be carried forward.
>>>>
>>>>> and we have no assurance that future ones will not
>>>>> have the defect. It would be better if we had some algorithmic way of
>>>>> figuring out what to do.
>>>>
>>>> You yourself have said that in hindsight taking E820 reservations
>>>> into account for PCI bridge host windows was a mistake. So what
>>>> the "ignore E820 for BIOS date >= 2018" is doing is letting the
>>>> past be the past (without regressing on older models) while fixing
>>>> that mistake on any hardware going forward.
>>>>
>>>> In the unlikely case that we hit that BIOS bug again on 1 or 2 models,
>>>> we can simply DMI quirk those models, as we do for countless other
>>>> BIOS issues.
>>>>
>>>>> Thank you very much for chasing down the dmesg log archive
>>>>> (https://github.com/linuxhw/Dmesg; see
>>>>> https://lore.kernel.org/r/[email protected]).
>>>>> Unfortunately I haven't had time to look through it myself, and I
>>>>> haven't heard of anybody else doing it either.
>>>>
>>>> Right, I'm afraid that I already have spend way too much time on this
>>>> myself. Note that I've been working with users on this bug on and off
>>>> for over a year now.
>>>>
>>>> This is hitting many users and now that we have a viable fix, this
>>>> really needs to be fixed now.
>>>>
>>>> I believe that the "ignore E820 for BIOS date >= 2018" fix is good
>>>> enough and that you are letting perfect be the enemy of good here.
>>>>
>>>> As an upstream kernel maintainer myself, I'm sorry to say this,
>>>> but if we don't get some fix for this merged soon you are leaving
>>>> my no choice but to add my fix to the Fedora kernels as a downstream
>>>> patch (and to advise other distros to do the same).
>>>>
>>>> Note that if you are still afraid of regressions going the downstream
>>>> route is also an opportunity, Fedora will start testing moving users
>>>> to 5.15.y soon, so I could add the patch to Fedora's 5.15.y builds and
>>>> see how that goes ?
>>>
>>> So I've discussed this with the Fedora kernel maintainers and they have
>>> agreed to add the patch to the Fedora 5.15 kernels, which we will ask
>>> our users to start testing soon (we first run some voluntary testing
>>> before eventually moving all users over).
>>>
>>> This will provide us with valuable feedback wrt this patch causing
>>> regressions as you are worried about, or not.
>>>
>>> Assuming no regressions show up I hope that this will give you
>>> some assurance that there the patch causes no regressions and that
>>> you will then be willing to pick this up later during the 5.16
>>> cycle so that Fedora only deviates from upstream for 1 cycle.
>>
>> 5.15.y kernels with this patch added have been in Fedora's
>> stable updates repo for a while now without any reports of the
>> regressions you feared this may cause.
>>
>> Bjorn, I hope that you are willing to merge this patch now that it has
>> seen some more wide spread testing ?
>
> I'm still not happy about the idea of basing this on BIOS dates. I
> did this with 7bc5e3f2be32 ("x86/PCI: use host bridge _CRS info by
> default on 2008 and newer machines"), and it was a mistake.
>
> Because of that mistake, we now have the use_crs/nocrs kernel
> parameters, which confuse users and lead to them being passed around
> as "fixes" on random bulletin boards.
>
> Adding another BIOS date check and use_e820/no_e820 kernel parameters
> feels like it's layering on more complexity to cover up another major
> mistake I made, 4dc2287c1805 ("x86: avoid E820 regions when allocating
> address space").
>
> I think it would be better for the code to recognize the situation
> addressed by 4dc2287c1805 and deal with it directly. Is that
> possible? I dunno; I don't think we've really tried.

So we are just going to leave a ton of users systems broken *for years*
until someone has the time to try ? I've not seen anyone step up to
try and address the issue worked around by 4dc2287c1805 (and no I'm not
volunteering).

Also how are we going to come up with another fix for that without any
of the hardware which was affected by the issue back then to test on?

AFAIK we agree that:

1. In hindsight commit 4dc2287c1805 was not a good idea.
2. We cannot just revert it without causing regressions

So given these 2 things, disabling the problematic behavior introduced
by commit 4dc2287c1805 on newer machines, to avoid the older machines
which need it from regressions really seems like the obvious fix to me ?

Especially since replacing commit 4dc2287c1805 seems impossible to me
without access to the originally affected hardware to verify any fix.

AFAIK there are a number of other places in the kernel where
BIOS date checks are used, to e.g. not use ACPI on really early
buggy ACPI implementations, so this is not unheard of.

You seem to mainly be concerned about users cargo-culting
the use_e820/no_e820 kernel parameters as workaround for issues
which have a completely different root cause.

Would my solution to disable the troublesome workaround from
4dc2287c1805 be acceptable if I drop the new commandline options?

I added those just in case, but so far no Fedora users have
needed them, so I would be happy to drop them ?

Regards,

Hans