2023-01-10 18:08:40

by Bjorn Helgaas

[permalink] [raw]
Subject: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

From: Bjorn Helgaas <[email protected]>

Normally we reject ECAM space unless it is reported as reserved in the E820
table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
means PCI extended config space (offsets 0x100-0xfff) may not be accessible.

Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
normally converted to an E820 entry by a bootloader or EFI stub.

07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
E820 entries that correspond to EfiMemoryMappedIO regions because some
other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
E820 entries prevent Linux from allocating BAR space for hot-added devices.

Allow use of ECAM for extended config space when the region is covered by
an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
_CRS.

Reported by Kan Liang, Tony Luck, and Giovanni Cabiddu.

Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Kan Liang <[email protected]>
Reported-by: Tony Luck <[email protected]>
Reported-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
---
arch/x86/pci/mmconfig-shared.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index cd16bef5f2d9..da4b6e8e9df0 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -12,6 +12,7 @@
*/

#include <linux/acpi.h>
+#include <linux/efi.h>
#include <linux/pci.h>
#include <linux/init.h>
#include <linux/bitmap.h>
@@ -442,6 +443,32 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
return mcfg_res.flags;
}

+static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used)
+{
+#ifdef CONFIG_EFI
+ efi_memory_desc_t *md;
+ u64 size, mmio_start, mmio_end;
+
+ for_each_efi_memory_desc(md) {
+ if (md->type == EFI_MEMORY_MAPPED_IO) {
+ size = md->num_pages << EFI_PAGE_SHIFT;
+ mmio_start = md->phys_addr;
+ mmio_end = mmio_start + size;
+
+ /*
+ * N.B. Caller supplies (start, start + size),
+ * so to match, mmio_end is the first address
+ * *past* the EFI_MEMORY_MAPPED_IO area.
+ */
+ if (mmio_start <= start && end <= mmio_end)
+ return true;
+ }
+ }
+#endif
+
+ return false;
+}
+
typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type);

static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
@@ -513,6 +540,10 @@ pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int e
"MMCONFIG at %pR not reserved in "
"ACPI motherboard resources\n",
&cfg->res);
+
+ if (is_mmconf_reserved(is_efi_mmio, cfg, dev,
+ "EfiMemoryMappedIO"))
+ return true;
}

/*
--
2.25.1


2023-01-10 18:33:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

On Tuesday, January 10, 2023 7:02:43 PM CET Bjorn Helgaas wrote:
> From: Bjorn Helgaas <[email protected]>
>
> Normally we reject ECAM space unless it is reported as reserved in the E820
> table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
> means PCI extended config space (offsets 0x100-0xfff) may not be accessible.
>
> Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
> mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
> normally converted to an E820 entry by a bootloader or EFI stub.
>
> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
> E820 entries that correspond to EfiMemoryMappedIO regions because some
> other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
> E820 entries prevent Linux from allocating BAR space for hot-added devices.
>
> Allow use of ECAM for extended config space when the region is covered by
> an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
> _CRS.
>
> Reported by Kan Liang, Tony Luck, and Giovanni Cabiddu.
>
> Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
> Link: https://lore.kernel.org/r/[email protected]
> Reported-by: Kan Liang <[email protected]>
> Reported-by: Tony Luck <[email protected]>
> Reported-by: Giovanni Cabiddu <[email protected]>
> Signed-off-by: Bjorn Helgaas <[email protected]>

Reviewed-by: Rafael J. Wysocki <[email protected]>

> ---
> arch/x86/pci/mmconfig-shared.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index cd16bef5f2d9..da4b6e8e9df0 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -12,6 +12,7 @@
> */
>
> #include <linux/acpi.h>
> +#include <linux/efi.h>
> #include <linux/pci.h>
> #include <linux/init.h>
> #include <linux/bitmap.h>
> @@ -442,6 +443,32 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
> return mcfg_res.flags;
> }
>
> +static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used)
> +{
> +#ifdef CONFIG_EFI
> + efi_memory_desc_t *md;
> + u64 size, mmio_start, mmio_end;
> +
> + for_each_efi_memory_desc(md) {
> + if (md->type == EFI_MEMORY_MAPPED_IO) {
> + size = md->num_pages << EFI_PAGE_SHIFT;
> + mmio_start = md->phys_addr;
> + mmio_end = mmio_start + size;
> +
> + /*
> + * N.B. Caller supplies (start, start + size),
> + * so to match, mmio_end is the first address
> + * *past* the EFI_MEMORY_MAPPED_IO area.
> + */
> + if (mmio_start <= start && end <= mmio_end)
> + return true;
> + }
> + }
> +#endif
> +
> + return false;
> +}
> +
> typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type);
>
> static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
> @@ -513,6 +540,10 @@ pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int e
> "MMCONFIG at %pR not reserved in "
> "ACPI motherboard resources\n",
> &cfg->res);
> +
> + if (is_mmconf_reserved(is_efi_mmio, cfg, dev,
> + "EfiMemoryMappedIO"))
> + return true;
> }
>
> /*
>




2023-01-10 18:42:25

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

Bjorn Helgaas wrote:
> From: Bjorn Helgaas <[email protected]>
>
> Normally we reject ECAM space unless it is reported as reserved in the E820
> table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
> means PCI extended config space (offsets 0x100-0xfff) may not be accessible.
>
> Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
> mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
> normally converted to an E820 entry by a bootloader or EFI stub.
>
> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
> E820 entries that correspond to EfiMemoryMappedIO regions because some
> other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
> E820 entries prevent Linux from allocating BAR space for hot-added devices.
>
> Allow use of ECAM for extended config space when the region is covered by
> an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
> _CRS.
>
> Reported by Kan Liang, Tony Luck, and Giovanni Cabiddu.
>
> Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
> Link: https://lore.kernel.org/r/[email protected]
> Reported-by: Kan Liang <[email protected]>
> Reported-by: Tony Luck <[email protected]>
> Reported-by: Giovanni Cabiddu <[email protected]>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> ---
> arch/x86/pci/mmconfig-shared.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index cd16bef5f2d9..da4b6e8e9df0 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -12,6 +12,7 @@
> */
>
> #include <linux/acpi.h>
> +#include <linux/efi.h>
> #include <linux/pci.h>
> #include <linux/init.h>
> #include <linux/bitmap.h>
> @@ -442,6 +443,32 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
> return mcfg_res.flags;
> }
>
> +static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used)
> +{
> +#ifdef CONFIG_EFI
> + efi_memory_desc_t *md;
> + u64 size, mmio_start, mmio_end;
> +
> + for_each_efi_memory_desc(md) {
> + if (md->type == EFI_MEMORY_MAPPED_IO) {
> + size = md->num_pages << EFI_PAGE_SHIFT;
> + mmio_start = md->phys_addr;
> + mmio_end = mmio_start + size;
> +
> + /*
> + * N.B. Caller supplies (start, start + size),
> + * so to match, mmio_end is the first address
> + * *past* the EFI_MEMORY_MAPPED_IO area.
> + */
> + if (mmio_start <= start && end <= mmio_end)
> + return true;
> + }
> + }
> +#endif

Perhaps the following trick (compile tested), but either way:

Reviewed-by: Dan Williams <[email protected]>


diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index da4b6e8e9df0..ae95d1b073c6 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -445,7 +445,6 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)

static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used)
{
-#ifdef CONFIG_EFI
efi_memory_desc_t *md;
u64 size, mmio_start, mmio_end;

@@ -464,7 +463,6 @@ static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used)
return true;
}
}
-#endif

return false;
}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 4b27519143f5..3ab0c255b791 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -790,8 +790,12 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm,
*
* Once the loop finishes @md must not be accessed.
*/
+#ifdef CONFIG_EFI
#define for_each_efi_memory_desc(md) \
for_each_efi_memory_desc_in_map(&efi.memmap, md)
+#else
+#define for_each_efi_memory_desc(md) for (; 0;)
+#endif

/*
* Format an EFI memory descriptor's type and attributes to a user-provided

2023-01-10 19:39:32

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

On Tue, Jan 10, 2023 at 10:29:06AM -0800, Dan Williams wrote:
> Bjorn Helgaas wrote:
> > From: Bjorn Helgaas <[email protected]>
> >
> > Normally we reject ECAM space unless it is reported as reserved in the E820
> > table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
> > means PCI extended config space (offsets 0x100-0xfff) may not be accessible.
> >
> > Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
> > mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
> > normally converted to an E820 entry by a bootloader or EFI stub.
> >
> > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
> > E820 entries that correspond to EfiMemoryMappedIO regions because some
> > other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
> > E820 entries prevent Linux from allocating BAR space for hot-added devices.
> >
> > Allow use of ECAM for extended config space when the region is covered by
> > an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
> > _CRS.
> >
> > Reported by Kan Liang, Tony Luck, and Giovanni Cabiddu.
> >
> > Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
> > Link: https://lore.kernel.org/r/[email protected]
> > Reported-by: Kan Liang <[email protected]>
> > Reported-by: Tony Luck <[email protected]>
> > Reported-by: Giovanni Cabiddu <[email protected]>
> > Signed-off-by: Bjorn Helgaas <[email protected]>
> > ---
> > arch/x86/pci/mmconfig-shared.c | 31 +++++++++++++++++++++++++++++++
> > 1 file changed, 31 insertions(+)
> >
> > diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> > index cd16bef5f2d9..da4b6e8e9df0 100644
> > --- a/arch/x86/pci/mmconfig-shared.c
> > +++ b/arch/x86/pci/mmconfig-shared.c
> > @@ -12,6 +12,7 @@
> > */
> >
> > #include <linux/acpi.h>
> > +#include <linux/efi.h>
> > #include <linux/pci.h>
> > #include <linux/init.h>
> > #include <linux/bitmap.h>
> > @@ -442,6 +443,32 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
> > return mcfg_res.flags;
> > }
> >
> > +static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used)
> > +{
> > +#ifdef CONFIG_EFI
> > + efi_memory_desc_t *md;
> > + u64 size, mmio_start, mmio_end;
> > +
> > + for_each_efi_memory_desc(md) {
> > + if (md->type == EFI_MEMORY_MAPPED_IO) {
> > + size = md->num_pages << EFI_PAGE_SHIFT;
> > + mmio_start = md->phys_addr;
> > + mmio_end = mmio_start + size;
> > +
> > + /*
> > + * N.B. Caller supplies (start, start + size),
> > + * so to match, mmio_end is the first address
> > + * *past* the EFI_MEMORY_MAPPED_IO area.
> > + */
> > + if (mmio_start <= start && end <= mmio_end)
> > + return true;
> > + }
> > + }
> > +#endif
>
> Perhaps the following trick (compile tested), but either way:
>
> Reviewed-by: Dan Williams <[email protected]>

That's a great trick, and I wish I'd thought of it. I have some
follow-on patches I'm considering for v6.3, so in the interest of
streamlining the path of this one to v6.2-rc4, I think I'll wait on
this until v6.3.

> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index da4b6e8e9df0..ae95d1b073c6 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -445,7 +445,6 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
>
> static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used)
> {
> -#ifdef CONFIG_EFI
> efi_memory_desc_t *md;
> u64 size, mmio_start, mmio_end;
>
> @@ -464,7 +463,6 @@ static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used)
> return true;
> }
> }
> -#endif
>
> return false;
> }
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 4b27519143f5..3ab0c255b791 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -790,8 +790,12 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm,
> *
> * Once the loop finishes @md must not be accessed.
> */
> +#ifdef CONFIG_EFI
> #define for_each_efi_memory_desc(md) \
> for_each_efi_memory_desc_in_map(&efi.memmap, md)
> +#else
> +#define for_each_efi_memory_desc(md) for (; 0;)
> +#endif
>
> /*
> * Format an EFI memory descriptor's type and attributes to a user-provided

2023-10-12 16:04:38

by Tomasz Pala

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

Hello,

On Tue, Jan 10, 2023 at 12:02:43 -0600, Bjorn Helgaas wrote:

> Normally we reject ECAM space unless it is reported as reserved in the E820
> table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
> means PCI extended config space (offsets 0x100-0xfff) may not be accessible.
>
> Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
> mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
> normally converted to an E820 entry by a bootloader or EFI stub.
>
> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
> E820 entries that correspond to EfiMemoryMappedIO regions because some
> other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
> E820 entries prevent Linux from allocating BAR space for hot-added devices.
>
> Allow use of ECAM for extended config space when the region is covered by
> an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
> _CRS.

I'm still having a problem initializing ixgbe NICs with pristine 6.5.7 kernel.

efi: Remove mem63: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
[...]
[mem 0x7f800000-0xfed1bfff] available for PCI devices
[...]
PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as EfiMemoryMappedIO
[...]
ixgbe 0000:02:00.0: enabling device (0140 -> 0142)
ixgbe 0000:02:00.0: BAR 0: can't reserve [mem 0x80000000-0x8007ffff 64bit]
ixgbe 0000:02:00.0: pci_request_selected_regions failed 0xfffffff0
ixgbe: probe of 0000:02:00.0 failed with error -16


After disabling the code causing this (using always-false condition:
if (size >= 256*1024 && 0) {
) in the chunk:

https://lore.kernel.org/lkml/[email protected]/

the BAR starts at 0x90000000 (not 0x80000000):

efi: Not removing mem63: MMIO range=[0x80000000-0x8fffffff] (262144KB) from e820 map
[...]
[mem 0x90000000-0xfed1bfff] available for PCI devices
[...]
PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as E820 entry

and everything seems to work again.


I've got full system bootup logs from the upstream and worked around,
but I'm not sure if this is OK to attach them (the CC list is long).

Also, this is my test machine so I can run some experiments.

best regards,
--
Tomasz Pala <[email protected]>

2023-10-16 17:32:02

by Tomasz Pala

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

On Thu, Oct 12, 2023 at 17:33:47 +0200, Tomasz Pala wrote:

> I'm still having a problem initializing ixgbe NICs with pristine 6.5.7 kernel.
>
> efi: Remove mem63: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
> [...]
> [mem 0x7f800000-0xfed1bfff] available for PCI devices
> [...]
> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as EfiMemoryMappedIO
> [...]
> ixgbe 0000:02:00.0: enabling device (0140 -> 0142)
> ixgbe 0000:02:00.0: BAR 0: can't reserve [mem 0x80000000-0x8007ffff 64bit]
> ixgbe 0000:02:00.0: pci_request_selected_regions failed 0xfffffff0
> ixgbe: probe of 0000:02:00.0 failed with error -16

FWIW, as I got no response - there were other people facing the issue as
well:

https://forum.proxmox.com/threads/proxmox-8-kernel-6-2-16-4-pve-ixgbe-driver-fails-to-load-due-to-pci-device-probing-failure.131203/


Apparently this might be some hardware quirk, therefore I'm not sure if
the internal EfiMemoryMappedIO reservation logic should be reviewed, or
some quirk handling to be added, or maybe some CONFIG_option introduced.

Anyone please?

--
Tomasz Pala <[email protected]>

2023-10-26 20:53:38

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

On Thu, Oct 12, 2023 at 05:33:47PM +0200, Tomasz Pala wrote:
> On Tue, Jan 10, 2023 at 12:02:43 -0600, Bjorn Helgaas wrote:
> > Normally we reject ECAM space unless it is reported as reserved in the E820
> > table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
> > means PCI extended config space (offsets 0x100-0xfff) may not be accessible.
> >
> > Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
> > mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
> > normally converted to an E820 entry by a bootloader or EFI stub.
> >
> > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
> > E820 entries that correspond to EfiMemoryMappedIO regions because some
> > other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
> > E820 entries prevent Linux from allocating BAR space for hot-added devices.
> >
> > Allow use of ECAM for extended config space when the region is covered by
> > an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
> > _CRS.
>
> I'm still having a problem initializing ixgbe NICs with pristine 6.5.7 kernel.

Thanks very much for the report, and sorry for the inconvenience and
my delay in looking at it.

> efi: Remove mem63: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
> [mem 0x7f800000-0xfed1bfff] available for PCI devices
> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as EfiMemoryMappedIO
> ixgbe 0000:02:00.0: enabling device (0140 -> 0142)
> ixgbe 0000:02:00.0: BAR 0: can't reserve [mem 0x80000000-0x8007ffff 64bit]
> ixgbe 0000:02:00.0: pci_request_selected_regions failed 0xfffffff0
> ixgbe: probe of 0000:02:00.0 failed with error -16

Something is wrong with our allocation scheme. Both the MMCONFIG
region and the ixgbe BAR 0 are at 0x80000000, which obviously cannot
work. Maybe the full dmesg log will have a clue about why we didn't
move ixgbe out of the way.

> After disabling the code causing this (using always-false condition:
> if (size >= 256*1024 && 0) {
> ) in the chunk:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> the BAR starts at 0x90000000 (not 0x80000000):
>
> efi: Not removing mem63: MMIO range=[0x80000000-0x8fffffff] (262144KB) from e820 map
> [...]
> [mem 0x90000000-0xfed1bfff] available for PCI devices
> [...]
> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as E820 entry
>
> and everything seems to work again.
>
>
> I've got full system bootup logs from the upstream and worked around,
> but I'm not sure if this is OK to attach them (the CC list is long).

Would you mind opening a new report at https://bugzilla.kernel.org,
attaching those logs, and responding here with the URL?

I looked at the proxmox thread you mentioned, but sometimes people
strip out parts of the log they think are irrelevant, and in this
case, the stripped-out parts *are* relevant.

Bjorn

2023-10-27 15:24:48

by Tomasz Pala

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

On Thu, Oct 26, 2023 at 15:53:19 -0500, Bjorn Helgaas wrote:

> Something is wrong with our allocation scheme. Both the MMCONFIG
> region and the ixgbe BAR 0 are at 0x80000000, which obviously cannot
> work. Maybe the full dmesg log will have a clue about why we didn't
> move ixgbe out of the way.
>
> Would you mind opening a new report at https://bugzilla.kernel.org,
> attaching those logs, and responding here with the URL?

Sure, no problem: https://bugzilla.kernel.org/show_bug.cgi?id=218050

I've attached the failing one and the one working with my workaround in
place ("if (size >= 256*1024 && 0) {").

regards,
--
Tomasz Pala <[email protected]>

2023-11-03 19:19:43

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

On Thu, Oct 26, 2023 at 03:53:19PM -0500, Bjorn Helgaas wrote:
> On Thu, Oct 12, 2023 at 05:33:47PM +0200, Tomasz Pala wrote:
> > On Tue, Jan 10, 2023 at 12:02:43 -0600, Bjorn Helgaas wrote:
> > > Normally we reject ECAM space unless it is reported as reserved in the E820
> > > table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
> > > means PCI extended config space (offsets 0x100-0xfff) may not be accessible.
> > >
> > > Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
> > > mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
> > > normally converted to an E820 entry by a bootloader or EFI stub.
> > >
> > > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
> > > E820 entries that correspond to EfiMemoryMappedIO regions because some
> > > other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
> > > E820 entries prevent Linux from allocating BAR space for hot-added devices.
> > >
> > > Allow use of ECAM for extended config space when the region is covered by
> > > an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
> > > _CRS.
> >
> > I'm still having a problem initializing ixgbe NICs with pristine 6.5.7 kernel.
>
> Thanks very much for the report, and sorry for the inconvenience and
> my delay in looking at it.
>
> > efi: Remove mem63: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
> > [mem 0x7f800000-0xfed1bfff] available for PCI devices
> > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> > [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as EfiMemoryMappedIO
> > ixgbe 0000:02:00.0: enabling device (0140 -> 0142)
> > ixgbe 0000:02:00.0: BAR 0: can't reserve [mem 0x80000000-0x8007ffff 64bit]
> > ixgbe 0000:02:00.0: pci_request_selected_regions failed 0xfffffff0
> > ixgbe: probe of 0000:02:00.0 failed with error -16
>
> Something is wrong with our allocation scheme. Both the MMCONFIG
> region and the ixgbe BAR 0 are at 0x80000000, which obviously cannot
> work. Maybe the full dmesg log will have a clue about why we didn't
> move ixgbe out of the way.
>
> > After disabling the code causing this (using always-false condition:
> > if (size >= 256*1024 && 0) {
> > ) in the chunk:
> >
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > the BAR starts at 0x90000000 (not 0x80000000):
> >
> > efi: Not removing mem63: MMIO range=[0x80000000-0x8fffffff] (262144KB) from e820 map
> > [...]
> > [mem 0x90000000-0xfed1bfff] available for PCI devices
> > [...]
> > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as E820 entry
> >
> > and everything seems to work again.
> >
> >
> > I've got full system bootup logs from the upstream and worked around,
> > but I'm not sure if this is OK to attach them (the CC list is long).
>
> Would you mind opening a new report at https://bugzilla.kernel.org,
> attaching those logs, and responding here with the URL?

Thanks for the report and the logs, which are attached at
https://bugzilla.kernel.org/show_bug.cgi?id=218050

I think the problem is that the MMCONFIG region is at
[mem 0x80000000-0x8fffffff], and that is *also* included in one of the
host bridge windows reported via _CRS:

PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
pci_bus 0000:00: root bus resource [mem 0x80000000-0xfbffffff window]

I'll try to figure out how to deal with that. In the meantime, would
you mind attaching the contents of /proc/iomem to the bugzilla? I
think you have to cat it as root to get the actual values included.

Bjorn

2023-11-18 14:22:06

by Tomasz Pala

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

On Thu, Nov 09, 2023 at 12:44:05 -0600, Bjorn Helgaas wrote:

>> https://bugzilla.kernel.org/show_bug.cgi?id=218050
>>
>> I think the problem is that the MMCONFIG region is at
>> [mem 0x80000000-0x8fffffff], and that is *also* included in one of the
>> host bridge windows reported via _CRS:
>>
>> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>> pci_bus 0000:00: root bus resource [mem 0x80000000-0xfbffffff window]
>>
>> I'll try to figure out how to deal with that. In the meantime, would
>> you mind attaching the contents of /proc/iomem to the bugzilla? I
>
> I attached a debug patch to both bugzilla entries. If you could
> attach the "acpidump" output and (if practical) boot a kernel with the
> debug patch and attach the dmesg logs, that would be great.

I've posted the files. There are signs of buggy BIOS, but I don't expect
any firmware update to be released for this hw anymore.

DMI: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.4 11/20/2019

.text .data .bss are not marked as E820_TYPE_RAM!
tboot: non-0 tboot_addr but it is not of type E820_TYPE_RESERVED

DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000df243000-0x00000000df251fff], contact BIOS vendor for fixes
DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000df243000-0x00000000df251fff]



BTW is there a reason for this logging discrepancy?

efi: Remove mem173: MMIO range=[0xe0000000-0xefffffff] (256MB) from e820 map
efi: Not removing mem71: MMIO range=[0xe0000000-0xefffffff] (262144KB) from e820 map

efi: Not removing mem74: MMIO range=[0xff000000-0xffffffff] (16384KB) from e820 map
efi: Remove mem176: MMIO range=[0xff000000-0xffffffff] (16MB) from e820 map

This is arch/x86/platform/efi/efi.c:
static void __init efi_remove_e820_mmio(void)

Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) ... size >> 20
Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) ... size >> 10

--
Tomasz Pala <[email protected]>

2023-11-20 16:29:46

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

On Sat, Nov 18, 2023 at 03:21:43PM +0100, Tomasz Pala wrote:
> On Thu, Nov 09, 2023 at 12:44:05 -0600, Bjorn Helgaas wrote:
>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=218050
> >>
> >> I think the problem is that the MMCONFIG region is at
> >> [mem 0x80000000-0x8fffffff], and that is *also* included in one of the
> >> host bridge windows reported via _CRS:
> >>
> >> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >> pci_bus 0000:00: root bus resource [mem 0x80000000-0xfbffffff window]
> >>
> >> I'll try to figure out how to deal with that. In the meantime, would
> >> you mind attaching the contents of /proc/iomem to the bugzilla? I
> >
> > I attached a debug patch to both bugzilla entries. If you could
> > attach the "acpidump" output and (if practical) boot a kernel with the
> > debug patch and attach the dmesg logs, that would be great.
>
> I've posted the files. There are signs of buggy BIOS, but I don't expect
> any firmware update to be released for this hw anymore.

Thank you! A BIOS update is almost never the answer because even if
an update exists, we have to assume that most users in the field will
never install the update.

I want to look at the BIOS info in case we can learn about something
*Linux* is doing wrong. This most likely works fine with Windows, so
I assume Linux is doing something wrong or at least differently than
Windows.

> DMI: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.4 11/20/2019
>
> .text .data .bss are not marked as E820_TYPE_RAM!

Added by 4eea6aa581ab ("x86, mm: if kernel .text .data .bss are not
marked as E820_RAM, complain and fix"). No idea. A shame we didn't
include the .text/.data values in the message.

> tboot: non-0 tboot_addr but it is not of type E820_TYPE_RESERVED

Added by 316253406959 ("x86, intel_txt: Intel TXT boot support"). No
idea about this either.

> DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000df243000-0x00000000df251fff], contact BIOS vendor for fixes
> DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000df243000-0x00000000df251fff]

Both related to arch_rmrr_sanity_check(), added by f036c7fa0ab6
("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved")
and f5a68bb0752e ("iommu/vt-d: Mark firmware tainted if RMRR fails
sanity check").

No idea about this one either. The VT-d spec (r1.3, sec 8.4) says
"BIOS must report the RMRR reported memory addresses as reserved in
the system memory map returned through methods such as INT15, EFI
GetMemoryMap etc."

arch_rmrr_sanity_check() only looks at your e820 map, which only has
this:

BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
BIOS-e820: [mem 0x0000000000100000-0x00000000d1f36fff] usable

I think Linux basically converts the info from EFI GetMemoryMap
to an e820 format; I think booting with "efi=debug" would show more
details of this.

Anyway, this is all a tangent.

> BTW is there a reason for this logging discrepancy?
>
> efi: Remove mem173: MMIO range=[0xe0000000-0xefffffff] (256MB) from e820 map
> efi: Not removing mem71: MMIO range=[0xe0000000-0xefffffff] (262144KB) from e820 map
>
> efi: Not removing mem74: MMIO range=[0xff000000-0xffffffff] (16384KB) from e820 map
> efi: Remove mem176: MMIO range=[0xff000000-0xffffffff] (16MB) from e820 map
>
> This is arch/x86/platform/efi/efi.c:
> static void __init efi_remove_e820_mmio(void)
>
> Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) ... size >> 20
> Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) ... size >> 10

You mean the MB vs KB difference? That's my fault. I guess I used KB
for the "Not removing" message because those are smaller (< 256KB) so
the size in MB wouldn't be useful there. We could use KB for both,
but I guess I used MB for the "Remove" case because it's a little
easier to read and I expected "Not removing" to be a relatively
unusual case.

Bjorn

2023-11-21 15:24:29

by Tomasz Pala

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

On Mon, Nov 20, 2023 at 10:29:33 -0600, Bjorn Helgaas wrote:

> Thank you! A BIOS update is almost never the answer because even if
> an update exists, we have to assume that most users in the field will
> never install the update.

Not to mention enabling 64-bit BARs, which is even more cumbersome
ixgbe-specific magic that requires entirely dedicated tools...

>> .text .data .bss are not marked as E820_TYPE_RAM!
and
>> DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000df243000-0x00000000df251fff], contact BIOS vendor for fixes
>> DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000df243000-0x00000000df251fff]
[...]
> I think Linux basically converts the info from EFI GetMemoryMap
> to an e820 format; I think booting with "efi=debug" would show more
> details of this.

The dmesg I've attached today is with efi=debug, but the weird thing is
- both of the above warnings manifested themself only once, with the
first (verbose debugging: "MCFG debug") patch applied... Anyway.

The "memremap attempted on mixed range 0x0000000000000000 size: 0x8000
WARNING: CPU: 0 PID: 1 at kernel/iomem.c:78 memremap+0x154/0x170" also
seems to be triggered by "efi=debug", so my guess is that it's unrelated.

--
Tomasz Pala <[email protected]>

2023-11-21 18:19:59

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

On Tue, Nov 21, 2023 at 04:24:07PM +0100, Tomasz Pala wrote:
> On Mon, Nov 20, 2023 at 10:29:33 -0600, Bjorn Helgaas wrote:
>
> > Thank you! A BIOS update is almost never the answer because even if
> > an update exists, we have to assume that most users in the field will
> > never install the update.
>
> Not to mention enabling 64-bit BARs, which is even more cumbersome
> ixgbe-specific magic that requires entirely dedicated tools...
>
> >> .text .data .bss are not marked as E820_TYPE_RAM!
> and
> >> DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000df243000-0x00000000df251fff], contact BIOS vendor for fixes
> >> DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000df243000-0x00000000df251fff]
> [...]
> > I think Linux basically converts the info from EFI GetMemoryMap
> > to an e820 format; I think booting with "efi=debug" would show more
> > details of this.
>
> The dmesg I've attached today is with efi=debug, but the weird thing is
> - both of the above warnings manifested themself only once, with the
> first (verbose debugging: "MCFG debug") patch applied... Anyway.

OK. I don't know what (if anything) to do about the above.

> The "memremap attempted on mixed range 0x0000000000000000 size: 0x8000
> WARNING: CPU: 0 PID: 1 at kernel/iomem.c:78 memremap+0x154/0x170" also
> seems to be triggered by "efi=debug", so my guess is that it's unrelated.

Yes, I think so. This is from efi_debugfs_init(), which we only run
when "efi=debug", and I think it comes from memremapping this area:

efi: mem00: [Boot Code | | | | | | | | | | |WB|WT|WC|UC] range=[0x0000000000000000-0x0000000000007fff] (0MB)

Bjorn

2023-12-06 11:55:02

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 12.10.23 17:33, Tomasz Pala wrote:
> Hello,
>
> On Tue, Jan 10, 2023 at 12:02:43 -0600, Bjorn Helgaas wrote:
>
>> Normally we reject ECAM space unless it is reported as reserved in the E820
>> table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
>> means PCI extended config space (offsets 0x100-0xfff) may not be accessible.
>
> I'm still having a problem initializing ixgbe NICs with pristine 6.5.7 kernel.

#regzbot fix: x86/pci: Reserve ECAM if BIOS didn't include it in PNP0C02
_CRS
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.