Hi Bjorn,
Happy new year!
We found some PCI issues with the latest 6.2-rc2.
- Using the lspci -xxxx, the extended PCI config space of all PCI
devices are missed with the latest 6.2-rc2. The system we used had 932
PCI devices, at least 800 which have extended space as seen when booted
into a 5.15 kernel. But none of them appeared in 6.2-rc2.
- The drivers which rely on the information in the extended PCI config
space don't work anymore. We have confirmed that the perf uncore driver
(uncore performance monitoring) and Intel VSEC driver (telemetry) don't
work in 6.2-rc2. There could be more drivers which are impacted.
After a bisect, we found the regression is caused by the below commit
07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").
After reverting the commit, the issues are gone.
Could you please take a look at the issues?
Thanks,
Kan
On Wed, Jan 04, 2023 at 09:39:56AM -0500, Liang, Kan wrote:
> Hi Bjorn,
>
> Happy new year!
>
> We found some PCI issues with the latest 6.2-rc2.
>
> - Using the lspci -xxxx, the extended PCI config space of all PCI
> devices are missed with the latest 6.2-rc2. The system we used had 932
> PCI devices, at least 800 which have extended space as seen when booted
> into a 5.15 kernel. But none of them appeared in 6.2-rc2.
> - The drivers which rely on the information in the extended PCI config
> space don't work anymore. We have confirmed that the perf uncore driver
> (uncore performance monitoring) and Intel VSEC driver (telemetry) don't
> work in 6.2-rc2. There could be more drivers which are impacted.
>
> After a bisect, we found the regression is caused by the below commit
> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").
> After reverting the commit, the issues are gone.
>
> Could you please take a look at the issues?
Certainly. Can you capture the complete dmesg log, please?
Bjorn
On Wed, Jan 04, 2023 at 08:50:32AM -0600, Bjorn Helgaas wrote:
> On Wed, Jan 04, 2023 at 09:39:56AM -0500, Liang, Kan wrote:
> > Hi Bjorn,
> >
> > Happy new year!
> >
> > We found some PCI issues with the latest 6.2-rc2.
> >
> > - Using the lspci -xxxx, the extended PCI config space of all PCI
> > devices are missed with the latest 6.2-rc2. The system we used had 932
> > PCI devices, at least 800 which have extended space as seen when booted
> > into a 5.15 kernel. But none of them appeared in 6.2-rc2.
> > - The drivers which rely on the information in the extended PCI config
> > space don't work anymore. We have confirmed that the perf uncore driver
> > (uncore performance monitoring) and Intel VSEC driver (telemetry) don't
> > work in 6.2-rc2. There could be more drivers which are impacted.
> >
> > After a bisect, we found the regression is caused by the below commit
> > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").
> > After reverting the commit, the issues are gone.
> >
> > Could you please take a look at the issues?
>
> Certainly. Can you capture the complete dmesg log, please?
Thanks! Comparing v5.19 and v6.2-rc2, I see these:
--- v5.19
+++ v6.2-rc2
+efi: Remove mem458: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
+e820: remove [mem 0x80000000-0x8fffffff] reserved
-PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
+PCI: not using MMCONFIG
+PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
+[Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
+PCI: not using MMCONFIG
system 00:01: [mem 0xff000000-0xffffffff] has been reserved
system 00:02: [mem 0xfd000000-0xfd69ffff] could not be reserved
system 00:02: [mem 0xfd6c0000-0xfd6cffff] has been reserved
system 00:02: [mem 0xfd6f0000-0xfdffffff] has been reserved
system 00:02: [mem 0xfe000000-0xfe01ffff] could not be reserved
system 00:02: [mem 0xfe200000-0xfe7fffff] has been reserved
system 00:02: [mem 0xff000000-0xffffffff] has been reserved
I think this is a firmware defect. MCFG says the ECAM space is at
[mem 0x80000000-0x8fffffff]. Per the PCI Firmware Spec, r3.3, Note 2
of Table 4-2, this space should be reserved by a motherboard resource,
i.e., a PNP0C02 device (which would appear as "system 00:01" or
similar above) with _CRS that includes [mem 0x80000000-0x8fffffff].
This firmware supplies an EfiMemoryMappedIO region
[0x80000000-0x8fffffff] for the ECAM space (this could be confirmed by
adding "efi=debug"), and the bootloader or EFI stub converted that to
an E820 entry that Linux consumes.
On v5.19, Linux treated that EfiMemoryMappedIO region as a reservation
of the ECAM space, but starting with v6.2-rc1, Linux removes
EfiMemoryMappedIO regions from E820.
My understanding is that EfiMemoryMappedIO tells the OS to map the
area for use by runtime services, but is not intended to prevent the
OS from using the area. Some platforms use EfiMemoryMappedIO for PCI
host bridge apertures, and of course the OS needs to use those.
If your firmware folks disagree and think Linux should be able to
figure this out differently, I would love to have a conversation about
how to do this.
Bjorn
On Thu, Jan 05, 2023 at 09:42:10AM -0800, Tony Luck wrote:
> On Wed, Jan 04, 2023 at 09:45:11AM -0600, Bjorn Helgaas wrote:
> > My understanding is that EfiMemoryMappedIO tells the OS to map the
> > area for use by runtime services, but is not intended to prevent the
> > OS from using the area. Some platforms use EfiMemoryMappedIO for PCI
> > host bridge apertures, and of course the OS needs to use those.
> >
> > If your firmware folks disagree and think Linux should be able to
> > figure this out differently, I would love to have a conversation about
> > how to do this.
>
> It seems that 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
> is also the cause of breakage for drivers/edac/sb_edac.c. It is broken
> in v6.2-rc2 and reverting this commit makes it work again.
>
> This ancient driver probably plays fast and loose with how it ought to
> access extended PCIe config space ... but it needs to do this to read various
> memory controller configuration registers to do address translation from
> a system physical address to a DIMM address.
Hi Tony, can you share a dmesg log? Does it look like the same thing
Kan reported, where the ECAM space is reported only via an
EfiMemoryMappedIO region and is not otherwise reserved by firmware?
Bjorn
On Wed, Jan 04, 2023 at 09:45:11AM -0600, Bjorn Helgaas wrote:
> My understanding is that EfiMemoryMappedIO tells the OS to map the
> area for use by runtime services, but is not intended to prevent the
> OS from using the area. Some platforms use EfiMemoryMappedIO for PCI
> host bridge apertures, and of course the OS needs to use those.
>
> If your firmware folks disagree and think Linux should be able to
> figure this out differently, I would love to have a conversation about
> how to do this.
It seems that 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
is also the cause of breakage for drivers/edac/sb_edac.c. It is broken
in v6.2-rc2 and reverting this commit makes it work again.
This ancient driver probably plays fast and loose with how it ought to
access extended PCIe config space ... but it needs to do this to read various
memory controller configuration registers to do address translation from
a system physical address to a DIMM address.
-Tony
> Hi Tony, can you share a dmesg log? Does it look like the same thing
> Kan reported, where the ECAM space is reported only via an
> EfiMemoryMappedIO region and is not otherwise reserved by firmware?
Bjorn,
Attached are serial logs. "broken" is the one from v6.2-rc2, "revert" is the
one with your commit reverted.
I don't see the string "ECAM" in either of them.
-Tony
On Thu, Jan 05, 2023 at 06:04:52PM +0000, Luck, Tony wrote:
> > Hi Tony, can you share a dmesg log? Does it look like the same thing
> > Kan reported, where the ECAM space is reported only via an
> > EfiMemoryMappedIO region and is not otherwise reserved by firmware?
>
> Attached are serial logs. "broken" is the one from v6.2-rc2, "revert" is the
> one with your commit reverted.
>
> I don't see the string "ECAM" in either of them.
Yeah, "ECAM" is what the PCIe spec calls it, but Linux logging uses
"MMCONFIG". Probably should change that.
Anyway, your dmesg log shows the same problem:
DMI: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
efi: Remove mem48: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
PCI: not using MMCONFIG
acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended configuration space under this bridge
Apparently the only mention of [mem 0x80000000-0x8fffffff] in the
firmware/kernel interface is as an EfiMemoryMappedIO region.
I think this is a firmware bug, but obviously we're going to have to
figure out a way around it.
Bjorn
On 2023-01-05 1:29 p.m., Bjorn Helgaas wrote:
> On Thu, Jan 05, 2023 at 06:04:52PM +0000, Luck, Tony wrote:
>>> Hi Tony, can you share a dmesg log? Does it look like the same thing
>>> Kan reported, where the ECAM space is reported only via an
>>> EfiMemoryMappedIO region and is not otherwise reserved by firmware?
>>
>> Attached are serial logs. "broken" is the one from v6.2-rc2, "revert" is the
>> one with your commit reverted.
>>
>> I don't see the string "ECAM" in either of them.
>
> Yeah, "ECAM" is what the PCIe spec calls it, but Linux logging uses
> "MMCONFIG". Probably should change that.
>
> Anyway, your dmesg log shows the same problem:
>
> DMI: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
> efi: Remove mem48: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> PCI: not using MMCONFIG
> acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended configuration space under this bridge
>
> Apparently the only mention of [mem 0x80000000-0x8fffffff] in the
> firmware/kernel interface is as an EfiMemoryMappedIO region.
>
> I think this is a firmware bug, but obviously we're going to have to
> figure out a way around it.
>
I just want to share that I did more tests on an Ice Lake server (a
different generation from my original report and Tony's machine).
The same problem can be found as well.
[ 0.000000] DMI: Intel Corporation M50CYP2SB2U/M50CYP2SB2U, BIOS
SE5C6200.86B.4018.D65.2010201151 10/20/2020
[ 0.000000] efi: Remove mem375: MMIO range=[0x80000000-0x8fffffff]
(256MB) from e820 map
[ 0.000000] e820: remove [mem 0x80000000-0x8fffffff] reserved
[ 1.528341] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
0x80000000-0x8fffffff] (base 0x80000000)
[ 1.566605] [Firmware Info]: PCI: MMCONFIG at [mem
0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
[ 1.566611] PCI: not using MMCONFIG
This firmware implementation should exist on the existing platforms for
a long time.
Thanks,
Kan
On Thu, Jan 05, 2023 at 02:23:45PM -0500, Liang, Kan wrote:
> On 2023-01-05 1:29 p.m., Bjorn Helgaas wrote:
> > On Thu, Jan 05, 2023 at 06:04:52PM +0000, Luck, Tony wrote:
> >>> Hi Tony, can you share a dmesg log? Does it look like the same thing
> >>> Kan reported, where the ECAM space is reported only via an
> >>> EfiMemoryMappedIO region and is not otherwise reserved by firmware?
> >>
> >> Attached are serial logs. "broken" is the one from v6.2-rc2, "revert" is the
> >> one with your commit reverted.
> >>
> >> I don't see the string "ECAM" in either of them.
> >
> > Yeah, "ECAM" is what the PCIe spec calls it, but Linux logging uses
> > "MMCONFIG". Probably should change that.
> >
> > Anyway, your dmesg log shows the same problem:
> >
> > DMI: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
> > efi: Remove mem48: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
> > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> > [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> > PCI: not using MMCONFIG
> > acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended configuration space under this bridge
> >
> > Apparently the only mention of [mem 0x80000000-0x8fffffff] in the
> > firmware/kernel interface is as an EfiMemoryMappedIO region.
> >
> > I think this is a firmware bug, but obviously we're going to have to
> > figure out a way around it.
>
> I just want to share that I did more tests on an Ice Lake server (a
> different generation from my original report and Tony's machine).
>
> The same problem can be found as well.
>
> [ 0.000000] DMI: Intel Corporation M50CYP2SB2U/M50CYP2SB2U, BIOS
> SE5C6200.86B.4018.D65.2010201151 10/20/2020
> [ 0.000000] efi: Remove mem375: MMIO range=[0x80000000-0x8fffffff]
> (256MB) from e820 map
> [ 0.000000] e820: remove [mem 0x80000000-0x8fffffff] reserved
>
> [ 1.528341] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> 0x80000000-0x8fffffff] (base 0x80000000)
> [ 1.566605] [Firmware Info]: PCI: MMCONFIG at [mem
> 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> [ 1.566611] PCI: not using MMCONFIG
>
> This firmware implementation should exist on the existing platforms for
> a long time.
Yes. Frustrating, but I think we have no choice but to make Linux
work with the firmware as it is, whether it is buggy or not. From
your first report, I hoped it was isolated to unreleased firmware that
had a chance of being fixed, but obviously that's not the case.
Bjorn
> Definitely an ambiguity / conflict, but not sure it is a bug when you
> look at from the perspective of how would an EFI runtime service use
> ECAM/MMCONFIG space?
>
> Would it be enough to add this clarification in "EFI 2.9 Table 7-6
> Memory Type Usage after ExitBootServices()"?
>
> s/This memory is not used by the OS./This memory is not used by the OS,
> unless ACPI declares it for another purpose./
In the case of the EDAC driver there isn't any ACPI declaration. It just does
pci_get_device() to find the devices it needs to use with a lookup based on
vendor-id and device-id.
and then uses pci_read_config_dword() to read various offsets from config
space, some of which are in PCIe extended config space.
-Tony
Bjorn Helgaas wrote:
> On Thu, Jan 05, 2023 at 06:04:52PM +0000, Luck, Tony wrote:
> > > Hi Tony, can you share a dmesg log? Does it look like the same thing
> > > Kan reported, where the ECAM space is reported only via an
> > > EfiMemoryMappedIO region and is not otherwise reserved by firmware?
> >
> > Attached are serial logs. "broken" is the one from v6.2-rc2, "revert" is the
> > one with your commit reverted.
> >
> > I don't see the string "ECAM" in either of them.
>
> Yeah, "ECAM" is what the PCIe spec calls it, but Linux logging uses
> "MMCONFIG". Probably should change that.
>
> Anyway, your dmesg log shows the same problem:
>
> DMI: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
> efi: Remove mem48: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> PCI: not using MMCONFIG
> acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended configuration space under this bridge
>
> Apparently the only mention of [mem 0x80000000-0x8fffffff] in the
> firmware/kernel interface is as an EfiMemoryMappedIO region.
>
> I think this is a firmware bug, but obviously we're going to have to
> figure out a way around it.
Definitely an ambiguity / conflict, but not sure it is a bug when you
look at from the perspective of how would an EFI runtime service use
ECAM/MMCONFIG space?
Would it be enough to add this clarification in "EFI 2.9 Table 7-6
Memory Type Usage after ExitBootServices()"?
s/This memory is not used by the OS./This memory is not used by the OS,
unless ACPI declares it for another puporse./
On Thu, Jan 05, 2023 at 11:44:28AM -0800, Dan Williams wrote:
> Bjorn Helgaas wrote:
> > Apparently the only mention of [mem 0x80000000-0x8fffffff] in the
> > firmware/kernel interface is as an EfiMemoryMappedIO region.
> >
> > I think this is a firmware bug, but obviously we're going to have to
> > figure out a way around it.
>
> Definitely an ambiguity / conflict, but not sure it is a bug when you
> look at from the perspective of how would an EFI runtime service use
> ECAM/MMCONFIG space?
I think it's perfectly fine for firmware to advertise ECAM space as an
EfiMemoryMappedIO region via EFI GetMemoryMap() because it certainly
makes sense that EFI runtime services would use config space.
My understanding is that the OS should learn about device address
space via ACPI _CRS, not GetMemoryMap(). The MCFG spec (PCI Firmware
Spec, r3.3, sec 4.1.2) requires ECAM space to be reserved via a
PNP0C02 motherboard device _CRS.
So what I think *is* a bug is that this firmware doesn't report the
ECAM space via PNP0C02 _CRS.
If somebody thinks the lack of this reservation is not a bug, I would
love to hear ideas about how Linux *should* be handling this. There
are many variations on how firmware does things like this, and it's
been a nightmare trying to figure out something that works with all of
them.
> Would it be enough to add this clarification in "EFI 2.9 Table 7-6
> Memory Type Usage after ExitBootServices()"?
>
> s/This memory is not used by the OS./This memory is not used by the OS,
> unless ACPI declares it for another purpose./
I guess the idea is that MCFG is a form of "ACPI declaring it"? I
don't have an explicit citation for it, but I infer at [1] that ACPI
static tables are second-class citizens and not intended as a way of
reserving address space because that would lead to problems booting
old OSes on firmware that provides new tables unknown to the OS.
Bjorn
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.1#n32
Bjorn Helgaas wrote:
> On Thu, Jan 05, 2023 at 11:44:28AM -0800, Dan Williams wrote:
> > Bjorn Helgaas wrote:
>
> > > Apparently the only mention of [mem 0x80000000-0x8fffffff] in the
> > > firmware/kernel interface is as an EfiMemoryMappedIO region.
> > >
> > > I think this is a firmware bug, but obviously we're going to have to
> > > figure out a way around it.
> >
> > Definitely an ambiguity / conflict, but not sure it is a bug when you
> > look at from the perspective of how would an EFI runtime service use
> > ECAM/MMCONFIG space?
>
> I think it's perfectly fine for firmware to advertise ECAM space as an
> EfiMemoryMappedIO region via EFI GetMemoryMap() because it certainly
> makes sense that EFI runtime services would use config space.
>
> My understanding is that the OS should learn about device address
> space via ACPI _CRS, not GetMemoryMap(). The MCFG spec (PCI Firmware
> Spec, r3.3, sec 4.1.2) requires ECAM space to be reserved via a
> PNP0C02 motherboard device _CRS.
>
> So what I think *is* a bug is that this firmware doesn't report the
> ECAM space via PNP0C02 _CRS.
>
> If somebody thinks the lack of this reservation is not a bug, I would
> love to hear ideas about how Linux *should* be handling this. There
> are many variations on how firmware does things like this, and it's
> been a nightmare trying to figure out something that works with all of
> them.
I am trying to get a statement from a BIOS person, but in the meantime I
am confused by this lead in sentence of Note 2 in "PCI Firmware Spec
v3.2 Table 4-2: MCFG Table to Support Enhanced Configuration Space
Access":
If the operating system does not natively comprehend reserving the MMCFG
region, the MMCFG region must be reserved by firmware. The address range
reported in the MCFG table or by _CBA method (see Section 4.1.3) must be
reserved by declaring a motherboard resource...
Which seems to say it is ok for the OS to treat MMCFG space as reserved
by default. It certainly fails the Robustness Principle for the BIOS to
*assume* that the OS can natively comprehend that reservation, but it
seems Linux is in its rights to make that assumption.
>
> > Would it be enough to add this clarification in "EFI 2.9 Table 7-6
> > Memory Type Usage after ExitBootServices()"?
> >
> > s/This memory is not used by the OS./This memory is not used by the OS,
> > unless ACPI declares it for another purpose./
>
> I guess the idea is that MCFG is a form of "ACPI declaring it"? I
> don't have an explicit citation for it, but I infer at [1] that ACPI
> static tables are second-class citizens and not intended as a way of
> reserving address space because that would lead to problems booting
> old OSes on firmware that provides new tables unknown to the OS.
Ah, true, certainly for new stuff, but what about MCFG specifically?
What harm is there an assuming that MMCONFIG intersecting with
EfiMemoryMappedIO shall be treated as reserved for MMCONFIG usage.
>
> Bjorn
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.1#n32
On Thu, Jan 05, 2023 at 07:58:47PM +0000, Luck, Tony wrote:
> > Definitely an ambiguity / conflict, but not sure it is a bug when you
> > look at from the perspective of how would an EFI runtime service use
> > ECAM/MMCONFIG space?
> >
> > Would it be enough to add this clarification in "EFI 2.9 Table 7-6
> > Memory Type Usage after ExitBootServices()"?
> >
> > s/This memory is not used by the OS./This memory is not used by the OS,
> > unless ACPI declares it for another purpose./
>
> In the case of the EDAC driver there isn't any ACPI declaration. It just does
> pci_get_device() to find the devices it needs to use with a lookup based on
> vendor-id and device-id.
The EDAC driver wouldn't have any ACPI stuff in it; it's just that
Linux is looking for ACPI info about the ECAM area.
I think the problem here is that the ECAM/MMCONFIG code checks to make
sure the ECAM space is reserved somewhere. If it doesn't find a
reservation either in E820 [1] or by PNP0C01/PNP0C02 _CRS methods [2],
it decides it's not safe to use ECAM, which means we may only have the
old accessors that can only reach 256 bytes of config space.
Bjorn
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/pci/mmconfig-shared.c?id=v6.1#n447
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/pci/mmconfig-shared.c?id=v6.1#n428
Bjorn Helgaas wrote:
> On Thu, Jan 05, 2023 at 01:20:36PM -0800, Dan Williams wrote:
> > Bjorn Helgaas wrote:
> > > On Thu, Jan 05, 2023 at 11:44:28AM -0800, Dan Williams wrote:
> > > > Bjorn Helgaas wrote:
> > >
> > > > > Apparently the only mention of [mem 0x80000000-0x8fffffff] in the
> > > > > firmware/kernel interface is as an EfiMemoryMappedIO region.
> > > > >
> > > > > I think this is a firmware bug, but obviously we're going to have to
> > > > > figure out a way around it.
> > > >
> > > > Definitely an ambiguity / conflict, but not sure it is a bug when you
> > > > look at from the perspective of how would an EFI runtime service use
> > > > ECAM/MMCONFIG space?
> > >
> > > I think it's perfectly fine for firmware to advertise ECAM space as an
> > > EfiMemoryMappedIO region via EFI GetMemoryMap() because it certainly
> > > makes sense that EFI runtime services would use config space.
> > >
> > > My understanding is that the OS should learn about device address
> > > space via ACPI _CRS, not GetMemoryMap(). The MCFG spec (PCI Firmware
> > > Spec, r3.3, sec 4.1.2) requires ECAM space to be reserved via a
> > > PNP0C02 motherboard device _CRS.
> > >
> > > So what I think *is* a bug is that this firmware doesn't report the
> > > ECAM space via PNP0C02 _CRS.
> > >
> > > If somebody thinks the lack of this reservation is not a bug, I would
> > > love to hear ideas about how Linux *should* be handling this. There
> > > are many variations on how firmware does things like this, and it's
> > > been a nightmare trying to figure out something that works with all of
> > > them.
> >
> > I am trying to get a statement from a BIOS person, but in the meantime I
> > am confused by this lead in sentence of Note 2 in "PCI Firmware Spec
> > v3.2 Table 4-2: MCFG Table to Support Enhanced Configuration Space
> > Access":
> >
> > If the operating system does not natively comprehend reserving the MMCFG
> > region, the MMCFG region must be reserved by firmware. The address range
> > reported in the MCFG table or by _CBA method (see Section 4.1.3) must be
> > reserved by declaring a motherboard resource...
> >
> > Which seems to say it is ok for the OS to treat MMCFG space as reserved
> > by default. It certainly fails the Robustness Principle for the BIOS to
> > *assume* that the OS can natively comprehend that reservation, but it
> > seems Linux is in its rights to make that assumption.
>
> I read "OS natively comprehends MMCFG space" as meaning "the OS has
> device-specific knowledge of the PCI host bridge and the associated
> MMCFG space." But in that case, the OS wouldn't need MCFG at all, so
> maybe I'm not reading it right.
>
> There must have been some reason for that sentence, e.g., some system
> that didn't or couldn't report MMCFG via PNP0C02 _CBA, but it sure
> makes a mess of what could have been a simple "range must be reserved"
> statement.
>
> > > > Would it be enough to add this clarification in "EFI 2.9 Table 7-6
> > > > Memory Type Usage after ExitBootServices()"?
> > > >
> > > > s/This memory is not used by the OS./This memory is not used by the OS,
> > > > unless ACPI declares it for another purpose./
> > >
> > > I guess the idea is that MCFG is a form of "ACPI declaring it"? I
> > > don't have an explicit citation for it, but I infer at [1] that ACPI
> > > static tables are second-class citizens and not intended as a way of
> > > reserving address space because that would lead to problems booting
> > > old OSes on firmware that provides new tables unknown to the OS.
> >
> > Ah, true, certainly for new stuff, but what about MCFG specifically?
> > What harm is there an assuming that MMCONFIG intersecting with
> > EfiMemoryMappedIO shall be treated as reserved for MMCONFIG usage.
>
> Probably none, and I think that's what we'll have to do. Ugh.
> Another random special-case rule.
>
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.1#n32
I am still holding out that a BIOS developer can either say "whoops,
populating MMCONFIG in _CRS was overlooked", or point out "if you take
the derivative of the PCI spec, multiply it be the inverse of the EFI
spec and then take the cross-product with the ACPI spec then the memory
type comes out as implicitly reserved".
On Thu, Jan 05, 2023 at 01:20:36PM -0800, Dan Williams wrote:
> Bjorn Helgaas wrote:
> > On Thu, Jan 05, 2023 at 11:44:28AM -0800, Dan Williams wrote:
> > > Bjorn Helgaas wrote:
> >
> > > > Apparently the only mention of [mem 0x80000000-0x8fffffff] in the
> > > > firmware/kernel interface is as an EfiMemoryMappedIO region.
> > > >
> > > > I think this is a firmware bug, but obviously we're going to have to
> > > > figure out a way around it.
> > >
> > > Definitely an ambiguity / conflict, but not sure it is a bug when you
> > > look at from the perspective of how would an EFI runtime service use
> > > ECAM/MMCONFIG space?
> >
> > I think it's perfectly fine for firmware to advertise ECAM space as an
> > EfiMemoryMappedIO region via EFI GetMemoryMap() because it certainly
> > makes sense that EFI runtime services would use config space.
> >
> > My understanding is that the OS should learn about device address
> > space via ACPI _CRS, not GetMemoryMap(). The MCFG spec (PCI Firmware
> > Spec, r3.3, sec 4.1.2) requires ECAM space to be reserved via a
> > PNP0C02 motherboard device _CRS.
> >
> > So what I think *is* a bug is that this firmware doesn't report the
> > ECAM space via PNP0C02 _CRS.
> >
> > If somebody thinks the lack of this reservation is not a bug, I would
> > love to hear ideas about how Linux *should* be handling this. There
> > are many variations on how firmware does things like this, and it's
> > been a nightmare trying to figure out something that works with all of
> > them.
>
> I am trying to get a statement from a BIOS person, but in the meantime I
> am confused by this lead in sentence of Note 2 in "PCI Firmware Spec
> v3.2 Table 4-2: MCFG Table to Support Enhanced Configuration Space
> Access":
>
> If the operating system does not natively comprehend reserving the MMCFG
> region, the MMCFG region must be reserved by firmware. The address range
> reported in the MCFG table or by _CBA method (see Section 4.1.3) must be
> reserved by declaring a motherboard resource...
>
> Which seems to say it is ok for the OS to treat MMCFG space as reserved
> by default. It certainly fails the Robustness Principle for the BIOS to
> *assume* that the OS can natively comprehend that reservation, but it
> seems Linux is in its rights to make that assumption.
I read "OS natively comprehends MMCFG space" as meaning "the OS has
device-specific knowledge of the PCI host bridge and the associated
MMCFG space." But in that case, the OS wouldn't need MCFG at all, so
maybe I'm not reading it right.
There must have been some reason for that sentence, e.g., some system
that didn't or couldn't report MMCFG via PNP0C02 _CBA, but it sure
makes a mess of what could have been a simple "range must be reserved"
statement.
> > > Would it be enough to add this clarification in "EFI 2.9 Table 7-6
> > > Memory Type Usage after ExitBootServices()"?
> > >
> > > s/This memory is not used by the OS./This memory is not used by the OS,
> > > unless ACPI declares it for another purpose./
> >
> > I guess the idea is that MCFG is a form of "ACPI declaring it"? I
> > don't have an explicit citation for it, but I infer at [1] that ACPI
> > static tables are second-class citizens and not intended as a way of
> > reserving address space because that would lead to problems booting
> > old OSes on firmware that provides new tables unknown to the OS.
>
> Ah, true, certainly for new stuff, but what about MCFG specifically?
> What harm is there an assuming that MMCONFIG intersecting with
> EfiMemoryMappedIO shall be treated as reserved for MMCONFIG usage.
Probably none, and I think that's what we'll have to do. Ugh.
Another random special-case rule.
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.1#n32
>
>
> The EDAC driver wouldn't have any ACPI stuff in it; it's just that
> Linux is looking for ACPI info about the ECAM area.
Is there some way for a driver to know that pci_read_config_dword()
is going to fail for offsets in extended PCIe config space?
-Tony
On Thu, Jan 05, 2023 at 01:43:20PM -0800, Dan Williams wrote:
> Bjorn Helgaas wrote:
> > On Thu, Jan 05, 2023 at 01:20:36PM -0800, Dan Williams wrote:
> > > Bjorn Helgaas wrote:
> > > > On Thu, Jan 05, 2023 at 11:44:28AM -0800, Dan Williams wrote:
> > > > > Would it be enough to add this clarification in "EFI 2.9 Table 7-6
> > > > > Memory Type Usage after ExitBootServices()"?
> > > > >
> > > > > s/This memory is not used by the OS./This memory is not used by the OS,
> > > > > unless ACPI declares it for another purpose./
> > > >
> > > > I guess the idea is that MCFG is a form of "ACPI declaring it"? I
> > > > don't have an explicit citation for it, but I infer at [1] that ACPI
> > > > static tables are second-class citizens and not intended as a way of
> > > > reserving address space because that would lead to problems booting
> > > > old OSes on firmware that provides new tables unknown to the OS.
> > >
> > > Ah, true, certainly for new stuff, but what about MCFG specifically?
> > > What harm is there an assuming that MMCONFIG intersecting with
> > > EfiMemoryMappedIO shall be treated as reserved for MMCONFIG usage.
> >
> > Probably none, and I think that's what we'll have to do. Ugh.
> > Another random special-case rule.
> >
> > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.1#n32
>
> I am still holding out that a BIOS developer can either say "whoops,
> populating MMCONFIG in _CRS was overlooked", or point out "if you take
> the derivative of the PCI spec, multiply it be the inverse of the EFI
> spec and then take the cross-product with the ACPI spec then the memory
> type comes out as implicitly reserved".
Hahaha :) Yep, but even if they change it, apparently there are lots
of machines in the field that won't get updated, so we're stuck
working around it.
Or, I guess the best-case scenario would be that it's not actually a
firmware bug, and there's some clean fix we can make to Linux. But
I'm not holding my breath.
On Thu, Jan 05, 2023 at 09:49:44PM +0000, Luck, Tony wrote:
> > The EDAC driver wouldn't have any ACPI stuff in it; it's just that
> > Linux is looking for ACPI info about the ECAM area.
>
> Is there some way for a driver to know that pci_read_config_dword()
> is going to fail for offsets in extended PCIe config space?
There's no nice way to know ahead of time. "raw_pci_ext_ops ==
&pci_mmcfg" means it should work. It looks like pci_direct_conf1 and
pci_mmcfg_numachip would also work, but it's all pretty ugly.
You could check the result for PCI_POSSIBLE_ERROR(), since reads
should return ~0 (PCI_ERROR_RESPONSE) if they fail either because we
can't access extended config space or because there was some PCI
error.
Bjorn
[+cc Tony, Dan]
On Wed, Jan 04, 2023 at 09:39:56AM -0500, Liang, Kan wrote:
> Hi Bjorn,
>
> Happy new year!
>
> We found some PCI issues with the latest 6.2-rc2.
>
> - Using the lspci -xxxx, the extended PCI config space of all PCI
> devices are missed with the latest 6.2-rc2. The system we used had 932
> PCI devices, at least 800 which have extended space as seen when booted
> into a 5.15 kernel. But none of them appeared in 6.2-rc2.
> - The drivers which rely on the information in the extended PCI config
> space don't work anymore. We have confirmed that the perf uncore driver
> (uncore performance monitoring) and Intel VSEC driver (telemetry) don't
> work in 6.2-rc2. There could be more drivers which are impacted.
>
> After a bisect, we found the regression is caused by the below commit
> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").
> After reverting the commit, the issues are gone.
Can you try this patch (based on v6.2-rc1):
commit 89a0067217b0 ("x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space")
parent 1b929c02afd3
Author: Bjorn Helgaas <[email protected]>
Date: Thu Jan 5 16:02:58 2023 -0600
x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
Normally we reject ECAM space unless it is reported as reserved in the E820
table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
means extended config space (offsets 0x100-0xfff) may not be accessible.
Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
normally converted to an E820 entry by a bootloader or EFI stub.
07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
E820 entries that correspond to EfiMemoryMappedIO regions because some
other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
E820 entries prevent Linux from allocating BAR space for hot-added devices.
Allow use of ECAM for extended config space when the region is covered by
an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
_CRS.
Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
Link: https://lore.kernel.org/r/[email protected]
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 758cbfe55daa..4adc587a4c94 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -12,6 +12,7 @@
*/
#include <linux/acpi.h>
+#include <linux/efi.h>
#include <linux/pci.h>
#include <linux/init.h>
#include <linux/bitmap.h>
@@ -442,6 +443,25 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
return mcfg_res.flags;
}
+static bool is_efi_reserved(u64 start, u64 end, enum e820_type not_used)
+{
+ efi_memory_desc_t *md;
+ u64 size, mmio_start, mmio_end;
+
+ for_each_efi_memory_desc(md) {
+ if (md->type == EFI_MEMORY_MAPPED_IO) {
+ size = md->num_pages << EFI_PAGE_SHIFT;
+ mmio_start = md->phys_addr;
+ mmio_end = mmio_start + size - 1;
+
+ if (mmio_start <= start && end <= mmio_end)
+ return true;
+ }
+ }
+
+ return false;
+}
+
typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type);
static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
@@ -452,7 +472,7 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
u64 size = resource_size(&cfg->res);
u64 old_size = size;
int num_buses;
- char *method = with_e820 ? "E820" : "ACPI motherboard resources";
+ char *method = with_e820 ? "E820" : "ACPI motherboard resources or EFI";
while (!is_reserved(addr, addr + size, E820_TYPE_RESERVED)) {
size >>= 1;
@@ -502,15 +522,17 @@ pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int e
if (!early && !acpi_disabled) {
if (is_mmconf_reserved(is_acpi_reserved, cfg, dev, 0))
return true;
+ if (is_mmconf_reserved(is_efi_reserved, cfg, dev, 0))
+ return true;
if (dev)
dev_info(dev, FW_INFO
- "MMCONFIG at %pR not reserved in "
+ "MMCONFIG at %pR not reserved in EFI "
"ACPI motherboard resources\n",
&cfg->res);
else
pr_info(FW_INFO PREFIX
- "MMCONFIG at %pR not reserved in "
+ "MMCONFIG at %pR not reserved in EFI or "
"ACPI motherboard resources\n",
&cfg->res);
}
Bjorn Helgaas wrote:
> [+cc Tony, Dan]
>
> On Wed, Jan 04, 2023 at 09:39:56AM -0500, Liang, Kan wrote:
> > Hi Bjorn,
> >
> > Happy new year!
> >
> > We found some PCI issues with the latest 6.2-rc2.
> >
> > - Using the lspci -xxxx, the extended PCI config space of all PCI
> > devices are missed with the latest 6.2-rc2. The system we used had 932
> > PCI devices, at least 800 which have extended space as seen when booted
> > into a 5.15 kernel. But none of them appeared in 6.2-rc2.
> > - The drivers which rely on the information in the extended PCI config
> > space don't work anymore. We have confirmed that the perf uncore driver
> > (uncore performance monitoring) and Intel VSEC driver (telemetry) don't
> > work in 6.2-rc2. There could be more drivers which are impacted.
> >
> > After a bisect, we found the regression is caused by the below commit
> > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").
> > After reverting the commit, the issues are gone.
>
> Can you try this patch (based on v6.2-rc1):
Looks good to me, one question below, but either way:
Reviewed-by: Dan Williams <[email protected]>
...and Dave, who reported that CXL enumeration was busted in -rc2, says
this patch fixes that. So you can also add:
Tested-by: Dave Jiang <[email protected]>
>
>
> commit 89a0067217b0 ("x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space")
> parent 1b929c02afd3
> Author: Bjorn Helgaas <[email protected]>
> Date: Thu Jan 5 16:02:58 2023 -0600
>
> x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
>
> Normally we reject ECAM space unless it is reported as reserved in the E820
> table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
> means extended config space (offsets 0x100-0xfff) may not be accessible.
>
> Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
> mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
> normally converted to an E820 entry by a bootloader or EFI stub.
>
> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
> E820 entries that correspond to EfiMemoryMappedIO regions because some
> other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
> E820 entries prevent Linux from allocating BAR space for hot-added devices.
>
> Allow use of ECAM for extended config space when the region is covered by
> an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
> _CRS.
>
> Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
> Link: https://lore.kernel.org/r/[email protected]
>
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index 758cbfe55daa..4adc587a4c94 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -12,6 +12,7 @@
> */
>
> #include <linux/acpi.h>
> +#include <linux/efi.h>
> #include <linux/pci.h>
> #include <linux/init.h>
> #include <linux/bitmap.h>
> @@ -442,6 +443,25 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
> return mcfg_res.flags;
> }
>
> +static bool is_efi_reserved(u64 start, u64 end, enum e820_type not_used)
> +{
> + efi_memory_desc_t *md;
> + u64 size, mmio_start, mmio_end;
> +
> + for_each_efi_memory_desc(md) {
> + if (md->type == EFI_MEMORY_MAPPED_IO) {
Should this also consider EFI_RESERVED_TYPE? Not that any known BIOS
needs that accommodation. This is more a question than a suggestion.
> ...and Dave, who reported that CXL enumeration was busted in -rc2, says
> this patch fixes that. So you can also add:
>
> Tested-by: Dave Jiang <[email protected]>
Also seems good for my Broadwell/EDAC system.
Boot messages mentioning MMCONFIG are:
$ dmesg | grep MMCONFIG
[ 12.280360] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[ 12.291606] PCI: not using MMCONFIG
[ 12.873676] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[ 12.893266] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in ACPI motherboard resources or EFI
[ 12.903601] PCI: MMCONFIG for 0000 [bus00-7f] at [mem 0x80000000-0x87ffffff] (base 0x80000000) (size reduced!)
[ 13.385616] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended configuration space under this bridge
[ 14.115617] acpi PNP0A03:01: fail to add MMCONFIG information, can't access extended configuration space under this bridge
[ 17.969639] acpi PNP0A08:02: fail to add MMCONFIG information, can't access extended configuration space under this bridge
[ 18.298640] acpi PNP0A08:03: fail to add MMCONFIG information, can't access extended configuration space under this bridge
The EDAC driver works with this patch applied.
Tested-by: Tony Luck <[email protected]>
-Tony
On Thu, Jan 05, 2023 at 03:38:40PM -0800, Dan Williams wrote:
> Bjorn Helgaas wrote:
> > On Wed, Jan 04, 2023 at 09:39:56AM -0500, Liang, Kan wrote:
> > > Hi Bjorn,
> > >
> > > Happy new year!
> > >
> > > We found some PCI issues with the latest 6.2-rc2.
> > >
> > > - Using the lspci -xxxx, the extended PCI config space of all PCI
> > > devices are missed with the latest 6.2-rc2. The system we used had 932
> > > PCI devices, at least 800 which have extended space as seen when booted
> > > into a 5.15 kernel. But none of them appeared in 6.2-rc2.
> > > - The drivers which rely on the information in the extended PCI config
> > > space don't work anymore. We have confirmed that the perf uncore driver
> > > (uncore performance monitoring) and Intel VSEC driver (telemetry) don't
> > > work in 6.2-rc2. There could be more drivers which are impacted.
> > >
> > > After a bisect, we found the regression is caused by the below commit
> > > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").
> > > After reverting the commit, the issues are gone.
> >
> > Can you try this patch (based on v6.2-rc1):
>
> Looks good to me, one question below, but either way:
>
> Reviewed-by: Dan Williams <[email protected]>
>
> ...and Dave, who reported that CXL enumeration was busted in -rc2, says
> this patch fixes that. So you can also add:
>
> Tested-by: Dave Jiang <[email protected]>
Thanks for all this!
> > +static bool is_efi_reserved(u64 start, u64 end, enum e820_type not_used)
> > +{
> > + efi_memory_desc_t *md;
> > + u64 size, mmio_start, mmio_end;
> > +
> > + for_each_efi_memory_desc(md) {
> > + if (md->type == EFI_MEMORY_MAPPED_IO) {
>
> Should this also consider EFI_RESERVED_TYPE? Not that any known BIOS
> needs that accommodation. This is more a question than a suggestion.
I don't think GetMemoryMap() is intended as a way to tell the OS about
device memory. The OS needs to know what address space goes with what
device and what kind of device it is. The ACPI namespace supplies all
that kind of information, so it doesn't make sense to me that we'd get
some from ACPI and some from EFI.
Also, the EFI spec says EfiReservedMemoryType is "Not usable." But if
ECAM space were described that way, obviously the OS *does* need to
use it, so it doesn't really seem to fit.
I do think the EFI spec is pretty poorly worded. EfiMemoryMappedIO is
"not used by the OS" -- misleading, since the OS *has* to use ECAM and
host bridge apertures. And "all system memory-mapped IO information
should come from ACPI tables" -- well, the EfiMemoryMappedIO region is
itself certainly some kind of information about memory-mapped IO
space! I think it should really refer to the ACPI *namespace*
specifically, not just tables that might include MCFG, etc. IMHO the
static tables like MCFG are basically just a crutch for use before we
know how to parse the namespace.
Anyway, I am inclined to do nothing with EFI_RESERVED_TYPE unless we
come across a system that needs it.
Bjorn
On Fri, Jan 06, 2023 at 12:22:09AM +0000, Luck, Tony wrote:
> > ...and Dave, who reported that CXL enumeration was busted in -rc2, says
> > this patch fixes that. So you can also add:
> >
> > Tested-by: Dave Jiang <[email protected]>
>
> Also seems good for my Broadwell/EDAC system.
>
> Boot messages mentioning MMCONFIG are:
>
> $ dmesg | grep MMCONFIG
> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> PCI: not using MMCONFIG
> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in ACPI motherboard resources or EFI
This part looks ok.
> PCI: MMCONFIG for 0000 [bus00-7f] at [mem 0x80000000-0x87ffffff] (base 0x80000000) (size reduced!)
> acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended configuration space under this bridge
> acpi PNP0A03:01: fail to add MMCONFIG information, can't access extended configuration space under this bridge
> acpi PNP0A08:02: fail to add MMCONFIG information, can't access extended configuration space under this bridge
> acpi PNP0A08:03: fail to add MMCONFIG information, can't access extended configuration space under this bridge
But the rest of this still looks like a regression. From your
previous dmesg log:
PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
PNP0A03:00: host bridge to domain 0000 [bus ff]
PNP0A03:01: host bridge to domain 0000 [bus bf]
PNP0A03:02: host bridge to domain 0000 [bus 7f]
PNP0A03:03: host bridge to domain 0000 [bus 3f]
PNP0A08:00: host bridge to domain 0000 [bus 00-3e]
PNP0A08:01: host bridge to domain 0000 [bus 40-7e]
PNP0A08:02: host bridge to domain 0000 [bus 80-be]
PNP0A08:03: host bridge to domain 0000 [bus c0-fe]
That MMCONFIG space should cover all those buses, but something is
going wrong.
I'll look at this more tomorrow.
Bjorn
On 2023-01-05 5:32 p.m., Bjorn Helgaas wrote:
> [+cc Tony, Dan]
>
> On Wed, Jan 04, 2023 at 09:39:56AM -0500, Liang, Kan wrote:
>> Hi Bjorn,
>>
>> Happy new year!
>>
>> We found some PCI issues with the latest 6.2-rc2.
>>
>> - Using the lspci -xxxx, the extended PCI config space of all PCI
>> devices are missed with the latest 6.2-rc2. The system we used had 932
>> PCI devices, at least 800 which have extended space as seen when booted
>> into a 5.15 kernel. But none of them appeared in 6.2-rc2.
>> - The drivers which rely on the information in the extended PCI config
>> space don't work anymore. We have confirmed that the perf uncore driver
>> (uncore performance monitoring) and Intel VSEC driver (telemetry) don't
>> work in 6.2-rc2. There could be more drivers which are impacted.
>>
>> After a bisect, we found the regression is caused by the below commit
>> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").
>> After reverting the commit, the issues are gone.
>
> Can you try this patch (based on v6.2-rc1):
>
> > commit 89a0067217b0 ("x86/pci: Treat EfiMemoryMappedIO as reservation
of ECAM space")
> parent 1b929c02afd3
> Author: Bjorn Helgaas <[email protected]>
> Date: Thu Jan 5 16:02:58 2023 -0600
>
> x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
>
> Normally we reject ECAM space unless it is reported as reserved in the E820
> table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
> means extended config space (offsets 0x100-0xfff) may not be accessible.
>
> Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
> mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
> normally converted to an E820 entry by a bootloader or EFI stub.
>
> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
> E820 entries that correspond to EfiMemoryMappedIO regions because some
> other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
> E820 entries prevent Linux from allocating BAR space for hot-added devices.
>
> Allow use of ECAM for extended config space when the region is covered by
> an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
> _CRS.
>
> Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
> Link: https://lore.kernel.org/r/[email protected]
>
The patch fixes the issue I reported.
Tested-by: Kan Liang <[email protected]>
Thanks,
Kan
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index 758cbfe55daa..4adc587a4c94 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -12,6 +12,7 @@
> */
>
> #include <linux/acpi.h>
> +#include <linux/efi.h>
> #include <linux/pci.h>
> #include <linux/init.h>
> #include <linux/bitmap.h>
> @@ -442,6 +443,25 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
> return mcfg_res.flags;
> }
>
> +static bool is_efi_reserved(u64 start, u64 end, enum e820_type not_used)
> +{
> + efi_memory_desc_t *md;
> + u64 size, mmio_start, mmio_end;
> +
> + for_each_efi_memory_desc(md) {
> + if (md->type == EFI_MEMORY_MAPPED_IO) {
> + size = md->num_pages << EFI_PAGE_SHIFT;
> + mmio_start = md->phys_addr;
> + mmio_end = mmio_start + size - 1;
> +
> + if (mmio_start <= start && end <= mmio_end)
> + return true;
> + }
> + }
> +
> + return false;
> +}
> +
> typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type);
>
> static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
> @@ -452,7 +472,7 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
> u64 size = resource_size(&cfg->res);
> u64 old_size = size;
> int num_buses;
> - char *method = with_e820 ? "E820" : "ACPI motherboard resources";
> + char *method = with_e820 ? "E820" : "ACPI motherboard resources or EFI";
>
> while (!is_reserved(addr, addr + size, E820_TYPE_RESERVED)) {
> size >>= 1;
> @@ -502,15 +522,17 @@ pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int e
> if (!early && !acpi_disabled) {
> if (is_mmconf_reserved(is_acpi_reserved, cfg, dev, 0))
> return true;
> + if (is_mmconf_reserved(is_efi_reserved, cfg, dev, 0))
> + return true;
>
> if (dev)
> dev_info(dev, FW_INFO
> - "MMCONFIG at %pR not reserved in "
> + "MMCONFIG at %pR not reserved in EFI "
> "ACPI motherboard resources\n",
> &cfg->res);
> else
> pr_info(FW_INFO PREFIX
> - "MMCONFIG at %pR not reserved in "
> + "MMCONFIG at %pR not reserved in EFI or "
> "ACPI motherboard resources\n",
> &cfg->res);
> }
[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; all text you find below is based on a few templates
paragraphs you might have encountered already already in similar form.
See link in footer if these mails annoy you.]
On 04.01.23 15:39, Liang, Kan wrote:
> Hi Bjorn,
>
> Happy new year!
>
> We found some PCI issues with the latest 6.2-rc2.
>
> - Using the lspci -xxxx, the extended PCI config space of all PCI
> devices are missed with the latest 6.2-rc2. The system we used had 932
> PCI devices, at least 800 which have extended space as seen when booted
> into a 5.15 kernel. But none of them appeared in 6.2-rc2.
> - The drivers which rely on the information in the extended PCI config
> space don't work anymore. We have confirmed that the perf uncore driver
> (uncore performance monitoring) and Intel VSEC driver (telemetry) don't
> work in 6.2-rc2. There could be more drivers which are impacted.
>
> After a bisect, we found the regression is caused by the below commit
> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").
> After reverting the commit, the issues are gone.
>
> Could you please take a look at the issues?
Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:
#regzbot ^introduced 07eab0901ede
#regzbot title x86/pci: extended PCI config space is missed
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (see page linked in footer for details).
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
On Thu, Jan 05, 2023 at 06:47:44PM -0600, Bjorn Helgaas wrote:
> On Fri, Jan 06, 2023 at 12:22:09AM +0000, Luck, Tony wrote:
> > > ...and Dave, who reported that CXL enumeration was busted in -rc2, says
> > > this patch fixes that. So you can also add:
> > >
> > > Tested-by: Dave Jiang <[email protected]>
> >
> > Also seems good for my Broadwell/EDAC system.
> >
> > Boot messages mentioning MMCONFIG are:
> >
> > $ dmesg | grep MMCONFIG
> > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> > PCI: not using MMCONFIG
> > PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> > PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in ACPI motherboard resources or EFI
>
> This part looks ok.
>
> > PCI: MMCONFIG for 0000 [bus00-7f] at [mem 0x80000000-0x87ffffff] (base 0x80000000) (size reduced!)
> > acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended configuration space under this bridge
> > acpi PNP0A03:01: fail to add MMCONFIG information, can't access extended configuration space under this bridge
> > acpi PNP0A08:02: fail to add MMCONFIG information, can't access extended configuration space under this bridge
> > acpi PNP0A08:03: fail to add MMCONFIG information, can't access extended configuration space under this bridge
>
> But the rest of this still looks like a regression. From your
> previous dmesg log:
>
> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> PNP0A03:00: host bridge to domain 0000 [bus ff]
> PNP0A03:01: host bridge to domain 0000 [bus bf]
> PNP0A03:02: host bridge to domain 0000 [bus 7f]
> PNP0A03:03: host bridge to domain 0000 [bus 3f]
> PNP0A08:00: host bridge to domain 0000 [bus 00-3e]
> PNP0A08:01: host bridge to domain 0000 [bus 40-7e]
> PNP0A08:02: host bridge to domain 0000 [bus 80-be]
> PNP0A08:03: host bridge to domain 0000 [bus c0-fe]
>
> That MMCONFIG space should cover all those buses, but something is
> going wrong.
Tony, would you mind collecting a dmesg log with "efi=debug"? I want
to see the EFI_MEMORY_MAPPED_IO size and what we remove from E820.
Bjorn
> Tony, would you mind collecting a dmesg log with "efi=debug"? I want
> to see the EFI_MEMORY_MAPPED_IO size and what we remove from E820.
Bjorn,
Booted the 6.2-rc1 kernel with the patch that you provided yesterday with efi=debug option.
Compressed dmesg file attached.
-Tony
On Fri, Jan 06, 2023 at 06:03:13PM +0000, Luck, Tony wrote:
> > Tony, would you mind collecting a dmesg log with "efi=debug"? I want
> > to see the EFI_MEMORY_MAPPED_IO size and what we remove from E820.
>
> Bjorn,
>
> Booted the 6.2-rc1 kernel with the patch that you provided yesterday with efi=debug option.
>
> Compressed dmesg file attached.
Thanks, Tony! Something is wrong with the EFI MMIO removal
(obviously), but I don't see what it is. Could you try the patch
below (replacement for previous one, with more debug)?
commit ce347c04cc2f ("x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space")
parent 1b929c02afd3
Author: Bjorn Helgaas <[email protected]>
Date: Thu Jan 5 16:02:58 2023 -0600
x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
Normally we reject ECAM space unless it is reported as reserved in the E820
table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
means extended config space (offsets 0x100-0xfff) may not be accessible.
Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
normally converted to an E820 entry by a bootloader or EFI stub.
07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
E820 entries that correspond to EfiMemoryMappedIO regions because some
other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
E820 entries prevent Linux from allocating BAR space for hot-added devices.
Allow use of ECAM for extended config space when the region is covered by
an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
_CRS.
Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
Link: https://lore.kernel.org/r/[email protected]
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 758cbfe55daa..07308f403649 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -12,6 +12,7 @@
*/
#include <linux/acpi.h>
+#include <linux/efi.h>
#include <linux/pci.h>
#include <linux/init.h>
#include <linux/bitmap.h>
@@ -442,6 +443,33 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
return mcfg_res.flags;
}
+static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used)
+{
+ efi_memory_desc_t *md;
+ u64 size, mmio_start, mmio_end;
+
+ pr_info("is_efi_mmio %#lx-%#lx\n",
+ (unsigned long) start, (unsigned long) end);
+ for_each_efi_memory_desc(md) {
+ if (md->type == EFI_MEMORY_MAPPED_IO) {
+ size = md->num_pages << EFI_PAGE_SHIFT;
+ mmio_start = md->phys_addr;
+ mmio_end = mmio_start + size - 1;
+
+ pr_info(" efi_mmio %#lx-%#lx\n",
+ (unsigned long) mmio_start,
+ (unsigned long) mmio_end);
+ if (mmio_start <= start && end <= mmio_end) {
+ pr_info("is_efi_mmio true\n");
+ return true;
+ }
+ }
+ }
+
+ pr_info("is_efi_mmio false\n");
+ return false;
+}
+
typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type);
static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
@@ -452,23 +480,24 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
u64 size = resource_size(&cfg->res);
u64 old_size = size;
int num_buses;
- char *method = with_e820 ? "E820" : "ACPI motherboard resources";
+ char *method = with_e820 ? "E820" : "ACPI motherboard resources or EFI";
+ pr_info("is_mmconf_reserved %ps [bus %02x-%02x] %pR\n",
+ is_reserved, cfg->start_bus, cfg->end_bus, &cfg->res);
while (!is_reserved(addr, addr + size, E820_TYPE_RESERVED)) {
+ pr_info(" %#lx-%#lx (size %#lx) not reserved\n",
+ (unsigned long) addr, (unsigned long) (addr + size - 1),
+ (unsigned long) size);
size >>= 1;
+ pr_info(" size reduced to %#lx\n", (unsigned long) size);
if (size < (16UL<<20))
break;
}
- if (size < (16UL<<20) && size != old_size)
+ if (size < (16UL<<20) && size != old_size) {
+ pr_info("is_mmconf_reserved %ps false\n", is_reserved);
return false;
-
- if (dev)
- dev_info(dev, "MMCONFIG at %pR reserved in %s\n",
- &cfg->res, method);
- else
- pr_info(PREFIX "MMCONFIG at %pR reserved in %s\n",
- &cfg->res, method);
+ }
if (old_size != size) {
/* update end_bus */
@@ -487,30 +516,42 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
&cfg->res, (unsigned long) cfg->address);
else
pr_info(PREFIX
- "MMCONFIG for %04x [bus%02x-%02x] "
+ "MMCONFIG for %04x [bus %02x-%02x] "
"at %pR (base %#lx) (size reduced!)\n",
cfg->segment, cfg->start_bus, cfg->end_bus,
&cfg->res, (unsigned long) cfg->address);
}
+ if (dev)
+ dev_info(dev, "MMCONFIG at %pR reserved in %s\n",
+ &cfg->res, method);
+ else
+ pr_info(PREFIX "MMCONFIG at %pR reserved in %s\n",
+ &cfg->res, method);
+
return true;
}
static bool __ref
pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int early)
{
+ pr_info("pci_mmcfg_check_reserved([bus %02x-%02x] %pR, %s)\n",
+ cfg->start_bus, cfg->end_bus, &cfg->res,
+ early ? "early" : "late");
if (!early && !acpi_disabled) {
if (is_mmconf_reserved(is_acpi_reserved, cfg, dev, 0))
return true;
+ if (is_mmconf_reserved(is_efi_mmio, cfg, dev, 0))
+ return true;
if (dev)
dev_info(dev, FW_INFO
- "MMCONFIG at %pR not reserved in "
+ "MMCONFIG at %pR not reserved in EFI or "
"ACPI motherboard resources\n",
&cfg->res);
else
pr_info(FW_INFO PREFIX
- "MMCONFIG at %pR not reserved in "
+ "MMCONFIG at %pR not reserved in EFI or "
"ACPI motherboard resources\n",
&cfg->res);
}
@@ -536,6 +577,7 @@ static void __init pci_mmcfg_reject_broken(int early)
{
struct pci_mmcfg_region *cfg;
+ pr_info("pci_mmcfg_reject_broken(%s)\n", early ? "early" : "late");
list_for_each_entry(cfg, &pci_mmcfg_list, list) {
if (pci_mmcfg_check_reserved(NULL, cfg, early) == 0) {
pr_info(PREFIX "not using MMCONFIG\n");
@@ -570,6 +612,7 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
unsigned long i;
int entries;
+ pr_info("pci_parse_mcfg\n");
if (!header)
return -EINVAL;
@@ -661,6 +704,7 @@ static int __initdata known_bridge;
void __init pci_mmcfg_early_init(void)
{
+ pr_info("pci_mmcfg_early_init\n");
if (pci_probe & PCI_PROBE_MMCONF) {
if (pci_mmcfg_check_hostbridge())
known_bridge = 1;
@@ -674,6 +718,7 @@ void __init pci_mmcfg_early_init(void)
void __init pci_mmcfg_late_init(void)
{
+ pr_info("pci_mmcfg_late_init\n");
/* MMCONFIG disabled */
if ((pci_probe & PCI_PROBE_MMCONF) == 0)
return;
@@ -725,6 +770,8 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
struct resource *tmp = NULL;
struct pci_mmcfg_region *cfg;
+ dev_info(dev, "pci_mmconfig_insert %02x-%02x addr %#lx\n",
+ start, end, (unsigned long)addr);
if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed)
return -ENODEV;
@@ -788,6 +835,7 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
mutex_unlock(&pci_mmcfg_lock);
+ dev_info(dev, "pci_mmconfig_insert returns %d\n", rc);
return rc;
}
> Thanks, Tony! Something is wrong with the EFI MMIO removal
> (obviously), but I don't see what it is. Could you try the patch
> below (replacement for previous one, with more debug)?
Swapped out old patch for this new one. Booted with efi-debug
New dmesg attached.
-Tony
On Fri, Jan 06, 2023 at 09:37:06PM +0000, Luck, Tony wrote:
> > Thanks, Tony! Something is wrong with the EFI MMIO removal
> > (obviously), but I don't see what it is. Could you try the patch
> > below (replacement for previous one, with more debug)?
>
> Swapped out old patch for this new one. Booted with efi-debug
>
> New dmesg attached.
Thanks! Ah, off-by-one error because e820__mapped_all() expects
"[start, end)" which means is_acpi_reserved() and is_efi_mmio() must
use the same, and I was thinking "[start, end]" like a struct
resource.
The below should work better.
commit 696ac9286d3d ("x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space")
parent 1b929c02afd3
Author: Bjorn Helgaas <[email protected]>
Date: Thu Jan 5 16:02:58 2023 -0600
x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
Normally we reject ECAM space unless it is reported as reserved in the E820
table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
means extended config space (offsets 0x100-0xfff) may not be accessible.
Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
normally converted to an E820 entry by a bootloader or EFI stub.
07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
E820 entries that correspond to EfiMemoryMappedIO regions because some
other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
E820 entries prevent Linux from allocating BAR space for hot-added devices.
Allow use of ECAM for extended config space when the region is covered by
an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
_CRS.
Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
Link: https://lore.kernel.org/r/[email protected]
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 758cbfe55daa..5c6cadd60fef 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -12,6 +12,7 @@
*/
#include <linux/acpi.h>
+#include <linux/efi.h>
#include <linux/pci.h>
#include <linux/init.h>
#include <linux/bitmap.h>
@@ -442,6 +443,34 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
return mcfg_res.flags;
}
+static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used)
+{
+ efi_memory_desc_t *md;
+ u64 size, mmio_start, mmio_end;
+
+ end--; /* caller supplies start, end = start + size */
+ pr_info("is_efi_mmio %#lx-%#lx\n",
+ (unsigned long) start, (unsigned long) end);
+ for_each_efi_memory_desc(md) {
+ if (md->type == EFI_MEMORY_MAPPED_IO) {
+ size = md->num_pages << EFI_PAGE_SHIFT;
+ mmio_start = md->phys_addr;
+ mmio_end = mmio_start + size - 1;
+
+ pr_info(" efi_mmio %#lx-%#lx\n",
+ (unsigned long) mmio_start,
+ (unsigned long) mmio_end);
+ if (mmio_start <= start && end <= mmio_end) {
+ pr_info("is_efi_mmio true\n");
+ return true;
+ }
+ }
+ }
+
+ pr_info("is_efi_mmio false\n");
+ return false;
+}
+
typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type);
static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
@@ -452,23 +481,24 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
u64 size = resource_size(&cfg->res);
u64 old_size = size;
int num_buses;
- char *method = with_e820 ? "E820" : "ACPI motherboard resources";
+ char *method = with_e820 ? "E820" : "ACPI motherboard resources or EFI";
+ pr_info("is_mmconf_reserved %ps [bus %02x-%02x] %pR\n",
+ is_reserved, cfg->start_bus, cfg->end_bus, &cfg->res);
while (!is_reserved(addr, addr + size, E820_TYPE_RESERVED)) {
+ pr_info(" %#lx-%#lx (size %#lx) not reserved\n",
+ (unsigned long) addr, (unsigned long) (addr + size - 1),
+ (unsigned long) size);
size >>= 1;
+ pr_info(" size reduced to %#lx\n", (unsigned long) size);
if (size < (16UL<<20))
break;
}
- if (size < (16UL<<20) && size != old_size)
+ if (size < (16UL<<20) && size != old_size) {
+ pr_info("is_mmconf_reserved %ps false\n", is_reserved);
return false;
-
- if (dev)
- dev_info(dev, "MMCONFIG at %pR reserved in %s\n",
- &cfg->res, method);
- else
- pr_info(PREFIX "MMCONFIG at %pR reserved in %s\n",
- &cfg->res, method);
+ }
if (old_size != size) {
/* update end_bus */
@@ -487,30 +517,42 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
&cfg->res, (unsigned long) cfg->address);
else
pr_info(PREFIX
- "MMCONFIG for %04x [bus%02x-%02x] "
+ "MMCONFIG for %04x [bus %02x-%02x] "
"at %pR (base %#lx) (size reduced!)\n",
cfg->segment, cfg->start_bus, cfg->end_bus,
&cfg->res, (unsigned long) cfg->address);
}
+ if (dev)
+ dev_info(dev, "MMCONFIG at %pR reserved in %s\n",
+ &cfg->res, method);
+ else
+ pr_info(PREFIX "MMCONFIG at %pR reserved in %s\n",
+ &cfg->res, method);
+
return true;
}
static bool __ref
pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int early)
{
+ pr_info("pci_mmcfg_check_reserved([bus %02x-%02x] %pR, %s)\n",
+ cfg->start_bus, cfg->end_bus, &cfg->res,
+ early ? "early" : "late");
if (!early && !acpi_disabled) {
if (is_mmconf_reserved(is_acpi_reserved, cfg, dev, 0))
return true;
+ if (is_mmconf_reserved(is_efi_mmio, cfg, dev, 0))
+ return true;
if (dev)
dev_info(dev, FW_INFO
- "MMCONFIG at %pR not reserved in "
+ "MMCONFIG at %pR not reserved in EFI or "
"ACPI motherboard resources\n",
&cfg->res);
else
pr_info(FW_INFO PREFIX
- "MMCONFIG at %pR not reserved in "
+ "MMCONFIG at %pR not reserved in EFI or "
"ACPI motherboard resources\n",
&cfg->res);
}
@@ -536,6 +578,7 @@ static void __init pci_mmcfg_reject_broken(int early)
{
struct pci_mmcfg_region *cfg;
+ pr_info("pci_mmcfg_reject_broken(%s)\n", early ? "early" : "late");
list_for_each_entry(cfg, &pci_mmcfg_list, list) {
if (pci_mmcfg_check_reserved(NULL, cfg, early) == 0) {
pr_info(PREFIX "not using MMCONFIG\n");
@@ -570,6 +613,7 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
unsigned long i;
int entries;
+ pr_info("pci_parse_mcfg\n");
if (!header)
return -EINVAL;
@@ -661,6 +705,7 @@ static int __initdata known_bridge;
void __init pci_mmcfg_early_init(void)
{
+ pr_info("pci_mmcfg_early_init\n");
if (pci_probe & PCI_PROBE_MMCONF) {
if (pci_mmcfg_check_hostbridge())
known_bridge = 1;
@@ -674,6 +719,7 @@ void __init pci_mmcfg_early_init(void)
void __init pci_mmcfg_late_init(void)
{
+ pr_info("pci_mmcfg_late_init\n");
/* MMCONFIG disabled */
if ((pci_probe & PCI_PROBE_MMCONF) == 0)
return;
@@ -725,6 +771,8 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
struct resource *tmp = NULL;
struct pci_mmcfg_region *cfg;
+ dev_info(dev, "pci_mmconfig_insert %02x-%02x addr %#lx\n",
+ start, end, (unsigned long)addr);
if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed)
return -ENODEV;
@@ -788,6 +836,7 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
mutex_unlock(&pci_mmcfg_lock);
+ dev_info(dev, "pci_mmconfig_insert returns %d\n", rc);
return rc;
}
> Thanks! Ah, off-by-one error because e820__mapped_all() expects
> "[start, end)" which means is_acpi_reserved() and is_efi_mmio() must
> use the same, and I was thinking "[start, end]" like a struct
> resource.
>
> The below should work better.
Applied in place of earlier patch.
The basic MMCONFIG messages look better:
$ dmesg | grep MMCONFIG
[ 12.985055] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[ 13.078050] PCI: not using MMCONFIG
[ 13.666053] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[ 13.795049] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in ACPI motherboard resources or EFI
Full dmesg attached.
-Tony
Verified this updated patch on SPR DNP and SPR MCC, it fixes the perf uncore driver not working issue.
Tested-by: Yunying Sun <[email protected]>
-Yunying
-----Original Message-----
From: Bjorn Helgaas <[email protected]>
Sent: Saturday, 7 January, 2023 06:05
To: Luck, Tony <[email protected]>
Cc: Williams, Dan J <[email protected]>; Liang, Kan <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Box, David E <[email protected]>; Sun, Yunying <[email protected]>; Jiang, Dave <[email protected]>
Subject: Re: Bug report: the extended PCI config space is missed with 6.2-rc2
On Fri, Jan 06, 2023 at 09:37:06PM +0000, Luck, Tony wrote:
> > Thanks, Tony! Something is wrong with the EFI MMIO removal
> > (obviously), but I don't see what it is. Could you try the patch
> > below (replacement for previous one, with more debug)?
>
> Swapped out old patch for this new one. Booted with efi-debug
>
> New dmesg attached.
Thanks! Ah, off-by-one error because e820__mapped_all() expects "[start, end)" which means is_acpi_reserved() and is_efi_mmio() must use the same, and I was thinking "[start, end]" like a struct resource.
The below should work better.
commit 696ac9286d3d ("x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space") parent 1b929c02afd3
Author: Bjorn Helgaas <[email protected]>
Date: Thu Jan 5 16:02:58 2023 -0600
x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
Normally we reject ECAM space unless it is reported as reserved in the E820
table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
means extended config space (offsets 0x100-0xfff) may not be accessible.
Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
normally converted to an E820 entry by a bootloader or EFI stub.
07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
E820 entries that correspond to EfiMemoryMappedIO regions because some
other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
E820 entries prevent Linux from allocating BAR space for hot-added devices.
Allow use of ECAM for extended config space when the region is covered by
an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
_CRS.
Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
Link: https://lore.kernel.org/r/[email protected]
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 758cbfe55daa..5c6cadd60fef 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -12,6 +12,7 @@
*/
#include <linux/acpi.h>
+#include <linux/efi.h>
#include <linux/pci.h>
#include <linux/init.h>
#include <linux/bitmap.h>
@@ -442,6 +443,34 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
return mcfg_res.flags;
}
+static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used) {
+ efi_memory_desc_t *md;
+ u64 size, mmio_start, mmio_end;
+
+ end--; /* caller supplies start, end = start + size */
+ pr_info("is_efi_mmio %#lx-%#lx\n",
+ (unsigned long) start, (unsigned long) end);
+ for_each_efi_memory_desc(md) {
+ if (md->type == EFI_MEMORY_MAPPED_IO) {
+ size = md->num_pages << EFI_PAGE_SHIFT;
+ mmio_start = md->phys_addr;
+ mmio_end = mmio_start + size - 1;
+
+ pr_info(" efi_mmio %#lx-%#lx\n",
+ (unsigned long) mmio_start,
+ (unsigned long) mmio_end);
+ if (mmio_start <= start && end <= mmio_end) {
+ pr_info("is_efi_mmio true\n");
+ return true;
+ }
+ }
+ }
+
+ pr_info("is_efi_mmio false\n");
+ return false;
+}
+
typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type);
static bool __ref is_mmconf_reserved(check_reserved_t is_reserved, @@ -452,23 +481,24 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
u64 size = resource_size(&cfg->res);
u64 old_size = size;
int num_buses;
- char *method = with_e820 ? "E820" : "ACPI motherboard resources";
+ char *method = with_e820 ? "E820" : "ACPI motherboard resources or
+EFI";
+ pr_info("is_mmconf_reserved %ps [bus %02x-%02x] %pR\n",
+ is_reserved, cfg->start_bus, cfg->end_bus, &cfg->res);
while (!is_reserved(addr, addr + size, E820_TYPE_RESERVED)) {
+ pr_info(" %#lx-%#lx (size %#lx) not reserved\n",
+ (unsigned long) addr, (unsigned long) (addr + size - 1),
+ (unsigned long) size);
size >>= 1;
+ pr_info(" size reduced to %#lx\n", (unsigned long) size);
if (size < (16UL<<20))
break;
}
- if (size < (16UL<<20) && size != old_size)
+ if (size < (16UL<<20) && size != old_size) {
+ pr_info("is_mmconf_reserved %ps false\n", is_reserved);
return false;
-
- if (dev)
- dev_info(dev, "MMCONFIG at %pR reserved in %s\n",
- &cfg->res, method);
- else
- pr_info(PREFIX "MMCONFIG at %pR reserved in %s\n",
- &cfg->res, method);
+ }
if (old_size != size) {
/* update end_bus */
@@ -487,30 +517,42 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
&cfg->res, (unsigned long) cfg->address);
else
pr_info(PREFIX
- "MMCONFIG for %04x [bus%02x-%02x] "
+ "MMCONFIG for %04x [bus %02x-%02x] "
"at %pR (base %#lx) (size reduced!)\n",
cfg->segment, cfg->start_bus, cfg->end_bus,
&cfg->res, (unsigned long) cfg->address);
}
+ if (dev)
+ dev_info(dev, "MMCONFIG at %pR reserved in %s\n",
+ &cfg->res, method);
+ else
+ pr_info(PREFIX "MMCONFIG at %pR reserved in %s\n",
+ &cfg->res, method);
+
return true;
}
static bool __ref
pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int early) {
+ pr_info("pci_mmcfg_check_reserved([bus %02x-%02x] %pR, %s)\n",
+ cfg->start_bus, cfg->end_bus, &cfg->res,
+ early ? "early" : "late");
if (!early && !acpi_disabled) {
if (is_mmconf_reserved(is_acpi_reserved, cfg, dev, 0))
return true;
+ if (is_mmconf_reserved(is_efi_mmio, cfg, dev, 0))
+ return true;
if (dev)
dev_info(dev, FW_INFO
- "MMCONFIG at %pR not reserved in "
+ "MMCONFIG at %pR not reserved in EFI or "
"ACPI motherboard resources\n",
&cfg->res);
else
pr_info(FW_INFO PREFIX
- "MMCONFIG at %pR not reserved in "
+ "MMCONFIG at %pR not reserved in EFI or "
"ACPI motherboard resources\n",
&cfg->res);
}
@@ -536,6 +578,7 @@ static void __init pci_mmcfg_reject_broken(int early) {
struct pci_mmcfg_region *cfg;
+ pr_info("pci_mmcfg_reject_broken(%s)\n", early ? "early" : "late");
list_for_each_entry(cfg, &pci_mmcfg_list, list) {
if (pci_mmcfg_check_reserved(NULL, cfg, early) == 0) {
pr_info(PREFIX "not using MMCONFIG\n"); @@ -570,6 +613,7 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
unsigned long i;
int entries;
+ pr_info("pci_parse_mcfg\n");
if (!header)
return -EINVAL;
@@ -661,6 +705,7 @@ static int __initdata known_bridge;
void __init pci_mmcfg_early_init(void)
{
+ pr_info("pci_mmcfg_early_init\n");
if (pci_probe & PCI_PROBE_MMCONF) {
if (pci_mmcfg_check_hostbridge())
known_bridge = 1;
@@ -674,6 +719,7 @@ void __init pci_mmcfg_early_init(void)
void __init pci_mmcfg_late_init(void)
{
+ pr_info("pci_mmcfg_late_init\n");
/* MMCONFIG disabled */
if ((pci_probe & PCI_PROBE_MMCONF) == 0)
return;
@@ -725,6 +771,8 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
struct resource *tmp = NULL;
struct pci_mmcfg_region *cfg;
+ dev_info(dev, "pci_mmconfig_insert %02x-%02x addr %#lx\n",
+ start, end, (unsigned long)addr);
if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed)
return -ENODEV;
@@ -788,6 +836,7 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
mutex_unlock(&pci_mmcfg_lock);
+ dev_info(dev, "pci_mmconfig_insert returns %d\n", rc);
return rc;
}
Hi Bjorn,
I tested this first version patch too. It fixes the uncore driver issue on SPR DNP, but does not work on SPR MCC.
Compressed dmesg files attached.
-Yunying
-----Original Message-----
From: Bjorn Helgaas <[email protected]>
Sent: Friday, 6 January, 2023 06:33
To: Liang, Kan <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Box, David E <[email protected]>; Sun, Yunying <[email protected]>; Luck, Tony <[email protected]>; Williams, Dan J <[email protected]>
Subject: Re: Bug report: the extended PCI config space is missed with 6.2-rc2
[+cc Tony, Dan]
On Wed, Jan 04, 2023 at 09:39:56AM -0500, Liang, Kan wrote:
> Hi Bjorn,
>
> Happy new year!
>
> We found some PCI issues with the latest 6.2-rc2.
>
> - Using the lspci -xxxx, the extended PCI config space of all PCI
> devices are missed with the latest 6.2-rc2. The system we used had 932
> PCI devices, at least 800 which have extended space as seen when
> booted into a 5.15 kernel. But none of them appeared in 6.2-rc2.
> - The drivers which rely on the information in the extended PCI config
> space don't work anymore. We have confirmed that the perf uncore
> driver (uncore performance monitoring) and Intel VSEC driver
> (telemetry) don't work in 6.2-rc2. There could be more drivers which are impacted.
>
> After a bisect, we found the regression is caused by the below commit
> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").
> After reverting the commit, the issues are gone.
Can you try this patch (based on v6.2-rc1):
commit 89a0067217b0 ("x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space") parent 1b929c02afd3
Author: Bjorn Helgaas <[email protected]>
Date: Thu Jan 5 16:02:58 2023 -0600
x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
Normally we reject ECAM space unless it is reported as reserved in the E820
table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
means extended config space (offsets 0x100-0xfff) may not be accessible.
Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
normally converted to an E820 entry by a bootloader or EFI stub.
07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
E820 entries that correspond to EfiMemoryMappedIO regions because some
other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
E820 entries prevent Linux from allocating BAR space for hot-added devices.
Allow use of ECAM for extended config space when the region is covered by
an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
_CRS.
Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
Link: https://lore.kernel.org/r/[email protected]
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 758cbfe55daa..4adc587a4c94 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -12,6 +12,7 @@
*/
#include <linux/acpi.h>
+#include <linux/efi.h>
#include <linux/pci.h>
#include <linux/init.h>
#include <linux/bitmap.h>
@@ -442,6 +443,25 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
return mcfg_res.flags;
}
+static bool is_efi_reserved(u64 start, u64 end, enum e820_type
+not_used) {
+ efi_memory_desc_t *md;
+ u64 size, mmio_start, mmio_end;
+
+ for_each_efi_memory_desc(md) {
+ if (md->type == EFI_MEMORY_MAPPED_IO) {
+ size = md->num_pages << EFI_PAGE_SHIFT;
+ mmio_start = md->phys_addr;
+ mmio_end = mmio_start + size - 1;
+
+ if (mmio_start <= start && end <= mmio_end)
+ return true;
+ }
+ }
+
+ return false;
+}
+
typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type);
static bool __ref is_mmconf_reserved(check_reserved_t is_reserved, @@ -452,7 +472,7 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
u64 size = resource_size(&cfg->res);
u64 old_size = size;
int num_buses;
- char *method = with_e820 ? "E820" : "ACPI motherboard resources";
+ char *method = with_e820 ? "E820" : "ACPI motherboard resources or
+EFI";
while (!is_reserved(addr, addr + size, E820_TYPE_RESERVED)) {
size >>= 1;
@@ -502,15 +522,17 @@ pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int e
if (!early && !acpi_disabled) {
if (is_mmconf_reserved(is_acpi_reserved, cfg, dev, 0))
return true;
+ if (is_mmconf_reserved(is_efi_reserved, cfg, dev, 0))
+ return true;
if (dev)
dev_info(dev, FW_INFO
- "MMCONFIG at %pR not reserved in "
+ "MMCONFIG at %pR not reserved in EFI "
"ACPI motherboard resources\n",
&cfg->res);
else
pr_info(FW_INFO PREFIX
- "MMCONFIG at %pR not reserved in "
+ "MMCONFIG at %pR not reserved in EFI or "
"ACPI motherboard resources\n",
&cfg->res);
}
On Friday, January 6, 2023 11:04:49 PM CET Bjorn Helgaas wrote:
> On Fri, Jan 06, 2023 at 09:37:06PM +0000, Luck, Tony wrote:
> > > Thanks, Tony! Something is wrong with the EFI MMIO removal
> > > (obviously), but I don't see what it is. Could you try the patch
> > > below (replacement for previous one, with more debug)?
> >
> > Swapped out old patch for this new one. Booted with efi-debug
> >
> > New dmesg attached.
>
> Thanks! Ah, off-by-one error because e820__mapped_all() expects
> "[start, end)" which means is_acpi_reserved() and is_efi_mmio() must
> use the same, and I was thinking "[start, end]" like a struct
> resource.
>
> The below should work better.
>
> commit 696ac9286d3d ("x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space")
> parent 1b929c02afd3
> Author: Bjorn Helgaas <[email protected]>
> Date: Thu Jan 5 16:02:58 2023 -0600
>
> x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
>
> Normally we reject ECAM space unless it is reported as reserved in the E820
> table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
> means extended config space (offsets 0x100-0xfff) may not be accessible.
>
> Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
> mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
> normally converted to an E820 entry by a bootloader or EFI stub.
>
> 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
> E820 entries that correspond to EfiMemoryMappedIO regions because some
> other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
> E820 entries prevent Linux from allocating BAR space for hot-added devices.
>
> Allow use of ECAM for extended config space when the region is covered by
> an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
> _CRS.
>
> Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
> Link: https://lore.kernel.org/r/[email protected]
This prints quite a few debug-level messages to dmesg. I guess you'll post an
update with fewer of them?
In any case, please feel free to add
Acked-by: Rafael J. Wysocki <[email protected]>
to it.
>
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index 758cbfe55daa..5c6cadd60fef 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -12,6 +12,7 @@
> */
>
> #include <linux/acpi.h>
> +#include <linux/efi.h>
> #include <linux/pci.h>
> #include <linux/init.h>
> #include <linux/bitmap.h>
> @@ -442,6 +443,34 @@ static bool is_acpi_reserved(u64 start, u64 end, enum e820_type not_used)
> return mcfg_res.flags;
> }
>
> +static bool is_efi_mmio(u64 start, u64 end, enum e820_type not_used)
> +{
> + efi_memory_desc_t *md;
> + u64 size, mmio_start, mmio_end;
> +
> + end--; /* caller supplies start, end = start + size */
> + pr_info("is_efi_mmio %#lx-%#lx\n",
> + (unsigned long) start, (unsigned long) end);
> + for_each_efi_memory_desc(md) {
> + if (md->type == EFI_MEMORY_MAPPED_IO) {
> + size = md->num_pages << EFI_PAGE_SHIFT;
> + mmio_start = md->phys_addr;
> + mmio_end = mmio_start + size - 1;
> +
> + pr_info(" efi_mmio %#lx-%#lx\n",
> + (unsigned long) mmio_start,
> + (unsigned long) mmio_end);
> + if (mmio_start <= start && end <= mmio_end) {
> + pr_info("is_efi_mmio true\n");
> + return true;
> + }
> + }
> + }
> +
> + pr_info("is_efi_mmio false\n");
> + return false;
> +}
> +
> typedef bool (*check_reserved_t)(u64 start, u64 end, enum e820_type type);
>
> static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
> @@ -452,23 +481,24 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
> u64 size = resource_size(&cfg->res);
> u64 old_size = size;
> int num_buses;
> - char *method = with_e820 ? "E820" : "ACPI motherboard resources";
> + char *method = with_e820 ? "E820" : "ACPI motherboard resources or EFI";
>
> + pr_info("is_mmconf_reserved %ps [bus %02x-%02x] %pR\n",
> + is_reserved, cfg->start_bus, cfg->end_bus, &cfg->res);
> while (!is_reserved(addr, addr + size, E820_TYPE_RESERVED)) {
> + pr_info(" %#lx-%#lx (size %#lx) not reserved\n",
> + (unsigned long) addr, (unsigned long) (addr + size - 1),
> + (unsigned long) size);
> size >>= 1;
> + pr_info(" size reduced to %#lx\n", (unsigned long) size);
> if (size < (16UL<<20))
> break;
> }
>
> - if (size < (16UL<<20) && size != old_size)
> + if (size < (16UL<<20) && size != old_size) {
> + pr_info("is_mmconf_reserved %ps false\n", is_reserved);
> return false;
> -
> - if (dev)
> - dev_info(dev, "MMCONFIG at %pR reserved in %s\n",
> - &cfg->res, method);
> - else
> - pr_info(PREFIX "MMCONFIG at %pR reserved in %s\n",
> - &cfg->res, method);
> + }
>
> if (old_size != size) {
> /* update end_bus */
> @@ -487,30 +517,42 @@ static bool __ref is_mmconf_reserved(check_reserved_t is_reserved,
> &cfg->res, (unsigned long) cfg->address);
> else
> pr_info(PREFIX
> - "MMCONFIG for %04x [bus%02x-%02x] "
> + "MMCONFIG for %04x [bus %02x-%02x] "
> "at %pR (base %#lx) (size reduced!)\n",
> cfg->segment, cfg->start_bus, cfg->end_bus,
> &cfg->res, (unsigned long) cfg->address);
> }
>
> + if (dev)
> + dev_info(dev, "MMCONFIG at %pR reserved in %s\n",
> + &cfg->res, method);
> + else
> + pr_info(PREFIX "MMCONFIG at %pR reserved in %s\n",
> + &cfg->res, method);
> +
> return true;
> }
>
> static bool __ref
> pci_mmcfg_check_reserved(struct device *dev, struct pci_mmcfg_region *cfg, int early)
> {
> + pr_info("pci_mmcfg_check_reserved([bus %02x-%02x] %pR, %s)\n",
> + cfg->start_bus, cfg->end_bus, &cfg->res,
> + early ? "early" : "late");
> if (!early && !acpi_disabled) {
> if (is_mmconf_reserved(is_acpi_reserved, cfg, dev, 0))
> return true;
> + if (is_mmconf_reserved(is_efi_mmio, cfg, dev, 0))
> + return true;
>
> if (dev)
> dev_info(dev, FW_INFO
> - "MMCONFIG at %pR not reserved in "
> + "MMCONFIG at %pR not reserved in EFI or "
> "ACPI motherboard resources\n",
> &cfg->res);
> else
> pr_info(FW_INFO PREFIX
> - "MMCONFIG at %pR not reserved in "
> + "MMCONFIG at %pR not reserved in EFI or "
> "ACPI motherboard resources\n",
> &cfg->res);
> }
> @@ -536,6 +578,7 @@ static void __init pci_mmcfg_reject_broken(int early)
> {
> struct pci_mmcfg_region *cfg;
>
> + pr_info("pci_mmcfg_reject_broken(%s)\n", early ? "early" : "late");
> list_for_each_entry(cfg, &pci_mmcfg_list, list) {
> if (pci_mmcfg_check_reserved(NULL, cfg, early) == 0) {
> pr_info(PREFIX "not using MMCONFIG\n");
> @@ -570,6 +613,7 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
> unsigned long i;
> int entries;
>
> + pr_info("pci_parse_mcfg\n");
> if (!header)
> return -EINVAL;
>
> @@ -661,6 +705,7 @@ static int __initdata known_bridge;
>
> void __init pci_mmcfg_early_init(void)
> {
> + pr_info("pci_mmcfg_early_init\n");
> if (pci_probe & PCI_PROBE_MMCONF) {
> if (pci_mmcfg_check_hostbridge())
> known_bridge = 1;
> @@ -674,6 +719,7 @@ void __init pci_mmcfg_early_init(void)
>
> void __init pci_mmcfg_late_init(void)
> {
> + pr_info("pci_mmcfg_late_init\n");
> /* MMCONFIG disabled */
> if ((pci_probe & PCI_PROBE_MMCONF) == 0)
> return;
> @@ -725,6 +771,8 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
> struct resource *tmp = NULL;
> struct pci_mmcfg_region *cfg;
>
> + dev_info(dev, "pci_mmconfig_insert %02x-%02x addr %#lx\n",
> + start, end, (unsigned long)addr);
> if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed)
> return -ENODEV;
>
> @@ -788,6 +836,7 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
>
> mutex_unlock(&pci_mmcfg_lock);
>
> + dev_info(dev, "pci_mmconfig_insert returns %d\n", rc);
> return rc;
> }
>
>
On Tue, Jan 10, 2023 at 07:12:42PM +0100, Rafael J. Wysocki wrote:
> On Friday, January 6, 2023 11:04:49 PM CET Bjorn Helgaas wrote:
> ...
> > The below should work better.
> >
> > commit 696ac9286d3d ("x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space")
> > parent 1b929c02afd3
> > Author: Bjorn Helgaas <[email protected]>
> > Date: Thu Jan 5 16:02:58 2023 -0600
> >
> > x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
> >
> > Normally we reject ECAM space unless it is reported as reserved in the E820
> > table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). This
> > means extended config space (offsets 0x100-0xfff) may not be accessible.
> >
> > Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
> > mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
> > normally converted to an E820 entry by a bootloader or EFI stub.
> >
> > 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
> > E820 entries that correspond to EfiMemoryMappedIO regions because some
> > other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
> > E820 entries prevent Linux from allocating BAR space for hot-added devices.
> >
> > Allow use of ECAM for extended config space when the region is covered by
> > an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
> > _CRS.
> >
> > Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
> > Link: https://lore.kernel.org/r/[email protected]
>
> This prints quite a few debug-level messages to dmesg. I guess you'll post an
> update with fewer of them?
Right, this was a debugging patch.
> In any case, please feel free to add
>
> Acked-by: Rafael J. Wysocki <[email protected]>
Thanks!