Add a writeup about how PCI host bridges should be described in ACPI
using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
Signed-off-by: Bjorn Helgaas <[email protected]>
---
Documentation/PCI/00-INDEX | 2 +
Documentation/PCI/acpi-info.txt | 136 +++++++++++++++++++++++++++++++++++++++
2 files changed, 138 insertions(+)
create mode 100644 Documentation/PCI/acpi-info.txt
diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX
index 147231f..0780280 100644
--- a/Documentation/PCI/00-INDEX
+++ b/Documentation/PCI/00-INDEX
@@ -1,5 +1,7 @@
00-INDEX
- this file
+acpi-info.txt
+ - info on how PCI host bridges are represented in ACPI
MSI-HOWTO.txt
- the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ.
PCIEBUS-HOWTO.txt
diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.txt
new file mode 100644
index 0000000..ccbcfda
--- /dev/null
+++ b/Documentation/PCI/acpi-info.txt
@@ -0,0 +1,136 @@
+ ACPI considerations for PCI host bridges
+
+The basic requirement is that the ACPI namespace should describe
+*everything* that consumes address space unless there's another
+standard way for the OS to find it [1, 2]. For example, windows that
+are forwarded to PCI by a PCI host bridge should be described via ACPI
+devices, since the OS can't locate the host bridge by itself. PCI
+devices *below* the host bridge do not need to be described via ACPI,
+because the resources they consume are inside the host bridge windows,
+and the OS can discover them via the standard PCI enumeration
+mechanism (using config accesses to read and size the BARs).
+
+This ACPI resource description is done via _CRS methods of devices in
+the ACPI namespace [2]. _CRS methods are like generalized PCI BARs:
+the OS can read _CRS and figure out what resource is being consumed
+even if it doesn't have a driver for the device [3]. That's important
+because it means an old OS can work correctly even on a system with
+new devices unknown to the OS. The new devices won't do anything, but
+the OS can at least make sure no resources conflict with them.
+
+Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for
+reserving address space! The static tables are for things the OS
+needs to know early in boot, before it can parse the ACPI namespace.
+If a new table is defined, an old OS needs to operate correctly even
+though it ignores the table. _CRS allows that because it is generic
+and understood by the old OS; a static table does not.
+
+If the OS is expected to manage an ACPI device, that device will have
+a specific _HID/_CID that tells the OS what driver to bind to it, and
+the _CRS tells the OS and the driver where the device's registers are.
+
+PNP0C02 "motherboard" devices are basically a catch-all. There's no
+programming model for them other than "don't use these resources for
+anything else." So any address space that is (1) not claimed by some
+other ACPI device and (2) should not be assigned by the OS to
+something else, should be claimed by a PNP0C02 _CRS method.
+
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
+describe all the address space they consume. In principle, this would
+be all the windows they forward down to the PCI bus, as well as the
+bridge registers themselves. The bridge registers include things like
+secondary/subordinate bus registers that determine the bus range below
+the bridge, window registers that describe the apertures, etc. These
+are all device-specific, non-architected things, so the only way a
+PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
+contain the device-specific details. These bridge registers also
+include ECAM space, since it is consumed by the bridge.
+
+ACPI defined a Producer/Consumer bit that was intended to distinguish
+the bridge apertures from the bridge registers [4, 5]. However,
+BIOSes didn't use that bit correctly, and the result is that OSes have
+to assume that everything in a PCI host bridge _CRS is a window. That
+leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
+device itself.
+
+The workaround is to describe the bridge registers (including ECAM
+space) in PNP0C02 catch-all devices [6]. With the exception of ECAM,
+the bridge register space is device-specific anyway, so the generic
+PNP0A03/PNP0A08 driver (pci_root.c) has no need to know about it. For
+ECAM, pci_root.c learns about the space from either MCFG or the _CBA
+method.
+
+Note that the PCIe spec actually does require ECAM unless there's a
+standard firmware interface for config access, e.g., the ia64 SAL
+interface [7]. One reason is that we want a generic host bridge
+driver (pci_root.c), and a generic driver requires a generic way to
+access config space.
+
+
+[1] ACPI 6.0, sec 6.1:
+ For any device that is on a non-enumerable type of bus (for
+ example, an ISA bus), OSPM enumerates the devices' identifier(s)
+ and the ACPI system firmware must supply an _HID object ... for
+ each device to enable OSPM to do that.
+
+[2] ACPI 6.0, sec 3.7:
+ The OS enumerates motherboard devices simply by reading through
+ the ACPI Namespace looking for devices with hardware IDs.
+
+ Each device enumerated by ACPI includes ACPI-defined objects in
+ the ACPI Namespace that report the hardware resources the device
+ could occupy [_PRS], an object that reports the resources that are
+ currently used by the device [_CRS], and objects for configuring
+ those resources [_SRS]. The information is used by the Plug and
+ Play OS (OSPM) to configure the devices.
+
+[3] ACPI 6.0, sec 6.2:
+ OSPM uses device configuration objects to configure hardware
+ resources for devices enumerated via ACPI. Device configuration
+ objects provide information about current and possible resource
+ requirements, the relationship between shared resources, and
+ methods for configuring hardware resources.
+
+ When OSPM enumerates a device, it calls _PRS to determine the
+ resource requirements of the device. It may also call _CRS to
+ find the current resource settings for the device. Using this
+ information, the Plug and Play system determines what resources
+ the device should consume and sets those resources by calling the
+ device’s _SRS control method.
+
+ In ACPI, devices can consume resources (for example, legacy
+ keyboards), provide resources (for example, a proprietary PCI
+ bridge), or do both. Unless otherwise specified, resources for a
+ device are assumed to be taken from the nearest matching resource
+ above the device in the device hierarchy.
+
+[4] ACPI 6.0, sec 6.4.3.5.4:
+ Extended Address Space Descriptor
+ General Flags: Bit [0] Consumer/Producer:
+ 1–This device consumes this resource
+ 0–This device produces and consumes this resource
+
+[5] ACPI 6.0, sec 19.6.43:
+ ResourceUsage specifies whether the Memory range is consumed by
+ this device (ResourceConsumer) or passed on to child devices
+ (ResourceProducer). If nothing is specified, then
+ ResourceConsumer is assumed.
+
+[6] PCI Firmware 3.0, sec 4.1.2:
+ If the operating system does not natively comprehend reserving the
+ MMCFG region, the MMCFG region must be reserved by firmware. The
+ address range reported in the MCFG table or by _CBA method (see
+ Section 4.1.3) must be reserved by declaring a motherboard
+ resource. For most systems, the motherboard resource would appear
+ at the root of the ACPI namespace (under \_SB) in a node with a
+ _HID of EISAID (PNP0C02), and the resources in this case should
+ not be claimed in the root PCI bus’s _CRS. The resources can
+ optionally be returned in Int15 E820 or EFIGetMemoryMap as
+ reserved memory but must always be reported through ACPI as a
+ motherboard resource.
+
+[7] PCI Express 3.0, sec 7.2.2:
+ For systems that are PC-compatible, or that do not implement a
+ processor-architecture-specific firmware interface standard that
+ allows access to the Configuration Space, the ECAM is required as
+ defined in this section.
Hi Bjorn
Many thanks for putting this together, it really helps!
One thing below..
> -----Original Message-----
> From: [email protected] [mailto:linux-kernel-
> [email protected]] On Behalf Of Bjorn Helgaas
> Sent: 17 November 2016 18:00
> To: [email protected]
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: [PATCH] PCI: Add information about describing PCI in ACPI
>
> Add a writeup about how PCI host bridges should be described in ACPI
> using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> ---
> Documentation/PCI/00-INDEX | 2 +
> Documentation/PCI/acpi-info.txt | 136
> +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 138 insertions(+)
> create mode 100644 Documentation/PCI/acpi-info.txt
>
> diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX
> index 147231f..0780280 100644
> --- a/Documentation/PCI/00-INDEX
> +++ b/Documentation/PCI/00-INDEX
> @@ -1,5 +1,7 @@
> 00-INDEX
> - this file
> +acpi-info.txt
> + - info on how PCI host bridges are represented in ACPI
> MSI-HOWTO.txt
> - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and
> FAQ.
> PCIEBUS-HOWTO.txt
> diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-
> info.txt
> new file mode 100644
> index 0000000..ccbcfda
> --- /dev/null
> +++ b/Documentation/PCI/acpi-info.txt
> @@ -0,0 +1,136 @@
> + ACPI considerations for PCI host bridges
> +
> +The basic requirement is that the ACPI namespace should describe
> +*everything* that consumes address space unless there's another
> +standard way for the OS to find it [1, 2]. For example, windows that
> +are forwarded to PCI by a PCI host bridge should be described via ACPI
> +devices, since the OS can't locate the host bridge by itself. PCI
> +devices *below* the host bridge do not need to be described via ACPI,
> +because the resources they consume are inside the host bridge windows,
> +and the OS can discover them via the standard PCI enumeration
> +mechanism (using config accesses to read and size the BARs).
> +
> +This ACPI resource description is done via _CRS methods of devices in
> +the ACPI namespace [2]. _CRS methods are like generalized PCI BARs:
> +the OS can read _CRS and figure out what resource is being consumed
> +even if it doesn't have a driver for the device [3]. That's important
> +because it means an old OS can work correctly even on a system with
> +new devices unknown to the OS. The new devices won't do anything, but
> +the OS can at least make sure no resources conflict with them.
> +
> +Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for
> +reserving address space! The static tables are for things the OS
> +needs to know early in boot, before it can parse the ACPI namespace.
> +If a new table is defined, an old OS needs to operate correctly even
> +though it ignores the table. _CRS allows that because it is generic
> +and understood by the old OS; a static table does not.
Right so if my understanding is correct you are saying that resources
described in the MCFG table should also be declared in PNP0C02 devices
so that the PNP driver can reserve these resources.
On the other side the PCI Root bridge driver should not reserve such
resources.
Well if my understanding is correct I think we have a problem here:
http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
As you can see pci_ecam_create() will conflict with the pnp driver
as it will try to reserve the resources from the MCFG table...
Maybe we need to rework pci_ecam_create() ?
Thanks
Gab
> +
> +If the OS is expected to manage an ACPI device, that device will have
> +a specific _HID/_CID that tells the OS what driver to bind to it, and
> +the _CRS tells the OS and the driver where the device's registers are.
> +
> +PNP0C02 "motherboard" devices are basically a catch-all. There's no
> +programming model for them other than "don't use these resources for
> +anything else." So any address space that is (1) not claimed by some
> +other ACPI device and (2) should not be assigned by the OS to
> +something else, should be claimed by a PNP0C02 _CRS method.
> +
> +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
> +describe all the address space they consume. In principle, this would
> +be all the windows they forward down to the PCI bus, as well as the
> +bridge registers themselves. The bridge registers include things like
> +secondary/subordinate bus registers that determine the bus range below
> +the bridge, window registers that describe the apertures, etc. These
> +are all device-specific, non-architected things, so the only way a
> +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
> +contain the device-specific details. These bridge registers also
> +include ECAM space, since it is consumed by the bridge.
> +
> +ACPI defined a Producer/Consumer bit that was intended to distinguish
> +the bridge apertures from the bridge registers [4, 5]. However,
> +BIOSes didn't use that bit correctly, and the result is that OSes have
> +to assume that everything in a PCI host bridge _CRS is a window. That
> +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
> +device itself.
> +
> +The workaround is to describe the bridge registers (including ECAM
> +space) in PNP0C02 catch-all devices [6]. With the exception of ECAM,
> +the bridge register space is device-specific anyway, so the generic
> +PNP0A03/PNP0A08 driver (pci_root.c) has no need to know about it. For
> +ECAM, pci_root.c learns about the space from either MCFG or the _CBA
> +method.
> +
> +Note that the PCIe spec actually does require ECAM unless there's a
> +standard firmware interface for config access, e.g., the ia64 SAL
> +interface [7]. One reason is that we want a generic host bridge
> +driver (pci_root.c), and a generic driver requires a generic way to
> +access config space.
> +
> +
> +[1] ACPI 6.0, sec 6.1:
> + For any device that is on a non-enumerable type of bus (for
> + example, an ISA bus), OSPM enumerates the devices' identifier(s)
> + and the ACPI system firmware must supply an _HID object ... for
> + each device to enable OSPM to do that.
> +
> +[2] ACPI 6.0, sec 3.7:
> + The OS enumerates motherboard devices simply by reading through
> + the ACPI Namespace looking for devices with hardware IDs.
> +
> + Each device enumerated by ACPI includes ACPI-defined objects in
> + the ACPI Namespace that report the hardware resources the device
> + could occupy [_PRS], an object that reports the resources that are
> + currently used by the device [_CRS], and objects for configuring
> + those resources [_SRS]. The information is used by the Plug and
> + Play OS (OSPM) to configure the devices.
> +
> +[3] ACPI 6.0, sec 6.2:
> + OSPM uses device configuration objects to configure hardware
> + resources for devices enumerated via ACPI. Device configuration
> + objects provide information about current and possible resource
> + requirements, the relationship between shared resources, and
> + methods for configuring hardware resources.
> +
> + When OSPM enumerates a device, it calls _PRS to determine the
> + resource requirements of the device. It may also call _CRS to
> + find the current resource settings for the device. Using this
> + information, the Plug and Play system determines what resources
> + the device should consume and sets those resources by calling the
> + device’s _SRS control method.
> +
> + In ACPI, devices can consume resources (for example, legacy
> + keyboards), provide resources (for example, a proprietary PCI
> + bridge), or do both. Unless otherwise specified, resources for a
> + device are assumed to be taken from the nearest matching resource
> + above the device in the device hierarchy.
> +
> +[4] ACPI 6.0, sec 6.4.3.5.4:
> + Extended Address Space Descriptor
> + General Flags: Bit [0] Consumer/Producer:
> + 1–This device consumes this resource
> + 0–This device produces and consumes this resource
> +
> +[5] ACPI 6.0, sec 19.6.43:
> + ResourceUsage specifies whether the Memory range is consumed by
> + this device (ResourceConsumer) or passed on to child devices
> + (ResourceProducer). If nothing is specified, then
> + ResourceConsumer is assumed.
> +
> +[6] PCI Firmware 3.0, sec 4.1.2:
> + If the operating system does not natively comprehend reserving the
> + MMCFG region, the MMCFG region must be reserved by firmware. The
> + address range reported in the MCFG table or by _CBA method (see
> + Section 4.1.3) must be reserved by declaring a motherboard
> + resource. For most systems, the motherboard resource would appear
> + at the root of the ACPI namespace (under \_SB) in a node with a
> + _HID of EISAID (PNP0C02), and the resources in this case should
> + not be claimed in the root PCI bus’s _CRS. The resources can
> + optionally be returned in Int15 E820 or EFIGetMemoryMap as
> + reserved memory but must always be reported through ACPI as a
> + motherboard resource.
> +
> +[7] PCI Express 3.0, sec 7.2.2:
> + For systems that are PC-compatible, or that do not implement a
> + processor-architecture-specific firmware interface standard that
> + allows access to the Configuration Space, the ECAM is required as
> + defined in this section.
On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni wrote:
> > -----Original Message-----
> > From: [email protected] [mailto:linux-kernel-
> > [email protected]] On Behalf Of Bjorn Helgaas
> > Sent: 17 November 2016 18:00
> > +Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for
> > +reserving address space! The static tables are for things the OS
> > +needs to know early in boot, before it can parse the ACPI namespace.
> > +If a new table is defined, an old OS needs to operate correctly even
> > +though it ignores the table. _CRS allows that because it is generic
> > +and understood by the old OS; a static table does not.
>
> Right so if my understanding is correct you are saying that resources
> described in the MCFG table should also be declared in PNP0C02 devices
> so that the PNP driver can reserve these resources.
Yes.
> On the other side the PCI Root bridge driver should not reserve such
> resources.
>
> Well if my understanding is correct I think we have a problem here:
> http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
>
> As you can see pci_ecam_create() will conflict with the pnp driver
> as it will try to reserve the resources from the MCFG table...
>
> Maybe we need to rework pci_ecam_create() ?
I think it's OK as it is.
The pnp/system.c driver does try to reserve PNP0C02 resources, and it
marks them as "not busy". That way they appear in /proc/iomem and
won't be allocated for anything else, but they can still be requested
by drivers, e.g., pci/ecam.c, which will mark them "busy".
This is analogous to what the PCI core does in pci_claim_resource().
This is really a function of the ACPI/PNP *core*, which should reserve
all _CRS resources for all devices (not just PNP0C02 devices). But
it's done by pnp/system.c, and only for PNP0C02, because there's a
bunch of historical baggage there.
You'll also notice that in this case, things are out of order:
logically the pnp/system.c reservation should happen first, but in
fact the pci/ecam.c request happens *before* the pnp/system.c one.
That means the pnp/system.c one might fail and complain "[mem ...]
could not be reserved".
Bjorn
On Thu, Nov 17, 2016 at 6:59 PM, Bjorn Helgaas <[email protected]> wrote:
> Add a writeup about how PCI host bridges should be described in ACPI
> using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
>
> Signed-off-by: Bjorn Helgaas <[email protected]>
Looks great overall, but I have a few comments (below).
> ---
> Documentation/PCI/00-INDEX | 2 +
> Documentation/PCI/acpi-info.txt | 136 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 138 insertions(+)
> create mode 100644 Documentation/PCI/acpi-info.txt
>
> diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX
> index 147231f..0780280 100644
> --- a/Documentation/PCI/00-INDEX
> +++ b/Documentation/PCI/00-INDEX
> @@ -1,5 +1,7 @@
> 00-INDEX
> - this file
> +acpi-info.txt
> + - info on how PCI host bridges are represented in ACPI
> MSI-HOWTO.txt
> - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ.
> PCIEBUS-HOWTO.txt
> diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.txt
> new file mode 100644
> index 0000000..ccbcfda
> --- /dev/null
> +++ b/Documentation/PCI/acpi-info.txt
> @@ -0,0 +1,136 @@
> + ACPI considerations for PCI host bridges
> +
> +The basic requirement is that the ACPI namespace should describe
> +*everything* that consumes address space unless there's another
> +standard way for the OS to find it [1, 2]. For example, windows that
> +are forwarded to PCI by a PCI host bridge should be described via ACPI
> +devices, since the OS can't locate the host bridge by itself. PCI
> +devices *below* the host bridge do not need to be described via ACPI,
> +because the resources they consume are inside the host bridge windows,
> +and the OS can discover them via the standard PCI enumeration
> +mechanism (using config accesses to read and size the BARs).
> +
> +This ACPI resource description is done via _CRS methods of devices in
To be painfully precise, those need not be methods. They may be
static objects too.
> +the ACPI namespace [2]. _CRS methods are like generalized PCI BARs:
> +the OS can read _CRS and figure out what resource is being consumed
> +even if it doesn't have a driver for the device [3]. That's important
> +because it means an old OS can work correctly even on a system with
> +new devices unknown to the OS. The new devices won't do anything, but
> +the OS can at least make sure no resources conflict with them.
> +
> +Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for
> +reserving address space! The static tables are for things the OS
> +needs to know early in boot, before it can parse the ACPI namespace.
> +If a new table is defined, an old OS needs to operate correctly even
> +though it ignores the table. _CRS allows that because it is generic
> +and understood by the old OS; a static table does not.
> +
> +If the OS is expected to manage an ACPI device, that device will have
I'm not very keen on using the term "ACPI device" in documentation as
it is not particularly well defined. As a rule, I prefer to talk
about "non-discoverable devices described via ACPI" or similar.
Accordingly, I'd change the above line to something like "If the OS is
expected to manage a non-discoverable device described via ACPI, that
device will have".
> +a specific _HID/_CID that tells the OS what driver to bind to it, and
> +the _CRS tells the OS and the driver where the device's registers are.
> +
> +PNP0C02 "motherboard" devices are basically a catch-all. There's no
> +programming model for them other than "don't use these resources for
> +anything else." So any address space that is (1) not claimed by some
> +other ACPI device and (2) should not be assigned by the OS to
Here I'd say "any address space that is (1) not claimed by _CRS under
any other device object in the ACPI namespace and (2) ...".
> +something else, should be claimed by a PNP0C02 _CRS method.
Thanks,
Rafael
Hi Bjorn
> -----Original Message-----
> From: Bjorn Helgaas [mailto:[email protected]]
> Sent: 18 November 2016 17:54
> To: Gabriele Paoloni
> Cc: Bjorn Helgaas; [email protected]; linux-
> [email protected]; [email protected]; linux-arm-
> [email protected]; [email protected]
> Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
>
> On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni wrote:
> > > -----Original Message-----
> > > From: [email protected] [mailto:linux-kernel-
> > > [email protected]] On Behalf Of Bjorn Helgaas
> > > Sent: 17 November 2016 18:00
>
> > > +Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms
> for
> > > +reserving address space! The static tables are for things the OS
> > > +needs to know early in boot, before it can parse the ACPI
> namespace.
> > > +If a new table is defined, an old OS needs to operate correctly
> even
> > > +though it ignores the table. _CRS allows that because it is
> generic
> > > +and understood by the old OS; a static table does not.
> >
> > Right so if my understanding is correct you are saying that resources
> > described in the MCFG table should also be declared in PNP0C02
> devices
> > so that the PNP driver can reserve these resources.
>
> Yes.
>
> > On the other side the PCI Root bridge driver should not reserve such
> > resources.
> >
> > Well if my understanding is correct I think we have a problem here:
> > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
> >
> > As you can see pci_ecam_create() will conflict with the pnp driver
> > as it will try to reserve the resources from the MCFG table...
> >
> > Maybe we need to rework pci_ecam_create() ?
>
> I think it's OK as it is.
>
> The pnp/system.c driver does try to reserve PNP0C02 resources, and it
> marks them as "not busy". That way they appear in /proc/iomem and
> won't be allocated for anything else, but they can still be requested
> by drivers, e.g., pci/ecam.c, which will mark them "busy".
>
> This is analogous to what the PCI core does in pci_claim_resource().
> This is really a function of the ACPI/PNP *core*, which should reserve
> all _CRS resources for all devices (not just PNP0C02 devices). But
> it's done by pnp/system.c, and only for PNP0C02, because there's a
> bunch of historical baggage there.
>
> You'll also notice that in this case, things are out of order:
> logically the pnp/system.c reservation should happen first, but in
> fact the pci/ecam.c request happens *before* the pnp/system.c one.
> That means the pnp/system.c one might fail and complain "[mem ...]
> could not be reserved".
Correct me if I am wrong...
So currently we are relying on the fact that pci_ecam_create() is called
before the pnp driver.
If the pnp driver came first we would end up in pci_ecam_create() failing
here:
http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
I am not sure but it seems to me like a bit weak condition to rely on...
what about removing the error condition in pci_ecam_create() and logging
just a dev_info()?
Thanks
Gab
>
> Bjorn
On Sat, Nov 19, 2016 at 12:02:24AM +0100, Rafael J. Wysocki wrote:
> On Thu, Nov 17, 2016 at 6:59 PM, Bjorn Helgaas <[email protected]> wrote:
> > Add a writeup about how PCI host bridges should be described in ACPI
> > using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
> >
> > Signed-off-by: Bjorn Helgaas <[email protected]>
>
> Looks great overall, but I have a few comments (below).
Thanks a lot for taking a look, Rafael! I applied all your suggestions.
Bjorn
On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
> Hi Bjorn
>
> > -----Original Message-----
> > From: Bjorn Helgaas [mailto:[email protected]]
> > Sent: 18 November 2016 17:54
> > To: Gabriele Paoloni
> > Cc: Bjorn Helgaas; [email protected]; linux-
> > [email protected]; [email protected]; linux-arm-
> > [email protected]; [email protected]
> > Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
> >
> > On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni wrote:
> > > > -----Original Message-----
> > > > From: [email protected] [mailto:linux-kernel-
> > > > [email protected]] On Behalf Of Bjorn Helgaas
> > > > Sent: 17 November 2016 18:00
> >
> > > > +Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms
> > for
> > > > +reserving address space! The static tables are for things the OS
> > > > +needs to know early in boot, before it can parse the ACPI
> > namespace.
> > > > +If a new table is defined, an old OS needs to operate correctly
> > even
> > > > +though it ignores the table. _CRS allows that because it is
> > generic
> > > > +and understood by the old OS; a static table does not.
> > >
> > > Right so if my understanding is correct you are saying that resources
> > > described in the MCFG table should also be declared in PNP0C02
> > devices
> > > so that the PNP driver can reserve these resources.
> >
> > Yes.
> >
> > > On the other side the PCI Root bridge driver should not reserve such
> > > resources.
> > >
> > > Well if my understanding is correct I think we have a problem here:
> > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
> > >
> > > As you can see pci_ecam_create() will conflict with the pnp driver
> > > as it will try to reserve the resources from the MCFG table...
> > >
> > > Maybe we need to rework pci_ecam_create() ?
> >
> > I think it's OK as it is.
> >
> > The pnp/system.c driver does try to reserve PNP0C02 resources, and it
> > marks them as "not busy". That way they appear in /proc/iomem and
> > won't be allocated for anything else, but they can still be requested
> > by drivers, e.g., pci/ecam.c, which will mark them "busy".
> >
> > This is analogous to what the PCI core does in pci_claim_resource().
> > This is really a function of the ACPI/PNP *core*, which should reserve
> > all _CRS resources for all devices (not just PNP0C02 devices). But
> > it's done by pnp/system.c, and only for PNP0C02, because there's a
> > bunch of historical baggage there.
> >
> > You'll also notice that in this case, things are out of order:
> > logically the pnp/system.c reservation should happen first, but in
> > fact the pci/ecam.c request happens *before* the pnp/system.c one.
> > That means the pnp/system.c one might fail and complain "[mem ...]
> > could not be reserved".
>
> Correct me if I am wrong...
>
> So currently we are relying on the fact that pci_ecam_create() is called
> before the pnp driver.
> If the pnp driver came first we would end up in pci_ecam_create() failing
> here:
> http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
>
> I am not sure but it seems to me like a bit weak condition to rely on...
> what about removing the error condition in pci_ecam_create() and logging
> just a dev_info()?
Huh. I'm confused. I *thought* it would be safe to reverse the
order, which would effectively be this:
system_pnp_probe
reserve_resources_of_dev
reserve_range
request_mem_region([mem 0xb0000000-0xb1ffffff])
...
pci_ecam_create
request_resource_conflict([mem 0xb0000000-0xb1ffffff])
but I experimented with the patch below on qemu, and it failed as you
predicted:
** res test **
requested [mem 0xa0000000-0xafffffff]
can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with ECAM PNP [mem 0xa0000000-0xafffffff]
I expected the request_resource_conflict() to succeed since it's
completely contained in the "ECAM PNP" region. But I guess I don't
understand kernel/resource.c well enough.
I'm not sure we need to fix anything yet, since we currently do the
ecam.c request before the system.c one, and any change there would be
a long ways off. If/when that *does* change, I think the correct fix
would be to change ecam.c so its request succeeds (by changing the way
it does the request, fixing kernel/resource.c, or whatever) rather
than to reduce the log level and ignore the failure.
Bjorn
diff --git a/arch/x86/pci/init.c b/arch/x86/pci/init.c
index adb62aa..5a35638 100644
--- a/arch/x86/pci/init.c
+++ b/arch/x86/pci/init.c
@@ -7,6 +7,8 @@
in the right sequence from here. */
static __init int pci_arch_init(void)
{
+ struct resource *res, *conflict;
+ static struct resource cfg;
#ifdef CONFIG_PCI_DIRECT
int type = 0;
@@ -39,6 +41,26 @@ static __init int pci_arch_init(void)
dmi_check_skip_isa_align();
+ printk("\n** res test **\n");
+
+ res = request_mem_region(0xa0000000, 0x10000000, "ECAM PNP");
+ printk("requested %pR\n", res);
+ if (!res)
+ return 0;
+ res->flags &= ~IORESOURCE_BUSY;
+
+ cfg.start = 0xa0000000;
+ cfg.end = 0xafffffff;
+ cfg.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+ cfg.name = "PCI ECAM";
+
+ conflict = request_resource_conflict(&iomem_resource, &cfg);
+ if (conflict)
+ printk("can't claim ECAM area %pR: conflict with %s %pR\n",
+ &cfg, conflict->name, conflict);
+
+ printk("\n");
+
return 0;
}
arch_initcall(pci_arch_init);
Hi Bjorn
> -----Original Message-----
> From: [email protected] [mailto:linux-pci-
> [email protected]] On Behalf Of Bjorn Helgaas
> Sent: 21 November 2016 16:47
> To: Gabriele Paoloni
> Cc: Bjorn Helgaas; [email protected]; linux-
> [email protected]; [email protected]; linux-arm-
> [email protected]; [email protected]
> Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
>
> On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
> > Hi Bjorn
> >
> > > -----Original Message-----
> > > From: Bjorn Helgaas [mailto:[email protected]]
> > > Sent: 18 November 2016 17:54
> > > To: Gabriele Paoloni
> > > Cc: Bjorn Helgaas; [email protected]; linux-
> > > [email protected]; [email protected]; linux-arm-
> > > [email protected]; [email protected]
> > > Subject: Re: [PATCH] PCI: Add information about describing PCI in
> ACPI
> > >
> > > On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni wrote:
> > > > > -----Original Message-----
> > > > > From: [email protected] [mailto:linux-kernel-
> > > > > [email protected]] On Behalf Of Bjorn Helgaas
> > > > > Sent: 17 November 2016 18:00
> > >
> > > > > +Static tables like MCFG, HPET, ECDT, etc., are *not*
> mechanisms
> > > for
> > > > > +reserving address space! The static tables are for things the
> OS
> > > > > +needs to know early in boot, before it can parse the ACPI
> > > namespace.
> > > > > +If a new table is defined, an old OS needs to operate
> correctly
> > > even
> > > > > +though it ignores the table. _CRS allows that because it is
> > > generic
> > > > > +and understood by the old OS; a static table does not.
> > > >
> > > > Right so if my understanding is correct you are saying that
> resources
> > > > described in the MCFG table should also be declared in PNP0C02
> > > devices
> > > > so that the PNP driver can reserve these resources.
> > >
> > > Yes.
> > >
> > > > On the other side the PCI Root bridge driver should not reserve
> such
> > > > resources.
> > > >
> > > > Well if my understanding is correct I think we have a problem
> here:
> > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
> > > >
> > > > As you can see pci_ecam_create() will conflict with the pnp
> driver
> > > > as it will try to reserve the resources from the MCFG table...
> > > >
> > > > Maybe we need to rework pci_ecam_create() ?
> > >
> > > I think it's OK as it is.
> > >
> > > The pnp/system.c driver does try to reserve PNP0C02 resources, and
> it
> > > marks them as "not busy". That way they appear in /proc/iomem and
> > > won't be allocated for anything else, but they can still be
> requested
> > > by drivers, e.g., pci/ecam.c, which will mark them "busy".
> > >
> > > This is analogous to what the PCI core does in
> pci_claim_resource().
> > > This is really a function of the ACPI/PNP *core*, which should
> reserve
> > > all _CRS resources for all devices (not just PNP0C02 devices). But
> > > it's done by pnp/system.c, and only for PNP0C02, because there's a
> > > bunch of historical baggage there.
> > >
> > > You'll also notice that in this case, things are out of order:
> > > logically the pnp/system.c reservation should happen first, but in
> > > fact the pci/ecam.c request happens *before* the pnp/system.c one.
> > > That means the pnp/system.c one might fail and complain "[mem ...]
> > > could not be reserved".
> >
> > Correct me if I am wrong...
> >
> > So currently we are relying on the fact that pci_ecam_create() is
> called
> > before the pnp driver.
> > If the pnp driver came first we would end up in pci_ecam_create()
> failing
> > here:
> > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
> >
> > I am not sure but it seems to me like a bit weak condition to rely
> on...
> > what about removing the error condition in pci_ecam_create() and
> logging
> > just a dev_info()?
>
> Huh. I'm confused. I *thought* it would be safe to reverse the
> order, which would effectively be this:
>
> system_pnp_probe
> reserve_resources_of_dev
> reserve_range
> request_mem_region([mem 0xb0000000-0xb1ffffff])
> ...
> pci_ecam_create
> request_resource_conflict([mem 0xb0000000-0xb1ffffff])
>
>
> but I experimented with the patch below on qemu, and it failed as you
> predicted:
>
> ** res test **
> requested [mem 0xa0000000-0xafffffff]
> can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with ECAM
> PNP [mem 0xa0000000-0xafffffff]
>
> I expected the request_resource_conflict() to succeed since it's
> completely contained in the "ECAM PNP" region. But I guess I don't
> understand kernel/resource.c well enough.
I think it fails because effectively the PNP driver is populating the
iomem_resource resource tree and therefore pci_ecam_create() finds that
it cannot add the cfg resource to the same hierarchy as it is already
there...
>
> I'm not sure we need to fix anything yet, since we currently do the
> ecam.c request before the system.c one, and any change there would be
> a long ways off. If/when that *does* change, I think the correct fix
> would be to change ecam.c so its request succeeds (by changing the way
> it does the request, fixing kernel/resource.c, or whatever) rather
> than to reduce the log level and ignore the failure.
Well in my mind I didn't want just to make the error disappear...
If all the resources should be reserved by the PNP driver then ideally
we could take away request_resource_conflict() from pci_ecam_create(),
but this would make buggy some systems with an already shipped BIOS
that relied on pci_ecam_create() reservation rather than PNP reservation.
Just removing the error condition and converting dev_err() into
dev_info() seems to me like accommodating already shipped BIOS images
and flagging a reservation that is already done by somebody else
without compromising the functionality of the PCI Root bridge driver
(so far the only reason why I can see the error condition there is
to catch a buggy MCFG with overlapping addresses; so if this is the
case maybe we need to have a different diagnostic check to make sure
that the MCFG table is alright)
BTW if you think that so far we can keep this as it is I am ok.
Many Thanks
Gab
>
> Bjorn
>
>
> diff --git a/arch/x86/pci/init.c b/arch/x86/pci/init.c
> index adb62aa..5a35638 100644
> --- a/arch/x86/pci/init.c
> +++ b/arch/x86/pci/init.c
> @@ -7,6 +7,8 @@
> in the right sequence from here. */
> static __init int pci_arch_init(void)
> {
> + struct resource *res, *conflict;
> + static struct resource cfg;
> #ifdef CONFIG_PCI_DIRECT
> int type = 0;
>
> @@ -39,6 +41,26 @@ static __init int pci_arch_init(void)
>
> dmi_check_skip_isa_align();
>
> + printk("\n** res test **\n");
> +
> + res = request_mem_region(0xa0000000, 0x10000000, "ECAM PNP");
> + printk("requested %pR\n", res);
> + if (!res)
> + return 0;
> + res->flags &= ~IORESOURCE_BUSY;
> +
> + cfg.start = 0xa0000000;
> + cfg.end = 0xafffffff;
> + cfg.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> + cfg.name = "PCI ECAM";
> +
> + conflict = request_resource_conflict(&iomem_resource, &cfg);
> + if (conflict)
> + printk("can't claim ECAM area %pR: conflict with %s %pR\n",
> + &cfg, conflict->name, conflict);
> +
> + printk("\n");
> +
> return 0;
> }
> arch_initcall(pci_arch_init);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 21, 2016 at 05:23:11PM +0000, Gabriele Paoloni wrote:
> Hi Bjorn
>
> > -----Original Message-----
> > From: [email protected] [mailto:linux-pci-
> > [email protected]] On Behalf Of Bjorn Helgaas
> > Sent: 21 November 2016 16:47
> > To: Gabriele Paoloni
> > Cc: Bjorn Helgaas; [email protected]; linux-
> > [email protected]; [email protected]; linux-arm-
> > [email protected]; [email protected]
> > Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
> >
> > On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
> > > Hi Bjorn
> > >
> > > > -----Original Message-----
> > > > From: Bjorn Helgaas [mailto:[email protected]]
> > > > Sent: 18 November 2016 17:54
> > > > To: Gabriele Paoloni
> > > > Cc: Bjorn Helgaas; [email protected]; linux-
> > > > [email protected]; [email protected]; linux-arm-
> > > > [email protected]; [email protected]
> > > > Subject: Re: [PATCH] PCI: Add information about describing PCI in
> > ACPI
> > > >
> > > > On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni wrote:
> > > > > > -----Original Message-----
> > > > > > From: [email protected] [mailto:linux-kernel-
> > > > > > [email protected]] On Behalf Of Bjorn Helgaas
> > > > > > Sent: 17 November 2016 18:00
> > > >
> > > > > > +Static tables like MCFG, HPET, ECDT, etc., are *not*
> > mechanisms
> > > > for
> > > > > > +reserving address space! The static tables are for things the
> > OS
> > > > > > +needs to know early in boot, before it can parse the ACPI
> > > > namespace.
> > > > > > +If a new table is defined, an old OS needs to operate
> > correctly
> > > > even
> > > > > > +though it ignores the table. _CRS allows that because it is
> > > > generic
> > > > > > +and understood by the old OS; a static table does not.
> > > > >
> > > > > Right so if my understanding is correct you are saying that
> > resources
> > > > > described in the MCFG table should also be declared in PNP0C02
> > > > devices
> > > > > so that the PNP driver can reserve these resources.
> > > >
> > > > Yes.
> > > >
> > > > > On the other side the PCI Root bridge driver should not reserve
> > such
> > > > > resources.
> > > > >
> > > > > Well if my understanding is correct I think we have a problem
> > here:
> > > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
> > > > >
> > > > > As you can see pci_ecam_create() will conflict with the pnp
> > driver
> > > > > as it will try to reserve the resources from the MCFG table...
> > > > >
> > > > > Maybe we need to rework pci_ecam_create() ?
> > > >
> > > > I think it's OK as it is.
> > > >
> > > > The pnp/system.c driver does try to reserve PNP0C02 resources, and
> > it
> > > > marks them as "not busy". That way they appear in /proc/iomem and
> > > > won't be allocated for anything else, but they can still be
> > requested
> > > > by drivers, e.g., pci/ecam.c, which will mark them "busy".
> > > >
> > > > This is analogous to what the PCI core does in
> > pci_claim_resource().
> > > > This is really a function of the ACPI/PNP *core*, which should
> > reserve
> > > > all _CRS resources for all devices (not just PNP0C02 devices). But
> > > > it's done by pnp/system.c, and only for PNP0C02, because there's a
> > > > bunch of historical baggage there.
> > > >
> > > > You'll also notice that in this case, things are out of order:
> > > > logically the pnp/system.c reservation should happen first, but in
> > > > fact the pci/ecam.c request happens *before* the pnp/system.c one.
> > > > That means the pnp/system.c one might fail and complain "[mem ...]
> > > > could not be reserved".
> > >
> > > Correct me if I am wrong...
> > >
> > > So currently we are relying on the fact that pci_ecam_create() is
> > called
> > > before the pnp driver.
> > > If the pnp driver came first we would end up in pci_ecam_create()
> > failing
> > > here:
> > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
> > >
> > > I am not sure but it seems to me like a bit weak condition to rely
> > on...
> > > what about removing the error condition in pci_ecam_create() and
> > logging
> > > just a dev_info()?
> >
> > Huh. I'm confused. I *thought* it would be safe to reverse the
> > order, which would effectively be this:
> >
> > system_pnp_probe
> > reserve_resources_of_dev
> > reserve_range
> > request_mem_region([mem 0xb0000000-0xb1ffffff])
> > ...
> > pci_ecam_create
> > request_resource_conflict([mem 0xb0000000-0xb1ffffff])
> >
> >
> > but I experimented with the patch below on qemu, and it failed as you
> > predicted:
> >
> > ** res test **
> > requested [mem 0xa0000000-0xafffffff]
> > can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with ECAM
> > PNP [mem 0xa0000000-0xafffffff]
> >
> > I expected the request_resource_conflict() to succeed since it's
> > completely contained in the "ECAM PNP" region. But I guess I don't
> > understand kernel/resource.c well enough.
>
> I think it fails because effectively the PNP driver is populating the
> iomem_resource resource tree and therefore pci_ecam_create() finds that
> it cannot add the cfg resource to the same hierarchy as it is already
> there...
Right. I'm just surprised because the PNP reservation is marked
"not busy", and a driver (e.g., ECAM) should still be able to request
the resource.
> > I'm not sure we need to fix anything yet, since we currently do the
> > ecam.c request before the system.c one, and any change there would be
> > a long ways off. If/when that *does* change, I think the correct fix
> > would be to change ecam.c so its request succeeds (by changing the way
> > it does the request, fixing kernel/resource.c, or whatever) rather
> > than to reduce the log level and ignore the failure.
>
> Well in my mind I didn't want just to make the error disappear...
> If all the resources should be reserved by the PNP driver then ideally
> we could take away request_resource_conflict() from pci_ecam_create(),
> but this would make buggy some systems with an already shipped BIOS
> that relied on pci_ecam_create() reservation rather than PNP reservation.
I don't want remove the request from ecam.c. Ideally, there should be
TWO lines in /proc/iomem: one from system.c for "pnp 00:01" or
whatever it is, and a second from ecam.c. The first is the generic
one saying "this region is consumed by a piece of hardware, so don't
put anything else here." The second is the driver-specific one saying
"PCI ECAM owns this region, nobody else can use it."
This is the same way we handle PCI BAR resources. Here are two
examples from my laptop. The first (00:08.0) only has one line:
it has a BAR that consumes address space, but I don't have a driver
for it loaded. The second (00:16.0) does have a driver loaded, so it
has a second line showing that the driver owns the space:
f124a000-f124afff : 0000:00:08.0 # from PCI core
f124d000-f124dfff : 0000:00:16.0 # from PCI core
f124d000-f124dfff : mei_me # from mei_me driver
> Just removing the error condition and converting dev_err() into
> dev_info() seems to me like accommodating already shipped BIOS images
> and flagging a reservation that is already done by somebody else
> without compromising the functionality of the PCI Root bridge driver
> (so far the only reason why I can see the error condition there is
> to catch a buggy MCFG with overlapping addresses; so if this is the
> case maybe we need to have a different diagnostic check to make sure
> that the MCFG table is alright)
Ideally I think we should end up with this:
a0000000-afffffff : pnp 00:01
a0000000-afffffff : PCI ECAM
Realistically right now we'll probably end up with only the "PCI ECAM"
line in /proc/iomem and a warning from system.c about not being able
to reserve the space.
If we ever change things to do the generic PNP reservation first, then
we should fix things so ecam.c can claim the space without an error.
On 17 November 2016 at 17:59, Bjorn Helgaas <[email protected]> wrote:
> Add a writeup about how PCI host bridges should be described in ACPI
> using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> ---
> Documentation/PCI/00-INDEX | 2 +
> Documentation/PCI/acpi-info.txt | 136 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 138 insertions(+)
> create mode 100644 Documentation/PCI/acpi-info.txt
>
> diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX
> index 147231f..0780280 100644
> --- a/Documentation/PCI/00-INDEX
> +++ b/Documentation/PCI/00-INDEX
> @@ -1,5 +1,7 @@
> 00-INDEX
> - this file
> +acpi-info.txt
> + - info on how PCI host bridges are represented in ACPI
> MSI-HOWTO.txt
> - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ.
> PCIEBUS-HOWTO.txt
> diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.txt
> new file mode 100644
> index 0000000..ccbcfda
> --- /dev/null
> +++ b/Documentation/PCI/acpi-info.txt
> @@ -0,0 +1,136 @@
> + ACPI considerations for PCI host bridges
> +
> +The basic requirement is that the ACPI namespace should describe
> +*everything* that consumes address space unless there's another
> +standard way for the OS to find it [1, 2]. For example, windows that
> +are forwarded to PCI by a PCI host bridge should be described via ACPI
> +devices, since the OS can't locate the host bridge by itself. PCI
> +devices *below* the host bridge do not need to be described via ACPI,
> +because the resources they consume are inside the host bridge windows,
> +and the OS can discover them via the standard PCI enumeration
> +mechanism (using config accesses to read and size the BARs).
> +
> +This ACPI resource description is done via _CRS methods of devices in
> +the ACPI namespace [2]. _CRS methods are like generalized PCI BARs:
> +the OS can read _CRS and figure out what resource is being consumed
> +even if it doesn't have a driver for the device [3]. That's important
> +because it means an old OS can work correctly even on a system with
> +new devices unknown to the OS. The new devices won't do anything, but
> +the OS can at least make sure no resources conflict with them.
> +
> +Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for
> +reserving address space! The static tables are for things the OS
> +needs to know early in boot, before it can parse the ACPI namespace.
> +If a new table is defined, an old OS needs to operate correctly even
> +though it ignores the table. _CRS allows that because it is generic
> +and understood by the old OS; a static table does not.
> +
> +If the OS is expected to manage an ACPI device, that device will have
> +a specific _HID/_CID that tells the OS what driver to bind to it, and
> +the _CRS tells the OS and the driver where the device's registers are.
> +
> +PNP0C02 "motherboard" devices are basically a catch-all. There's no
> +programming model for them other than "don't use these resources for
> +anything else." So any address space that is (1) not claimed by some
> +other ACPI device and (2) should not be assigned by the OS to
> +something else, should be claimed by a PNP0C02 _CRS method.
> +
> +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
> +describe all the address space they consume. In principle, this would
> +be all the windows they forward down to the PCI bus, as well as the
> +bridge registers themselves. The bridge registers include things like
> +secondary/subordinate bus registers that determine the bus range below
> +the bridge, window registers that describe the apertures, etc. These
> +are all device-specific, non-architected things, so the only way a
> +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
> +contain the device-specific details. These bridge registers also
> +include ECAM space, since it is consumed by the bridge.
> +
> +ACPI defined a Producer/Consumer bit that was intended to distinguish
> +the bridge apertures from the bridge registers [4, 5]. However,
> +BIOSes didn't use that bit correctly, and the result is that OSes have
> +to assume that everything in a PCI host bridge _CRS is a window. That
> +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
> +device itself.
> +
Is that universally true? Or is it still possible to do the right
thing here on new ACPI architectures such as arm64?
> +The workaround is to describe the bridge registers (including ECAM
> +space) in PNP0C02 catch-all devices [6]. With the exception of ECAM,
> +the bridge register space is device-specific anyway, so the generic
> +PNP0A03/PNP0A08 driver (pci_root.c) has no need to know about it. For
> +ECAM, pci_root.c learns about the space from either MCFG or the _CBA
> +method.
> +
> +Note that the PCIe spec actually does require ECAM unless there's a
> +standard firmware interface for config access, e.g., the ia64 SAL
> +interface [7]. One reason is that we want a generic host bridge
> +driver (pci_root.c), and a generic driver requires a generic way to
> +access config space.
> +
> +
> +[1] ACPI 6.0, sec 6.1:
> + For any device that is on a non-enumerable type of bus (for
> + example, an ISA bus), OSPM enumerates the devices' identifier(s)
> + and the ACPI system firmware must supply an _HID object ... for
> + each device to enable OSPM to do that.
> +
> +[2] ACPI 6.0, sec 3.7:
> + The OS enumerates motherboard devices simply by reading through
> + the ACPI Namespace looking for devices with hardware IDs.
> +
> + Each device enumerated by ACPI includes ACPI-defined objects in
> + the ACPI Namespace that report the hardware resources the device
> + could occupy [_PRS], an object that reports the resources that are
> + currently used by the device [_CRS], and objects for configuring
> + those resources [_SRS]. The information is used by the Plug and
> + Play OS (OSPM) to configure the devices.
> +
> +[3] ACPI 6.0, sec 6.2:
> + OSPM uses device configuration objects to configure hardware
> + resources for devices enumerated via ACPI. Device configuration
> + objects provide information about current and possible resource
> + requirements, the relationship between shared resources, and
> + methods for configuring hardware resources.
> +
> + When OSPM enumerates a device, it calls _PRS to determine the
> + resource requirements of the device. It may also call _CRS to
> + find the current resource settings for the device. Using this
> + information, the Plug and Play system determines what resources
> + the device should consume and sets those resources by calling the
> + device’s _SRS control method.
> +
> + In ACPI, devices can consume resources (for example, legacy
> + keyboards), provide resources (for example, a proprietary PCI
> + bridge), or do both. Unless otherwise specified, resources for a
> + device are assumed to be taken from the nearest matching resource
> + above the device in the device hierarchy.
> +
> +[4] ACPI 6.0, sec 6.4.3.5.4:
> + Extended Address Space Descriptor
> + General Flags: Bit [0] Consumer/Producer:
> + 1–This device consumes this resource
> + 0–This device produces and consumes this resource
> +
> +[5] ACPI 6.0, sec 19.6.43:
> + ResourceUsage specifies whether the Memory range is consumed by
> + this device (ResourceConsumer) or passed on to child devices
> + (ResourceProducer). If nothing is specified, then
> + ResourceConsumer is assumed.
> +
> +[6] PCI Firmware 3.0, sec 4.1.2:
> + If the operating system does not natively comprehend reserving the
> + MMCFG region, the MMCFG region must be reserved by firmware. The
> + address range reported in the MCFG table or by _CBA method (see
> + Section 4.1.3) must be reserved by declaring a motherboard
> + resource. For most systems, the motherboard resource would appear
> + at the root of the ACPI namespace (under \_SB) in a node with a
> + _HID of EISAID (PNP0C02), and the resources in this case should
> + not be claimed in the root PCI bus’s _CRS. The resources can
> + optionally be returned in Int15 E820 or EFIGetMemoryMap as
> + reserved memory but must always be reported through ACPI as a
> + motherboard resource.
> +
> +[7] PCI Express 3.0, sec 7.2.2:
> + For systems that are PC-compatible, or that do not implement a
> + processor-architecture-specific firmware interface standard that
> + allows access to the Configuration Space, the ECAM is required as
> + defined in this section.
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Bjorn
> -----Original Message-----
> From: Bjorn Helgaas [mailto:[email protected]]
> Sent: 21 November 2016 20:10
> To: Gabriele Paoloni
> Cc: Bjorn Helgaas; [email protected]; linux-
> [email protected]; [email protected]; linux-arm-
> [email protected]; [email protected]
> Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
>
> On Mon, Nov 21, 2016 at 05:23:11PM +0000, Gabriele Paoloni wrote:
> > Hi Bjorn
> >
> > > -----Original Message-----
> > > From: [email protected] [mailto:linux-pci-
> > > [email protected]] On Behalf Of Bjorn Helgaas
> > > Sent: 21 November 2016 16:47
> > > To: Gabriele Paoloni
> > > Cc: Bjorn Helgaas; [email protected]; linux-
> > > [email protected]; [email protected]; linux-arm-
> > > [email protected]; [email protected]
> > > Subject: Re: [PATCH] PCI: Add information about describing PCI in
> ACPI
> > >
> > > On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
> > > > Hi Bjorn
> > > >
> > > > > -----Original Message-----
> > > > > From: Bjorn Helgaas [mailto:[email protected]]
> > > > > Sent: 18 November 2016 17:54
> > > > > To: Gabriele Paoloni
> > > > > Cc: Bjorn Helgaas; [email protected]; linux-
> > > > > [email protected]; [email protected]; linux-arm-
> > > > > [email protected]; [email protected]
> > > > > Subject: Re: [PATCH] PCI: Add information about describing PCI
> in
> > > ACPI
> > > > >
> > > > > On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni
> wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: [email protected] [mailto:linux-
> kernel-
> > > > > > > [email protected]] On Behalf Of Bjorn Helgaas
> > > > > > > Sent: 17 November 2016 18:00
> > > > >
> > > > > > > +Static tables like MCFG, HPET, ECDT, etc., are *not*
> > > mechanisms
> > > > > for
> > > > > > > +reserving address space! The static tables are for things
> the
> > > OS
> > > > > > > +needs to know early in boot, before it can parse the ACPI
> > > > > namespace.
> > > > > > > +If a new table is defined, an old OS needs to operate
> > > correctly
> > > > > even
> > > > > > > +though it ignores the table. _CRS allows that because it
> is
> > > > > generic
> > > > > > > +and understood by the old OS; a static table does not.
> > > > > >
> > > > > > Right so if my understanding is correct you are saying that
> > > resources
> > > > > > described in the MCFG table should also be declared in
> PNP0C02
> > > > > devices
> > > > > > so that the PNP driver can reserve these resources.
> > > > >
> > > > > Yes.
> > > > >
> > > > > > On the other side the PCI Root bridge driver should not
> reserve
> > > such
> > > > > > resources.
> > > > > >
> > > > > > Well if my understanding is correct I think we have a problem
> > > here:
> > > > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
> > > > > >
> > > > > > As you can see pci_ecam_create() will conflict with the pnp
> > > driver
> > > > > > as it will try to reserve the resources from the MCFG
> table...
> > > > > >
> > > > > > Maybe we need to rework pci_ecam_create() ?
> > > > >
> > > > > I think it's OK as it is.
> > > > >
> > > > > The pnp/system.c driver does try to reserve PNP0C02 resources,
> and
> > > it
> > > > > marks them as "not busy". That way they appear in /proc/iomem
> and
> > > > > won't be allocated for anything else, but they can still be
> > > requested
> > > > > by drivers, e.g., pci/ecam.c, which will mark them "busy".
> > > > >
> > > > > This is analogous to what the PCI core does in
> > > pci_claim_resource().
> > > > > This is really a function of the ACPI/PNP *core*, which should
> > > reserve
> > > > > all _CRS resources for all devices (not just PNP0C02 devices).
> But
> > > > > it's done by pnp/system.c, and only for PNP0C02, because
> there's a
> > > > > bunch of historical baggage there.
> > > > >
> > > > > You'll also notice that in this case, things are out of order:
> > > > > logically the pnp/system.c reservation should happen first, but
> in
> > > > > fact the pci/ecam.c request happens *before* the pnp/system.c
> one.
> > > > > That means the pnp/system.c one might fail and complain "[mem
> ...]
> > > > > could not be reserved".
> > > >
> > > > Correct me if I am wrong...
> > > >
> > > > So currently we are relying on the fact that pci_ecam_create() is
> > > called
> > > > before the pnp driver.
> > > > If the pnp driver came first we would end up in pci_ecam_create()
> > > failing
> > > > here:
> > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
> > > >
> > > > I am not sure but it seems to me like a bit weak condition to
> rely
> > > on...
> > > > what about removing the error condition in pci_ecam_create() and
> > > logging
> > > > just a dev_info()?
> > >
> > > Huh. I'm confused. I *thought* it would be safe to reverse the
> > > order, which would effectively be this:
> > >
> > > system_pnp_probe
> > > reserve_resources_of_dev
> > > reserve_range
> > > request_mem_region([mem 0xb0000000-0xb1ffffff])
> > > ...
> > > pci_ecam_create
> > > request_resource_conflict([mem 0xb0000000-0xb1ffffff])
> > >
> > >
> > > but I experimented with the patch below on qemu, and it failed as
> you
> > > predicted:
> > >
> > > ** res test **
> > > requested [mem 0xa0000000-0xafffffff]
> > > can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with
> ECAM
> > > PNP [mem 0xa0000000-0xafffffff]
> > >
> > > I expected the request_resource_conflict() to succeed since it's
> > > completely contained in the "ECAM PNP" region. But I guess I don't
> > > understand kernel/resource.c well enough.
> >
> > I think it fails because effectively the PNP driver is populating the
> > iomem_resource resource tree and therefore pci_ecam_create() finds
> that
> > it cannot add the cfg resource to the same hierarchy as it is already
> > there...
>
> Right. I'm just surprised because the PNP reservation is marked
> "not busy", and a driver (e.g., ECAM) should still be able to request
> the resource.
Yes unfortunately pci_ecam_create() is not flexible on the conflict as
pci_request_regions():
http://lxr.free-electrons.com/source/kernel/resource.c#L1155
if the conflict resource is not busy pci_request_regions() will create
a child resource under the conflict sibling and mark it as busy...
or at least this is my understanding...
>
> > > I'm not sure we need to fix anything yet, since we currently do the
> > > ecam.c request before the system.c one, and any change there would
> be
> > > a long ways off. If/when that *does* change, I think the correct
> fix
> > > would be to change ecam.c so its request succeeds (by changing the
> way
> > > it does the request, fixing kernel/resource.c, or whatever) rather
> > > than to reduce the log level and ignore the failure.
> >
> > Well in my mind I didn't want just to make the error disappear...
> > If all the resources should be reserved by the PNP driver then
> ideally
> > we could take away request_resource_conflict() from
> pci_ecam_create(),
> > but this would make buggy some systems with an already shipped BIOS
> > that relied on pci_ecam_create() reservation rather than PNP
> reservation.
>
> I don't want remove the request from ecam.c. Ideally, there should be
> TWO lines in /proc/iomem: one from system.c for "pnp 00:01" or
> whatever it is, and a second from ecam.c. The first is the generic
> one saying "this region is consumed by a piece of hardware, so don't
> put anything else here." The second is the driver-specific one saying
> "PCI ECAM owns this region, nobody else can use it."
>
> This is the same way we handle PCI BAR resources. Here are two
> examples from my laptop. The first (00:08.0) only has one line:
> it has a BAR that consumes address space, but I don't have a driver
> for it loaded. The second (00:16.0) does have a driver loaded, so it
> has a second line showing that the driver owns the space:
>
> f124a000-f124afff : 0000:00:08.0 # from PCI core
>
> f124d000-f124dfff : 0000:00:16.0 # from PCI core
> f124d000-f124dfff : mei_me # from mei_me driver
>
> > Just removing the error condition and converting dev_err() into
> > dev_info() seems to me like accommodating already shipped BIOS images
> > and flagging a reservation that is already done by somebody else
> > without compromising the functionality of the PCI Root bridge driver
> > (so far the only reason why I can see the error condition there is
> > to catch a buggy MCFG with overlapping addresses; so if this is the
> > case maybe we need to have a different diagnostic check to make sure
> > that the MCFG table is alright)
>
> Ideally I think we should end up with this:
>
> a0000000-afffffff : pnp 00:01
> a0000000-afffffff : PCI ECAM
I think that for PCIe device drivers it works ok because it is guaranteed
that their own pci_request_regions() is called always after
pci_claim_resource() of the bridge that is on top of them...
I.e. pci_claim_resource() reserves the resources as not busy and
pci_request_regions() will create a child busy resource
>
> Realistically right now we'll probably end up with only the "PCI ECAM"
> line in /proc/iomem and a warning from system.c about not being able
> to reserve the space.
>
> If we ever change things to do the generic PNP reservation first, then
> we should fix things so ecam.c can claim the space without an error.
Maybe the patch below could be a sort of solution...effectively pci_ecam
should succeed in reserving a busy resource under the conflict resource
in case of PNP driver allocating a non BUSY resource first...
---
drivers/pci/ecam.c | 16 +++++-----------
drivers/pci/host/pci-thunder-ecam.c | 2 +-
include/linux/pci-ecam.h | 2 +-
3 files changed, 7 insertions(+), 13 deletions(-)
diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c
index 43ed08d..999b6ef 100644
--- a/drivers/pci/ecam.c
+++ b/drivers/pci/ecam.c
@@ -66,16 +66,10 @@ struct pci_config_window *pci_ecam_create(struct device *dev,
}
bsz = 1 << ops->bus_shift;
- cfg->res.start = cfgres->start;
- cfg->res.end = cfgres->end;
- cfg->res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- cfg->res.name = "PCI ECAM";
-
- conflict = request_resource_conflict(&iomem_resource, &cfg->res);
- if (conflict) {
+ cfg->res = request_mem_region(cfgres->start, resource_size(cfgres), "PCI ECAM");
+ if (!cfg->res) {
err = -EBUSY;
- dev_err(dev, "can't claim ECAM area %pR: address conflict with %s %pR\n",
- &cfg->res, conflict->name, conflict);
+ dev_err(dev, "can't claim ECAM area %pR\n", &cfg->res);
goto err_exit;
}
@@ -126,8 +120,8 @@ void pci_ecam_free(struct pci_config_window *cfg)
if (cfg->win)
iounmap(cfg->win);
}
- if (cfg->res.parent)
- release_resource(&cfg->res);
+ if (cfg->res->parent)
+ release_region(cfg->res->start, resource_size(cfg->res));
kfree(cfg);
}
diff --git a/drivers/pci/host/pci-thunder-ecam.c b/drivers/pci/host/pci-thunder-ecam.c
index d50a3dc..2e48d9d 100644
--- a/drivers/pci/host/pci-thunder-ecam.c
+++ b/drivers/pci/host/pci-thunder-ecam.c
@@ -117,7 +117,7 @@ static int thunder_ecam_p2_config_read(struct pci_bus *bus, unsigned int devfn,
* the config space access window. Since we are working with
* the high-order 32 bits, shift everything down by 32 bits.
*/
- node_bits = (cfg->res.start >> 32) & (1 << 12);
+ node_bits = (cfg->res->start >> 32) & (1 << 12);
v |= node_bits;
set_val(v, where, size, val);
diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h
index 7adad20..f30a4ea 100644
--- a/include/linux/pci-ecam.h
+++ b/include/linux/pci-ecam.h
@@ -36,7 +36,7 @@ struct pci_ecam_ops {
* use ECAM.
*/
struct pci_config_window {
- struct resource res;
+ struct resource *res;
struct resource busr;
void *priv;
struct pci_ecam_ops *ops;
--
2.7.4
On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
> On 17 November 2016 at 17:59, Bjorn Helgaas <[email protected]> wrote:
> > +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
> > +describe all the address space they consume. In principle, this would
> > +be all the windows they forward down to the PCI bus, as well as the
> > +bridge registers themselves. The bridge registers include things like
> > +secondary/subordinate bus registers that determine the bus range below
> > +the bridge, window registers that describe the apertures, etc. These
> > +are all device-specific, non-architected things, so the only way a
> > +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
> > +contain the device-specific details. These bridge registers also
> > +include ECAM space, since it is consumed by the bridge.
> > +
> > +ACPI defined a Producer/Consumer bit that was intended to distinguish
> > +the bridge apertures from the bridge registers [4, 5]. However,
> > +BIOSes didn't use that bit correctly, and the result is that OSes have
> > +to assume that everything in a PCI host bridge _CRS is a window. That
> > +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
> > +device itself.
>
> Is that universally true? Or is it still possible to do the right
> thing here on new ACPI architectures such as arm64?
That's a very good question. I had thought that the ACPI spec had
given up on Consumer/Producer completely, but I was wrong. In the 6.0
spec, the Consumer/Producer bit is still documented in the Extended
Address Space Descriptor (sec 6.4.3.5.4). It is documented as
"ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
Linux looks at the producer_consumer bit in acpi_decode_space(), which
I think is used for all these descriptors (QWord, DWord, Word, and
Extended). This doesn't quite follow the spec -- we probably should
ignore it except for Extended. In any event, acpi_decode_space() sets
IORESOURCE_WINDOW for Producer descriptors, but we don't test
IORESOURCE_WINDOW in the PCI host bridge code.
x86 and ia64 supply their own pci_acpi_root_prepare_resources()
functions that call acpi_pci_probe_root_resources(), which parses _CRS
and looks at producer_consumer. Then they do a little arch-specific
stuff on the result.
On arm64 we use acpi_pci_probe_root_resources() directly, with no
arch-specific stuff.
On all three arches, we ignore the Consumer/Producer bit, so all the
resources are treated as Producers, e.g., as bridge windows.
I think we *could* implement an arm64 version of
pci_acpi_root_prepare_resources() that would pay attention to the
Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec
compliant, we would have to use Extended descriptors for all bridge
windows, even if they would fit in a DWord or QWord.
Should we do that? I dunno. I'd like to hear your opinion(s).
It *would* be nice to have bridge registers in the bridge _CRS. That
would eliminate the need for looking up the HISI0081/PNP0C02 devices
to find the bridge registers. Avoiding that lookup is only a
temporary advantage -- the next round of bridges are supposed to fully
implement ECAM, and then we won't need to know where the registers
are.
Apart from the lookup, there's still some advantage in describing the
registers in the PNP0A03 device instead of an unrelated PNP0C02
device, because it makes /proc/iomem more accurate and potentially
makes host bridge hotplug cleaner. We would have to enhance the host
bridge driver to do the reservations currently done by pnp/system.c.
There's some value in doing it the same way as on x86, even though
that way is somewhat broken.
Whatever we decide, I think it's very important to get it figured out
ASAP because it affects the ECAM quirks that we're trying to merge in
v4.10.
> > +The workaround is to describe the bridge registers (including ECAM
> > +space) in PNP0C02 catch-all devices [6]. With the exception of ECAM,
> > +the bridge register space is device-specific anyway, so the generic
> > +PNP0A03/PNP0A08 driver (pci_root.c) has no need to know about it. For
> > +ECAM, pci_root.c learns about the space from either MCFG or the _CBA
> > +method.
> > +
> > +Note that the PCIe spec actually does require ECAM unless there's a
> > +standard firmware interface for config access, e.g., the ia64 SAL
> > +interface [7]. One reason is that we want a generic host bridge
> > +driver (pci_root.c), and a generic driver requires a generic way to
> > +access config space.
> > +
> > +
> > +[1] ACPI 6.0, sec 6.1:
> > + For any device that is on a non-enumerable type of bus (for
> > + example, an ISA bus), OSPM enumerates the devices' identifier(s)
> > + and the ACPI system firmware must supply an _HID object ... for
> > + each device to enable OSPM to do that.
> > +
> > +[2] ACPI 6.0, sec 3.7:
> > + The OS enumerates motherboard devices simply by reading through
> > + the ACPI Namespace looking for devices with hardware IDs.
> > +
> > + Each device enumerated by ACPI includes ACPI-defined objects in
> > + the ACPI Namespace that report the hardware resources the device
> > + could occupy [_PRS], an object that reports the resources that are
> > + currently used by the device [_CRS], and objects for configuring
> > + those resources [_SRS]. The information is used by the Plug and
> > + Play OS (OSPM) to configure the devices.
> > +
> > +[3] ACPI 6.0, sec 6.2:
> > + OSPM uses device configuration objects to configure hardware
> > + resources for devices enumerated via ACPI. Device configuration
> > + objects provide information about current and possible resource
> > + requirements, the relationship between shared resources, and
> > + methods for configuring hardware resources.
> > +
> > + When OSPM enumerates a device, it calls _PRS to determine the
> > + resource requirements of the device. It may also call _CRS to
> > + find the current resource settings for the device. Using this
> > + information, the Plug and Play system determines what resources
> > + the device should consume and sets those resources by calling the
> > + device’s _SRS control method.
> > +
> > + In ACPI, devices can consume resources (for example, legacy
> > + keyboards), provide resources (for example, a proprietary PCI
> > + bridge), or do both. Unless otherwise specified, resources for a
> > + device are assumed to be taken from the nearest matching resource
> > + above the device in the device hierarchy.
> > +
> > +[4] ACPI 6.0, sec 6.4.3.5.4:
> > + Extended Address Space Descriptor
> > + General Flags: Bit [0] Consumer/Producer:
> > + 1–This device consumes this resource
> > + 0–This device produces and consumes this resource
> > +
> > +[5] ACPI 6.0, sec 19.6.43:
> > + ResourceUsage specifies whether the Memory range is consumed by
> > + this device (ResourceConsumer) or passed on to child devices
> > + (ResourceProducer). If nothing is specified, then
> > + ResourceConsumer is assumed.
> > +
> > +[6] PCI Firmware 3.0, sec 4.1.2:
> > + If the operating system does not natively comprehend reserving the
> > + MMCFG region, the MMCFG region must be reserved by firmware. The
> > + address range reported in the MCFG table or by _CBA method (see
> > + Section 4.1.3) must be reserved by declaring a motherboard
> > + resource. For most systems, the motherboard resource would appear
> > + at the root of the ACPI namespace (under \_SB) in a node with a
> > + _HID of EISAID (PNP0C02), and the resources in this case should
> > + not be claimed in the root PCI bus’s _CRS. The resources can
> > + optionally be returned in Int15 E820 or EFIGetMemoryMap as
> > + reserved memory but must always be reported through ACPI as a
> > + motherboard resource.
> > +
> > +[7] PCI Express 3.0, sec 7.2.2:
> > + For systems that are PC-compatible, or that do not implement a
> > + processor-architecture-specific firmware interface standard that
> > + allows access to the Configuration Space, the ECAM is required as
> > + defined in this section.
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Bjorn
Thanks for the documentation.
It really helps!
However I have a question below.
> From: [email protected] [mailto:[email protected]] On Behalf Of Bjorn
> Helgaas
> Subject: [PATCH] PCI: Add information about describing PCI in ACPI
>
> Add a writeup about how PCI host bridges should be described in ACPI
> using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> ---
> Documentation/PCI/00-INDEX | 2 +
> Documentation/PCI/acpi-info.txt | 136 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 138 insertions(+)
> create mode 100644 Documentation/PCI/acpi-info.txt
>
> diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX
> index 147231f..0780280 100644
> --- a/Documentation/PCI/00-INDEX
> +++ b/Documentation/PCI/00-INDEX
> @@ -1,5 +1,7 @@
> 00-INDEX
> - this file
> +acpi-info.txt
> + - info on how PCI host bridges are represented in ACPI
> MSI-HOWTO.txt
> - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ.
> PCIEBUS-HOWTO.txt
> diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.txt
> new file mode 100644
> index 0000000..ccbcfda
> --- /dev/null
> +++ b/Documentation/PCI/acpi-info.txt
> @@ -0,0 +1,136 @@
> + ACPI considerations for PCI host bridges
> +
> +The basic requirement is that the ACPI namespace should describe
> +*everything* that consumes address space unless there's another
> +standard way for the OS to find it [1, 2]. For example, windows that
> +are forwarded to PCI by a PCI host bridge should be described via ACPI
> +devices, since the OS can't locate the host bridge by itself. PCI
> +devices *below* the host bridge do not need to be described via ACPI,
> +because the resources they consume are inside the host bridge windows,
> +and the OS can discover them via the standard PCI enumeration
> +mechanism (using config accesses to read and size the BARs).
> +
> +This ACPI resource description is done via _CRS methods of devices in
> +the ACPI namespace [2]. _CRS methods are like generalized PCI BARs:
> +the OS can read _CRS and figure out what resource is being consumed
> +even if it doesn't have a driver for the device [3]. That's important
> +because it means an old OS can work correctly even on a system with
> +new devices unknown to the OS. The new devices won't do anything, but
> +the OS can at least make sure no resources conflict with them.
> +
> +Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for
> +reserving address space! The static tables are for things the OS
> +needs to know early in boot, before it can parse the ACPI namespace.
> +If a new table is defined, an old OS needs to operate correctly even
> +though it ignores the table. _CRS allows that because it is generic
> +and understood by the old OS; a static table does not.
The entire document doesn't talk about the details of _CBA.
There is only one line below mentioned _CBA as an example.
> +
> +If the OS is expected to manage an ACPI device, that device will have
> +a specific _HID/_CID that tells the OS what driver to bind to it, and
> +the _CRS tells the OS and the driver where the device's registers are.
> +
> +PNP0C02 "motherboard" devices are basically a catch-all. There's no
> +programming model for them other than "don't use these resources for
> +anything else." So any address space that is (1) not claimed by some
> +other ACPI device and (2) should not be assigned by the OS to
> +something else, should be claimed by a PNP0C02 _CRS method.
> +
> +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
> +describe all the address space they consume. In principle, this would
> +be all the windows they forward down to the PCI bus, as well as the
> +bridge registers themselves. The bridge registers include things like
> +secondary/subordinate bus registers that determine the bus range below
> +the bridge, window registers that describe the apertures, etc. These
> +are all device-specific, non-architected things, so the only way a
> +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
> +contain the device-specific details. These bridge registers also
> +include ECAM space, since it is consumed by the bridge.
> +
> +ACPI defined a Producer/Consumer bit that was intended to distinguish
> +the bridge apertures from the bridge registers [4, 5]. However,
> +BIOSes didn't use that bit correctly, and the result is that OSes have
> +to assume that everything in a PCI host bridge _CRS is a window. That
> +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
> +device itself.
> +
> +The workaround is to describe the bridge registers (including ECAM
> +space) in PNP0C02 catch-all devices [6]. With the exception of ECAM,
> +the bridge register space is device-specific anyway, so the generic
> +PNP0A03/PNP0A08 driver (pci_root.c) has no need to know about it. For
> +ECAM, pci_root.c learns about the space from either MCFG or the _CBA
> +method.
Should the relationship of MCFG and _CBA be covered in this document?
Thanks and best regards
Lv
> +
> +Note that the PCIe spec actually does require ECAM unless there's a
> +standard firmware interface for config access, e.g., the ia64 SAL
> +interface [7]. One reason is that we want a generic host bridge
> +driver (pci_root.c), and a generic driver requires a generic way to
> +access config space.
> +
> +
> +[1] ACPI 6.0, sec 6.1:
> + For any device that is on a non-enumerable type of bus (for
> + example, an ISA bus), OSPM enumerates the devices' identifier(s)
> + and the ACPI system firmware must supply an _HID object ... for
> + each device to enable OSPM to do that.
> +
> +[2] ACPI 6.0, sec 3.7:
> + The OS enumerates motherboard devices simply by reading through
> + the ACPI Namespace looking for devices with hardware IDs.
> +
> + Each device enumerated by ACPI includes ACPI-defined objects in
> + the ACPI Namespace that report the hardware resources the device
> + could occupy [_PRS], an object that reports the resources that are
> + currently used by the device [_CRS], and objects for configuring
> + those resources [_SRS]. The information is used by the Plug and
> + Play OS (OSPM) to configure the devices.
> +
> +[3] ACPI 6.0, sec 6.2:
> + OSPM uses device configuration objects to configure hardware
> + resources for devices enumerated via ACPI. Device configuration
> + objects provide information about current and possible resource
> + requirements, the relationship between shared resources, and
> + methods for configuring hardware resources.
> +
> + When OSPM enumerates a device, it calls _PRS to determine the
> + resource requirements of the device. It may also call _CRS to
> + find the current resource settings for the device. Using this
> + information, the Plug and Play system determines what resources
> + the device should consume and sets those resources by calling the
> + device’s _SRS control method.
> +
> + In ACPI, devices can consume resources (for example, legacy
> + keyboards), provide resources (for example, a proprietary PCI
> + bridge), or do both. Unless otherwise specified, resources for a
> + device are assumed to be taken from the nearest matching resource
> + above the device in the device hierarchy.
> +
> +[4] ACPI 6.0, sec 6.4.3.5.4:
> + Extended Address Space Descriptor
> + General Flags: Bit [0] Consumer/Producer:
> + 1–This device consumes this resource
> + 0–This device produces and consumes this resource
> +
> +[5] ACPI 6.0, sec 19.6.43:
> + ResourceUsage specifies whether the Memory range is consumed by
> + this device (ResourceConsumer) or passed on to child devices
> + (ResourceProducer). If nothing is specified, then
> + ResourceConsumer is assumed.
> +
> +[6] PCI Firmware 3.0, sec 4.1.2:
> + If the operating system does not natively comprehend reserving the
> + MMCFG region, the MMCFG region must be reserved by firmware. The
> + address range reported in the MCFG table or by _CBA method (see
> + Section 4.1.3) must be reserved by declaring a motherboard
> + resource. For most systems, the motherboard resource would appear
> + at the root of the ACPI namespace (under \_SB) in a node with a
> + _HID of EISAID (PNP0C02), and the resources in this case should
> + not be claimed in the root PCI bus’s _CRS. The resources can
> + optionally be returned in Int15 E820 or EFIGetMemoryMap as
> + reserved memory but must always be reported through ACPI as a
> + motherboard resource.
> +
> +[7] PCI Express 3.0, sec 7.2.2:
> + For systems that are PC-compatible, or that do not implement a
> + processor-architecture-specific firmware interface standard that
> + allows access to the Configuration Space, the ECAM is required as
> + defined in this section.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On 23 November 2016 at 01:06, Bjorn Helgaas <[email protected]> wrote:
> On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
>> On 17 November 2016 at 17:59, Bjorn Helgaas <[email protected]> wrote:
>
>> > +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
>> > +describe all the address space they consume. In principle, this would
>> > +be all the windows they forward down to the PCI bus, as well as the
>> > +bridge registers themselves. The bridge registers include things like
>> > +secondary/subordinate bus registers that determine the bus range below
>> > +the bridge, window registers that describe the apertures, etc. These
>> > +are all device-specific, non-architected things, so the only way a
>> > +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
>> > +contain the device-specific details. These bridge registers also
>> > +include ECAM space, since it is consumed by the bridge.
>> > +
>> > +ACPI defined a Producer/Consumer bit that was intended to distinguish
>> > +the bridge apertures from the bridge registers [4, 5]. However,
>> > +BIOSes didn't use that bit correctly, and the result is that OSes have
>> > +to assume that everything in a PCI host bridge _CRS is a window. That
>> > +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
>> > +device itself.
>>
>> Is that universally true? Or is it still possible to do the right
>> thing here on new ACPI architectures such as arm64?
>
> That's a very good question. I had thought that the ACPI spec had
> given up on Consumer/Producer completely, but I was wrong. In the 6.0
> spec, the Consumer/Producer bit is still documented in the Extended
> Address Space Descriptor (sec 6.4.3.5.4). It is documented as
> "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
>
> Linux looks at the producer_consumer bit in acpi_decode_space(), which
> I think is used for all these descriptors (QWord, DWord, Word, and
> Extended). This doesn't quite follow the spec -- we probably should
> ignore it except for Extended. In any event, acpi_decode_space() sets
> IORESOURCE_WINDOW for Producer descriptors, but we don't test
> IORESOURCE_WINDOW in the PCI host bridge code.
>
> x86 and ia64 supply their own pci_acpi_root_prepare_resources()
> functions that call acpi_pci_probe_root_resources(), which parses _CRS
> and looks at producer_consumer. Then they do a little arch-specific
> stuff on the result.
>
> On arm64 we use acpi_pci_probe_root_resources() directly, with no
> arch-specific stuff.
>
> On all three arches, we ignore the Consumer/Producer bit, so all the
> resources are treated as Producers, e.g., as bridge windows.
>
> I think we *could* implement an arm64 version of
> pci_acpi_root_prepare_resources() that would pay attention to the
> Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec
> compliant, we would have to use Extended descriptors for all bridge
> windows, even if they would fit in a DWord or QWord.
>
> Should we do that? I dunno. I'd like to hear your opinion(s).
>
Yes, I think we should. If the spec allows for a way for a PNP0A03
device to describe all of its resources unambiguously, we should not
be relying on workarounds that were designed for another architecture
in another decade (for, presumably, another OS)
Just for my understanding, we will need to use extended descriptors
for all consumed *and* produced regions, even though dword/qword are
implicitly produced-only, due to the fact that the bit is ignored?
> It *would* be nice to have bridge registers in the bridge _CRS. That
> would eliminate the need for looking up the HISI0081/PNP0C02 devices
> to find the bridge registers. Avoiding that lookup is only a
> temporary advantage -- the next round of bridges are supposed to fully
> implement ECAM, and then we won't need to know where the registers
> are.
>
> Apart from the lookup, there's still some advantage in describing the
> registers in the PNP0A03 device instead of an unrelated PNP0C02
> device, because it makes /proc/iomem more accurate and potentially
> makes host bridge hotplug cleaner. We would have to enhance the host
> bridge driver to do the reservations currently done by pnp/system.c.
>
> There's some value in doing it the same way as on x86, even though
> that way is somewhat broken.
>
> Whatever we decide, I think it's very important to get it figured out
> ASAP because it affects the ECAM quirks that we're trying to merge in
> v4.10.
>
I agree. What exactly is the impact for the quirks mechanism as proposed?
On Wed, Nov 23, 2016 at 07:28:12AM +0000, Ard Biesheuvel wrote:
> On 23 November 2016 at 01:06, Bjorn Helgaas <[email protected]> wrote:
> > On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
> >> On 17 November 2016 at 17:59, Bjorn Helgaas <[email protected]> wrote:
> >
> >> > +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
> >> > +describe all the address space they consume. In principle, this would
> >> > +be all the windows they forward down to the PCI bus, as well as the
> >> > +bridge registers themselves. The bridge registers include things like
> >> > +secondary/subordinate bus registers that determine the bus range below
> >> > +the bridge, window registers that describe the apertures, etc. These
> >> > +are all device-specific, non-architected things, so the only way a
> >> > +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
> >> > +contain the device-specific details. These bridge registers also
> >> > +include ECAM space, since it is consumed by the bridge.
> >> > +
> >> > +ACPI defined a Producer/Consumer bit that was intended to distinguish
> >> > +the bridge apertures from the bridge registers [4, 5]. However,
> >> > +BIOSes didn't use that bit correctly, and the result is that OSes have
> >> > +to assume that everything in a PCI host bridge _CRS is a window. That
> >> > +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
> >> > +device itself.
> >>
> >> Is that universally true? Or is it still possible to do the right
> >> thing here on new ACPI architectures such as arm64?
> >
> > That's a very good question. I had thought that the ACPI spec had
> > given up on Consumer/Producer completely, but I was wrong. In the 6.0
> > spec, the Consumer/Producer bit is still documented in the Extended
> > Address Space Descriptor (sec 6.4.3.5.4). It is documented as
> > "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
> >
> > Linux looks at the producer_consumer bit in acpi_decode_space(), which
> > I think is used for all these descriptors (QWord, DWord, Word, and
> > Extended). This doesn't quite follow the spec -- we probably should
> > ignore it except for Extended. In any event, acpi_decode_space() sets
> > IORESOURCE_WINDOW for Producer descriptors, but we don't test
> > IORESOURCE_WINDOW in the PCI host bridge code.
> >
> > x86 and ia64 supply their own pci_acpi_root_prepare_resources()
> > functions that call acpi_pci_probe_root_resources(), which parses _CRS
> > and looks at producer_consumer. Then they do a little arch-specific
> > stuff on the result.
> >
> > On arm64 we use acpi_pci_probe_root_resources() directly, with no
> > arch-specific stuff.
> >
> > On all three arches, we ignore the Consumer/Producer bit, so all the
> > resources are treated as Producers, e.g., as bridge windows.
> >
> > I think we *could* implement an arm64 version of
> > pci_acpi_root_prepare_resources() that would pay attention to the
> > Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec
> > compliant, we would have to use Extended descriptors for all bridge
> > windows, even if they would fit in a DWord or QWord.
> >
> > Should we do that? I dunno. I'd like to hear your opinion(s).
> >
>
> Yes, I think we should. If the spec allows for a way for a PNP0A03
> device to describe all of its resources unambiguously, we should not
> be relying on workarounds that were designed for another architecture
> in another decade (for, presumably, another OS)
That was the idea I floated at LPC16. We can override the
acpi_pci_root_ops prepare_resources() function pointer with a function
that checks IORESOURCE_WINDOW and filters resources accordingly (and
specific quirk "drivers" may know how to intepret resources that aren't
IORESOURCE_WINDOW - ie they can use it to describe the PCI ECAM config
space quirk region in their _CRS).
In a way that's something that makes sense anyway because given
that we are starting from a clean slate on ARM64 considering resources
that are not IORESOURCE_WINDOW as host bridge windows is just something
we are inheriting from x86, it is not really ACPI specs compliant (is
it ?).
> Just for my understanding, we will need to use extended descriptors
> for all consumed *and* produced regions, even though dword/qword are
> implicitly produced-only, due to the fact that the bit is ignored?
That's something that has to be clarified within the ASWG ie why the
consumer bit is ignored for *some* descriptors and not for others.
As things stand unfortunately the answer seems yes (I do not know
why).
> > It *would* be nice to have bridge registers in the bridge _CRS. That
> > would eliminate the need for looking up the HISI0081/PNP0C02 devices
> > to find the bridge registers. Avoiding that lookup is only a
> > temporary advantage -- the next round of bridges are supposed to fully
> > implement ECAM, and then we won't need to know where the registers
> > are.
> >
> > Apart from the lookup, there's still some advantage in describing the
> > registers in the PNP0A03 device instead of an unrelated PNP0C02
> > device, because it makes /proc/iomem more accurate and potentially
> > makes host bridge hotplug cleaner. We would have to enhance the host
> > bridge driver to do the reservations currently done by pnp/system.c.
> >
> > There's some value in doing it the same way as on x86, even though
> > that way is somewhat broken.
> >
> > Whatever we decide, I think it's very important to get it figured out
> > ASAP because it affects the ECAM quirks that we're trying to merge in
> > v4.10.
> >
>
> I agree. What exactly is the impact for the quirks mechanism as proposed?
The impact is that we could just use the PNP0A03 _CRS to report the PCI
ECAM config space quirk region through a consumer resource keeping in
mind what I say above (actually I think that's what was done on APM
firmware initially, for the records).
Lorenzo
On Wed, Nov 23, 2016 at 07:28:12AM +0000, Ard Biesheuvel wrote:
> On 23 November 2016 at 01:06, Bjorn Helgaas <[email protected]> wrote:
> > On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
> >> On 17 November 2016 at 17:59, Bjorn Helgaas <[email protected]> wrote:
> >
> >> > +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
> >> > +describe all the address space they consume. In principle, this would
> >> > +be all the windows they forward down to the PCI bus, as well as the
> >> > +bridge registers themselves. The bridge registers include things like
> >> > +secondary/subordinate bus registers that determine the bus range below
> >> > +the bridge, window registers that describe the apertures, etc. These
> >> > +are all device-specific, non-architected things, so the only way a
> >> > +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
> >> > +contain the device-specific details. These bridge registers also
> >> > +include ECAM space, since it is consumed by the bridge.
> >> > +
> >> > +ACPI defined a Producer/Consumer bit that was intended to distinguish
> >> > +the bridge apertures from the bridge registers [4, 5]. However,
> >> > +BIOSes didn't use that bit correctly, and the result is that OSes have
> >> > +to assume that everything in a PCI host bridge _CRS is a window. That
> >> > +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
> >> > +device itself.
> >>
> >> Is that universally true? Or is it still possible to do the right
> >> thing here on new ACPI architectures such as arm64?
> >
> > That's a very good question. I had thought that the ACPI spec had
> > given up on Consumer/Producer completely, but I was wrong. In the 6.0
> > spec, the Consumer/Producer bit is still documented in the Extended
> > Address Space Descriptor (sec 6.4.3.5.4). It is documented as
> > "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
> >
> > Linux looks at the producer_consumer bit in acpi_decode_space(), which
> > I think is used for all these descriptors (QWord, DWord, Word, and
> > Extended). This doesn't quite follow the spec -- we probably should
> > ignore it except for Extended. In any event, acpi_decode_space() sets
> > IORESOURCE_WINDOW for Producer descriptors, but we don't test
> > IORESOURCE_WINDOW in the PCI host bridge code.
> >
> > x86 and ia64 supply their own pci_acpi_root_prepare_resources()
> > functions that call acpi_pci_probe_root_resources(), which parses _CRS
> > and looks at producer_consumer. Then they do a little arch-specific
> > stuff on the result.
> >
> > On arm64 we use acpi_pci_probe_root_resources() directly, with no
> > arch-specific stuff.
> >
> > On all three arches, we ignore the Consumer/Producer bit, so all the
> > resources are treated as Producers, e.g., as bridge windows.
> >
> > I think we *could* implement an arm64 version of
> > pci_acpi_root_prepare_resources() that would pay attention to the
> > Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec
> > compliant, we would have to use Extended descriptors for all bridge
> > windows, even if they would fit in a DWord or QWord.
> >
> > Should we do that? I dunno. I'd like to hear your opinion(s).
> >
>
> Yes, I think we should. If the spec allows for a way for a PNP0A03
> device to describe all of its resources unambiguously, we should not
> be relying on workarounds that were designed for another architecture
> in another decade (for, presumably, another OS)
>
> Just for my understanding, we will need to use extended descriptors
> for all consumed *and* produced regions, even though dword/qword are
> implicitly produced-only, due to the fact that the bit is ignored?
>From an ACPI spec point of view, I would say QWord/DWord/Word
descriptors are implicitly *consumer*-only because ResourceConsumer
is the default and they don't have a bit to indicate otherwise.
The current code assumes all PNP0A03 resources are producers. If we
implement an arm64 pci_acpi_root_prepare_resources() that pays
attention to the Consumer/Producer bit, we would have to:
- Reserve all producer regions in the iomem/ioport trees. This is
already done via pci_acpi_root_add_resources(), but we might need
a new check to handle consumers differently.
- Reserve all consumer regions. This corresponds to what
pnp/system.c does for PNP0C02 devices. This is similar to the
producer regions, but I think the consumer ones should be marked
IORESOURCE_BUSY.
- Use every producer (IORESOURCE_WINDOW) as a host bridge window.
I think it's a bug that acpi_decode_space() looks at producer_consumer
for QWord/DWord/Word descriptors, but I think QWord/DWord/Word
descriptors for consumed regions should be safe, as long as they don't
set the Consumer/Producer bit.
On Wed, Nov 23, 2016 at 4:30 AM, Lorenzo Pieralisi
<[email protected]> wrote:
> On Wed, Nov 23, 2016 at 07:28:12AM +0000, Ard Biesheuvel wrote:
>> On 23 November 2016 at 01:06, Bjorn Helgaas <[email protected]> wrote:
>> > On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
>> >> On 17 November 2016 at 17:59, Bjorn Helgaas <[email protected]> wrote:
>> >
>> >> > +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
>> >> > +describe all the address space they consume. In principle, this would
>> >> > +be all the windows they forward down to the PCI bus, as well as the
>> >> > +bridge registers themselves. The bridge registers include things like
>> >> > +secondary/subordinate bus registers that determine the bus range below
>> >> > +the bridge, window registers that describe the apertures, etc. These
>> >> > +are all device-specific, non-architected things, so the only way a
>> >> > +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
>> >> > +contain the device-specific details. These bridge registers also
>> >> > +include ECAM space, since it is consumed by the bridge.
>> >> > +
>> >> > +ACPI defined a Producer/Consumer bit that was intended to distinguish
>> >> > +the bridge apertures from the bridge registers [4, 5]. However,
>> >> > +BIOSes didn't use that bit correctly, and the result is that OSes have
>> >> > +to assume that everything in a PCI host bridge _CRS is a window. That
>> >> > +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
>> >> > +device itself.
>> >>
>> >> Is that universally true? Or is it still possible to do the right
>> >> thing here on new ACPI architectures such as arm64?
>> >
>> > That's a very good question. I had thought that the ACPI spec had
>> > given up on Consumer/Producer completely, but I was wrong. In the 6.0
>> > spec, the Consumer/Producer bit is still documented in the Extended
>> > Address Space Descriptor (sec 6.4.3.5.4). It is documented as
>> > "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
>> >
>> > Linux looks at the producer_consumer bit in acpi_decode_space(), which
>> > I think is used for all these descriptors (QWord, DWord, Word, and
>> > Extended). This doesn't quite follow the spec -- we probably should
>> > ignore it except for Extended. In any event, acpi_decode_space() sets
>> > IORESOURCE_WINDOW for Producer descriptors, but we don't test
>> > IORESOURCE_WINDOW in the PCI host bridge code.
>> >
>> > x86 and ia64 supply their own pci_acpi_root_prepare_resources()
>> > functions that call acpi_pci_probe_root_resources(), which parses _CRS
>> > and looks at producer_consumer. Then they do a little arch-specific
>> > stuff on the result.
>> >
>> > On arm64 we use acpi_pci_probe_root_resources() directly, with no
>> > arch-specific stuff.
>> >
>> > On all three arches, we ignore the Consumer/Producer bit, so all the
>> > resources are treated as Producers, e.g., as bridge windows.
>> >
>> > I think we *could* implement an arm64 version of
>> > pci_acpi_root_prepare_resources() that would pay attention to the
>> > Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec
>> > compliant, we would have to use Extended descriptors for all bridge
>> > windows, even if they would fit in a DWord or QWord.
>> >
>> > Should we do that? I dunno. I'd like to hear your opinion(s).
>> >
>>
>> Yes, I think we should. If the spec allows for a way for a PNP0A03
>> device to describe all of its resources unambiguously, we should not
>> be relying on workarounds that were designed for another architecture
>> in another decade (for, presumably, another OS)
>
> That was the idea I floated at LPC16. We can override the
> acpi_pci_root_ops prepare_resources() function pointer with a function
> that checks IORESOURCE_WINDOW and filters resources accordingly (and
> specific quirk "drivers" may know how to intepret resources that aren't
> IORESOURCE_WINDOW - ie they can use it to describe the PCI ECAM config
> space quirk region in their _CRS).
>
> In a way that's something that makes sense anyway because given
> that we are starting from a clean slate on ARM64 considering resources
> that are not IORESOURCE_WINDOW as host bridge windows is just something
> we are inheriting from x86, it is not really ACPI specs compliant (is
> it ?).
>
>> Just for my understanding, we will need to use extended descriptors
>> for all consumed *and* produced regions, even though dword/qword are
>> implicitly produced-only, due to the fact that the bit is ignored?
>
> That's something that has to be clarified within the ASWG ie why the
> consumer bit is ignored for *some* descriptors and not for others.
>
> As things stand unfortunately the answer seems yes (I do not know
> why).
>
>> > It *would* be nice to have bridge registers in the bridge _CRS. That
>> > would eliminate the need for looking up the HISI0081/PNP0C02 devices
>> > to find the bridge registers. Avoiding that lookup is only a
>> > temporary advantage -- the next round of bridges are supposed to fully
>> > implement ECAM, and then we won't need to know where the registers
>> > are.
>> >
>> > Apart from the lookup, there's still some advantage in describing the
>> > registers in the PNP0A03 device instead of an unrelated PNP0C02
>> > device, because it makes /proc/iomem more accurate and potentially
>> > makes host bridge hotplug cleaner. We would have to enhance the host
>> > bridge driver to do the reservations currently done by pnp/system.c.
>> >
>> > There's some value in doing it the same way as on x86, even though
>> > that way is somewhat broken.
>> >
>> > Whatever we decide, I think it's very important to get it figured out
>> > ASAP because it affects the ECAM quirks that we're trying to merge in
>> > v4.10.
>> >
>>
>> I agree. What exactly is the impact for the quirks mechanism as proposed?
> The impact is that we could just use the PNP0A03 _CRS to report the PCI
> ECAM config space quirk region through a consumer resource keeping in
> mind what I say above (actually I think that's what was done on APM
> firmware initially, for the records).
Just to clarify: APM firmware initially has a _CSR region to declare
the controller register region. We don't know that we need to declare
the reserved space for ECAM until Bjorn pointed out recently (with the
usage of PNP0C02).
I really like this idea about declaring ECAM space and any additional
spaces required for ECAM quirk inside PNP0A03 _CRS. For the firmware
that already shipped, the quirk will need to add additional resources
(for ECAM and other needed regions) into the root-bus. If we decided
to go with this, do we still have time to make additional adjustment
for the current ECAM quirk and the foundation patches before
v4.10-rc1?
>
> Lorenzo
> _______________________________________________
> Linaro-acpi mailing list
> [email protected]
> https://lists.linaro.org/mailman/listinfo/linaro-acpi
Regards,
Duc Dang.
On Wed, Nov 23, 2016 at 09:06:33AM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 23, 2016 at 07:28:12AM +0000, Ard Biesheuvel wrote:
> > On 23 November 2016 at 01:06, Bjorn Helgaas <[email protected]> wrote:
> > > On Tue, Nov 22, 2016 at 10:09:50AM +0000, Ard Biesheuvel wrote:
> > >> On 17 November 2016 at 17:59, Bjorn Helgaas <[email protected]> wrote:
> > >
> > >> > +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
> > >> > +describe all the address space they consume. In principle, this would
> > >> > +be all the windows they forward down to the PCI bus, as well as the
> > >> > +bridge registers themselves. The bridge registers include things like
> > >> > +secondary/subordinate bus registers that determine the bus range below
> > >> > +the bridge, window registers that describe the apertures, etc. These
> > >> > +are all device-specific, non-architected things, so the only way a
> > >> > +PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which
> > >> > +contain the device-specific details. These bridge registers also
> > >> > +include ECAM space, since it is consumed by the bridge.
> > >> > +
> > >> > +ACPI defined a Producer/Consumer bit that was intended to distinguish
> > >> > +the bridge apertures from the bridge registers [4, 5]. However,
> > >> > +BIOSes didn't use that bit correctly, and the result is that OSes have
> > >> > +to assume that everything in a PCI host bridge _CRS is a window. That
> > >> > +leaves no way to describe the bridge registers in the PNP0A03/PNP0A08
> > >> > +device itself.
> > >>
> > >> Is that universally true? Or is it still possible to do the right
> > >> thing here on new ACPI architectures such as arm64?
> > >
> > > That's a very good question. I had thought that the ACPI spec had
> > > given up on Consumer/Producer completely, but I was wrong. In the 6.0
> > > spec, the Consumer/Producer bit is still documented in the Extended
> > > Address Space Descriptor (sec 6.4.3.5.4). It is documented as
> > > "ignored" in the QWord, DWord, and Word descriptors (sec 6.4.3.5.1,2,3).
> > >
> > > Linux looks at the producer_consumer bit in acpi_decode_space(), which
> > > I think is used for all these descriptors (QWord, DWord, Word, and
> > > Extended). This doesn't quite follow the spec -- we probably should
> > > ignore it except for Extended. In any event, acpi_decode_space() sets
> > > IORESOURCE_WINDOW for Producer descriptors, but we don't test
> > > IORESOURCE_WINDOW in the PCI host bridge code.
> > >
> > > x86 and ia64 supply their own pci_acpi_root_prepare_resources()
> > > functions that call acpi_pci_probe_root_resources(), which parses _CRS
> > > and looks at producer_consumer. Then they do a little arch-specific
> > > stuff on the result.
> > >
> > > On arm64 we use acpi_pci_probe_root_resources() directly, with no
> > > arch-specific stuff.
> > >
> > > On all three arches, we ignore the Consumer/Producer bit, so all the
> > > resources are treated as Producers, e.g., as bridge windows.
> > >
> > > I think we *could* implement an arm64 version of
> > > pci_acpi_root_prepare_resources() that would pay attention to the
> > > Consumer/Producer bit by checking IORESOURCE_WINDOW. To be spec
> > > compliant, we would have to use Extended descriptors for all bridge
> > > windows, even if they would fit in a DWord or QWord.
> > >
> > > Should we do that? I dunno. I'd like to hear your opinion(s).
> > >
> >
> > Yes, I think we should. If the spec allows for a way for a PNP0A03
> > device to describe all of its resources unambiguously, we should not
> > be relying on workarounds that were designed for another architecture
> > in another decade (for, presumably, another OS)
> >
> > Just for my understanding, we will need to use extended descriptors
> > for all consumed *and* produced regions, even though dword/qword are
> > implicitly produced-only, due to the fact that the bit is ignored?
>
> From an ACPI spec point of view, I would say QWord/DWord/Word
> descriptors are implicitly *consumer*-only because ResourceConsumer
> is the default and they don't have a bit to indicate otherwise.
>
> The current code assumes all PNP0A03 resources are producers. If we
> implement an arm64 pci_acpi_root_prepare_resources() that pays
> attention to the Consumer/Producer bit, we would have to:
>
> - Reserve all producer regions in the iomem/ioport trees. This is
> already done via pci_acpi_root_add_resources(), but we might need
> a new check to handle consumers differently.
>
> - Reserve all consumer regions. This corresponds to what
> pnp/system.c does for PNP0C02 devices. This is similar to the
> producer regions, but I think the consumer ones should be marked
> IORESOURCE_BUSY.
>
> - Use every producer (IORESOURCE_WINDOW) as a host bridge window.
>
> I think it's a bug that acpi_decode_space() looks at producer_consumer
> for QWord/DWord/Word descriptors, but I think QWord/DWord/Word
> descriptors for consumed regions should be safe, as long as they don't
> set the Consumer/Producer bit.
I'm going to post a couple very lightly-tested patches that should
make us ignore the Consumer/Producer bit for QWord/DWord/Word. I'd
appreciate any discussion about whether that's the right approach.
Bjorn
On Wed, Nov 23, 2016 at 03:23:35AM +0000, Zheng, Lv wrote:
> Hi, Bjorn
>
> Thanks for the documentation.
> It really helps!
>
> However I have a question below.
>
> > From: [email protected] [mailto:[email protected]] On Behalf Of Bjorn
> > Helgaas
> > Subject: [PATCH] PCI: Add information about describing PCI in ACPI
> >
> > Add a writeup about how PCI host bridges should be described in ACPI
> > using PNP0A03/PNP0A08 devices, PNP0C02 devices, and the MCFG table.
> >
> > Signed-off-by: Bjorn Helgaas <[email protected]>
> > ---
> > Documentation/PCI/00-INDEX | 2 +
> > Documentation/PCI/acpi-info.txt | 136 +++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 138 insertions(+)
> > create mode 100644 Documentation/PCI/acpi-info.txt
> >
> > diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX
> > index 147231f..0780280 100644
> > --- a/Documentation/PCI/00-INDEX
> > +++ b/Documentation/PCI/00-INDEX
> > @@ -1,5 +1,7 @@
> > 00-INDEX
> > - this file
> > +acpi-info.txt
> > + - info on how PCI host bridges are represented in ACPI
> > MSI-HOWTO.txt
> > - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ.
> > PCIEBUS-HOWTO.txt
> > diff --git a/Documentation/PCI/acpi-info.txt b/Documentation/PCI/acpi-info.txt
> > new file mode 100644
> > index 0000000..ccbcfda
> > --- /dev/null
> > +++ b/Documentation/PCI/acpi-info.txt
> > @@ -0,0 +1,136 @@
> > + ACPI considerations for PCI host bridges
> > +
> > +The basic requirement is that the ACPI namespace should describe
> > +*everything* that consumes address space unless there's another
> > +standard way for the OS to find it [1, 2]. ?For example, windows that
> > +are forwarded to PCI by a PCI host bridge should be described via ACPI
> > +devices, since the OS can't locate the host bridge by itself. ?PCI
> > +devices *below* the host bridge do not need to be described via ACPI,
> > +because the resources they consume are inside the host bridge windows,
> > +and the OS can discover them via the standard PCI enumeration
> > +mechanism (using config accesses to read and size the BARs).
> > +
> > +This ACPI resource description is done via _CRS methods of devices in
> > +the ACPI namespace [2]. ? _CRS methods are like generalized PCI BARs:
> > +the OS can read _CRS and figure out what resource is being consumed
> > +even if it doesn't have a driver for the device [3]. ?That's important
> > +because it means an old OS can work correctly even on a system with
> > +new devices unknown to the OS. ?The new devices won't do anything, but
> > +the OS can at least make sure no resources conflict with them.
> > +
> > +Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for
> > +reserving address space! The static tables are for things the OS
> > +needs to know early in boot, before it can parse the ACPI namespace.
> > +If a new table is defined, an old OS needs to operate correctly even
> > +though it ignores the table. _CRS allows that because it is generic
> > +and understood by the old OS; a static table does not.
>
> The entire document doesn't talk about the details of _CBA.
> There is only one line below mentioned _CBA as an example.
Yes, that's a good point. I'll add some more details about MCFG
and _CBA.
Bjorn