2019-09-12 19:53:39

by Igor Druzhinin

[permalink] [raw]
Subject: [PATCH v2] xen/pci: reserve MCFG areas earlier

If MCFG area is not reserved in E820, Xen by default will defer its usage
until Dom0 registers it explicitly after ACPI parser recognizes it as
a reserved resource in DSDT. Having it reserved in E820 is not
mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2)
and firmware is free to keep a hole in E820 in that place. Xen doesn't know
what exactly is inside this hole since it lacks full ACPI view of the
platform therefore it's potentially harmful to access MCFG region
without additional checks as some machines are known to provide
inconsistent information on the size of the region.

Now xen_mcfg_late() runs after acpi_init() which is too late as some basic
PCI enumeration starts exactly there as well. Trying to register a device
prior to MCFG reservation causes multiple problems with PCIe extended
capability initializations in Xen (e.g. SR-IOV VF BAR sizing). There are
no convenient hooks for us to subscribe to so register MCFG areas earlier
upon the first invocation of xen_add_device(). It should be safe to do once
since all the boot time buses must have their MCFG areas in MCFG table
already and we don't support PCI bus hot-plug.

Signed-off-by: Igor Druzhinin <[email protected]>
---
drivers/xen/pci.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 7494dbe..db58aaa 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -29,6 +29,8 @@
#include "../pci/pci.h"
#ifdef CONFIG_PCI_MMCONFIG
#include <asm/pci_x86.h>
+
+static int xen_mcfg_late(void);
#endif

static bool __read_mostly pci_seg_supported = true;
@@ -40,7 +42,18 @@ static int xen_add_device(struct device *dev)
#ifdef CONFIG_PCI_IOV
struct pci_dev *physfn = pci_dev->physfn;
#endif
-
+#ifdef CONFIG_PCI_MMCONFIG
+ static bool pci_mcfg_reserved = false;
+ /*
+ * Reserve MCFG areas in Xen on first invocation due to this being
+ * potentially called from inside of acpi_init immediately after
+ * MCFG table has been finally parsed.
+ */
+ if (!pci_mcfg_reserved) {
+ xen_mcfg_late();
+ pci_mcfg_reserved = true;
+ }
+#endif
if (pci_seg_supported) {
struct {
struct physdev_pci_device_add add;
@@ -213,7 +226,7 @@ static int __init register_xen_pci_notifier(void)
arch_initcall(register_xen_pci_notifier);

#ifdef CONFIG_PCI_MMCONFIG
-static int __init xen_mcfg_late(void)
+static int xen_mcfg_late(void)
{
struct pci_mmcfg_region *cfg;
int rc;
@@ -252,8 +265,4 @@ static int __init xen_mcfg_late(void)
}
return 0;
}
-/*
- * Needs to be done after acpi_init which are subsys_initcall.
- */
-subsys_initcall_sync(xen_mcfg_late);
#endif
--
2.7.4


2019-09-13 20:01:10

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH v2] xen/pci: reserve MCFG areas earlier

On 9/12/19 2:31 PM, Igor Druzhinin wrote:
> If MCFG area is not reserved in E820, Xen by default will defer its usage
> until Dom0 registers it explicitly after ACPI parser recognizes it as
> a reserved resource in DSDT. Having it reserved in E820 is not
> mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2)
> and firmware is free to keep a hole in E820 in that place. Xen doesn't know
> what exactly is inside this hole since it lacks full ACPI view of the
> platform therefore it's potentially harmful to access MCFG region
> without additional checks as some machines are known to provide
> inconsistent information on the size of the region.
>
> Now xen_mcfg_late() runs after acpi_init() which is too late as some basic
> PCI enumeration starts exactly there as well. Trying to register a device
> prior to MCFG reservation causes multiple problems with PCIe extended
> capability initializations in Xen (e.g. SR-IOV VF BAR sizing). There are
> no convenient hooks for us to subscribe to so register MCFG areas earlier
> upon the first invocation of xen_add_device(). It should be safe to do once
> since all the boot time buses must have their MCFG areas in MCFG table
> already and we don't support PCI bus hot-plug.
>
> Signed-off-by: Igor Druzhinin <[email protected]>

Reviewed-by: Boris Ostrovsky <[email protected]>

and applied to for-linus-5.4