2013-10-25 15:03:49

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v1] Set 1-1 P2M for PCI BARs and MCFG regions - if needed.

These two patches check and mark the P2M region (if needed) for
PCIe BARs and for MCFG regions.

The reason this is required is that anytime any driver calls ioremap_pfn
or constructs PTEs - we consult the P2M. If we find that for the PFN
the region is INVALID_P2M_ENTRY we return the 0 PFN. If we find that
for the PFN the region is IDENTITY_FRAME we ruturn the PFN (so
pfn_to_mfn(pfn) == pfn) - which makes device drivers happy.

We initially set this up when scanning the E820 and selecting the
E820_RSRV and the gaps between the E820 regions as such. But we also
have to careful as there are potential balloon regions so we can't
assume that region past the E820 is all 1-1.

There are alternative ways of solving this:
1) Mark all regions past the E820 and past the balloon regions as
1-1 regions. That requires some surgery in the P2M code to deal
with the p2m_mid_missing and p2m_mid_identity (new) for the different
levels in the tree. The code is not for the faint of heart.

2). Assume that INVALID_P2M_ENTRY is considered 1-1. That would require
auditing of the code and also making sure that any xen_privcmd_mmap
calls use the m2p_add_override. However there are some
valid cases in which we need to check for INVALID_P2M_ENTRY -
balloon driver - that might be relaxed. However again the P2M code
and the ABI it places are not for the faint of the heart.

As such these two patches just add extra code to set the IDENTITY_FRAME
on the BARs and for MCFG areas.


drivers/xen/balloon.c | 19 +++++
drivers/xen/pci.c | 164 +++++++++++++++++++++++++++++++++++++++-
include/xen/balloon.h | 1 +
include/xen/interface/physdev.h | 11 +++
4 files changed, 192 insertions(+), 3 deletions(-)


Konrad Rzeszutek Wilk (2):
xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs
xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas and setup 1-1 P2M


2013-10-25 15:03:44

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs

On bootup the E820 "gaps" or E820_RESV regions are marked as
identity regions. Meaning that any lookup done in the P2M
will return the same value: pfn_to_mfn(pfn) == pfn.

This is needed for PCI devices so that drivers can reference
the correct bus address. Unfortunatly there are also PCIe
devices which setup their MMIO region above the E820. By default
we assume in the P2M that any region above the last E820 entry
is to be used for ballooning. That is not true - and we don't
mark such regions as IDENTIY_FRAME but INVALID_P2M_ENTRY.
The result is that any lookup in the P2M (pfn_to_mfn(pfn) == 0)
gets us the 0 bus address which is hardly correct.

A solution that this patch implements is to walk the PCI device
BAR regions and mark them as IDENTITY_FRAMEs in the P2M.
Naturally some checks are needed such as making sure that the
BAR regions are not part of the balloon pages, nor regular RAM.

We only set them to IDENTITY if the:
pfn_to_mfn(pfn) == INVALID_P2M_ENTRY.

Another solution would be to mark all P2M entries beyond the
last E820 entry _and_ not in the balloon regions as IDENTITY.

If done, that means in worst case we have to reserve MAX_DOMAIN_PAGES
pages (so 2MB) of virtual space in case we have to create
new P2M leafs. We could be fancy and make the P2M code understand
p2m_mid_missing and p2_mid_identity and do the right things.
But that is quite complex while this particular patch only
gets invoked if there are PCI devices. Another solution (David
Vrabel ideas) was to consider INVALID_P2M_ENTRY as 1-1 regions.
The author of this patch is not sure of the ramifications
as it would require surgery in various P2M codebits.

Reported-by: Lukas Hejtmanek <[email protected]>
Reported-by: Lance Larsh <[email protected]>
CC: Boris Ostrovsky <[email protected]>
CC: David Vrabel <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/xen/balloon.c | 19 +++++++++++++
drivers/xen/pci.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++--
include/xen/balloon.h | 1 +
3 files changed, 96 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index b232908..258e3f9 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -133,6 +133,25 @@ static void balloon_append(struct page *page)
adjust_managed_page_count(page, -1);
}

+/*
+ * Check if any the balloon pages overlap with the supplied
+ * pfn and its range.
+ */
+bool balloon_pfn(unsigned long pfn, unsigned long nr)
+{
+ struct page *page;
+
+ if (list_empty(&ballooned_pages))
+ return false;
+
+ list_for_each_entry(page, &ballooned_pages, lru) {
+ unsigned long b_pfn = page_to_pfn(page);
+
+ if (b_pfn >= pfn && b_pfn < pfn + nr)
+ return true;
+ }
+ return false;
+}
/* balloon_retrieve: rescue a page from the balloon, if it is not empty. */
static struct page *balloon_retrieve(bool prefer_highmem)
{
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 18fff88..6b86eda 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -22,6 +22,9 @@
#include <xen/xen.h>
#include <xen/interface/physdev.h>
#include <xen/interface/xen.h>
+#include <xen/interface/memory.h>
+#include <xen/page.h>
+#include <xen/balloon.h>

#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
@@ -127,6 +130,72 @@ static int xen_add_device(struct device *dev)
return r;
}

+static void xen_p2m_add_device(struct device *dev)
+{
+ int i;
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+
+ /* Verify whether the MMIO BARs are 1-1 in the P2M. */
+ for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+ unsigned long pfn, start, end, ok_pfns;
+ char bus_addr[64];
+ char *fmt;
+
+ if (!pci_resource_len(pci_dev, i))
+ continue;
+
+ if (pci_resource_flags(pci_dev, i) == IORESOURCE_IO)
+ fmt = " (bus address [%#06llx-%#06llx])";
+ else
+ fmt = " (bus address [%#010llx-%#010llx])";
+
+ snprintf(bus_addr, sizeof(bus_addr), fmt,
+ (unsigned long long) (pci_resource_start(pci_dev, i)),
+ (unsigned long long) (pci_resource_end(pci_dev, i)));
+
+ start = pci_resource_start(pci_dev, i) >> PAGE_SHIFT;
+ end = pci_resource_end(pci_dev, i) >> PAGE_SHIFT;
+
+ /*
+ * We don't worry about the balloon scratch page as it has a
+ * valid PFN - which means we will catch in the loop below.
+ */
+ if (balloon_pfn(start, end - start)) {
+ dev_warn(dev, "%s is within balloon pages!\n", bus_addr);
+ continue;
+ }
+
+ for (ok_pfns = 0, pfn = start; pfn < end; pfn++) {
+ unsigned long mfn = pfn_to_mfn(pfn);
+
+ if (mfn == pfn) {
+ ok_pfns++;
+ continue;
+ }
+ if (mfn != INVALID_P2M_ENTRY) { /* RAM */
+ dev_warn(dev, "%s is within RAM [%lx] region!\n", bus_addr, pfn);
+ break;
+ }
+ }
+ if (ok_pfns == end - start) /* All good. */
+ continue;
+
+ if (pfn != end - 1) /* We broke out of the loop above. */
+ continue;
+
+ /* This BAR was not detected during E820 parsing. */
+ for (pfn = start; pfn < end; pfn++) {
+ if (!set_phys_to_machine(pfn, pfn))
+ break;
+ }
+ WARN(pfn != end - 1, "Only set %ld instead of %ld PFNs!\n",
+ end - pfn, end - start);
+
+ dev_info(dev, "%s set %ld page(s) to 1-1 mapping.\n",
+ bus_addr, end - pfn);
+ }
+}
+
static int xen_remove_device(struct device *dev)
{
int r;
@@ -164,10 +233,14 @@ static int xen_pci_notifier(struct notifier_block *nb,

switch (action) {
case BUS_NOTIFY_ADD_DEVICE:
- r = xen_add_device(dev);
+ if (xen_initial_domain())
+ r = xen_add_device(dev);
+ if (r == 0)
+ xen_p2m_add_device(dev);
break;
case BUS_NOTIFY_DEL_DEVICE:
- r = xen_remove_device(dev);
+ if (xen_initial_domain())
+ r = xen_remove_device(dev);
break;
default:
return NOTIFY_DONE;
@@ -185,7 +258,7 @@ static struct notifier_block device_nb = {

static int __init register_xen_pci_notifier(void)
{
- if (!xen_initial_domain())
+ if (!xen_domain())
return 0;

return bus_register_notifier(&pci_bus_type, &device_nb);
diff --git a/include/xen/balloon.h b/include/xen/balloon.h
index a4c1c6a..60ecc50 100644
--- a/include/xen/balloon.h
+++ b/include/xen/balloon.h
@@ -41,3 +41,4 @@ static inline int register_xen_selfballooning(struct device *dev)
return -ENOSYS;
}
#endif
+bool balloon_pfn(unsigned long pfn, unsigned long nr);
--
1.8.3.1

2013-10-25 15:03:48

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [v1 2/2] xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas and setup 1-1 P2M

if they aren't there.

The PCI MMCONFIG area is usually reserved via the E820 so the Xen hypervisor
is aware of these regions. But they can also be enumerated in the ACPI
DSDT which means the hypervisor won't know of them until the initial
domain informs it of via PHYSDEVOP_pci_mmcfg_reserved.

This is what this patch does for all of the MCFG regions that the
initial domain is aware of (E820 enumerated and ACPI). Furtheremore
it also makes sure that the P2M area in the guest for the MCFG region
is marked as 1-1 so that any read/writes to it will work.

This setup is done right after the ACPI routines have called and had
mapped alread the MCFG region so we have to potentially tear them down
and recreate them - fortunatly for us the MCFG API provides a mechanism
for us to do that.

Lastly we also make the appropiate hypercall to inform Xen
of the MCFG regions.

Reported-by: Santosh Jodh <[email protected]>
CC: Jan Beulich <[email protected]>
CC: Boris Ostrovsky <[email protected]>
CC: David Vrabel <[email protected]>
CC: Mukesh Rathor <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/xen/pci.c | 153 +++++++++++++++++++++++++++++++---------
include/xen/interface/physdev.h | 11 +++
2 files changed, 130 insertions(+), 34 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 6b86eda..749abf3 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -29,6 +29,7 @@
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
#include "../pci/pci.h"
+#include <asm/pci_x86.h>

static bool __read_mostly pci_seg_supported = true;

@@ -129,6 +130,49 @@ static int xen_add_device(struct device *dev)

return r;
}
+static long xen_p2m_add_pfn(struct device *dev, char *prefix,
+ unsigned long start, unsigned long end)
+{
+ unsigned long pfn, ok_pfns;
+
+ if (balloon_pfn(start, end - start)) {
+ pr_warn("%s%s is within balloon pages!\n",
+ dev ? dev_name(dev) : "", prefix);
+ return -ENOMEM;
+ }
+
+ for (ok_pfns = 0, pfn = start; pfn < end; pfn++) {
+ unsigned long mfn = pfn_to_mfn(pfn);
+
+ if (mfn == pfn) {
+ ok_pfns++;
+ continue;
+ }
+ if (mfn != INVALID_P2M_ENTRY) { /* RAM */
+ pr_warn("%s%s is within RAM [%lx] region!\n",
+ dev ? dev_name(dev) : "", prefix, pfn);
+ break;
+ }
+ }
+ if (ok_pfns == end - start) /* All good. */
+ return 0;
+
+ if (pfn != end - 1) /* We broke out of the loop above. */
+ return -ENOMEM;
+
+ /* This BAR was not detected during E820 parsing. */
+ for (pfn = start; pfn < end; pfn++) {
+ if (!set_phys_to_machine(pfn, pfn))
+ break;
+ }
+ WARN(pfn != end - 1, "Only set %ld instead of %ld PFNs!\n",
+ end - pfn, end - start);
+
+ pr_info("%s%s set %ld page(s) to 1-1 mapping.\n",
+ dev ? dev_name(dev) : "", prefix, end - pfn);
+
+ return end - pfn;
+}

static void xen_p2m_add_device(struct device *dev)
{
@@ -137,7 +181,7 @@ static void xen_p2m_add_device(struct device *dev)

/* Verify whether the MMIO BARs are 1-1 in the P2M. */
for (i = 0; i < PCI_NUM_RESOURCES; i++) {
- unsigned long pfn, start, end, ok_pfns;
+ unsigned long start, end;
char bus_addr[64];
char *fmt;

@@ -160,39 +204,7 @@ static void xen_p2m_add_device(struct device *dev)
* We don't worry about the balloon scratch page as it has a
* valid PFN - which means we will catch in the loop below.
*/
- if (balloon_pfn(start, end - start)) {
- dev_warn(dev, "%s is within balloon pages!\n", bus_addr);
- continue;
- }
-
- for (ok_pfns = 0, pfn = start; pfn < end; pfn++) {
- unsigned long mfn = pfn_to_mfn(pfn);
-
- if (mfn == pfn) {
- ok_pfns++;
- continue;
- }
- if (mfn != INVALID_P2M_ENTRY) { /* RAM */
- dev_warn(dev, "%s is within RAM [%lx] region!\n", bus_addr, pfn);
- break;
- }
- }
- if (ok_pfns == end - start) /* All good. */
- continue;
-
- if (pfn != end - 1) /* We broke out of the loop above. */
- continue;
-
- /* This BAR was not detected during E820 parsing. */
- for (pfn = start; pfn < end; pfn++) {
- if (!set_phys_to_machine(pfn, pfn))
- break;
- }
- WARN(pfn != end - 1, "Only set %ld instead of %ld PFNs!\n",
- end - pfn, end - start);
-
- dev_info(dev, "%s set %ld page(s) to 1-1 mapping.\n",
- bus_addr, end - pfn);
+ (void)xen_p2m_add_pfn(dev, bus_addr, start, end);
}
}

@@ -265,3 +277,76 @@ static int __init register_xen_pci_notifier(void)
}

arch_initcall(register_xen_pci_notifier);
+
+static int __init xen_mcfg_late(void)
+{
+ struct pci_mmcfg_region *cfg;
+ int rc;
+ bool remap = false;
+
+ if ((pci_probe & PCI_PROBE_MMCONF) == 0)
+ return 0;
+
+ if (list_empty(&pci_mmcfg_list))
+ return 0;
+
+ /* Check whether they are in the right area. */
+ list_for_each_entry(cfg, &pci_mmcfg_list, list) {
+ unsigned long start, end;
+ long pfns;
+ struct physdev_pci_mmcfg_reserved r;
+
+ start = cfg->address + PCI_MMCFG_BUS_OFFSET(cfg->start_bus);
+ end = cfg->address + PCI_MMCFG_BUS_OFFSET(cfg->end_bus + 1) - 1;
+
+ start = PFN_DOWN(start);
+ end = PFN_DOWN(end);
+
+ pfns = xen_p2m_add_pfn(NULL, cfg->name, start, end);
+
+ pr_debug("%s: [%lx->%lx] PFNs: %ld\n", cfg->name, start, end, pfns);
+
+ if (pfns < 0)
+ continue;
+
+ if (pfns > 0)
+ remap = true;
+
+ r.address = cfg->address;
+ r.segment = cfg->segment;
+ r.start_bus = cfg->start_bus;
+ r.end_bus = cfg->end_bus;
+ r.flags = XEN_PCI_MMCFG_RESERVED;
+
+ rc = HYPERVISOR_physdev_op(PHYSDEVOP_pci_mmcfg_reserved, &r);
+ switch (rc) {
+ case 0:
+ case -ENOSYS:
+ continue;
+
+ default:
+ pr_warn("Failed to report MMCONFIG reservation"
+ " state for %s to hypervisor"
+ " (%d)\n",
+ cfg->name, rc);
+ }
+ }
+ /*
+ * Unmap the PTEs and remap them. The P2M will now have the
+ * correct PFN values.
+ */
+ if (remap) {
+ /*
+ * For 32-bit we end up using fixmap which for FIX_PCIE_MCFG
+ * in xen_set_fixmap ends up using the _PAGE_IOMAP so
+ * 1-1 mapping. The calls below are nops.
+ */
+ pci_mmcfg_arch_free();
+ pci_mmcfg_arch_init();
+ }
+ return 0;
+}
+/*
+ * Needs to be done after balloon_init and acpi_init which are subsys_initcall.
+ */
+subsys_initcall_sync(xen_mcfg_late);
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index 7000bb1..42721d1 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -231,6 +231,17 @@ struct physdev_get_free_pirq {
#define XEN_PCI_DEV_VIRTFN 0x2
#define XEN_PCI_DEV_PXM 0x4

+#define XEN_PCI_MMCFG_RESERVED 0x1
+
+#define PHYSDEVOP_pci_mmcfg_reserved 24
+struct physdev_pci_mmcfg_reserved {
+ uint64_t address;
+ uint16_t segment;
+ uint8_t start_bus;
+ uint8_t end_bus;
+ uint32_t flags;
+};
+
#define PHYSDEVOP_pci_device_add 25
struct physdev_pci_device_add {
/* IN */
--
1.8.3.1

2013-10-25 16:38:48

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs

On Fri, Oct 25, 2013 at 11:03:20AM -0400, Konrad Rzeszutek Wilk wrote:
> On bootup the E820 "gaps" or E820_RESV regions are marked as
> identity regions. Meaning that any lookup done in the P2M
> will return the same value: pfn_to_mfn(pfn) == pfn.
>
> This is needed for PCI devices so that drivers can reference
> the correct bus address. Unfortunatly there are also PCIe
> devices which setup their MMIO region above the E820. By default
> we assume in the P2M that any region above the last E820 entry
> is to be used for ballooning. That is not true - and we don't
> mark such regions as IDENTIY_FRAME but INVALID_P2M_ENTRY.
> The result is that any lookup in the P2M (pfn_to_mfn(pfn) == 0)
> gets us the 0 bus address which is hardly correct.
>
> A solution that this patch implements is to walk the PCI device
> BAR regions and mark them as IDENTITY_FRAMEs in the P2M.
> Naturally some checks are needed such as making sure that the
> BAR regions are not part of the balloon pages, nor regular RAM.
>
> We only set them to IDENTITY if the:
> pfn_to_mfn(pfn) == INVALID_P2M_ENTRY.
>
> Another solution would be to mark all P2M entries beyond the
> last E820 entry _and_ not in the balloon regions as IDENTITY.
>
> If done, that means in worst case we have to reserve MAX_DOMAIN_PAGES
> pages (so 2MB) of virtual space in case we have to create
> new P2M leafs. We could be fancy and make the P2M code understand
> p2m_mid_missing and p2_mid_identity and do the right things.
> But that is quite complex while this particular patch only
> gets invoked if there are PCI devices. Another solution (David
> Vrabel ideas) was to consider INVALID_P2M_ENTRY as 1-1 regions.
> The author of this patch is not sure of the ramifications
> as it would require surgery in various P2M codebits.
>
> Reported-by: Lukas Hejtmanek <[email protected]>
> Reported-by: Lance Larsh <[email protected]>
> CC: Boris Ostrovsky <[email protected]>
> CC: David Vrabel <[email protected]>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> drivers/xen/balloon.c | 19 +++++++++++++
> drivers/xen/pci.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++--
> include/xen/balloon.h | 1 +
> 3 files changed, 96 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index b232908..258e3f9 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -133,6 +133,25 @@ static void balloon_append(struct page *page)
> adjust_managed_page_count(page, -1);
> }
>
> +/*
> + * Check if any the balloon pages overlap with the supplied
> + * pfn and its range.
> + */
> +bool balloon_pfn(unsigned long pfn, unsigned long nr)
> +{
> + struct page *page;
> +
> + if (list_empty(&ballooned_pages))
> + return false;
> +
> + list_for_each_entry(page, &ballooned_pages, lru) {
> + unsigned long b_pfn = page_to_pfn(page);
> +
> + if (b_pfn >= pfn && b_pfn < pfn + nr)
> + return true;
> + }
> + return false;
> +}
> /* balloon_retrieve: rescue a page from the balloon, if it is not empty. */
> static struct page *balloon_retrieve(bool prefer_highmem)
> {
> diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
> index 18fff88..6b86eda 100644
> --- a/drivers/xen/pci.c
> +++ b/drivers/xen/pci.c
> @@ -22,6 +22,9 @@
> #include <xen/xen.h>
> #include <xen/interface/physdev.h>
> #include <xen/interface/xen.h>
> +#include <xen/interface/memory.h>
> +#include <xen/page.h>
> +#include <xen/balloon.h>
>
> #include <asm/xen/hypervisor.h>
> #include <asm/xen/hypercall.h>
> @@ -127,6 +130,72 @@ static int xen_add_device(struct device *dev)
> return r;
> }
>
> +static void xen_p2m_add_device(struct device *dev)
> +{
> + int i;
> + struct pci_dev *pci_dev = to_pci_dev(dev);
> +
> + /* Verify whether the MMIO BARs are 1-1 in the P2M. */
> + for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> + unsigned long pfn, start, end, ok_pfns;
> + char bus_addr[64];
> + char *fmt;
> +
> + if (!pci_resource_len(pci_dev, i))
> + continue;
> +
> + if (pci_resource_flags(pci_dev, i) == IORESOURCE_IO)
> + fmt = " (bus address [%#06llx-%#06llx])";
> + else
> + fmt = " (bus address [%#010llx-%#010llx])";
> +
> + snprintf(bus_addr, sizeof(bus_addr), fmt,
> + (unsigned long long) (pci_resource_start(pci_dev, i)),
> + (unsigned long long) (pci_resource_end(pci_dev, i)));
> +
> + start = pci_resource_start(pci_dev, i) >> PAGE_SHIFT;
> + end = pci_resource_end(pci_dev, i) >> PAGE_SHIFT;
> +
> + /*
> + * We don't worry about the balloon scratch page as it has a
> + * valid PFN - which means we will catch in the loop below.
> + */
> + if (balloon_pfn(start, end - start)) {
> + dev_warn(dev, "%s is within balloon pages!\n", bus_addr);
> + continue;
> + }
> +
> + for (ok_pfns = 0, pfn = start; pfn < end; pfn++) {
> + unsigned long mfn = pfn_to_mfn(pfn);
> +
> + if (mfn == pfn) {
> + ok_pfns++;
> + continue;
> + }
> + if (mfn != INVALID_P2M_ENTRY) { /* RAM */
> + dev_warn(dev, "%s is within RAM [%lx] region!\n", bus_addr, pfn);
> + break;
> + }
> + }
> + if (ok_pfns == end - start) /* All good. */
> + continue;
> +
> + if (pfn != end - 1) /* We broke out of the loop above. */
> + continue;

There are some bugs in this code so please wait until the next posting
which hopefully will also include me testing it on affected hardware.

2013-10-25 22:08:41

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs

On Fri, Oct 25, 2013 at 9:03 AM, Konrad Rzeszutek Wilk
<[email protected]> wrote:
> On bootup the E820 "gaps" or E820_RESV regions are marked as
> identity regions. Meaning that any lookup done in the P2M
> will return the same value: pfn_to_mfn(pfn) == pfn.
>
> This is needed for PCI devices so that drivers can reference
> the correct bus address. Unfortunatly there are also PCIe
> devices which setup their MMIO region above the E820. By default
> we assume in the P2M that any region above the last E820 entry
> is to be used for ballooning. That is not true - and we don't
> mark such regions as IDENTIY_FRAME but INVALID_P2M_ENTRY.
> The result is that any lookup in the P2M (pfn_to_mfn(pfn) == 0)
> gets us the 0 bus address which is hardly correct.
>
> A solution that this patch implements is to walk the PCI device
> BAR regions and mark them as IDENTITY_FRAMEs in the P2M.
> Naturally some checks are needed such as making sure that the
> BAR regions are not part of the balloon pages, nor regular RAM.
>
> We only set them to IDENTITY if the:
> pfn_to_mfn(pfn) == INVALID_P2M_ENTRY.
>
> Another solution would be to mark all P2M entries beyond the
> last E820 entry _and_ not in the balloon regions as IDENTITY.
>
> If done, that means in worst case we have to reserve MAX_DOMAIN_PAGES
> pages (so 2MB) of virtual space in case we have to create
> new P2M leafs. We could be fancy and make the P2M code understand
> p2m_mid_missing and p2_mid_identity and do the right things.
> But that is quite complex while this particular patch only
> gets invoked if there are PCI devices. Another solution (David
> Vrabel ideas) was to consider INVALID_P2M_ENTRY as 1-1 regions.
> The author of this patch is not sure of the ramifications
> as it would require surgery in various P2M codebits.
>
> Reported-by: Lukas Hejtmanek <[email protected]>
> Reported-by: Lance Larsh <[email protected]>
> CC: Boris Ostrovsky <[email protected]>
> CC: David Vrabel <[email protected]>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> drivers/xen/balloon.c | 19 +++++++++++++
> drivers/xen/pci.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++--
> include/xen/balloon.h | 1 +
> 3 files changed, 96 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index b232908..258e3f9 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -133,6 +133,25 @@ static void balloon_append(struct page *page)
> adjust_managed_page_count(page, -1);
> }
>
> +/*
> + * Check if any the balloon pages overlap with the supplied
> + * pfn and its range.
> + */
> +bool balloon_pfn(unsigned long pfn, unsigned long nr)
> +{
> + struct page *page;
> +
> + if (list_empty(&ballooned_pages))
> + return false;
> +
> + list_for_each_entry(page, &ballooned_pages, lru) {
> + unsigned long b_pfn = page_to_pfn(page);
> +
> + if (b_pfn >= pfn && b_pfn < pfn + nr)
> + return true;
> + }
> + return false;
> +}
> /* balloon_retrieve: rescue a page from the balloon, if it is not empty. */
> static struct page *balloon_retrieve(bool prefer_highmem)
> {
> diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
> index 18fff88..6b86eda 100644
> --- a/drivers/xen/pci.c
> +++ b/drivers/xen/pci.c
> @@ -22,6 +22,9 @@
> #include <xen/xen.h>
> #include <xen/interface/physdev.h>
> #include <xen/interface/xen.h>
> +#include <xen/interface/memory.h>
> +#include <xen/page.h>
> +#include <xen/balloon.h>
>
> #include <asm/xen/hypervisor.h>
> #include <asm/xen/hypercall.h>
> @@ -127,6 +130,72 @@ static int xen_add_device(struct device *dev)
> return r;
> }
>
> +static void xen_p2m_add_device(struct device *dev)
> +{
> + int i;
> + struct pci_dev *pci_dev = to_pci_dev(dev);
> +
> + /* Verify whether the MMIO BARs are 1-1 in the P2M. */
> + for (i = 0; i < PCI_NUM_RESOURCES; i++) {

Kudos for avoiding the usual "for_each_pci_dev()" trap (which totally
ignores hot-added devices).

But this still seems a bit fragile because:

(1) I would guess the same consideration applies to all PCI MMIO
space, regardless of whether it is actually assigned to a BAR. So
maybe you want to do the same thing for currently-unassigned MMIO
space.

(2) It will break if PCI ever reassigns device resources. We
currently don't reassign anything after we finish the initial
enumeration and configuration of the device, but it's conceivable that
we could someday.

If you can look at PCI host bridge apertures instead of BARs, that
would solve both problems. Reassigning those apertures is
theoretically possible but is not even a gleam in our eyes yet.

> + unsigned long pfn, start, end, ok_pfns;
> + char bus_addr[64];
> + char *fmt;
> +
> + if (!pci_resource_len(pci_dev, i))
> + continue;
> +
> + if (pci_resource_flags(pci_dev, i) == IORESOURCE_IO)

You probably want:

if (pci_resource_flags(pci_dev, i) & IORESOURCE_IO)

here. And I don't understand what you're doing below, but my
impression is you only care about MEM regions, not IO regions, so it
seems like you might want to skip IO completely.

> + fmt = " (bus address [%#06llx-%#06llx])";
> + else
> + fmt = " (bus address [%#010llx-%#010llx])";
> +
> + snprintf(bus_addr, sizeof(bus_addr), fmt,
> + (unsigned long long) (pci_resource_start(pci_dev, i)),
> + (unsigned long long) (pci_resource_end(pci_dev, i)));
> +
> + start = pci_resource_start(pci_dev, i) >> PAGE_SHIFT;
> + end = pci_resource_end(pci_dev, i) >> PAGE_SHIFT;
> +
> + /*
> + * We don't worry about the balloon scratch page as it has a
> + * valid PFN - which means we will catch in the loop below.
> + */
> + if (balloon_pfn(start, end - start)) {
> + dev_warn(dev, "%s is within balloon pages!\n", bus_addr);
> + continue;
> + }
> +
> + for (ok_pfns = 0, pfn = start; pfn < end; pfn++) {
> + unsigned long mfn = pfn_to_mfn(pfn);
> +
> + if (mfn == pfn) {
> + ok_pfns++;
> + continue;
> + }
> + if (mfn != INVALID_P2M_ENTRY) { /* RAM */
> + dev_warn(dev, "%s is within RAM [%lx] region!\n", bus_addr, pfn);
> + break;
> + }
> + }
> + if (ok_pfns == end - start) /* All good. */
> + continue;
> +
> + if (pfn != end - 1) /* We broke out of the loop above. */
> + continue;
> +
> + /* This BAR was not detected during E820 parsing. */
> + for (pfn = start; pfn < end; pfn++) {
> + if (!set_phys_to_machine(pfn, pfn))
> + break;
> + }
> + WARN(pfn != end - 1, "Only set %ld instead of %ld PFNs!\n",
> + end - pfn, end - start);
> +
> + dev_info(dev, "%s set %ld page(s) to 1-1 mapping.\n",
> + bus_addr, end - pfn);
> + }
> +}
> +
> static int xen_remove_device(struct device *dev)
> {
> int r;
> @@ -164,10 +233,14 @@ static int xen_pci_notifier(struct notifier_block *nb,
>
> switch (action) {
> case BUS_NOTIFY_ADD_DEVICE:
> - r = xen_add_device(dev);
> + if (xen_initial_domain())
> + r = xen_add_device(dev);
> + if (r == 0)
> + xen_p2m_add_device(dev);
> break;
> case BUS_NOTIFY_DEL_DEVICE:
> - r = xen_remove_device(dev);
> + if (xen_initial_domain())
> + r = xen_remove_device(dev);
> break;
> default:
> return NOTIFY_DONE;
> @@ -185,7 +258,7 @@ static struct notifier_block device_nb = {
>
> static int __init register_xen_pci_notifier(void)
> {
> - if (!xen_initial_domain())
> + if (!xen_domain())
> return 0;
>
> return bus_register_notifier(&pci_bus_type, &device_nb);
> diff --git a/include/xen/balloon.h b/include/xen/balloon.h
> index a4c1c6a..60ecc50 100644
> --- a/include/xen/balloon.h
> +++ b/include/xen/balloon.h
> @@ -41,3 +41,4 @@ static inline int register_xen_selfballooning(struct device *dev)
> return -ENOSYS;
> }
> #endif
> +bool balloon_pfn(unsigned long pfn, unsigned long nr);
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-10-25 22:59:03

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v1] Set 1-1 P2M for PCI BARs and MCFG regions - if needed.

On 25/10/13 16:03, Konrad Rzeszutek Wilk wrote:
>
> 2). Assume that INVALID_P2M_ENTRY is considered 1-1. That would require
> auditing of the code and also making sure that any xen_privcmd_mmap
> calls use the m2p_add_override.

I'm not sure I understand the concern about the privcmd driver's foreign
mapping ioctls (MMAPBATCH_V2 and friends) as these all deal with MFNs
directly and don't use the p2m.

> However there are some
> valid cases in which we need to check for INVALID_P2M_ENTRY -
> balloon driver - that might be relaxed.

Yes. And this behaviour will be retained. I think we only need to do
the INVALID_P2M_ENTRY => 1:1 when constructing PTEs so all other callers
will see INVALID_P2M_ENTRY as before.


> However again the P2M code
> and the ABI it places are not for the faint of the heart.

I have no fear.

If the above suggestion does work out it will reduce the amount of p2m
code /and/ remove a Xen-specific PTE bit. So I think it is worth pursuing.

In the meantime, the following minimal patch is a likely workaround.

David

8<-------------------------------
x86/mm: set _PAGE_IOMAP in io_remap_pfn_range().

Signed-off-by: David Vrabel <[email protected]>
---
arch/x86/include/asm/pgtable.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 3d19994..d59b22a 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -868,6 +868,9 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
}

+#define io_remap_pfn_range(vma, addr, pfn, size, prot) \
+ remap_pfn_range(vma, addr, pfn, size, (prot) | _PAGE_IOMAP)
+
#include <asm-generic/pgtable.h>
#endif /* __ASSEMBLY__ */

--
1.7.10.4

2013-10-25 23:39:34

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v1] Set 1-1 P2M for PCI BARs and MCFG regions - if needed.

David Vrabel <[email protected]> wrote:
>On 25/10/13 16:03, Konrad Rzeszutek Wilk wrote:
>>
>> 2). Assume that INVALID_P2M_ENTRY is considered 1-1. That would
>require
>> auditing of the code and also making sure that any
>xen_privcmd_mmap
>> calls use the m2p_add_override.
>
>I'm not sure I understand the concern about the privcmd driver's
>foreign
>mapping ioctls (MMAPBATCH_V2 and friends) as these all deal with MFNs
>directly and don't use the p2m.
>
>> However there are some
>> valid cases in which we need to check for INVALID_P2M_ENTRY -
>> balloon driver - that might be relaxed.
>
>Yes. And this behaviour will be retained. I think we only need to do
>
>the INVALID_P2M_ENTRY => 1:1 when constructing PTEs so all other
>callers
>will see INVALID_P2M_ENTRY as before.

You are suggesting not touching the p2m code and instead change the the xen-make-pte to assume that INVALID and IDENTITY should have the same behaviour. Interesting.
>
>
>> However again the P2M code
>> and the ABI it places are not for the faint of the heart.
>
>I have no fear.

:-)
>
>If the above suggestion does work out it will reduce the amount of p2m
>code /and/ remove a Xen-specific PTE bit. So I think it is worth
>pursuing.
>
>In the meantime, the following minimal patch is a likely workaround.
>
>David
>
>8<-------------------------------
>x86/mm: set _PAGE_IOMAP in io_remap_pfn_range().
>
>Signed-off-by: David Vrabel <[email protected]>
>---
> arch/x86/include/asm/pgtable.h | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/arch/x86/include/asm/pgtable.h
>b/arch/x86/include/asm/pgtable.h
>index 3d19994..d59b22a 100644
>--- a/arch/x86/include/asm/pgtable.h
>+++ b/arch/x86/include/asm/pgtable.h
>@@ -868,6 +868,9 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t
>pte)
> return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> }
>
>+#define io_remap_pfn_range(vma, addr, pfn, size, prot) \
>+ remap_pfn_range(vma, addr, pfn, size, (prot) | _PAGE_IOMAP)
>+
> #include <asm-generic/pgtable.h>
> #endif /* __ASSEMBLY__ */

You are welcome to suggest it to hpa. Last time a similar patch was proposed he was not too thrilled about it.

Looking forward to your p2m code surgery!

2013-10-28 09:51:30

by Jan Beulich

[permalink] [raw]
Subject: Re: [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs

>>> On 26.10.13 at 00:08, Bjorn Helgaas <[email protected]> wrote:
> On Fri, Oct 25, 2013 at 9:03 AM, Konrad Rzeszutek Wilk <[email protected]> wrote:
>> + unsigned long pfn, start, end, ok_pfns;
>> + char bus_addr[64];
>> + char *fmt;
>> +
>> + if (!pci_resource_len(pci_dev, i))
>> + continue;
>> +
>> + if (pci_resource_flags(pci_dev, i) == IORESOURCE_IO)
>
> You probably want:
>
> if (pci_resource_flags(pci_dev, i) & IORESOURCE_IO)
>
> here. And I don't understand what you're doing below, but my
> impression is you only care about MEM regions, not IO regions, so it
> seems like you might want to skip IO completely.

Yes indeed.

Jan

2013-10-28 16:53:07

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v1] Set 1-1 P2M for PCI BARs and MCFG regions - if needed.

On Fri, Oct 25, 2013 at 11:58:34PM +0100, David Vrabel wrote:
> On 25/10/13 16:03, Konrad Rzeszutek Wilk wrote:
> >
> > 2). Assume that INVALID_P2M_ENTRY is considered 1-1. That would require
> > auditing of the code and also making sure that any xen_privcmd_mmap
> > calls use the m2p_add_override.
>
> I'm not sure I understand the concern about the privcmd driver's
> foreign mapping ioctls (MMAPBATCH_V2 and friends) as these all deal
> with MFNs directly and don't use the p2m.
>
> > However there are some
> > valid cases in which we need to check for INVALID_P2M_ENTRY -
> > balloon driver - that might be relaxed.
>
> Yes. And this behaviour will be retained. I think we only need to
> do the INVALID_P2M_ENTRY => 1:1 when constructing PTEs so all other
> callers will see INVALID_P2M_ENTRY as before.
>
>
> > However again the P2M code
> > and the ABI it places are not for the faint of the heart.
>
> I have no fear.

<sigh> I booted up the kernel on the machine that has the affected
hardware and hit this:

806 if (unlikely(pfn >= MAX_P2M_PFN)) {
807 BUG_ON(mfn != INVALID_P2M_ENTRY);
808 return true;
809 }
810

in __set_phys_to_machine. This is of course b/c the region is way
out of the 500GB limit we have for P2M.

My original idea of fiddling with P2M is insufficient for this
particular bug (thought it would work for PCI devices under MAX_PAGES).

Anyhow, that means going down the P2M path as you suggested is the
only way (well, besides the work-around). So you got any patches ready :-) ?

>
> If the above suggestion does work out it will reduce the amount of
> p2m code /and/ remove a Xen-specific PTE bit. So I think it is
> worth pursuing.
>
> In the meantime, the following minimal patch is a likely workaround.
>
> David
>
> 8<-------------------------------
> x86/mm: set _PAGE_IOMAP in io_remap_pfn_range().
>
> Signed-off-by: David Vrabel <[email protected]>
> ---
> arch/x86/include/asm/pgtable.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 3d19994..d59b22a 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -868,6 +868,9 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
> return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
> }
>
> +#define io_remap_pfn_range(vma, addr, pfn, size, prot) \
> + remap_pfn_range(vma, addr, pfn, size, (prot) | _PAGE_IOMAP)
> +
> #include <asm-generic/pgtable.h>
> #endif /* __ASSEMBLY__ */
>
> --
> 1.7.10.4
>

2013-10-28 16:59:05

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs

On Fri, Oct 25, 2013 at 04:08:19PM -0600, Bjorn Helgaas wrote:
> On Fri, Oct 25, 2013 at 9:03 AM, Konrad Rzeszutek Wilk
> <[email protected]> wrote:
> > On bootup the E820 "gaps" or E820_RESV regions are marked as
> > identity regions. Meaning that any lookup done in the P2M
> > will return the same value: pfn_to_mfn(pfn) == pfn.
> >
> > This is needed for PCI devices so that drivers can reference
> > the correct bus address. Unfortunatly there are also PCIe
> > devices which setup their MMIO region above the E820. By default
> > we assume in the P2M that any region above the last E820 entry
> > is to be used for ballooning. That is not true - and we don't
> > mark such regions as IDENTIY_FRAME but INVALID_P2M_ENTRY.
> > The result is that any lookup in the P2M (pfn_to_mfn(pfn) == 0)
> > gets us the 0 bus address which is hardly correct.
> >
> > A solution that this patch implements is to walk the PCI device
> > BAR regions and mark them as IDENTITY_FRAMEs in the P2M.
> > Naturally some checks are needed such as making sure that the
> > BAR regions are not part of the balloon pages, nor regular RAM.
> >
> > We only set them to IDENTITY if the:
> > pfn_to_mfn(pfn) == INVALID_P2M_ENTRY.
> >
> > Another solution would be to mark all P2M entries beyond the
> > last E820 entry _and_ not in the balloon regions as IDENTITY.
> >
> > If done, that means in worst case we have to reserve MAX_DOMAIN_PAGES
> > pages (so 2MB) of virtual space in case we have to create
> > new P2M leafs. We could be fancy and make the P2M code understand
> > p2m_mid_missing and p2_mid_identity and do the right things.
> > But that is quite complex while this particular patch only
> > gets invoked if there are PCI devices. Another solution (David
> > Vrabel ideas) was to consider INVALID_P2M_ENTRY as 1-1 regions.
> > The author of this patch is not sure of the ramifications
> > as it would require surgery in various P2M codebits.
> >
> > Reported-by: Lukas Hejtmanek <[email protected]>
> > Reported-by: Lance Larsh <[email protected]>
> > CC: Boris Ostrovsky <[email protected]>
> > CC: David Vrabel <[email protected]>
> > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> > ---
> > drivers/xen/balloon.c | 19 +++++++++++++
> > drivers/xen/pci.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++--
> > include/xen/balloon.h | 1 +
> > 3 files changed, 96 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> > index b232908..258e3f9 100644
> > --- a/drivers/xen/balloon.c
> > +++ b/drivers/xen/balloon.c
> > @@ -133,6 +133,25 @@ static void balloon_append(struct page *page)
> > adjust_managed_page_count(page, -1);
> > }
> >
> > +/*
> > + * Check if any the balloon pages overlap with the supplied
> > + * pfn and its range.
> > + */
> > +bool balloon_pfn(unsigned long pfn, unsigned long nr)
> > +{
> > + struct page *page;
> > +
> > + if (list_empty(&ballooned_pages))
> > + return false;
> > +
> > + list_for_each_entry(page, &ballooned_pages, lru) {
> > + unsigned long b_pfn = page_to_pfn(page);
> > +
> > + if (b_pfn >= pfn && b_pfn < pfn + nr)
> > + return true;
> > + }
> > + return false;
> > +}
> > /* balloon_retrieve: rescue a page from the balloon, if it is not empty. */
> > static struct page *balloon_retrieve(bool prefer_highmem)
> > {
> > diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
> > index 18fff88..6b86eda 100644
> > --- a/drivers/xen/pci.c
> > +++ b/drivers/xen/pci.c
> > @@ -22,6 +22,9 @@
> > #include <xen/xen.h>
> > #include <xen/interface/physdev.h>
> > #include <xen/interface/xen.h>
> > +#include <xen/interface/memory.h>
> > +#include <xen/page.h>
> > +#include <xen/balloon.h>
> >
> > #include <asm/xen/hypervisor.h>
> > #include <asm/xen/hypercall.h>
> > @@ -127,6 +130,72 @@ static int xen_add_device(struct device *dev)
> > return r;
> > }
> >
> > +static void xen_p2m_add_device(struct device *dev)
> > +{
> > + int i;
> > + struct pci_dev *pci_dev = to_pci_dev(dev);
> > +
> > + /* Verify whether the MMIO BARs are 1-1 in the P2M. */
> > + for (i = 0; i < PCI_NUM_RESOURCES; i++) {
>
> Kudos for avoiding the usual "for_each_pci_dev()" trap (which totally
> ignores hot-added devices).

Thank you :-)
>
> But this still seems a bit fragile because:
>
> (1) I would guess the same consideration applies to all PCI MMIO
> space, regardless of whether it is actually assigned to a BAR. So
> maybe you want to do the same thing for currently-unassigned MMIO
> space.

OK. Let me dig in the code to find where that is..
>
> (2) It will break if PCI ever reassigns device resources. We
> currently don't reassign anything after we finish the initial
> enumeration and configuration of the device, but it's conceivable that
> we could someday.

<nods>
>
> If you can look at PCI host bridge apertures instead of BARs, that
> would solve both problems. Reassigning those apertures is
> theoretically possible but is not even a gleam in our eyes yet.

<nods> I think I have to have both (BARs and host bridge apertures) as when
we do PCI passthrough to a guest - we might do it without a bridge.


>
> > + unsigned long pfn, start, end, ok_pfns;
> > + char bus_addr[64];
> > + char *fmt;
> > +
> > + if (!pci_resource_len(pci_dev, i))
> > + continue;
> > +
> > + if (pci_resource_flags(pci_dev, i) == IORESOURCE_IO)
>
> You probably want:
>
> if (pci_resource_flags(pci_dev, i) & IORESOURCE_IO)
>
> here. And I don't understand what you're doing below, but my
> impression is you only care about MEM regions, not IO regions, so it
> seems like you might want to skip IO completely.

<nods> Good catch!

However there are is one thing this patch does not cover - that is if the
PCI bridge window is outside the P2M tree we keep (the P2M tree is a lookup
system to construct the PTE's with valid and correct physical addresses) we
end up always returning 0. David Vrabel suggests to return just the PFN value
which would certainly solve this particular problem - but I am bit
uncomfortable of doing surgery in that code.

2013-10-29 08:23:42

by Jan Beulich

[permalink] [raw]
Subject: Re: [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs

>>> On 28.10.13 at 17:58, Konrad Rzeszutek Wilk <[email protected]> wrote:
> On Fri, Oct 25, 2013 at 04:08:19PM -0600, Bjorn Helgaas wrote:
>> If you can look at PCI host bridge apertures instead of BARs, that
>> would solve both problems. Reassigning those apertures is
>> theoretically possible but is not even a gleam in our eyes yet.
>
> <nods> I think I have to have both (BARs and host bridge apertures) as when
> we do PCI passthrough to a guest - we might do it without a bridge.

Why? Aren't the host bridge ranges necessarily a superset of the
individual devices' BARs?

Jan

2013-10-29 14:46:46

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs

On Tue, Oct 29, 2013 at 08:23:30AM +0000, Jan Beulich wrote:
> >>> On 28.10.13 at 17:58, Konrad Rzeszutek Wilk <[email protected]> wrote:
> > On Fri, Oct 25, 2013 at 04:08:19PM -0600, Bjorn Helgaas wrote:
> >> If you can look at PCI host bridge apertures instead of BARs, that
> >> would solve both problems. Reassigning those apertures is
> >> theoretically possible but is not even a gleam in our eyes yet.
> >
> > <nods> I think I have to have both (BARs and host bridge apertures) as when
> > we do PCI passthrough to a guest - we might do it without a bridge.
>
> Why? Aren't the host bridge ranges necessarily a superset of the
> individual devices' BARs?

Yes. But when you pass in a PCI device to a PV guest you don't pass in the
bridge. Just the PCI device itself.


>
> Jan
>

2013-10-29 14:55:19

by Jan Beulich

[permalink] [raw]
Subject: Re: [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs

>>> On 29.10.13 at 15:45, Konrad Rzeszutek Wilk <[email protected]> wrote:
> On Tue, Oct 29, 2013 at 08:23:30AM +0000, Jan Beulich wrote:
>> >>> On 28.10.13 at 17:58, Konrad Rzeszutek Wilk <[email protected]> wrote:
>> > On Fri, Oct 25, 2013 at 04:08:19PM -0600, Bjorn Helgaas wrote:
>> >> If you can look at PCI host bridge apertures instead of BARs, that
>> >> would solve both problems. Reassigning those apertures is
>> >> theoretically possible but is not even a gleam in our eyes yet.
>> >
>> > <nods> I think I have to have both (BARs and host bridge apertures) as when
>> > we do PCI passthrough to a guest - we might do it without a bridge.
>>
>> Why? Aren't the host bridge ranges necessarily a superset of the
>> individual devices' BARs?
>
> Yes. But when you pass in a PCI device to a PV guest you don't pass in the
> bridge. Just the PCI device itself.

Right you are. Which means that basing the whole logic on the
PCI device BARs is likely wrong anyway, not just because it
doesn't account for other MMIO ranges.

Jan

2013-10-29 15:12:10

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs

On Tue, Oct 29, 2013 at 02:55:13PM +0000, Jan Beulich wrote:
> >>> On 29.10.13 at 15:45, Konrad Rzeszutek Wilk <[email protected]> wrote:
> > On Tue, Oct 29, 2013 at 08:23:30AM +0000, Jan Beulich wrote:
> >> >>> On 28.10.13 at 17:58, Konrad Rzeszutek Wilk <[email protected]> wrote:
> >> > On Fri, Oct 25, 2013 at 04:08:19PM -0600, Bjorn Helgaas wrote:
> >> >> If you can look at PCI host bridge apertures instead of BARs, that
> >> >> would solve both problems. Reassigning those apertures is
> >> >> theoretically possible but is not even a gleam in our eyes yet.
> >> >
> >> > <nods> I think I have to have both (BARs and host bridge apertures) as when
> >> > we do PCI passthrough to a guest - we might do it without a bridge.
> >>
> >> Why? Aren't the host bridge ranges necessarily a superset of the
> >> individual devices' BARs?
> >
> > Yes. But when you pass in a PCI device to a PV guest you don't pass in the
> > bridge. Just the PCI device itself.
>
> Right you are. Which means that basing the whole logic on the
> PCI device BARs is likely wrong anyway, not just because it
> doesn't account for other MMIO ranges.

Right, but that is OK. When you pass in a PCI device to a PV guest you
only care about that specific device driver being able to access its BARs.

>
> Jan
>

2013-10-29 15:30:25

by Jan Beulich

[permalink] [raw]
Subject: Re: [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs

>>> On 29.10.13 at 16:11, Konrad Rzeszutek Wilk <[email protected]> wrote:
> On Tue, Oct 29, 2013 at 02:55:13PM +0000, Jan Beulich wrote:
>> >>> On 29.10.13 at 15:45, Konrad Rzeszutek Wilk <[email protected]> wrote:
>> > On Tue, Oct 29, 2013 at 08:23:30AM +0000, Jan Beulich wrote:
>> >> >>> On 28.10.13 at 17:58, Konrad Rzeszutek Wilk <[email protected]> wrote:
>> >> > On Fri, Oct 25, 2013 at 04:08:19PM -0600, Bjorn Helgaas wrote:
>> >> >> If you can look at PCI host bridge apertures instead of BARs, that
>> >> >> would solve both problems. Reassigning those apertures is
>> >> >> theoretically possible but is not even a gleam in our eyes yet.
>> >> >
>> >> > <nods> I think I have to have both (BARs and host bridge apertures) as when
>> >> > we do PCI passthrough to a guest - we might do it without a bridge.
>> >>
>> >> Why? Aren't the host bridge ranges necessarily a superset of the
>> >> individual devices' BARs?
>> >
>> > Yes. But when you pass in a PCI device to a PV guest you don't pass in the
>> > bridge. Just the PCI device itself.
>>
>> Right you are. Which means that basing the whole logic on the
>> PCI device BARs is likely wrong anyway, not just because it
>> doesn't account for other MMIO ranges.
>
> Right, but that is OK. When you pass in a PCI device to a PV guest you
> only care about that specific device driver being able to access its BARs.

Yes, but again - this is not the only possible way of allowing a
guest access to MMIO. See the "iomem" config option.

Jan

2013-10-29 16:09:01

by Gordan Bobic

[permalink] [raw]
Subject: Re: [Xen-devel] [v1 1/2] xen/p2m: Create identity mappings for PFNs beyound E820 and PCI BARs

On Tue, 29 Oct 2013 11:11:29 -0400, Konrad Rzeszutek Wilk
<[email protected]> wrote:
> On Tue, Oct 29, 2013 at 02:55:13PM +0000, Jan Beulich wrote:
>> >>> On 29.10.13 at 15:45, Konrad Rzeszutek Wilk
>> <[email protected]> wrote:
>> > On Tue, Oct 29, 2013 at 08:23:30AM +0000, Jan Beulich wrote:
>> >> >>> On 28.10.13 at 17:58, Konrad Rzeszutek Wilk
>> <[email protected]> wrote:
>> >> > On Fri, Oct 25, 2013 at 04:08:19PM -0600, Bjorn Helgaas wrote:
>> >> >> If you can look at PCI host bridge apertures instead of BARs,
>> that
>> >> >> would solve both problems. Reassigning those apertures is
>> >> >> theoretically possible but is not even a gleam in our eyes
>> yet.
>> >> >
>> >> > <nods> I think I have to have both (BARs and host bridge
>> apertures) as when
>> >> > we do PCI passthrough to a guest - we might do it without a
>> bridge.
>> >>
>> >> Why? Aren't the host bridge ranges necessarily a superset of the
>> >> individual devices' BARs?
>> >
>> > Yes. But when you pass in a PCI device to a PV guest you don't
>> pass in the
>> > bridge. Just the PCI device itself.
>>
>> Right you are. Which means that basing the whole logic on the
>> PCI device BARs is likely wrong anyway, not just because it
>> doesn't account for other MMIO ranges.
>
> Right, but that is OK. When you pass in a PCI device to a PV guest
> you
> only care about that specific device driver being able to access its
> BARs.

Maybe I'm reading this wrong, but my understanding is that you cannot
pass PCI bridges to domU anyway (in HVM - I tried). Is that not the
case?
I'm particularly interested in this for two reasons:

1) Some GPUs (Nvidia?) use bus resets to reset the GPU

2) Multi-GPU cards (e.g. GTX690/Grid K2) come with a PCIe bridge
of their own. I am successfully passing a modified GTX680 (as either
a Grid K2 or Quadro K5000) to a domU, but have completely failed to
get a modified GTX690 (Grid K2, exact same GPU as the GTX680) to work
with passthru. The only theory I have is that the extra PCIe bridge
is the problem (possibly a compound problem that only manigests when
there is a PLX (as per GTX690) PCIe bridge behind a NF200 PCIe bridge,
behind an Intel PCIe bridge.

Gordan