2015-12-31 08:52:22

by Yongji Xie

[permalink] [raw]
Subject: [RFC PATCH v2 0/3] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform

Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs and MSI-X table. This is because
sub-page BARs' mmio page may be shared with other BARs and MSI-X table
should not be accessed directly from the guest for security reasons.

But these will easily cause some performance issues for mmio accesses
in guest when vfio passthrough sub-page BARs or BARs containing MSI-X
table on PPC64 platform. This is because PAGE_SIZE is 64KB by default
on PPC64 platform and the big page may easily hit the sub-page MMIO
BARs' unmmapping and cause the unmmaping of the mmio page which
MSI-X table locate in, which lead to mmio emulation in host.

For sub-page MMIO BARs' unmmapping, this patchset adds a kernel
parameter for PCI resource allocator to enforce the alignment of all
MMIO BARs to be at least PAGE_SZIE and make it enabled by default on
PPC64 platform so that sub-page BAR's mmio page will not be shared
with other BARs. Then we can mmap sub-page MMIO BARs in vfio-pci driver
with this parameter enabled.

For MSI-X table's unmmapping, we think MSI-X table is safe to access
directly from userspace with EEH mechanism enabled which can ensure that
a given pci device can only shoot the MSIs assigned for its PE. So
we add a Kconfig option to support for mmapping MSI-X table in vfio-pci
driver if EEH is supported.

With this patchset applied, we can get almost 100% improvement on
performance for mmio accesses when we passthrough sub-page BARs to guest
in our test.

The last two patches are based on the proposed patchset[1].

Changelog v2:
- Rebase on v4.4-rc6 with the patchset[1] applied.
- Use kernel parameter to enforce all MMIO BARs to be page aligned
on PCI core code instead of doing it on PPC64 arch code.
- Remove flags: VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED
VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP
- Add a Kconfig option to support for mmapping MSI-X table.

[1] https://lkml.org/lkml/2015/11/23/748

Yongji Xie (3):
PCI: Add support for enforcing all MMIO BARs to be page aligned
vfio-pci: Allow to mmap sub-page MMIO BARs if all MMIO BARs are page aligned
vfio-pci: Allow to mmap MSI-X table if EEH is supported

Documentation/kernel-parameters.txt | 4 ++++
arch/powerpc/include/asm/pci.h | 11 +++++++++++
drivers/pci/pci.c | 17 +++++++++++++++++
drivers/pci/pci.h | 7 ++++++-
drivers/vfio/pci/Kconfig | 4 ++++
drivers/vfio/pci/vfio_pci.c | 13 ++++++++++---
include/linux/pci.h | 2 ++
7 files changed, 54 insertions(+), 4 deletions(-)

--
1.7.9.5


2015-12-31 08:52:19

by Yongji Xie

[permalink] [raw]
Subject: [RFC PATCH v2 1/3] PCI: Add support for enforcing all MMIO BARs to be page aligned

When vfio passthrough a PCI device of which MMIO BARs
are smaller than PAGE_SIZE, guest will not handle the
mmio accesses to the BARs which leads to mmio emulations
in host.

This is because vfio will not allow to passthrough one
BAR's mmio page which may be shared with other BARs.

To solve this performance issue, this patch adds a kernel
parameter "pci=resource_page_aligned=on" to enforce
the alignments of all MMIO BARs to be at least PAGE_SIZE,
so that one BAR's mmio page would not be shared with other
BARs. We can also disable it through kernel parameter
"pci=resource_page_aligned=off".

For the default value of this parameter, we think it should be
arch-independent, so we add a macro PCI_RESOURCE_PAGE_ALIGNED
to change it. And we define this macro to enable this parameter
by default on PPC64 platform which can easily hit this
performance issue because its PAGE_SIZE is 64KB.

Signed-off-by: Yongji Xie <[email protected]>
---
Documentation/kernel-parameters.txt | 4 ++++
arch/powerpc/include/asm/pci.h | 11 +++++++++++
drivers/pci/pci.c | 17 +++++++++++++++++
drivers/pci/pci.h | 7 ++++++-
include/linux/pci.h | 2 ++
5 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 742f69d..a53aaee 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2857,6 +2857,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
PAGE_SIZE is used as alignment.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
+ resource_page_aligned= Enable/disable enforcing the alignment
+ of all PCI devices' memory resources to be
+ at least PAGE_SIZE.
+ Format: { "on" | "off" }
ecrc= Enable/disable PCIe ECRC (transaction layer
end-to-end CRC checking).
bios: Use BIOS/firmware settings. This is the
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 3453bd8..27bff59 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -136,6 +136,17 @@ extern pgprot_t pci_phys_mem_access_prot(struct file *file,
unsigned long pfn,
unsigned long size,
pgprot_t prot);
+#ifdef CONFIG_PPC64
+
+/* For PPC64, We enforce all PCI MMIO BARs to be page aligned
+ * by default. This would be helpful to improve performance
+ * when we passthrough a PCI device of which BARs are smaller
+ * than PAGE_SIZE(64KB). And we can use bootcmd
+ * "pci=resource_page_aligned=off" to disable it.
+ */
+#define PCI_ENABLE_RESOURCE_PAGE_ALIGNED
+
+#endif

#define HAVE_ARCH_PCI_RESOURCE_TO_USER
extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 314db8c..9f14ba5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -99,6 +99,13 @@ u8 pci_cache_line_size;
*/
unsigned int pcibios_max_latency = 255;

+#ifdef PCI_ENABLE_RESOURCE_PAGE_ALIGNED
+bool pci_resource_page_aligned = true;
+#else
+bool pci_resource_page_aligned;
+#endif
+EXPORT_SYMBOL(pci_resource_page_aligned);
+
/* If set, the PCIe ARI capability will not be used. */
static bool pcie_ari_disabled;

@@ -4746,6 +4753,14 @@ static ssize_t pci_resource_alignment_store(struct bus_type *bus,
BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_show,
pci_resource_alignment_store);

+static void pci_resource_get_page_aligned(char *str)
+{
+ if (!strncmp(str, "off", 3))
+ pci_resource_page_aligned = false;
+ else if (!strncmp(str, "on", 2))
+ pci_resource_page_aligned = true;
+}
+
static int __init pci_resource_alignment_sysfs_init(void)
{
return bus_create_file(&pci_bus_type,
@@ -4859,6 +4874,8 @@ static int __init pci_setup(char *str)
} else if (!strncmp(str, "resource_alignment=", 19)) {
pci_set_resource_alignment_param(str + 19,
strlen(str + 19));
+ } else if (!strncmp(str, "resource_page_aligned=", 22)) {
+ pci_resource_get_page_aligned(str + 22);
} else if (!strncmp(str, "ecrc=", 5)) {
pcie_ecrc_get_policy(str + 5);
} else if (!strncmp(str, "hpiosize=", 9)) {
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d390fc1..e16e48c 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -312,11 +312,16 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
#ifdef CONFIG_PCI_IOV
int resno = res - dev->resource;

- if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END)
+ if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END) {
+ if (pci_resource_page_aligned && res->flags & IORESOURCE_MEM)
+ return PAGE_ALIGN(pci_sriov_resource_alignment(dev, resno));
return pci_sriov_resource_alignment(dev, resno);
+ }
#endif
if (dev->class >> 8 == PCI_CLASS_BRIDGE_CARDBUS)
return pci_cardbus_resource_alignment(res);
+ if (pci_resource_page_aligned && res->flags & IORESOURCE_MEM)
+ return PAGE_ALIGN(resource_alignment(res));
return resource_alignment(res);
}

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 6ae25aa..0ca57f1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1517,6 +1517,8 @@ static inline int pci_get_new_domain_nr(void) { return -ENOSYS; }

#include <asm/pci.h>

+extern bool pci_resource_page_aligned;
+
/* these helpers provide future and backwards compatibility
* for accessing popular PCI BAR info */
#define pci_resource_start(dev, bar) ((dev)->resource[(bar)].start)
--
1.7.9.5

2015-12-31 08:52:27

by Yongji Xie

[permalink] [raw]
Subject: [RFC PATCH v2 2/3] vfio-pci: Allow to mmap sub-page MMIO BARs if all MMIO BARs are page aligned

Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio
page may be shared with other BARs.

But we should allow to mmap these sub-page MMIO BARs if PCI
resource allocator enforces the alignment of all MMIO BARs
to be at least PAGE_SIZE.

Signed-off-by: Yongji Xie <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 7e9f497..09b3805 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -552,7 +552,8 @@ static long vfio_pci_ioctl(void *device_data,
VFIO_REGION_INFO_FLAG_WRITE;
if (IS_ENABLED(CONFIG_VFIO_PCI_MMAP) &&
pci_resource_flags(pdev, info.index) &
- IORESOURCE_MEM && info.size >= PAGE_SIZE) {
+ IORESOURCE_MEM && (info.size >= PAGE_SIZE ||
+ pci_resource_page_aligned)) {
info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
if (info.index == vdev->msix_bar) {
ret = msix_sparse_mmap_cap(vdev, &caps);
@@ -954,6 +955,10 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
return -EINVAL;

phys_len = pci_resource_len(pdev, index);
+
+ if (pci_resource_page_aligned)
+ phys_len = PAGE_ALIGN(phys_len);
+
req_len = vma->vm_end - vma->vm_start;
pgoff = vma->vm_pgoff &
((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
--
1.7.9.5

2015-12-31 08:52:31

by Yongji Xie

[permalink] [raw]
Subject: [RFC PATCH v2 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

Current vfio-pci implementation disallows to mmap MSI-X
table in case that user get to touch this directly.

However, EEH mechanism can ensure that a given pci device
can only shoot the MSIs assigned for its PE. So we think
it's safe to expose the MSI-X table to userspace because
the exposed MSI-X table can't be used to do harm to other
memory space.

And with MSI-X table mmapped, some performance issues which
are caused when PCI adapters have critical registers in the
same page as the MSI-X table also can be resolved.

So this patch adds a Kconfig option, VFIO_PCI_MMAP_MSIX,
to support for mmapping MSI-X table.

Signed-off-by: Yongji Xie <[email protected]>
---
drivers/vfio/pci/Kconfig | 4 ++++
drivers/vfio/pci/vfio_pci.c | 6 ++++--
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 02912f1..67b0a2c 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -23,6 +23,10 @@ config VFIO_PCI_MMAP
depends on VFIO_PCI
def_bool y if !S390

+config VFIO_PCI_MMAP_MSIX
+ depends on VFIO_PCI_MMAP
+ def_bool y if EEH
+
config VFIO_PCI_INTX
depends on VFIO_PCI
def_bool y if !S390
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 09b3805..d536985 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -555,7 +555,8 @@ static long vfio_pci_ioctl(void *device_data,
IORESOURCE_MEM && (info.size >= PAGE_SIZE ||
pci_resource_page_aligned)) {
info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
- if (info.index == vdev->msix_bar) {
+ if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP_MSIX) &&
+ info.index == vdev->msix_bar) {
ret = msix_sparse_mmap_cap(vdev, &caps);
if (ret)
return ret;
@@ -967,7 +968,8 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
return -EINVAL;

- if (index == vdev->msix_bar) {
+ if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP_MSIX) &&
+ index == vdev->msix_bar) {
/*
* Disallow mmaps overlapping the MSI-X table; users don't
* get to touch this directly. We could find somewhere
--
1.7.9.5