2016-04-27 12:19:40

by Yongji Xie

[permalink] [raw]
Subject: [PATCH 0/4] PCI: Add support for enforcing all MMIO BARs not to share PAGE_SIZE

This series aims to add an option for PCI resource allocator to
force BARs not to share PAGE_SIZE. This would make sense to VFIO
driver. Because current VFIO implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs which may share the same page
with other BARs for security reasons. Thus, we have to handle mmio
access to these BARs in QEMU emulation rather than in guest which
will cause some performance loss.

To achieve that, we would like to make use of the existing
resource_alignment kernel parameter and force a minimum alignment
of PAGE_SIZE. It's flexible. And we can default to enable it on
some archs which may easily hit the performance issue because of
their 64K page.

In this series, patch 1 fixed a bug of resource_alignment; patch 2,3
tried to add a new option for resource_alignment to use
IORESOURCE_STARTALIGN to specify the alignment of PCI BARs; patch 4
modified resource_alignment to support syntax which can be used to
enforce the alignment of all MMIO BARs to be at least PAGE_SIZE.

Yongji Xie (4):
PCI: Ignore resource_alignment if PCI_PROBE_ONLY was set
PCI: Do not Use IORESOURCE_STARTALIGN to identify bridge resources
PCI: Add a new option for resource_alignment to reassign alignment
PCI: Add support for enforcing all MMIO BARs to be page aligned

Documentation/kernel-parameters.txt | 7 ++-
arch/powerpc/include/asm/pci.h | 2 +
drivers/pci/pci.c | 105 ++++++++++++++++++++++++++++-------
drivers/pci/setup-bus.c | 9 ++-
4 files changed, 98 insertions(+), 25 deletions(-)

--
1.7.9.5


2016-04-27 12:19:00

by Yongji Xie

[permalink] [raw]
Subject: [PATCH 1/4] PCI: Ignore resource_alignment if PCI_PROBE_ONLY was set

The resource_alignment will releases memory resources allocated
by firmware so that kernel can reassign new resources later on.
But this will cause the problem that no resources can be
allocated by kernel if PCI_PROBE_ONLY was set, e.g. on pSeries
platform because PCI_PROBE_ONLY force kernel to use firmware
setup and not to reassign any resources.

To solve this problem, this patch ignores resource_alignment
if PCI_PROBE_ONLY was set.

Signed-off-by: Yongji Xie <[email protected]>
---
drivers/pci/pci.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 25e0327..19b9560 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4617,6 +4617,12 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
spin_lock(&resource_alignment_lock);
p = resource_alignment_param;
while (*p) {
+ if (pci_has_flag(PCI_PROBE_ONLY)) {
+ printk(KERN_ERR "PCI: Ignore resource_alignment parameter: %s with PCI_PROBE_ONLY set\n",
+ p);
+ *p = 0;
+ break;
+ }
count = 0;
if (sscanf(p, "%d%n", &align_order, &count) == 1 &&
p[count] == '@') {
--
1.7.9.5

2016-04-27 12:20:13

by Yongji Xie

[permalink] [raw]
Subject: [PATCH 4/4] PCI: Add support for enforcing all MMIO BARs to be page aligned

When vfio passthrough a PCI device of which MMIO BARs are
smaller than PAGE_SIZE, guest will not handle the mmio
accesses to the BARs which leads to mmio emulations in host.

This is because vfio will not allow to passthrough one BAR's
mmio page which may be shared with other BARs. Otherwise,
there will be a backdoor that guest can use to access BARs
of other device.

To solve this performance issue, this patch modifies
resource_alignment to support syntax where multiple devices
get the same alignment. So we can use something like
"pci=resource_alignment=*:*:*.*:noresize" to enforce the
alignment of all MMIO BARs to be at least PAGE_SIZE so that
one BAR's mmio page would not be shared with other BARs.

And we also define a macro PCIBIOS_MIN_ALIGNMENT to enable this
automatically on PPC64 platform which can easily hit this issue
because its PAGE_SIZE is 64KB.

Signed-off-by: Yongji Xie <[email protected]>
---
Documentation/kernel-parameters.txt | 2 ++
arch/powerpc/include/asm/pci.h | 2 ++
drivers/pci/pci.c | 64 +++++++++++++++++++++++++++++------
3 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 13c25bf..d9cf5ae4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2964,6 +2964,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
aligned memory resources.
If <order of align> is not specified,
PAGE_SIZE is used as alignment.
+ <domain>, <bus>, <slot> and <func> can be set to
+ "*" which means match all values.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
noresize: Don't change the resources' sizes when
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index a6f3ac0..742fd34 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -28,6 +28,8 @@
#define PCIBIOS_MIN_IO 0x1000
#define PCIBIOS_MIN_MEM 0x10000000

+#define PCIBIOS_MIN_ALIGNMENT PAGE_SIZE
+
struct pci_dev;

/* Values for the `which' argument to sys_pciconfig_iobase syscall. */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a6d2b66..52df5fd 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4615,7 +4615,12 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
int seg, bus, slot, func, align_order, count;
resource_size_t align = 0;
char *p;
+ bool invalid = false;

+#ifdef PCIBIOS_MIN_ALIGNMENT
+ align = PCIBIOS_MIN_ALIGNMENT;
+ *resize = false;
+#endif
spin_lock(&resource_alignment_lock);
p = resource_alignment_param;
while (*p) {
@@ -4632,16 +4637,49 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
} else {
align_order = -1;
}
- if (sscanf(p, "%x:%x:%x.%x%n",
- &seg, &bus, &slot, &func, &count) != 4) {
+ if (p[0] == '*' && p[1] == ':') {
+ seg = -1;
+ count = 1;
+ } else if (sscanf(p, "%x%n", &seg, &count) != 1 ||
+ p[count] != ':') {
+ invalid = true;
+ break;
+ }
+ p += count + 1;
+ if (*p == '*') {
+ bus = -1;
+ count = 1;
+ } else if (sscanf(p, "%x%n", &bus, &count) != 1) {
+ invalid = true;
+ break;
+ }
+ p += count;
+ if (*p == '.') {
+ slot = bus;
+ bus = seg;
seg = 0;
- if (sscanf(p, "%x:%x.%x%n",
- &bus, &slot, &func, &count) != 3) {
- /* Invalid format */
- printk(KERN_ERR "PCI: Can't parse resource_alignment parameter: %s\n",
- p);
+ p++;
+ } else if (*p == ':') {
+ p++;
+ if (p[0] == '*' && p[1] == '.') {
+ slot = -1;
+ count = 1;
+ } else if (sscanf(p, "%x%n", &slot, &count) != 1 ||
+ p[count] != '.') {
+ invalid = true;
break;
}
+ p += count + 1;
+ } else {
+ invalid = true;
+ break;
+ }
+ if (*p == '*') {
+ func = -1;
+ count = 1;
+ } else if (sscanf(p, "%x%n", &func, &count) != 1) {
+ invalid = true;
+ break;
}
p += count;
if (!strncmp(p, ":noresize", 9)) {
@@ -4649,10 +4687,10 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
p += 9;
} else
*resize = true;
- if (seg == pci_domain_nr(dev->bus) &&
- bus == dev->bus->number &&
- slot == PCI_SLOT(dev->devfn) &&
- func == PCI_FUNC(dev->devfn)) {
+ if ((seg == pci_domain_nr(dev->bus) || seg == -1) &&
+ (bus == dev->bus->number || bus == -1) &&
+ (slot == PCI_SLOT(dev->devfn) || slot == -1) &&
+ (func == PCI_FUNC(dev->devfn) || func == -1)) {
if (align_order == -1)
align = PAGE_SIZE;
else
@@ -4662,10 +4700,14 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
}
if (*p != ';' && *p != ',') {
/* End of param or invalid format */
+ invalid = true;
break;
}
p++;
}
+ if (invalid)
+ printk(KERN_ERR "PCI: Can't parse resource_alignment parameter:%s\n",
+ p);
spin_unlock(&resource_alignment_lock);
return align;
}
--
1.7.9.5

2016-04-27 12:20:21

by Yongji Xie

[permalink] [raw]
Subject: [PATCH 3/4] PCI: Add a new option for resource_alignment to reassign alignment

When using resource_alignment kernel parameter, the current
implement reassigns the alignment by changing resources' size
which can potentially break some drivers. For example, the driver
uses the size to locate some register whose length is related
to the size.

This patch adds a new option "noresize" for the parameter to
solve this problem.

Signed-off-by: Yongji Xie <[email protected]>
---
Documentation/kernel-parameters.txt | 5 ++++-
drivers/pci/pci.c | 35 +++++++++++++++++++++++++----------
2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 0b3de80..13c25bf 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2958,13 +2958,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
window. The default value is 64 megabytes.
resource_alignment=
Format:
- [<order of align>@][<domain>:]<bus>:<slot>.<func>[; ...]
+ [<order of align>@][<domain>:]<bus>:<slot>.<func>
+ [:noresize][; ...]
Specifies alignment and device to reassign
aligned memory resources.
If <order of align> is not specified,
PAGE_SIZE is used as alignment.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
+ noresize: Don't change the resources' sizes when
+ reassigning alignment.
ecrc= Enable/disable PCIe ECRC (transaction layer
end-to-end CRC checking).
bios: Use BIOS/firmware settings. This is the
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 19b9560..a6d2b66 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4604,11 +4604,13 @@ static DEFINE_SPINLOCK(resource_alignment_lock);
/**
* pci_specified_resource_alignment - get resource alignment specified by user.
* @dev: the PCI device to get
+ * @resize: whether or not to change resources' size when reassigning alignment
*
* RETURNS: Resource alignment if it is specified.
* Zero if it is not specified.
*/
-static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
+static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
+ bool *resize)
{
int seg, bus, slot, func, align_order, count;
resource_size_t align = 0;
@@ -4642,6 +4644,11 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
}
}
p += count;
+ if (!strncmp(p, ":noresize", 9)) {
+ *resize = false;
+ p += 9;
+ } else
+ *resize = true;
if (seg == pci_domain_nr(dev->bus) &&
bus == dev->bus->number &&
slot == PCI_SLOT(dev->devfn) &&
@@ -4674,11 +4681,12 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
{
int i;
struct resource *r;
+ bool resize = true;
resource_size_t align, size;
u16 command;

/* check if specified PCI is target device to reassign */
- align = pci_specified_resource_alignment(dev);
+ align = pci_specified_resource_alignment(dev, &resize);
if (!align)
return;

@@ -4700,15 +4708,22 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
if (!(r->flags & IORESOURCE_MEM))
continue;
size = resource_size(r);
- if (size < align) {
- size = align;
- dev_info(&dev->dev,
- "Rounding up size of resource #%d to %#llx.\n",
- i, (unsigned long long)size);
+ if (resize) {
+ if (size < align) {
+ size = align;
+ dev_info(&dev->dev,
+ "Rounding up size of resource #%d to %#llx.\n",
+ i, (unsigned long long)size);
+ }
+ r->flags |= IORESOURCE_UNSET;
+ r->end = size - 1;
+ r->start = 0;
+ } else {
+ r->flags &= ~IORESOURCE_SIZEALIGN;
+ r->flags |= IORESOURCE_STARTALIGN | IORESOURCE_UNSET;
+ r->start = max(align, size);
+ r->end = r->start + size - 1;
}
- r->flags |= IORESOURCE_UNSET;
- r->end = size - 1;
- r->start = 0;
}
/* Need to disable bridge's resource window,
* to enable the kernel to reassign new resource
--
1.7.9.5

2016-04-27 12:20:44

by Yongji Xie

[permalink] [raw]
Subject: [PATCH 2/4] PCI: Do not Use IORESOURCE_STARTALIGN to identify bridge resources

Now we use the IORESOURCE_STARTALIGN to identify bridge resources
in __assign_resources_sorted(). That's quite fragile. We can't
make sure that the PCI devices' resources will not use
IORESOURCE_STARTALIGN any more.

In this patch, we try to use a more robust way to identify
bridge resources.

Signed-off-by: Yongji Xie <[email protected]>
---
drivers/pci/setup-bus.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 55641a3..216ddbc 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -390,6 +390,7 @@ static void __assign_resources_sorted(struct list_head *head,
struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
unsigned long fail_type;
resource_size_t add_align, align;
+ int index;

/* Check if optional add_size is there */
if (!realloc_head || list_empty(realloc_head))
@@ -410,11 +411,13 @@ static void __assign_resources_sorted(struct list_head *head,

/*
* There are two kinds of additional resources in the list:
- * 1. bridge resource -- IORESOURCE_STARTALIGN
- * 2. SR-IOV resource -- IORESOURCE_SIZEALIGN
+ * 1. bridge resource
+ * 2. SR-IOV resource
* Here just fix the additional alignment for bridge
*/
- if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
+ index = dev_res->res - dev_res->dev->resource;
+ if (index < PCI_BRIDGE_RESOURCES ||
+ index > PCI_BRIDGE_RESOURCE_END)
continue;

add_align = get_res_add_align(realloc_head, dev_res->res);
--
1.7.9.5