Greetings,
Following patches are intended to support SR-IOV capability in the
Linux kernel. With these patches, people can turn a PCI device with
the capability into multiple ones from software perspective, which
will benefit KVM and achieve other purposes such as QoS, security,
and etc.
[PATCH 1/8 v4] PCI: define PCI resource names in a 'enum'
[PATCH 2/8 v4] PCI: export __pci_read_base
[PATCH 3/8 v4] PCI: export pci_alloc_child_bus
[PATCH 4/8 v4] PCI: add a wrapper for resource_alignment
[PATCH 5/8 v4] PCI: add a new function to map BAR offset
[PATCH 6/8 v4] PCI: support the SR-IOV capability
[PATCH 7/8 v4] PCI: reserve bus range for the SR-IOV device
[PATCH 8/8 v4] PCI: document the changes
---
b/Documentation/DocBook/kernel-api.tmpl | 1
b/Documentation/PCI/pci-iov-howto.txt | 223 ++++++++
b/drivers/pci/Kconfig | 12
b/drivers/pci/Makefile | 2
b/drivers/pci/iov.c | 853 ++++++++++++++++++++++++++++++++
b/drivers/pci/pci-sysfs.c | 4
b/drivers/pci/pci.c | 19
b/drivers/pci/pci.h | 9
b/drivers/pci/probe.c | 2
b/drivers/pci/proc.c | 7
b/drivers/pci/setup-bus.c | 4
b/drivers/pci/setup-res.c | 8
b/include/linux/pci.h | 38 -
b/include/linux/pci_regs.h | 22
drivers/pci/iov.c | 24
drivers/pci/pci-sysfs.c | 4
drivers/pci/pci.c | 61 ++
drivers/pci/pci.h | 65 ++
drivers/pci/probe.c | 39 -
drivers/pci/setup-res.c | 14
include/linux/pci.h | 57 ++
21 files changed, 1397 insertions(+), 71 deletions(-)
---
Single Root I/O Virtualization (SR-IOV) capability defined by PCI-SIG
is intended to enable multiple system software to share PCI hardware
resources. PCI device that supports this capability can be extended
to one Physical Functions plus multiple Virtual Functions. Physical
Function, which could be considered as the "real" PCI device, reflects
the hardware instance and manages all physical resources. Virtual
Functions are associated with a Physical Function and shares physical
resources with the Physical Function.Software can control allocation of
Virtual Functions via registers encapsulated in the capability structure.
SR-IOV specification can be found at
http://www.pcisig.com/members/downloads/specifications/iov/sr-iov1.0_11Sep07.pdf
Devices that support SR-IOV are available from following vendors:
http://download.intel.com/design/network/ProdBrf/320025.pdf
http://www.netxen.com/products/chipsolutions/NX3031.html
http://www.neterion.com/products/x3100.html
This patch moves all definitions of PCI resource names to an 'enum',
and also replaces some hard-coded resource variables with symbol
names. This change eases the introduction of device specific resources.
Signed-off-by: Yu Zhao <[email protected]>
---
drivers/pci/pci-sysfs.c | 4 +++-
drivers/pci/pci.c | 19 ++-----------------
drivers/pci/probe.c | 2 +-
drivers/pci/proc.c | 7 ++++---
include/linux/pci.h | 37 ++++++++++++++++++++++++-------------
5 files changed, 34 insertions(+), 35 deletions(-)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 2cad6da..c41b783 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -101,11 +101,13 @@ resource_show(struct device * dev, struct device_attribute *attr, char * buf)
struct pci_dev * pci_dev = to_pci_dev(dev);
char * str = buf;
int i;
- int max = 7;
+ int max;
resource_size_t start, end;
if (pci_dev->subordinate)
max = DEVICE_COUNT_RESOURCE;
+ else
+ max = PCI_BRIDGE_RESOURCES;
for (i = 0; i < max; i++) {
struct resource *res = &pci_dev->resource[i];
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 5ecd2d7..a9c64b0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -359,24 +359,9 @@ pci_find_parent_resource(const struct pci_dev *dev, struct resource *res)
static void
pci_restore_bars(struct pci_dev *dev)
{
- int i, numres;
-
- switch (dev->hdr_type) {
- case PCI_HEADER_TYPE_NORMAL:
- numres = 6;
- break;
- case PCI_HEADER_TYPE_BRIDGE:
- numres = 2;
- break;
- case PCI_HEADER_TYPE_CARDBUS:
- numres = 1;
- break;
- default:
- /* Should never get here, but just in case... */
- return;
- }
+ int i;
- for (i = 0; i < numres; i++)
+ for (i = 0; i < PCI_BRIDGE_RESOURCES; i++)
pci_update_resource(dev, i);
}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index dcd6bf1..03ddfee 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -492,7 +492,7 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
child->subordinate = 0xff;
/* Set up default resource pointers and names.. */
- for (i = 0; i < 4; i++) {
+ for (i = 0; i < PCI_BRIDGE_RES_NUM; i++) {
child->resource[i] = &bridge->resource[PCI_BRIDGE_RESOURCES+i];
child->resource[i]->name = child->name;
}
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index e1098c3..f6f2a59 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -352,15 +352,16 @@ static int show_device(struct seq_file *m, void *v)
dev->vendor,
dev->device,
dev->irq);
- /* Here should be 7 and not PCI_NUM_RESOURCES as we need to preserve compatibility */
- for (i=0; i<7; i++) {
+
+ /* only print standard and ROM resources to preserve compatibility */
+ for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
resource_size_t start, end;
pci_resource_to_user(dev, i, &dev->resource[i], &start, &end);
seq_printf(m, "\t%16llx",
(unsigned long long)(start |
(dev->resource[i].flags & PCI_REGION_FLAG_MASK)));
}
- for (i=0; i<7; i++) {
+ for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
resource_size_t start, end;
pci_resource_to_user(dev, i, &dev->resource[i], &start, &end);
seq_printf(m, "\t%16llx",
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f280783..497d639 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -76,7 +76,30 @@ enum pci_mmap_state {
#define PCI_DMA_FROMDEVICE 2
#define PCI_DMA_NONE 3
-#define DEVICE_COUNT_RESOURCE 12
+/*
+ * For PCI devices, the region numbers are assigned this way:
+ */
+enum {
+ /* #0-5: standard PCI regions */
+ PCI_STD_RESOURCES,
+ PCI_STD_RESOURCES_END = 5,
+
+ /* #6: expansion ROM */
+ PCI_ROM_RESOURCE,
+
+ /* address space assigned to buses behind the bridge */
+#ifndef PCI_BRIDGE_RES_NUM
+#define PCI_BRIDGE_RES_NUM 4
+#endif
+ PCI_BRIDGE_RESOURCES,
+ PCI_BRIDGE_RES_END = PCI_BRIDGE_RESOURCES + PCI_BRIDGE_RES_NUM - 1,
+
+ /* total resources associated with a PCI device */
+ PCI_NUM_RESOURCES,
+
+ /* preserve this for compatibility */
+ DEVICE_COUNT_RESOURCE
+};
typedef int __bitwise pci_power_t;
@@ -262,18 +285,6 @@ static inline void pci_add_saved_cap(struct pci_dev *pci_dev,
hlist_add_head(&new_cap->next, &pci_dev->saved_cap_space);
}
-/*
- * For PCI devices, the region numbers are assigned this way:
- *
- * 0-5 standard PCI regions
- * 6 expansion ROM
- * 7-10 bridges: address space assigned to buses behind the bridge
- */
-
-#define PCI_ROM_RESOURCE 6
-#define PCI_BRIDGE_RESOURCES 7
-#define PCI_NUM_RESOURCES 11
-
#ifndef PCI_BUS_NUM_RESOURCES
#define PCI_BUS_NUM_RESOURCES 16
#endif
--
1.5.6.4
Export __pci_read_base() so it can be used by whole PCI subsystem.
Signed-off-by: Yu Zhao <[email protected]>
---
drivers/pci/pci.h | 9 +++++++++
drivers/pci/probe.c | 20 +++++++++-----------
2 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 69b6365..922b742 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -150,6 +150,15 @@ struct pci_slot_attribute {
};
#define to_pci_slot_attr(s) container_of(s, struct pci_slot_attribute, attr)
+enum pci_bar_type {
+ pci_bar_unknown, /* Standard PCI BAR probe */
+ pci_bar_io, /* An io port BAR */
+ pci_bar_mem32, /* A 32-bit memory BAR */
+ pci_bar_mem64, /* A 64-bit memory BAR */
+};
+
+extern int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
+ struct resource *res, unsigned int reg);
extern void pci_enable_ari(struct pci_dev *dev);
/**
* pci_ari_enabled - query ARI forwarding status
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 03ddfee..2326609 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -201,13 +201,6 @@ static u64 pci_size(u64 base, u64 maxbase, u64 mask)
return size;
}
-enum pci_bar_type {
- pci_bar_unknown, /* Standard PCI BAR probe */
- pci_bar_io, /* An io port BAR */
- pci_bar_mem32, /* A 32-bit memory BAR */
- pci_bar_mem64, /* A 64-bit memory BAR */
-};
-
static inline enum pci_bar_type decode_bar(struct resource *res, u32 bar)
{
if ((bar & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
@@ -222,11 +215,16 @@ static inline enum pci_bar_type decode_bar(struct resource *res, u32 bar)
return pci_bar_mem32;
}
-/*
- * If the type is not unknown, we assume that the lowest bit is 'enable'.
- * Returns 1 if the BAR was 64-bit and 0 if it was 32-bit.
+/**
+ * pci_read_base - read a PCI BAR
+ * @dev: the PCI device
+ * @type: type of the BAR
+ * @res: resource buffer to be filled in
+ * @pos: BAR position in the config space
+ *
+ * Returns 1 if the BAR is 64-bit, or 0 if 32-bit.
*/
-static int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
+int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
struct resource *res, unsigned int pos)
{
u32 l, sz, mask;
--
1.5.6.4
Export pci_alloc_child_bus(), and make it be able to handle buses without
bridge devices. Some devices such as SR-IOV devices use more than one bus
number while there is no explicit bridge devices since they have internal
routing mechanism.
Signed-off-by: Yu Zhao <[email protected]>
---
drivers/pci/pci.h | 2 ++
drivers/pci/probe.c | 9 ++++++---
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 922b742..c6fa8ab 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -159,6 +159,8 @@ enum pci_bar_type {
extern int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
struct resource *res, unsigned int reg);
+extern struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
+ struct pci_dev *bridge, int busnr);
extern void pci_enable_ari(struct pci_dev *dev);
/**
* pci_ari_enabled - query ARI forwarding status
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2326609..9c680b8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -454,7 +454,7 @@ static struct pci_bus * pci_alloc_bus(void)
return b;
}
-static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
+struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
struct pci_dev *bridge, int busnr)
{
struct pci_bus *child;
@@ -467,12 +467,10 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
if (!child)
return NULL;
- child->self = bridge;
child->parent = parent;
child->ops = parent->ops;
child->sysdata = parent->sysdata;
child->bus_flags = parent->bus_flags;
- child->bridge = get_device(&bridge->dev);
/* initialize some portions of the bus device, but don't register it
* now as the parent is not properly set up yet. This device will get
@@ -489,6 +487,11 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
child->primary = parent->secondary;
child->subordinate = 0xff;
+ if (!bridge)
+ return child;
+
+ child->self = bridge;
+ child->bridge = get_device(&bridge->dev);
/* Set up default resource pointers and names.. */
for (i = 0; i < PCI_BRIDGE_RES_NUM; i++) {
child->resource[i] = &bridge->resource[PCI_BRIDGE_RESOURCES+i];
--
1.5.6.4
Add a wrap of resource_alignment so it can handle device specific resource
alignment.
Signed-off-by: Yu Zhao <[email protected]>
---
drivers/pci/pci.c | 25 +++++++++++++++++++++++++
drivers/pci/pci.h | 1 +
drivers/pci/setup-bus.c | 4 ++--
drivers/pci/setup-res.c | 7 ++++---
4 files changed, 32 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a9c64b0..381e958 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1884,6 +1884,31 @@ int pci_select_bars(struct pci_dev *dev, unsigned long flags)
return bars;
}
+/**
+ * pci_resource_alignment - get a PCI BAR resource alignment
+ * @dev: the PCI device
+ * @resno: the resource number
+ *
+ * Returns alignment size on success, or 0 on error.
+ */
+int pci_resource_alignment(struct pci_dev *dev, int resno)
+{
+ resource_size_t align;
+ struct resource *res = dev->resource + resno;
+
+ align = resource_alignment(res);
+ if (align)
+ return align;
+
+ if (resno <= PCI_ROM_RESOURCE)
+ return resource_size(res);
+ else if (resno <= PCI_BRIDGE_RES_END)
+ return res->start;
+
+ dev_err(&dev->dev, "alignment: invalid resource #%d\n", resno);
+ return 0;
+}
+
static void __devinit pci_no_domains(void)
{
#ifdef CONFIG_PCI_DOMAINS
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c6fa8ab..720b7d6 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -161,6 +161,7 @@ extern int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
struct resource *res, unsigned int reg);
extern struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
struct pci_dev *bridge, int busnr);
+extern int pci_resource_alignment(struct pci_dev *dev, int resno);
extern void pci_enable_ari(struct pci_dev *dev);
/**
* pci_ari_enabled - query ARI forwarding status
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6c78cf8..d454ec3 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -25,6 +25,7 @@
#include <linux/ioport.h>
#include <linux/cache.h>
#include <linux/slab.h>
+#include "pci.h"
static void pbus_assign_resources_sorted(struct pci_bus *bus)
@@ -351,8 +352,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask, unsigned long
if (r->parent || (r->flags & mask) != type)
continue;
r_size = resource_size(r);
- /* For bridges size != alignment */
- align = resource_alignment(r);
+ align = pci_resource_alignment(dev, i);
order = __ffs(align) - 20;
if (order > 11) {
dev_warn(&dev->dev, "BAR %d bad alignment %llx: "
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index a81caac..ecff483 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -137,7 +137,7 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
size = resource_size(res);
min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
- align = resource_alignment(res);
+ align = pci_resource_alignment(dev, resno);
if (!align) {
dev_err(&dev->dev, "BAR %d: can't allocate resource (bogus "
"alignment) [%#llx-%#llx] flags %#lx\n",
@@ -235,7 +235,7 @@ void pdev_sort_resources(struct pci_dev *dev, struct resource_list *head)
if (!(r->flags) || r->parent)
continue;
- r_align = resource_alignment(r);
+ r_align = pci_resource_alignment(dev, i);
if (!r_align) {
dev_warn(&dev->dev, "BAR %d: bogus alignment "
"[%#llx-%#llx] flags %#lx\n",
@@ -248,7 +248,8 @@ void pdev_sort_resources(struct pci_dev *dev, struct resource_list *head)
struct resource_list *ln = list->next;
if (ln)
- align = resource_alignment(ln->res);
+ align = pci_resource_alignment(ln->dev,
+ ln->res - ln->dev->resource);
if (r_align > align) {
tmp = kmalloc(sizeof(*tmp), GFP_KERNEL);
--
1.5.6.4
Add a new function to map resource number to base register (offset and type).
Signed-off-by: Yu Zhao <[email protected]>
---
drivers/pci/pci.c | 22 ++++++++++++++++++++++
drivers/pci/pci.h | 2 ++
drivers/pci/setup-res.c | 13 +++++--------
3 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 381e958..3575124 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1909,6 +1909,28 @@ int pci_resource_alignment(struct pci_dev *dev, int resno)
return 0;
}
+/**
+ * pci_resource_bar - get position of the BAR associated with a resource
+ * @dev: the PCI device
+ * @resno: the resource number
+ * @type: the BAR type to be filled in
+ *
+ * Returns BAR position in config space, or 0 if the BAR is invalid.
+ */
+int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type)
+{
+ if (resno < PCI_ROM_RESOURCE) {
+ *type = pci_bar_unknown;
+ return PCI_BASE_ADDRESS_0 + 4 * resno;
+ } else if (resno == PCI_ROM_RESOURCE) {
+ *type = pci_bar_mem32;
+ return dev->rom_base_reg;
+ }
+
+ dev_err(&dev->dev, "BAR: invalid resource #%d\n", resno);
+ return 0;
+}
+
static void __devinit pci_no_domains(void)
{
#ifdef CONFIG_PCI_DOMAINS
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 720b7d6..e2237ad 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -162,6 +162,8 @@ extern int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
extern struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
struct pci_dev *bridge, int busnr);
extern int pci_resource_alignment(struct pci_dev *dev, int resno);
+extern int pci_resource_bar(struct pci_dev *dev, int resno,
+ enum pci_bar_type *type);
extern void pci_enable_ari(struct pci_dev *dev);
/**
* pci_ari_enabled - query ARI forwarding status
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index ecff483..c3585a0 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -31,6 +31,7 @@ void pci_update_resource(struct pci_dev *dev, int resno)
struct pci_bus_region region;
u32 new, check, mask;
int reg;
+ enum pci_bar_type type;
struct resource *res = dev->resource + resno;
/*
@@ -64,17 +65,13 @@ void pci_update_resource(struct pci_dev *dev, int resno)
else
mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
- if (resno < 6) {
- reg = PCI_BASE_ADDRESS_0 + 4 * resno;
- } else if (resno == PCI_ROM_RESOURCE) {
+ reg = pci_resource_bar(dev, resno, &type);
+ if (!reg)
+ return;
+ if (type != pci_bar_unknown) {
if (!(res->flags & IORESOURCE_ROM_ENABLE))
return;
new |= PCI_ROM_ADDRESS_ENABLE;
- reg = dev->rom_base_reg;
- } else {
- /* Hmm, non-standard resource. */
-
- return; /* kill uninitialised var warning */
}
pci_write_config_dword(dev, reg, new);
--
1.5.6.4
Support Single Root I/O Virtualization (SR-IOV) capability.
Signed-off-by: Yu Zhao <[email protected]>
---
drivers/pci/Kconfig | 12 +
drivers/pci/Makefile | 2 +
drivers/pci/iov.c | 853 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/pci/pci-sysfs.c | 4 +
drivers/pci/pci.c | 14 +-
drivers/pci/pci.h | 55 +++
drivers/pci/probe.c | 4 +
include/linux/pci.h | 57 +++
include/linux/pci_regs.h | 21 ++
9 files changed, 1021 insertions(+), 1 deletions(-)
create mode 100644 drivers/pci/iov.c
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index e1ca425..e7c0836 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -50,3 +50,15 @@ config HT_IRQ
This allows native hypertransport devices to use interrupts.
If unsure say Y.
+
+config PCI_IOV
+ bool "PCI SR-IOV support"
+ depends on PCI
+ select PCI_MSI
+ default n
+ help
+ This option allows device drivers to enable Single Root I/O
+ Virtualization. Each Virtual Function's PCI configuration
+ space can be accessed using its own Bus, Device and Function
+ Number (Routing ID). Each Virtual Function also has PCI Memory
+ Space, which is used to map its own register set.
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 7d63f8c..47bb456 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -53,3 +53,5 @@ obj-$(CONFIG_PCI_SYSCALL) += syscall.o
ifeq ($(CONFIG_PCI_DEBUG),y)
EXTRA_CFLAGS += -DDEBUG
endif
+
+obj-$(CONFIG_PCI_IOV) += iov.o
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
new file mode 100644
index 0000000..3cf9709
--- /dev/null
+++ b/drivers/pci/iov.c
@@ -0,0 +1,853 @@
+/*
+ * drivers/pci/iov.c
+ *
+ * Copyright (C) 2008 Intel Corporation
+ *
+ * PCI Express Single Root I/O Virtualization capability support.
+ */
+
+#include <linux/ctype.h>
+#include <linux/string.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <asm/page.h>
+#include "pci.h"
+
+#define VF_NAME_LEN 8
+
+
+struct iov_attr {
+ struct attribute attr;
+ ssize_t (*show)(struct kobject *,
+ struct iov_attr *, char *);
+ ssize_t (*store)(struct kobject *,
+ struct iov_attr *, const char *, size_t);
+};
+
+#define iov_config_attr(field) \
+static ssize_t field##_show(struct kobject *kobj, \
+ struct iov_attr *attr, char *buf) \
+{ \
+ struct pci_iov *iov = container_of(kobj, struct pci_iov, kobj); \
+ \
+ return sprintf(buf, "%d\n", iov->field); \
+}
+
+iov_config_attr(is_enabled);
+iov_config_attr(totalvfs);
+iov_config_attr(initialvfs);
+iov_config_attr(numvfs);
+
+struct vf_entry {
+ int vfn;
+ struct kobject kobj;
+ struct pci_iov *iov;
+ struct iov_attr *attr;
+ char name[VF_NAME_LEN];
+ char (*param)[PCI_IOV_PARAM_LEN];
+};
+
+static ssize_t iov_attr_show(struct kobject *kobj,
+ struct attribute *attr, char *buf)
+{
+ struct iov_attr *ia = container_of(attr, struct iov_attr, attr);
+
+ return ia->show ? ia->show(kobj, ia, buf) : -EIO;
+}
+
+static ssize_t iov_attr_store(struct kobject *kobj,
+ struct attribute *attr, const char *buf, size_t len)
+{
+ struct iov_attr *ia = container_of(attr, struct iov_attr, attr);
+
+ return ia->store ? ia->store(kobj, ia, buf, len) : -EIO;
+}
+
+static struct sysfs_ops iov_attr_ops = {
+ .show = iov_attr_show,
+ .store = iov_attr_store,
+};
+
+static struct kobj_type iov_ktype = {
+ .sysfs_ops = &iov_attr_ops,
+};
+
+static inline void vf_rid(struct pci_dev *dev, int vfn, u8 *busnr, u8 *devfn)
+{
+ u16 rid;
+
+ rid = (dev->bus->number << 8) + dev->devfn +
+ dev->iov->offset + dev->iov->stride * vfn;
+ *busnr = rid >> 8;
+ *devfn = rid & 0xff;
+}
+
+static int vf_add(struct pci_dev *dev, int vfn)
+{
+ int i;
+ int rc;
+ u8 busnr, devfn;
+ unsigned long size;
+ struct pci_dev *new;
+ struct pci_bus *bus;
+ struct resource *res;
+
+ vf_rid(dev, vfn, &busnr, &devfn);
+
+ new = alloc_pci_dev();
+ if (!new)
+ return -ENOMEM;
+
+ if (dev->bus->number == busnr)
+ new->bus = bus = dev->bus;
+ else {
+ list_for_each_entry(bus, &dev->bus->children, node)
+ if (bus->number == busnr) {
+ new->bus = bus;
+ break;
+ }
+ BUG_ON(!new->bus);
+ }
+
+ new->sysdata = bus->sysdata;
+ new->dev.parent = dev->dev.parent;
+ new->dev.bus = dev->dev.bus;
+ new->devfn = devfn;
+ new->hdr_type = PCI_HEADER_TYPE_NORMAL;
+ new->multifunction = 0;
+ new->vendor = dev->vendor;
+ pci_read_config_word(dev, dev->iov->cap + PCI_IOV_VF_DID, &new->device);
+ new->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
+ new->error_state = pci_channel_io_normal;
+ new->is_pcie = 1;
+ new->pcie_type = PCI_EXP_TYPE_ENDPOINT;
+ new->dma_mask = 0xffffffff;
+
+ dev_set_name(&new->dev, "%04x:%02x:%02x.%d", pci_domain_nr(bus),
+ busnr, PCI_SLOT(devfn), PCI_FUNC(devfn));
+
+ pci_read_config_byte(new, PCI_REVISION_ID, &new->revision);
+ new->class = dev->class;
+ new->current_state = PCI_UNKNOWN;
+ new->irq = 0;
+
+ for (i = 0; i < PCI_IOV_NUM_BAR; i++) {
+ res = dev->resource + PCI_IOV_RESOURCES + i;
+ if (!res->parent)
+ continue;
+ new->resource[i].name = pci_name(new);
+ new->resource[i].flags = res->flags;
+ size = resource_size(res) / dev->iov->totalvfs;
+ new->resource[i].start = res->start + size * vfn;
+ new->resource[i].end = new->resource[i].start + size - 1;
+ rc = request_resource(res, &new->resource[i]);
+ BUG_ON(rc);
+ }
+
+ new->subsystem_vendor = dev->subsystem_vendor;
+ pci_read_config_word(new, PCI_SUBSYSTEM_ID, &new->subsystem_device);
+
+ pci_device_add(new, bus);
+ return pci_bus_add_device(new);
+}
+
+static void vf_remove(struct pci_dev *dev, int vfn)
+{
+ u8 busnr, devfn;
+ struct pci_dev *tmp;
+
+ vf_rid(dev, vfn, &busnr, &devfn);
+
+ tmp = pci_get_bus_and_slot(busnr, devfn);
+ if (!tmp)
+ return;
+
+ pci_dev_put(tmp);
+ pci_remove_bus_device(tmp);
+}
+
+static int iov_enable(struct pci_iov *iov)
+{
+ int rc;
+ int i, j;
+ u16 ctrl;
+
+ if (!iov->notify)
+ return -ENODEV;
+
+ if (iov->is_enabled)
+ return 0;
+
+ iov->notify(iov->dev, iov->numvfs | PCI_IOV_ENABLE);
+ pci_read_config_word(iov->dev, iov->cap + PCI_IOV_CTRL, &ctrl);
+ ctrl |= (PCI_IOV_CTRL_VFE | PCI_IOV_CTRL_MSE);
+ pci_write_config_word(iov->dev, iov->cap + PCI_IOV_CTRL, ctrl);
+ ssleep(1);
+
+ for (i = 0; i < iov->numvfs; i++) {
+ rc = vf_add(iov->dev, i);
+ if (rc)
+ goto failed;
+ }
+
+ iov->notify(iov->dev, iov->numvfs |
+ PCI_IOV_ENABLE | PCI_IOV_POST_EVENT);
+ iov->is_enabled = 1;
+ return 0;
+
+failed:
+ for (j = 0; j < i; j++)
+ vf_remove(iov->dev, j);
+
+ pci_read_config_word(iov->dev, iov->cap + PCI_IOV_CTRL, &ctrl);
+ ctrl &= ~(PCI_IOV_CTRL_VFE | PCI_IOV_CTRL_MSE);
+ pci_write_config_word(iov->dev, iov->cap + PCI_IOV_CTRL, ctrl);
+ ssleep(1);
+
+ return rc;
+}
+
+static int iov_disable(struct pci_iov *iov)
+{
+ int i;
+ u16 ctrl;
+
+ if (!iov->notify)
+ return -ENODEV;
+
+ if (!iov->is_enabled)
+ return 0;
+
+ iov->notify(iov->dev, PCI_IOV_DISABLE);
+ for (i = 0; i < iov->numvfs; i++)
+ vf_remove(iov->dev, i);
+
+ pci_read_config_word(iov->dev, iov->cap + PCI_IOV_CTRL, &ctrl);
+ ctrl &= ~(PCI_IOV_CTRL_VFE | PCI_IOV_CTRL_MSE);
+ pci_write_config_word(iov->dev, iov->cap + PCI_IOV_CTRL, ctrl);
+ ssleep(1);
+
+ iov->notify(iov->dev, PCI_IOV_DISABLE | PCI_IOV_POST_EVENT);
+ iov->is_enabled = 0;
+ return 0;
+}
+
+static int iov_set_numvfs(struct pci_iov *iov, int numvfs)
+{
+ u16 offset, stride;
+
+ if (!iov->notify)
+ return -ENODEV;
+
+ if (numvfs == iov->numvfs)
+ return 0;
+
+ if (numvfs < 0 || numvfs > iov->initialvfs || iov->is_enabled)
+ return -EINVAL;
+
+ pci_write_config_word(iov->dev, iov->cap + PCI_IOV_NUM_VF, numvfs);
+ pci_read_config_word(iov->dev, iov->cap + PCI_IOV_VF_OFFSET, &offset);
+ pci_read_config_word(iov->dev, iov->cap + PCI_IOV_VF_STRIDE, &stride);
+ if ((numvfs && !offset) || (numvfs > 1 && !stride))
+ return -EIO;
+
+ iov->offset = offset;
+ iov->stride = stride;
+ iov->numvfs = numvfs;
+ return 0;
+}
+
+static ssize_t is_enabled_store(struct kobject *kobj, struct iov_attr *attr,
+ const char *buf, size_t count)
+{
+ int rc;
+ long enable;
+ struct pci_iov *iov = container_of(kobj, struct pci_iov, kobj);
+
+ rc = strict_strtol(buf, 0, &enable);
+ if (rc)
+ return rc;
+
+ mutex_lock(&iov->mutex);
+ switch (enable) {
+ case 0:
+ rc = iov_disable(iov);
+ break;
+ case 1:
+ rc = iov_enable(iov);
+ break;
+ default:
+ rc = -EINVAL;
+ }
+ mutex_unlock(&iov->mutex);
+
+ return rc ? rc : count;
+}
+
+static ssize_t numvfs_store(struct kobject *kobj, struct iov_attr *attr,
+ const char *buf, size_t count)
+{
+ int rc;
+ long numvfs;
+ struct pci_iov *iov = container_of(kobj, struct pci_iov, kobj);
+
+ rc = strict_strtol(buf, 0, &numvfs);
+ if (rc)
+ return rc;
+
+ mutex_lock(&iov->mutex);
+ rc = iov_set_numvfs(iov, numvfs);
+ mutex_unlock(&iov->mutex);
+
+ return rc ? rc : count;
+}
+
+
+static struct iov_attr iov_attr[] = {
+ __ATTR_RO(totalvfs),
+ __ATTR_RO(initialvfs),
+ __ATTR(numvfs, S_IWUSR | S_IRUGO, numvfs_show, numvfs_store),
+ __ATTR(enable, S_IWUSR | S_IRUGO, is_enabled_show, is_enabled_store),
+};
+
+static ssize_t vf_show(struct kobject *kobj, struct iov_attr *attr,
+ char *buf)
+{
+ int vfn;
+ struct vf_entry *ve = container_of(kobj, struct vf_entry, kobj);
+
+ vfn = attr - ve->attr;
+ ve->iov->notify(ve->iov->dev, vfn | PCI_IOV_RD_CONF);
+
+ return sprintf(buf, "%s\n", ve->param[vfn]);
+}
+
+static ssize_t vf_store(struct kobject *kobj, struct iov_attr *attr,
+ const char *buf, size_t count)
+{
+ int vfn;
+ struct vf_entry *ve = container_of(kobj, struct vf_entry, kobj);
+
+ vfn = attr - ve->attr;
+ sscanf(buf, "%63s", ve->param[vfn]);
+ ve->iov->notify(ve->iov->dev, vfn | PCI_IOV_WR_CONF);
+
+ return count;
+}
+
+static ssize_t rid_show(struct kobject *kobj, struct iov_attr *attr,
+ char *buf)
+{
+ u8 busnr, devfn;
+ struct vf_entry *ve = container_of(kobj, struct vf_entry, kobj);
+
+ vf_rid(ve->iov->dev, ve->vfn, &busnr, &devfn);
+
+ return sprintf(buf, "%04x:%02x:%02x.%d\n",
+ pci_domain_nr(ve->iov->dev->bus),
+ busnr, PCI_SLOT(devfn), PCI_FUNC(devfn));
+}
+
+static struct iov_attr vf_attr = __ATTR_RO(rid);
+
+int iov_alloc_bus(struct pci_bus *bus, int busnr)
+{
+ int i;
+ int rc = 0;
+ struct pci_bus *child, *next;
+ struct list_head head;
+
+ INIT_LIST_HEAD(&head);
+
+ down_write(&pci_bus_sem);
+
+ for (i = bus->number + 1; i <= busnr; i++) {
+ list_for_each_entry(child, &bus->children, node)
+ if (child->number == i)
+ break;
+ if (child->number == i)
+ continue;
+ child = pci_alloc_child_bus(bus, NULL, i);
+ if (!child) {
+ rc = -ENOMEM;
+ break;
+ }
+ child->subordinate = i;
+ child->dev.parent = bus->bridge;
+ rc = device_register(&child->dev);
+ if (rc) {
+ kfree(child);
+ break;
+ }
+ child->is_added = 1;
+ list_add_tail(&child->node, &head);
+ }
+
+ if (rc)
+ list_for_each_entry_safe(child, next, &head, node) {
+ device_unregister(&child->dev);
+ kfree(child);
+ }
+ else
+ list_for_each_entry_safe(child, next, &head, node)
+ list_move_tail(&child->node, &bus->children);
+
+ up_write(&pci_bus_sem);
+
+ return rc;
+}
+
+void iov_release_bus(struct pci_bus *bus)
+{
+ struct pci_dev *dev;
+ struct pci_bus *child, *next;
+ struct list_head head;
+
+ INIT_LIST_HEAD(&head);
+
+ down_write(&pci_bus_sem);
+
+ list_for_each_entry(dev, &bus->devices, bus_list)
+ if (dev->iov && dev->iov->notify)
+ goto done;
+
+ list_for_each_entry_safe(child, next, &bus->children, node)
+ if (!child->bridge)
+ list_move(&child->node, &head);
+done:
+ up_write(&pci_bus_sem);
+
+ list_for_each_entry_safe(child, next, &head, node)
+ pci_remove_bus(child);
+}
+
+/**
+ * pci_iov_init - initialize device's SR-IOV capability
+ * @dev: the PCI device
+ *
+ * Returns 0 on success, or negative on failure.
+ *
+ * The major differences between Virtual Function and PCI device are:
+ * 1) the device with multiple bus numbers uses internal routing, so
+ * there is no explicit bridge device in this case.
+ * 2) Virtual Function memory spaces are designated by BARs encapsulated
+ * in the capability structure, and the BARs in Virtual Function PCI
+ * configuration space are read-only zero.
+ */
+int pci_iov_init(struct pci_dev *dev)
+{
+ int i;
+ int pos;
+ u32 pgsz;
+ u16 ctrl, total, initial, offset, stride;
+ struct pci_iov *iov;
+ struct resource *res;
+
+ if (!dev->is_pcie || (dev->pcie_type != PCI_EXP_TYPE_RC_END &&
+ dev->pcie_type != PCI_EXP_TYPE_ENDPOINT))
+ return -ENODEV;
+
+ pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_IOV);
+ if (!pos)
+ return -ENODEV;
+
+ ctrl = pci_ari_enabled(dev) ? PCI_IOV_CTRL_ARI : 0;
+ pci_write_config_word(dev, pos + PCI_IOV_CTRL, ctrl);
+ ssleep(1);
+
+ pci_read_config_word(dev, pos + PCI_IOV_TOTAL_VF, &total);
+ pci_read_config_word(dev, pos + PCI_IOV_INITIAL_VF, &initial);
+ pci_write_config_word(dev, pos + PCI_IOV_NUM_VF, initial);
+ pci_read_config_word(dev, pos + PCI_IOV_VF_OFFSET, &offset);
+ pci_read_config_word(dev, pos + PCI_IOV_VF_STRIDE, &stride);
+ if (!total || initial > total || (initial && !offset) ||
+ (initial > 1 && !stride))
+ return -EIO;
+
+ pci_read_config_dword(dev, pos + PCI_IOV_SUP_PGSIZE, &pgsz);
+ i = PAGE_SHIFT > 12 ? PAGE_SHIFT - 12 : 0;
+ pgsz &= ~((1 << i) - 1);
+ if (!pgsz)
+ return -EIO;
+
+ pgsz &= ~(pgsz - 1);
+ pci_write_config_dword(dev, pos + PCI_IOV_SYS_PGSIZE, pgsz);
+
+ iov = kzalloc(sizeof(*iov), GFP_KERNEL);
+ if (!iov)
+ return -ENOMEM;
+
+ iov->dev = dev;
+ iov->cap = pos;
+ iov->totalvfs = total;
+ iov->initialvfs = initial;
+ iov->offset = offset;
+ iov->stride = stride;
+ iov->align = pgsz << 12;
+ mutex_init(&iov->mutex);
+
+ for (i = 0; i < PCI_IOV_NUM_BAR; i++) {
+ res = dev->resource + PCI_IOV_RESOURCES + i;
+ pos = iov->cap + PCI_IOV_BAR_0 + i * 4;
+ i += __pci_read_base(dev, pci_bar_unknown, res, pos);
+ if (!res->flags)
+ continue;
+ res->flags &= ~IORESOURCE_SIZEALIGN;
+ res->end = res->start + resource_size(res) * total - 1;
+ }
+
+ dev->iov = iov;
+
+ return 0;
+}
+
+/**
+ * pci_iov_release - release resources used by SR-IOV capability
+ * @dev: the PCI device
+ */
+void pci_iov_release(struct pci_dev *dev)
+{
+ if (!dev->iov)
+ return;
+
+ mutex_destroy(&dev->iov->mutex);
+ kfree(dev->iov);
+ dev->iov = NULL;
+}
+
+/**
+ * pci_iov_create_sysfs - create sysfs for SR-IOV capability
+ * @dev: the PCI device
+ */
+void pci_iov_create_sysfs(struct pci_dev *dev)
+{
+ int rc;
+ int i, j;
+ struct pci_iov *iov = dev->iov;
+
+ if (!iov)
+ return;
+
+ iov->ve = kzalloc(sizeof(*iov->ve) * iov->totalvfs, GFP_KERNEL);
+ if (!iov->ve)
+ return;
+
+ for (i = 0; i < iov->totalvfs; i++) {
+ iov->ve[i].vfn = i;
+ iov->ve[i].iov = iov;
+ }
+
+ rc = kobject_init_and_add(&iov->kobj, &iov_ktype,
+ &dev->dev.kobj, "iov");
+ if (rc)
+ goto failed1;
+
+ for (i = 0; i < ARRAY_SIZE(iov_attr); i++) {
+ rc = sysfs_create_file(&iov->kobj, &iov_attr[i].attr);
+ if (rc)
+ goto failed2;
+ }
+
+ for (i = 0; i < iov->totalvfs; i++) {
+ sprintf(iov->ve[i].name, "%d", i);
+ rc = kobject_init_and_add(&iov->ve[i].kobj, &iov_ktype,
+ &iov->kobj, iov->ve[i].name);
+ if (rc)
+ goto failed3;
+ rc = sysfs_create_file(&iov->ve[i].kobj, &vf_attr.attr);
+ if (rc) {
+ kobject_put(&iov->ve[i].kobj);
+ goto failed3;
+ }
+ }
+
+ return;
+
+failed3:
+ for (j = 0; j < i; j++) {
+ sysfs_remove_file(&iov->ve[j].kobj, &vf_attr.attr);
+ kobject_put(&iov->ve[j].kobj);
+ }
+failed2:
+ for (j = 0; j < i; j++)
+ sysfs_remove_file(&dev->iov->kobj, &iov_attr[j].attr);
+ kobject_put(&iov->kobj);
+failed1:
+ kfree(iov->ve);
+ iov->ve = NULL;
+
+ dev_err(&dev->dev, "can't create sysfs for SR-IOV.\n");
+}
+
+/**
+ * pci_iov_remove_sysfs - remove sysfs of SR-IOV capability
+ * @dev: the PCI device
+ */
+void pci_iov_remove_sysfs(struct pci_dev *dev)
+{
+ int i;
+ struct pci_iov *iov = dev->iov;
+
+ if (!iov || !iov->ve)
+ return;
+
+ for (i = 0; i < iov->totalvfs; i++) {
+ sysfs_remove_file(&iov->ve[i].kobj, &vf_attr.attr);
+ kobject_put(&iov->ve[i].kobj);
+ }
+
+ for (i = 0; i < ARRAY_SIZE(iov_attr); i++)
+ sysfs_remove_file(&dev->iov->kobj, &iov_attr[i].attr);
+
+ kobject_put(&iov->kobj);
+ kfree(iov->ve);
+}
+
+int pci_iov_resource_align(struct pci_dev *dev, int resno)
+{
+ if (resno < PCI_IOV_RESOURCES || resno > PCI_IOV_RESOURCES_END)
+ return 0;
+
+ BUG_ON(!dev->iov);
+
+ return dev->iov->align;
+}
+
+int pci_iov_resource_bar(struct pci_dev *dev, int resno,
+ enum pci_bar_type *type)
+{
+ if (resno < PCI_IOV_RESOURCES || resno > PCI_IOV_RESOURCES_END)
+ return 0;
+
+ BUG_ON(!dev->iov);
+
+ *type = pci_bar_unknown;
+ return dev->iov->cap + PCI_IOV_BAR_0 +
+ 4 * (resno - PCI_IOV_RESOURCES);
+}
+
+/**
+ * pci_iov_register - register SR-IOV service
+ * @dev: the PCI device
+ * @notify: callback function for SR-IOV events
+ * @entries: sysfs entries used by Physical Function driver
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int pci_iov_register(struct pci_dev *dev, int (*notify)(struct pci_dev *, u32),
+ char **entries)
+{
+ int rc;
+ int n, i, j, k;
+ u8 busnr, devfn;
+ struct iov_attr *attr;
+ struct pci_iov *iov = dev->iov;
+
+ if (!iov || !iov->ve)
+ return -ENODEV;
+
+ if (!notify)
+ return -EINVAL;
+
+ vf_rid(dev, iov->totalvfs - 1, &busnr, &devfn);
+ if (busnr > dev->bus->subordinate)
+ return -EIO;
+
+ iov->notify = notify;
+ rc = iov_alloc_bus(dev->bus, busnr);
+ if (rc)
+ return rc;
+
+ for (n = 0; entries && entries[n] && *entries[n]; n++)
+ ;
+ if (!n)
+ return 0;
+
+ for (i = 0; i < iov->totalvfs; i++) {
+ rc = -ENOMEM;
+ iov->ve[i].param = kzalloc(PCI_IOV_PARAM_LEN * n, GFP_KERNEL);
+ if (!iov->ve[i].param)
+ goto failed;
+ attr = kzalloc(sizeof(*attr) * n, GFP_KERNEL);
+ if (!attr) {
+ kfree(iov->ve[i].param);
+ goto failed;
+ }
+ iov->ve[i].attr = attr;
+ for (j = 0; j < n; j++) {
+ attr[j].attr.name = entries[j];
+ attr[j].attr.mode = S_IWUSR | S_IRUGO;
+ attr[j].show = vf_show;
+ attr[j].store = vf_store;
+ rc = sysfs_create_file(&iov->ve[i].kobj, &attr[j].attr);
+ if (rc) {
+ while (j--)
+ sysfs_remove_file(&iov->ve[i].kobj,
+ &attr[j].attr);
+ kfree(iov->ve[i].attr);
+ kfree(iov->ve[i].param);
+ goto failed;
+ }
+ }
+ }
+
+ iov->nentries = n;
+ return 0;
+
+failed:
+ for (k = 0; k < i; k++) {
+ for (j = 0; j < n; j++)
+ sysfs_remove_file(&iov->ve[k].kobj,
+ &iov->ve[k].attr[j].attr);
+ kfree(iov->ve[k].attr);
+ kfree(iov->ve[k].param);
+ }
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(pci_iov_register);
+
+/**
+ * pci_iov_unregister - unregister SR-IOV service
+ * @dev: the PCI device
+ */
+void pci_iov_unregister(struct pci_dev *dev)
+{
+ int i, j;
+ struct pci_iov *iov = dev->iov;
+
+ BUG_ON(!iov || !iov->notify);
+
+ if (!iov->nentries)
+ return;
+
+ for (i = 0; i < iov->totalvfs; i++) {
+ for (j = 0; j < iov->nentries; j++)
+ sysfs_remove_file(&iov->ve[i].kobj,
+ &iov->ve[i].attr[j].attr);
+ kfree(iov->ve[i].attr);
+ kfree(iov->ve[i].param);
+ }
+ iov->notify = NULL;
+ iov_release_bus(dev->bus);
+}
+EXPORT_SYMBOL_GPL(pci_iov_unregister);
+
+/**
+ * pci_iov_enable - enable SR-IOV capability
+ * @dev: the PCI device
+ * @numvfs: number of VFs to be available
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int pci_iov_enable(struct pci_dev *dev, int numvfs)
+{
+ int rc;
+ struct pci_iov *iov = dev->iov;
+
+ if (!iov)
+ return -ENODEV;
+
+ if (!iov->notify)
+ return -EINVAL;
+
+ mutex_lock(&iov->mutex);
+ rc = iov_set_numvfs(iov, numvfs);
+ if (rc)
+ goto done;
+ rc = iov_enable(iov);
+done:
+ mutex_unlock(&iov->mutex);
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(pci_iov_enable);
+
+/**
+ * pci_iov_disable - disable SR-IOV capability
+ * @dev: the PCI device
+ *
+ * Should be called upon Physical Function driver removal, and power
+ * state change. All previous allocated Virtual Functions are reclaimed.
+ */
+void pci_iov_disable(struct pci_dev *dev)
+{
+ struct pci_iov *iov = dev->iov;
+
+ BUG_ON(!iov || !iov->notify);
+ mutex_lock(&iov->mutex);
+ iov_disable(iov);
+ mutex_unlock(&iov->mutex);
+}
+EXPORT_SYMBOL_GPL(pci_iov_disable);
+
+/**
+ * pci_iov_read_config - read SR-IOV configurations
+ * @dev: the PCI device
+ * @vfn: Virtual Function Number
+ * @entry: the entry to be read
+ * @buf: the buffer to be filled
+ * @size: size of the buffer
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int pci_iov_read_config(struct pci_dev *dev, int vfn,
+ char *entry, char *buf, int size)
+{
+ int i;
+ struct pci_iov *iov = dev->iov;
+
+ if (!iov)
+ return -ENODEV;
+
+ if (!iov->notify || !iov->ve || !iov->nentries)
+ return -EINVAL;
+
+ if (vfn < 0 || vfn >= iov->totalvfs)
+ return -EINVAL;
+
+ for (i = 0; i < iov->nentries; i++)
+ if (!strcmp(iov->ve[vfn].attr[i].attr.name, entry)) {
+ strncpy(buf, iov->ve[vfn].param[i], size);
+ buf[size - 1] = '\0';
+ return 0;
+ }
+
+ return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(pci_iov_read_config);
+
+/**
+ * pci_iov_write_config - write SR-IOV configurations
+ * @dev: the PCI device
+ * @vfn: Virtual Function Number
+ * @entry: the entry to be written
+ * @buf: the buffer contains configurations
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int pci_iov_write_config(struct pci_dev *dev, int vfn,
+ char *entry, char *buf)
+{
+ int i;
+ struct pci_iov *iov = dev->iov;
+
+ if (!iov)
+ return -ENODEV;
+
+ if (!iov->notify || !iov->ve || !iov->nentries)
+ return -EINVAL;
+
+ if (vfn < 0 || vfn >= iov->totalvfs)
+ return -EINVAL;
+
+ for (i = 0; i < iov->nentries; i++)
+ if (!strcmp(iov->ve[vfn].attr[i].attr.name, entry)) {
+ strncpy(iov->ve[vfn].param[i], buf, PCI_IOV_PARAM_LEN);
+ iov->ve[vfn].param[i][PCI_IOV_PARAM_LEN - 1] = '\0';
+ return 0;
+ }
+
+ return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(pci_iov_write_config);
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index c41b783..9494659 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -764,6 +764,9 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
/* Active State Power Management */
pcie_aspm_create_sysfs_dev_files(dev);
+ /* Single Root I/O Virtualization */
+ pci_iov_create_sysfs(dev);
+
return 0;
}
@@ -849,6 +852,7 @@ static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
}
pcie_aspm_remove_sysfs_dev_files(dev);
+ pci_iov_remove_sysfs(dev);
}
/**
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 3575124..4cfdbdb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1902,7 +1902,12 @@ int pci_resource_alignment(struct pci_dev *dev, int resno)
if (resno <= PCI_ROM_RESOURCE)
return resource_size(res);
- else if (resno <= PCI_BRIDGE_RES_END)
+ else if (resno < PCI_BRIDGE_RESOURCES) {
+ /* may be device specific resource */
+ align = pci_iov_resource_align(dev, resno);
+ if (align)
+ return align;
+ } else if (resno <= PCI_BRIDGE_RES_END)
return res->start;
dev_err(&dev->dev, "alignment: invalid resource #%d\n", resno);
@@ -1919,12 +1924,19 @@ int pci_resource_alignment(struct pci_dev *dev, int resno)
*/
int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type)
{
+ int reg;
+
if (resno < PCI_ROM_RESOURCE) {
*type = pci_bar_unknown;
return PCI_BASE_ADDRESS_0 + 4 * resno;
} else if (resno == PCI_ROM_RESOURCE) {
*type = pci_bar_mem32;
return dev->rom_base_reg;
+ } else if (resno < PCI_BRIDGE_RESOURCES) {
+ /* may be device specific resource */
+ reg = pci_iov_resource_bar(dev, resno, type);
+ if (reg)
+ return reg;
}
dev_err(&dev->dev, "BAR: invalid resource #%d\n", resno);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e2237ad..c66a4bd 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -176,4 +176,59 @@ static inline int pci_ari_enabled(struct pci_dev *dev)
return dev->ari_enabled;
}
+/* Single Root I/O Virtualization */
+#define PCI_IOV_PARAM_LEN 64
+
+struct vf_entry;
+
+struct pci_iov {
+ int cap; /* capability position */
+ int align; /* page size used to map memory space */
+ int is_enabled; /* status of SR-IOV */
+ int nentries; /* number of sysfs entries used by PF driver */
+ u16 totalvfs; /* total VFs associated with the PF */
+ u16 initialvfs; /* initial VFs associated with the PF */
+ u16 numvfs; /* number of VFs available */
+ u16 offset; /* first VF Routing ID offset */
+ u16 stride; /* following VF stride */
+ struct mutex mutex; /* lock for SR-IOV */
+ struct kobject kobj; /* koject for IOV */
+ struct pci_dev *dev; /* Physical Function */
+ struct vf_entry *ve; /* Virtual Function related */
+ int (*notify)(struct pci_dev *, u32); /* event callback function */
+};
+
+#ifdef CONFIG_PCI_IOV
+extern int pci_iov_init(struct pci_dev *dev);
+extern void pci_iov_release(struct pci_dev *dev);
+void pci_iov_create_sysfs(struct pci_dev *dev);
+void pci_iov_remove_sysfs(struct pci_dev *dev);
+extern int pci_iov_resource_align(struct pci_dev *dev, int resno);
+extern int pci_iov_resource_bar(struct pci_dev *dev, int resno,
+ enum pci_bar_type *type);
+#else
+static inline int pci_iov_init(struct pci_dev *dev)
+{
+ return -EIO;
+}
+static inline void pci_iov_release(struct pci_dev *dev)
+{
+}
+static inline void pci_iov_create_sysfs(struct pci_dev *dev)
+{
+}
+static inline void pci_iov_remove_sysfs(struct pci_dev *dev)
+{
+}
+static inline int pci_iov_resource_align(struct pci_dev *dev, int resno)
+{
+ return 0;
+}
+static inline int pci_iov_resource_bar(struct pci_dev *dev, int resno,
+ enum pci_bar_type *type)
+{
+ return 0;
+}
+#endif /* CONFIG_PCI_IOV */
+
#endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 9c680b8..831d8d0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -845,6 +845,7 @@ static int pci_setup_device(struct pci_dev * dev)
static void pci_release_capabilities(struct pci_dev *dev)
{
pci_vpd_release(dev);
+ pci_iov_release(dev);
}
/**
@@ -1023,6 +1024,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
/* Alternative Routing-ID Forwarding */
pci_enable_ari(dev);
+
+ /* Single Root I/O Virtualization */
+ pci_iov_init(dev);
}
void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 497d639..a7d2fd4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -87,6 +87,12 @@ enum {
/* #6: expansion ROM */
PCI_ROM_RESOURCE,
+ /* device specific resources */
+#ifdef CONFIG_PCI_IOV
+ PCI_IOV_RESOURCES,
+ PCI_IOV_RESOURCES_END = PCI_IOV_RESOURCES + PCI_IOV_NUM_BAR - 1,
+#endif
+
/* address space assigned to buses behind the bridge */
#ifndef PCI_BRIDGE_RES_NUM
#define PCI_BRIDGE_RES_NUM 4
@@ -165,6 +171,7 @@ struct pci_cap_saved_state {
struct pcie_link_state;
struct pci_vpd;
+struct pci_iov;
/*
* The pci_dev structure is used to describe PCI devices.
@@ -253,6 +260,7 @@ struct pci_dev {
struct list_head msi_list;
#endif
struct pci_vpd *vpd;
+ struct pci_iov *iov;
};
extern struct pci_dev *alloc_pci_dev(void);
@@ -1128,5 +1136,54 @@ static inline void pci_mmcfg_early_init(void) { }
static inline void pci_mmcfg_late_init(void) { }
#endif
+/* SR-IOV events masks */
+#define PCI_IOV_VIRTFN_ID 0x0000FFFFU /* Virtual Function Number */
+#define PCI_IOV_NUM_VIRTFN 0x0000FFFFU /* num of Virtual Functions */
+#define PCI_IOV_EVENT_TYPE 0x80000000U /* event type (pre/post) */
+/* SR-IOV events values */
+#define PCI_IOV_ENABLE 0x00010000U /* SR-IOV enable request */
+#define PCI_IOV_DISABLE 0x00020000U /* SR-IOV disable request */
+#define PCI_IOV_RD_CONF 0x00040000U /* read configuration */
+#define PCI_IOV_WR_CONF 0x00080000U /* write configuration */
+#define PCI_IOV_POST_EVENT 0x80000000U /* post event */
+
+#ifdef CONFIG_PCI_IOV
+extern int pci_iov_enable(struct pci_dev *dev, int numvfs);
+extern void pci_iov_disable(struct pci_dev *dev);
+extern int pci_iov_register(struct pci_dev *dev,
+ int (*notify)(struct pci_dev *dev, u32 event), char **entries);
+extern void pci_iov_unregister(struct pci_dev *dev);
+extern int pci_iov_read_config(struct pci_dev *dev, int id,
+ char *entry, char *buf, int size);
+extern int pci_iov_write_config(struct pci_dev *dev, int id,
+ char *entry, char *buf);
+#else
+static inline int pci_iov_enable(struct pci_dev *dev, int numvfs)
+{
+ return -EIO;
+}
+static inline void pci_iov_disable(struct pci_dev *dev)
+{
+}
+static inline int pci_iov_register(struct pci_dev *dev,
+ int (*notify)(struct pci_dev *dev, u32 event), char **entries)
+{
+ return -EIO;
+}
+static inline void pci_iov_unregister(struct pci_dev *dev)
+{
+}
+static inline int pci_iov_read_config(struct pci_dev *dev, int id,
+ char *entry, char *buf, int size)
+{
+ return -EIO;
+}
+static inline int pci_iov_write_config(struct pci_dev *dev, int id,
+ char *entry, char *buf)
+{
+ return -EIO;
+}
+#endif /* CONFIG_PCI_IOV */
+
#endif /* __KERNEL__ */
#endif /* LINUX_PCI_H */
diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index eb6686b..1b28b3f 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -363,6 +363,7 @@
#define PCI_EXP_TYPE_UPSTREAM 0x5 /* Upstream Port */
#define PCI_EXP_TYPE_DOWNSTREAM 0x6 /* Downstream Port */
#define PCI_EXP_TYPE_PCI_BRIDGE 0x7 /* PCI/PCI-X Bridge */
+#define PCI_EXP_TYPE_RC_END 0x9 /* Root Complex Integrated Endpoint */
#define PCI_EXP_FLAGS_SLOT 0x0100 /* Slot implemented */
#define PCI_EXP_FLAGS_IRQ 0x3e00 /* Interrupt message number */
#define PCI_EXP_DEVCAP 4 /* Device capabilities */
@@ -434,6 +435,7 @@
#define PCI_EXT_CAP_ID_DSN 3
#define PCI_EXT_CAP_ID_PWR 4
#define PCI_EXT_CAP_ID_ARI 14
+#define PCI_EXT_CAP_ID_IOV 16
/* Advanced Error Reporting */
#define PCI_ERR_UNCOR_STATUS 4 /* Uncorrectable Error Status */
@@ -551,4 +553,23 @@
#define PCI_ARI_CTRL_ACS 0x0002 /* ACS Function Groups Enable */
#define PCI_ARI_CTRL_FG(x) (((x) >> 4) & 7) /* Function Group */
+/* Single Root I/O Virtualization */
+#define PCI_IOV_CAP 0x04 /* SR-IOV Capabilities */
+#define PCI_IOV_CTRL 0x08 /* SR-IOV Control */
+#define PCI_IOV_CTRL_VFE 0x01 /* VF Enable */
+#define PCI_IOV_CTRL_MSE 0x08 /* VF Memory Space Enable */
+#define PCI_IOV_CTRL_ARI 0x10 /* ARI Capable Hierarchy */
+#define PCI_IOV_STATUS 0x0a /* SR-IOV Status */
+#define PCI_IOV_INITIAL_VF 0x0c /* Initial VFs */
+#define PCI_IOV_TOTAL_VF 0x0e /* Total VFs */
+#define PCI_IOV_NUM_VF 0x10 /* Number of VFs */
+#define PCI_IOV_FUNC_LINK 0x12 /* Function Dependency Link */
+#define PCI_IOV_VF_OFFSET 0x14 /* First VF Offset */
+#define PCI_IOV_VF_STRIDE 0x16 /* Following VF Stride */
+#define PCI_IOV_VF_DID 0x1a /* VF Device ID */
+#define PCI_IOV_SUP_PGSIZE 0x1c /* Supported Page Sizes */
+#define PCI_IOV_SYS_PGSIZE 0x20 /* System Page Size */
+#define PCI_IOV_BAR_0 0x24 /* VF BAR0 */
+#define PCI_IOV_NUM_BAR 6 /* Number of VF BARs */
+
#endif /* LINUX_PCI_REGS_H */
--
1.5.6.4
Create how-to for SR-IOV user and device driver developer.
Signed-off-by: Yu Zhao <[email protected]>
---
Documentation/DocBook/kernel-api.tmpl | 1 +
Documentation/PCI/pci-iov-howto.txt | 222 +++++++++++++++++++++++++++++++++
2 files changed, 223 insertions(+), 0 deletions(-)
create mode 100644 Documentation/PCI/pci-iov-howto.txt
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index b7b1482..5cb6491 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -251,6 +251,7 @@ X!Edrivers/pci/hotplug.c
-->
!Edrivers/pci/probe.c
!Edrivers/pci/rom.c
+!Edrivers/pci/iov.c
</sect1>
<sect1><title>PCI Hotplug Support Library</title>
!Edrivers/pci/hotplug/pci_hotplug_core.c
diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt
new file mode 100644
index 0000000..15d846d
--- /dev/null
+++ b/Documentation/PCI/pci-iov-howto.txt
@@ -0,0 +1,222 @@
+ PCI Express Single Root I/O Virtualization HOWTO
+ Copyright (C) 2008 Intel Corporation
+
+
+1. Overview
+
+1.1 What is SR-IOV
+
+Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
+capability which makes one physical device appear as multiple virtual
+devices. The physical device is referred to as Physical Function while
+the virtual devices are referred to as Virtual Functions. Allocation
+of Virtual Functions can be dynamically controlled by Physical Function
+via registers encapsulated in the capability. By default, this feature
+is not enabled and the Physical Function behaves as traditional PCIe
+device. Once it's turned on, each Virtual Function's PCI configuration
+space can be accessed by its own Bus, Device and Function Number (Routing
+ID). And each Virtual Function also has PCI Memory Space, which is used
+to map its register set. Virtual Function device driver operates on the
+register set so it can be functional and appear as a real existing PCI
+device.
+
+2. User Guide
+
+2.1 How can I manage SR-IOV
+
+If a device supports SR-IOV, then there should be some entries under
+Physical Function's PCI device directory. These entries are in directory:
+ - /sys/bus/pci/devices/XXXX:BB:DD.F/iov/
+ (XXXX:BB:DD:F is domain:bus:dev:fun)
+and
+ - /sys/bus/pci/devices/XXXX:BB:DD.F/iov/N
+ (N is VF number from 0 to initialvfs-1)
+
+To enable or disable SR-IOV:
+ - /sys/bus/pci/devices/XXXX:BB:DD.F/iov/enable
+ (writing 1/0 means enable/disable VFs, state change will
+ notify PF driver)
+
+To change number of Virtual Functions:
+ - /sys/bus/pci/devices/XXXX:BB:DD.F/iov/numvfs
+ (writing positive integer to this file will change NumVFs)
+
+The total and initial number of VFs can get from:
+ - /sys/bus/pci/devices/XXXX:BB:DD.F/iov/totalvfs
+ - /sys/bus/pci/devices/XXXX:BB:DD.F/iov/initialvfs
+
+The identifier of a VF that belongs to this PF can get from:
+ - /sys/bus/pci/devices/XXXX:BB:DD.F/iov/N/rid
+
+2.2 How can I use Virtual Functions
+
+Virtual Functions are treated as hot-plugged PCI devices in the kernel,
+so they should be able to work in the same way as real PCI devices.
+NOTE: Virtual Function device driver must be loaded to make it work.
+
+
+3. Developer Guide
+
+3.1 SR-IOV APIs
+
+To register SR-IOV service, Physical Function device driver needs to call:
+ int pci_iov_register(struct pci_dev *dev,
+ int (*notify)(struct pci_dev *, u32), char **entries)
+ The 'notify' is a callback function that the SR-IOV code will invoke
+ it when events related to VFs happen (e.g. user read/write the sysfs
+ entries). The first argument is PF itself, the second argument is
+ event type and value. For now, following events type are supported:
+ - PCI_IOV_ENABLE: SR-IOV enable request
+ - PCI_IOV_DISABLE: SR-IOV disable request
+ - PCI_IOV_RD_CONF: read configuration
+ - PCI_IOV_WR_CONF: write configuration
+ - PCI_IOV_POST_EVENT: post event
+ And event values can be extract using following masks:
+ - PCI_IOV_VIRTFN_ID: Virtual Function Number
+ - PCI_IOV_NUM_VIRTFN: num of Virtual Functions
+ - PCI_IOV_EVENT_TYPE: event type (pre/post)
+ The 'entries' is is a list of sysfs entry names that will be to
+ created by the SR-IOV code.
+
+Note: entries could be NULL if PF driver doesn't want to create new entries
+under /sys/bus/pci/devices/XXXX:BB:DD.F/iov/N/.
+
+To unregister SR-IOV service, Physical Function device driver needs to call:
+ void pci_iov_unregister(struct pci_dev *dev)
+
+To enable SR-IOV, Physical Function device driver needs to call:
+ int pci_iov_enable(struct pci_dev *dev, int numvfs)
+ 'numvfs' is the number of VFs that PF wants to enable.
+
+To disable SR-IOV, Physical Function device driver needs to call:
+ void pci_iov_disable(struct pci_dev *dev)
+
+Note: above two functions sleeps 1 second waiting on hardware transaction
+completion according to SR-IOV specification.
+
+To read or write VFs configuration:
+ - int pci_iov_read_config(struct pci_dev *dev, int vfn,
+ char *entry, char *buf, int size);
+ - int pci_iov_write_config(struct pci_dev *dev, int vfn,
+ char *entry, char *buf);
+3.2 Usage example
+
+Following piece of code illustrates the usage of APIs above.
+
+static char *entries[] = { "foo", "bar", NULL };
+
+static int callback(struct pci_dev *dev, u32 event)
+{
+ int err;
+ int vfn;
+ int numvfs;
+
+ if (event & PCI_IOV_ENABLE) {
+ /*
+ * request to enable SR-IOV, NumVFs is available.
+ * Note: if the PF want to support PM, it has to
+ * check the device power state here to see if
+ * the request is allowed or not.
+ */
+
+ numvfs = event & PCI_IOV_NUM_VIRTFN;
+
+ } else if (event & PCI_IOV_DISABLE) {
+ /*
+ * request to disable SR-IOV.
+ */
+ ...
+
+ } else if (event & PCI_IOV_RD_CONF) {
+ /*
+ * request to read VF configuration, Virtual
+ * Function Number is available.
+ */
+
+ vfn = event & PCI_IOV_VIRTFN_ID;
+
+ /* pass the config to SR-IOV code so user can read it */
+ err = pci_iov_write_config(dev, vfn, entry, buf);
+
+ } else if (event & PCI_IOV_WR_CONF) {
+ /*
+ * request to write VF configuration, Virtual
+ * Function Number is available.
+ */
+
+ vfn = event & PCI_IOV_VIRTFN_ID;
+
+ /* read the config that has been written by user */
+ err = pci_iov_read_config(dev, vfn, entry, buf, size);
+
+ } else
+ return -EINVAL;
+
+ return err;
+}
+
+static int __devinit dev_probe(struct pci_dev *dev,
+ const struct pci_device_id *id)
+{
+ int err;
+
+ err = pci_iov_register(dev, callback, entries);
+ ...
+
+ err = pci_iov_enable(dev, nr_virtfn, callback);
+
+ ...
+
+ return err;
+}
+
+static void __devexit dev_remove(struct pci_dev *dev)
+{
+ ...
+
+ pci_iov_disable(dev);
+
+ ...
+
+ pci_iov_unregister(dev);
+
+ ...
+}
+
+#ifdef CONFIG_PM
+/*
+ * If Physical Function supports the power management, then the
+ * SR-IOV needs to be disabled before the adapter goes to sleep,
+ * because Virtual Functions will not work when the adapter is in
+ * the power-saving mode.
+ * The SR-IOV can be enabled again after the adapter wakes up.
+ */
+static int dev_suspend(struct pci_dev *dev, pm_message_t state)
+{
+ ...
+
+ pci_iov_disable(dev);
+
+ ...
+}
+
+static int dev_resume(struct pci_dev *dev)
+{
+ ...
+
+ pci_iov_enable(dev, numvfs);
+
+ ...
+}
+#endif
+
+static struct pci_driver dev_driver = {
+ .name = "SR-IOV Physical Function driver",
+ .id_table = dev_id_table,
+ .probe = dev_probe,
+ .remove = __devexit_p(dev_remove),
+#ifdef CONFIG_PM
+ .suspend = dev_suspend,
+ .resume = dev_resume,
+#endif
+};
--
1.5.6.4
Reserve bus range for SR-IOV at device scanning stage.
Signed-off-by: Yu Zhao <[email protected]>
---
drivers/pci/iov.c | 24 ++++++++++++++++++++++++
drivers/pci/pci.h | 5 +++++
drivers/pci/probe.c | 3 +++
3 files changed, 32 insertions(+), 0 deletions(-)
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 3cf9709..7685c6b 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -603,6 +603,30 @@ void pci_iov_remove_sysfs(struct pci_dev *dev)
kfree(iov->ve);
}
+/**
+ * pci_iov_bus_range - find bus range used by SR-IOV capability
+ * @bus: the PCI bus
+ *
+ * Returns max number of buses (exclude current one) used by Virtual
+ * Functions.
+ */
+int pci_iov_bus_range(struct pci_bus *bus)
+{
+ int max = 0;
+ u8 busnr, devfn;
+ struct pci_dev *dev;
+
+ list_for_each_entry(dev, &bus->devices, bus_list) {
+ if (!dev->iov)
+ continue;
+ vf_rid(dev, dev->iov->totalvfs - 1, &busnr, &devfn);
+ if (busnr > max)
+ max = busnr;
+ }
+
+ return max ? max - bus->number : 0;
+}
+
int pci_iov_resource_align(struct pci_dev *dev, int resno)
{
if (resno < PCI_IOV_RESOURCES || resno > PCI_IOV_RESOURCES_END)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c66a4bd..71149b5 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -206,6 +206,7 @@ void pci_iov_remove_sysfs(struct pci_dev *dev);
extern int pci_iov_resource_align(struct pci_dev *dev, int resno);
extern int pci_iov_resource_bar(struct pci_dev *dev, int resno,
enum pci_bar_type *type);
+extern int pci_iov_bus_range(struct pci_bus *bus);
#else
static inline int pci_iov_init(struct pci_dev *dev)
{
@@ -229,6 +230,10 @@ static inline int pci_iov_resource_bar(struct pci_dev *dev, int resno,
{
return 0;
}
+extern inline int pci_iov_bus_range(struct pci_bus *bus)
+{
+ return 0;
+}
#endif /* CONFIG_PCI_IOV */
#endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 831d8d0..b11f4b8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1129,6 +1129,9 @@ unsigned int __devinit pci_scan_child_bus(struct pci_bus *bus)
for (devfn = 0; devfn < 0x100; devfn += 8)
pci_scan_slot(bus, devfn);
+ /* Reserve buses for SR-IOV capability. */
+ max += pci_iov_bus_range(bus);
+
/*
* After performing arch-dependent fixup of the bus, look behind
* all PCI-to-PCI bridges on this bus.
--
1.5.6.4
On Tue, Oct 14, 2008 at 06:59:28PM +0800, Yu Zhao wrote:
> +++ b/drivers/pci/pci.h
> @@ -176,4 +176,59 @@ static inline int pci_ari_enabled(struct pci_dev *dev)
> +struct pci_iov {
> + int cap; /* capability position */
> + int align; /* page size used to map memory space */
> + int is_enabled; /* status of SR-IOV */
> + int nentries; /* number of sysfs entries used by PF driver */
> + u16 totalvfs; /* total VFs associated with the PF */
> + u16 initialvfs; /* initial VFs associated with the PF */
> + u16 numvfs; /* number of VFs available */
> + u16 offset; /* first VF Routing ID offset */
> + u16 stride; /* following VF stride */
> + struct mutex mutex; /* lock for SR-IOV */
> + struct kobject kobj; /* koject for IOV */
> + struct pci_dev *dev; /* Physical Function */
> + struct vf_entry *ve; /* Virtual Function related */
> + int (*notify)(struct pci_dev *, u32); /* event callback function */
> +};
> +++ b/include/linux/pci.h
> @@ -87,6 +87,12 @@ enum {
> /* #6: expansion ROM */
> PCI_ROM_RESOURCE,
>
> + /* device specific resources */
> +#ifdef CONFIG_PCI_IOV
> + PCI_IOV_RESOURCES,
> + PCI_IOV_RESOURCES_END = PCI_IOV_RESOURCES + PCI_IOV_NUM_BAR - 1,
> +#endif
> +
> /* address space assigned to buses behind the bridge */
> #ifndef PCI_BRIDGE_RES_NUM
> #define PCI_BRIDGE_RES_NUM 4
Why expand the number of resources in struct pci_dev instead of putting
the new resources in struct pci_iov?
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Tue, Oct 14, 2008 at 06:59:28PM +0800, Yu Zhao wrote:
> +struct pci_iov {
> + int cap; /* capability position */
> + int align; /* page size used to map memory space */
> + int is_enabled; /* status of SR-IOV */
> + int nentries; /* number of sysfs entries used by PF driver */
> + u16 totalvfs; /* total VFs associated with the PF */
> + u16 initialvfs; /* initial VFs associated with the PF */
> + u16 numvfs; /* number of VFs available */
> + u16 offset; /* first VF Routing ID offset */
> + u16 stride; /* following VF stride */
> + struct mutex mutex; /* lock for SR-IOV */
> + struct kobject kobj; /* koject for IOV */
Why isn't this a real struct device?
That way you get all of the proper userspace notification and the like,
with kobjects, you do not.
thanks,
greg k-h
Matthew Wilcox wrote:
> On Tue, Oct 14, 2008 at 06:59:28PM +0800, Yu Zhao wrote:
>> +++ b/include/linux/pci.h
>> @@ -87,6 +87,12 @@ enum {
>> /* #6: expansion ROM */
>> PCI_ROM_RESOURCE,
>>
>> + /* device specific resources */
>> +#ifdef CONFIG_PCI_IOV
>> + PCI_IOV_RESOURCES,
>> + PCI_IOV_RESOURCES_END = PCI_IOV_RESOURCES + PCI_IOV_NUM_BAR - 1,
>> +#endif
>> +
>> /* address space assigned to buses behind the bridge */
>> #ifndef PCI_BRIDGE_RES_NUM
>> #define PCI_BRIDGE_RES_NUM 4
>
> Why expand the number of resources in struct pci_dev instead of putting
> the new resources in struct pci_iov?
Yes, it's supposed to be in the 'struct pci_iov', and the resources used
to be there in early version. But later I found all resource related
functions such as pci_assign_resource, pdev_sort_resources,
pbus_size_mem, etc. assume the resources are bundled with 'struct
pci_dev' and address them using their indexes. Encapsulating resources
into 'pci_iov' will impact all these functions. And I think we can
postpone the change of these functions until the PCIM comes out, if the
IOV is the only one who uses non-standard resources.
>
> --
> Matthew Wilcox Intel Open Source Technology Centre
> "Bill, look, we understand that you're interested in selling us this
> operating system, but compare it to ours. We can't possibly take such
> a retrograde step."
Hi!
> Create how-to for SR-IOV user and device driver developer.
>
> Signed-off-by: Yu Zhao <[email protected]>
> +1.1 What is SR-IOV
> +
> +Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
> +capability which makes one physical device appear as multiple virtual
> +devices. The physical device is referred to as Physical Function while
> +the virtual devices are referred to as Virtual Functions. Allocation
> +of Virtual Functions can be dynamically controlled by Physical Function
> +via registers encapsulated in the capability. By default, this feature
> +is not enabled and the Physical Function behaves as traditional PCIe
> +device. Once it's turned on, each Virtual Function's PCI configuration
> +space can be accessed by its own Bus, Device and Function Number (Routing
> +ID). And each Virtual Function also has PCI Memory Space, which is
> used
Ok, why is this optional? If intel cares about virtualization, it
should enable this by default. I dont see why this should be
configurable.
> +#ifdef CONFIG_PM
> +/*
> + * If Physical Function supports the power management, then the
> + * SR-IOV needs to be disabled before the adapter goes to sleep,
> + * because Virtual Functions will not work when the adapter is in
> + * the power-saving mode.
> + * The SR-IOV can be enabled again after the adapter wakes up.
> + */
How beatiful :-(.
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html