2016-11-09 01:15:48

by Christoph Hellwig

[permalink] [raw]
Subject: support for partial irq affinity assignment V3

This series adds support for automatic interrupt assignment to devices
that have a few vectors that are set aside for admin or config purposes
and thus should not fall into the general per-cpu assginment pool.

The first patch adds that support to the core IRQ and PCI/msi code,
and the second is a small tweak to a block layer helper to make use
of it. I'd love to have both go into the same tree so that consumers
of this (e.g. the virtio, scsi and rdma trees) only need to pull in
one of these trees as dependency.

Changes since V3:
- lots of typo fixes
- fixed an argument name

Changes since V1:
- split up into more patches (thanks to Thomas for the initial work)
- simplify internal parameter passing a bit (also thanks to Thomas)


2016-11-09 01:15:58

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 6/7] PCI: Remove the irq_affinity mask from struct pci_dev

This has never been used, and now is totally unreferenced. Nuke it.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
---
include/linux/pci.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7090f5f..f2ba6ac 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -333,7 +333,6 @@ struct pci_dev {
* directly, use the values stored here. They might be different!
*/
unsigned int irq;
- struct cpumask *irq_affinity;
struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */

bool match_driver; /* Skip attaching driver */
--
2.1.4

2016-11-09 01:16:02

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 7/7] blk-mq: add a first_vec argument to blk_mq_pci_map_queues

This allows skipping the first N IRQ vectors in case they are used for
control or admin interrupts.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
---
block/blk-mq-pci.c | 6 ++++--
drivers/nvme/host/pci.c | 2 +-
include/linux/blk-mq-pci.h | 3 ++-
3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
index 966c216..03ff7c4 100644
--- a/block/blk-mq-pci.c
+++ b/block/blk-mq-pci.c
@@ -21,6 +21,7 @@
* blk_mq_pci_map_queues - provide a default queue mapping for PCI device
* @set: tagset to provide the mapping for
* @pdev: PCI device associated with @set.
+ * @first_vec: first interrupt vectors to use for queues (usually 0)
*
* This function assumes the PCI device @pdev has at least as many available
* interrupt vetors as @set has queues. It will then queuery the vector
@@ -28,12 +29,13 @@
* that maps a queue to the CPUs that have irq affinity for the corresponding
* vector.
*/
-int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
+int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
+ int first_vec)
{
const struct cpumask *mask;
unsigned int queue, cpu;

- for (queue = 0; queue < set->nr_hw_queues; queue++) {
+ for (queue = first_vec; queue < set->nr_hw_queues; queue++) {
mask = pci_irq_get_affinity(pdev, queue);
if (!mask)
return -EINVAL;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 0248d0e..6e6b917 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -273,7 +273,7 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
{
struct nvme_dev *dev = set->driver_data;

- return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev));
+ return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
}

/**
diff --git a/include/linux/blk-mq-pci.h b/include/linux/blk-mq-pci.h
index 6ab5952..fde26d2 100644
--- a/include/linux/blk-mq-pci.h
+++ b/include/linux/blk-mq-pci.h
@@ -4,6 +4,7 @@
struct blk_mq_tag_set;
struct pci_dev;

-int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev);
+int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
+ int first_vec);

#endif /* _LINUX_BLK_MQ_PCI_H */
--
2.1.4

2016-11-09 01:15:56

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 2/7] genirq/affinity: Handle pre/post vectors in irq_calc_affinity_vectors()

Only calculate the affinity for the main I/O vectors, and skip the
pre or post vectors specified by struct irq_affinity.

Also remove the irq_affinity cpumask argument that has never been used.
If we ever need it in the future we can pass it through struct
irq_affinity.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/pci/msi.c | 8 ++++----
include/linux/interrupt.h | 4 ++--
kernel/irq/affinity.c | 24 ++++++++++--------------
3 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index ad70507..dad2da7 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1061,6 +1061,7 @@ EXPORT_SYMBOL(pci_msi_enabled);
static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
unsigned int flags)
{
+ static const struct irq_affinity default_affd;
bool affinity = flags & PCI_IRQ_AFFINITY;
int nvec;
int rc;
@@ -1091,8 +1092,7 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,

for (;;) {
if (affinity) {
- nvec = irq_calc_affinity_vectors(dev->irq_affinity,
- nvec);
+ nvec = irq_calc_affinity_vectors(nvec, &default_affd);
if (nvec < minvec)
return -ENOSPC;
}
@@ -1132,6 +1132,7 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
struct msix_entry *entries, int minvec, int maxvec,
unsigned int flags)
{
+ static const struct irq_affinity default_affd;
bool affinity = flags & PCI_IRQ_AFFINITY;
int rc, nvec = maxvec;

@@ -1140,8 +1141,7 @@ static int __pci_enable_msix_range(struct pci_dev *dev,

for (;;) {
if (affinity) {
- nvec = irq_calc_affinity_vectors(dev->irq_affinity,
- nvec);
+ nvec = irq_calc_affinity_vectors(nvec, &default_affd);
if (nvec < minvec)
return -ENOSPC;
}
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 6b52686..9081f23b 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -291,7 +291,7 @@ extern int
irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);

struct cpumask *irq_create_affinity_masks(const struct cpumask *affinity, int nvec);
-int irq_calc_affinity_vectors(const struct cpumask *affinity, int maxvec);
+int irq_calc_affinity_vectors(int maxvec, const struct irq_affinity *affd);

#else /* CONFIG_SMP */

@@ -331,7 +331,7 @@ irq_create_affinity_masks(const struct cpumask *affinity, int nvec)
}

static inline int
-irq_calc_affinity_vectors(const struct cpumask *affinity, int maxvec)
+irq_calc_affinity_vectors(int maxvec, const struct irq_affinity *affd)
{
return maxvec;
}
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 17f51d63..8d92597 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -131,24 +131,20 @@ struct cpumask *irq_create_affinity_masks(const struct cpumask *affinity,
}

/**
- * irq_calc_affinity_vectors - Calculate to optimal number of vectors for a given affinity mask
- * @affinity: The affinity mask to spread. If NULL cpu_online_mask
- * is used
- * @maxvec: The maximum number of vectors available
+ * irq_calc_affinity_vectors - Calculate the optimal number of vectors
+ * @maxvec: The maximum number of vectors available
+ * @affd: Description of the affinity requirements
*/
-int irq_calc_affinity_vectors(const struct cpumask *affinity, int maxvec)
+int irq_calc_affinity_vectors(int maxvec, const struct irq_affinity *affd)
{
- int cpus, ret;
+ int resv = affd->pre_vectors + affd->post_vectors;
+ int vecs = maxvec - resv;
+ int cpus;

/* Stabilize the cpumasks */
get_online_cpus();
- /* If the supplied affinity mask is NULL, use cpu online mask */
- if (!affinity)
- affinity = cpu_online_mask;
-
- cpus = cpumask_weight(affinity);
- ret = (cpus < maxvec) ? cpus : maxvec;
-
+ cpus = cpumask_weight(cpu_online_mask);
put_online_cpus();
- return ret;
+
+ return min(cpus, vecs) + resv;
}
--
2.1.4

2016-11-09 01:15:53

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 5/7] pci/msi: Provide pci_alloc_irq_vectors_affinity()

This is a variant of pci_alloc_irq_vectors() that allows passing a
struct irq_affinity to provide fine-grained IRQ affinity control.
For now this means being able to exclude vectors at the beginning or
end of the MSI vector space, but it could also be used for any other
quirks needed in the future (e.g. more vectors than CPUs, or excluding
CPUs from the spreading).

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/msi.c | 20 +++++++++++++-------
include/linux/pci.h | 24 +++++++++++++++++++-----
2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 512f388..dd27f73 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1179,11 +1179,12 @@ int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
EXPORT_SYMBOL(pci_enable_msix_range);

/**
- * pci_alloc_irq_vectors - allocate multiple IRQs for a device
+ * pci_alloc_irq_vectors_affinity - allocate multiple IRQs for a device
* @dev: PCI device to operate on
* @min_vecs: minimum number of vectors required (must be >= 1)
* @max_vecs: maximum (desired) number of vectors
* @flags: flags or quirks for the allocation
+ * @affd: optional description of the affinity requirements
*
* Allocate up to @max_vecs interrupt vectors for @dev, using MSI-X or MSI
* vectors if available, and fall back to a single legacy vector
@@ -1195,15 +1196,20 @@ EXPORT_SYMBOL(pci_enable_msix_range);
* To get the Linux IRQ number used for a vector that can be passed to
* request_irq() use the pci_irq_vector() helper.
*/
-int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
- unsigned int max_vecs, unsigned int flags)
+int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
+ unsigned int max_vecs, unsigned int flags,
+ const struct irq_affinity *affd)
{
static const struct irq_affinity msi_default_affd;
- const struct irq_affinity *affd = NULL;
int vecs = -ENOSPC;

- if (flags & PCI_IRQ_AFFINITY)
- affd = &msi_default_affd;
+ if (flags & PCI_IRQ_AFFINITY) {
+ if (!affd)
+ affd = &msi_default_affd;
+ } else {
+ if (WARN_ON(affd))
+ affd = NULL;
+ }

if (flags & PCI_IRQ_MSIX) {
vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
@@ -1226,7 +1232,7 @@ int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,

return vecs;
}
-EXPORT_SYMBOL(pci_alloc_irq_vectors);
+EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);

/**
* pci_free_irq_vectors - free previously allocated IRQs for a device
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0e49f70..7090f5f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -244,6 +244,7 @@ struct pci_cap_saved_state {
struct pci_cap_saved_data cap;
};

+struct irq_affinity;
struct pcie_link_state;
struct pci_vpd;
struct pci_sriov;
@@ -1310,8 +1311,10 @@ static inline int pci_enable_msix_exact(struct pci_dev *dev,
return rc;
return 0;
}
-int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
- unsigned int max_vecs, unsigned int flags);
+int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
+ unsigned int max_vecs, unsigned int flags,
+ const struct irq_affinity *affd);
+
void pci_free_irq_vectors(struct pci_dev *dev);
int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev, int vec);
@@ -1339,14 +1342,17 @@ static inline int pci_enable_msix_range(struct pci_dev *dev,
static inline int pci_enable_msix_exact(struct pci_dev *dev,
struct msix_entry *entries, int nvec)
{ return -ENOSYS; }
-static inline int pci_alloc_irq_vectors(struct pci_dev *dev,
- unsigned int min_vecs, unsigned int max_vecs,
- unsigned int flags)
+
+static inline int
+pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
+ unsigned int max_vecs, unsigned int flags,
+ const struct irq_affinity *aff_desc)
{
if (min_vecs > 1)
return -EINVAL;
return 1;
}
+
static inline void pci_free_irq_vectors(struct pci_dev *dev)
{
}
@@ -1364,6 +1370,14 @@ static inline const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev,
}
#endif

+static inline int
+pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
+ unsigned int max_vecs, unsigned int flags)
+{
+ return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
+ NULL);
+}
+
#ifdef CONFIG_PCIEPORTBUS
extern bool pcie_ports_disabled;
extern bool pcie_ports_auto;
--
2.1.4

2016-11-09 01:15:50

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 1/7] genirq/affinity: Introduce struct irq_affinity

Some drivers (various network and RDMA adapter for example) have a MSI-X
vector layout where most of the vectors are used for I/O queues and should
have CPU affinity assigned to them, but some (usually 1 but sometimes more)
at the beginning or end are used for low-performance admin or configuration
work and should not have any explicit affinity assigned to them.

This adds a new irq_affinity structure, which will be passed through a
variant of pci_irq_alloc_vectors that allows to specify these
requirements (and is extensible to any future quirks in that area) so that
the core IRQ affinity algorithm can take this quirks into account.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
---
include/linux/interrupt.h | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 72f0721..6b52686 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -232,6 +232,18 @@ struct irq_affinity_notify {
void (*release)(struct kref *ref);
};

+/**
+ * struct irq_affinity - Description for automatic irq affinity assignements
+ * @pre_vectors: Don't apply affinity to @pre_vectors at beginning of
+ * the MSI(-X) vector space
+ * @post_vectors: Don't apply affinity to @post_vectors at end of
+ * the MSI(-X) vector space
+ */
+struct irq_affinity {
+ int pre_vectors;
+ int post_vectors;
+};
+
#if defined(CONFIG_SMP)

extern cpumask_var_t irq_default_affinity;
--
2.1.4

2016-11-09 01:17:07

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 4/7] PCI/MSI: Propagate IRQ affinity description through the MSI code

No API change yet, just pass it down all the way from
pci_alloc_irq_vectors() to the core MSI code.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/msi.c | 66 +++++++++++++++++++++++++++----------------------------
1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index f4a108b..512f388 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -551,15 +551,14 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
}

static struct msi_desc *
-msi_setup_entry(struct pci_dev *dev, int nvec, bool affinity)
+msi_setup_entry(struct pci_dev *dev, int nvec, const struct irq_affinity *affd)
{
- static const struct irq_affinity default_affd;
struct cpumask *masks = NULL;
struct msi_desc *entry;
u16 control;

- if (affinity) {
- masks = irq_create_affinity_masks(nvec, &default_affd);
+ if (affd) {
+ masks = irq_create_affinity_masks(nvec, affd);
if (!masks)
pr_err("Unable to allocate affinity masks, ignoring\n");
}
@@ -619,7 +618,8 @@ static int msi_verify_entries(struct pci_dev *dev)
* an error, and a positive return value indicates the number of interrupts
* which could have been allocated.
*/
-static int msi_capability_init(struct pci_dev *dev, int nvec, bool affinity)
+static int msi_capability_init(struct pci_dev *dev, int nvec,
+ const struct irq_affinity *affd)
{
struct msi_desc *entry;
int ret;
@@ -627,7 +627,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, bool affinity)

pci_msi_set_enable(dev, 0); /* Disable MSI during set up */

- entry = msi_setup_entry(dev, nvec, affinity);
+ entry = msi_setup_entry(dev, nvec, affd);
if (!entry)
return -ENOMEM;

@@ -691,15 +691,14 @@ static void __iomem *msix_map_region(struct pci_dev *dev, unsigned nr_entries)

static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
struct msix_entry *entries, int nvec,
- bool affinity)
+ const struct irq_affinity *affd)
{
- static const struct irq_affinity default_affd;
struct cpumask *curmsk, *masks = NULL;
struct msi_desc *entry;
int ret, i;

- if (affinity) {
- masks = irq_create_affinity_masks(nvec, &default_affd);
+ if (affd) {
+ masks = irq_create_affinity_masks(nvec, affd);
if (!masks)
pr_err("Unable to allocate affinity masks, ignoring\n");
}
@@ -755,14 +754,14 @@ static void msix_program_entries(struct pci_dev *dev,
* @dev: pointer to the pci_dev data structure of MSI-X device function
* @entries: pointer to an array of struct msix_entry entries
* @nvec: number of @entries
- * @affinity: flag to indicate cpu irq affinity mask should be set
+ * @affd: Optional pointer to enable automatic affinity assignement
*
* Setup the MSI-X capability structure of device function with a
* single MSI-X irq. A return of zero indicates the successful setup of
* requested MSI-X entries with allocated irqs or non-zero for otherwise.
**/
static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
- int nvec, bool affinity)
+ int nvec, const struct irq_affinity *affd)
{
int ret;
u16 control;
@@ -777,7 +776,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
if (!base)
return -ENOMEM;

- ret = msix_setup_entries(dev, base, entries, nvec, affinity);
+ ret = msix_setup_entries(dev, base, entries, nvec, affd);
if (ret)
return ret;

@@ -958,7 +957,7 @@ int pci_msix_vec_count(struct pci_dev *dev)
EXPORT_SYMBOL(pci_msix_vec_count);

static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
- int nvec, bool affinity)
+ int nvec, const struct irq_affinity *affd)
{
int nr_entries;
int i, j;
@@ -990,7 +989,7 @@ static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
dev_info(&dev->dev, "can't enable MSI-X (MSI IRQ already assigned)\n");
return -EINVAL;
}
- return msix_capability_init(dev, entries, nvec, affinity);
+ return msix_capability_init(dev, entries, nvec, affd);
}

/**
@@ -1010,7 +1009,7 @@ static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
**/
int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
{
- return __pci_enable_msix(dev, entries, nvec, false);
+ return __pci_enable_msix(dev, entries, nvec, NULL);
}
EXPORT_SYMBOL(pci_enable_msix);

@@ -1061,10 +1060,8 @@ int pci_msi_enabled(void)
EXPORT_SYMBOL(pci_msi_enabled);

static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
- unsigned int flags)
+ const struct irq_affinity *affd)
{
- static const struct irq_affinity default_affd;
- bool affinity = flags & PCI_IRQ_AFFINITY;
int nvec;
int rc;

@@ -1093,13 +1090,13 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
nvec = maxvec;

for (;;) {
- if (affinity) {
- nvec = irq_calc_affinity_vectors(nvec, &default_affd);
+ if (affd) {
+ nvec = irq_calc_affinity_vectors(nvec, affd);
if (nvec < minvec)
return -ENOSPC;
}

- rc = msi_capability_init(dev, nvec, affinity);
+ rc = msi_capability_init(dev, nvec, affd);
if (rc == 0)
return nvec;

@@ -1126,29 +1123,27 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
**/
int pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec)
{
- return __pci_enable_msi_range(dev, minvec, maxvec, 0);
+ return __pci_enable_msi_range(dev, minvec, maxvec, NULL);
}
EXPORT_SYMBOL(pci_enable_msi_range);

static int __pci_enable_msix_range(struct pci_dev *dev,
- struct msix_entry *entries, int minvec, int maxvec,
- unsigned int flags)
+ struct msix_entry *entries, int minvec,
+ int maxvec, const struct irq_affinity *affd)
{
- static const struct irq_affinity default_affd;
- bool affinity = flags & PCI_IRQ_AFFINITY;
int rc, nvec = maxvec;

if (maxvec < minvec)
return -ERANGE;

for (;;) {
- if (affinity) {
- nvec = irq_calc_affinity_vectors(nvec, &default_affd);
+ if (affd) {
+ nvec = irq_calc_affinity_vectors(nvec, affd);
if (nvec < minvec)
return -ENOSPC;
}

- rc = __pci_enable_msix(dev, entries, nvec, affinity);
+ rc = __pci_enable_msix(dev, entries, nvec, affd);
if (rc == 0)
return nvec;

@@ -1179,7 +1174,7 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
int minvec, int maxvec)
{
- return __pci_enable_msix_range(dev, entries, minvec, maxvec, 0);
+ return __pci_enable_msix_range(dev, entries, minvec, maxvec, NULL);
}
EXPORT_SYMBOL(pci_enable_msix_range);

@@ -1203,17 +1198,22 @@ EXPORT_SYMBOL(pci_enable_msix_range);
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags)
{
+ static const struct irq_affinity msi_default_affd;
+ const struct irq_affinity *affd = NULL;
int vecs = -ENOSPC;

+ if (flags & PCI_IRQ_AFFINITY)
+ affd = &msi_default_affd;
+
if (flags & PCI_IRQ_MSIX) {
vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
- flags);
+ affd);
if (vecs > 0)
return vecs;
}

if (flags & PCI_IRQ_MSI) {
- vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, flags);
+ vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
if (vecs > 0)
return vecs;
}
--
2.1.4

2016-11-09 01:17:10

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 3/7] genirq/affinity: Handle pre/post vectors in irq_create_affinity_masks()

Only calculate the affinity for the main I/O vectors, and skip the
pre or post vectors specified by struct irq_affinity.

Also remove the irq_affinity cpumask argument that has never been used.
If we ever need it in the future we can pass it through struct
irq_affinity.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/msi.c | 6 ++++--
include/linux/interrupt.h | 4 ++--
kernel/irq/affinity.c | 46 +++++++++++++++++++++++++---------------------
3 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index dad2da7..f4a108b 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -553,12 +553,13 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
static struct msi_desc *
msi_setup_entry(struct pci_dev *dev, int nvec, bool affinity)
{
+ static const struct irq_affinity default_affd;
struct cpumask *masks = NULL;
struct msi_desc *entry;
u16 control;

if (affinity) {
- masks = irq_create_affinity_masks(dev->irq_affinity, nvec);
+ masks = irq_create_affinity_masks(nvec, &default_affd);
if (!masks)
pr_err("Unable to allocate affinity masks, ignoring\n");
}
@@ -692,12 +693,13 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
struct msix_entry *entries, int nvec,
bool affinity)
{
+ static const struct irq_affinity default_affd;
struct cpumask *curmsk, *masks = NULL;
struct msi_desc *entry;
int ret, i;

if (affinity) {
- masks = irq_create_affinity_masks(dev->irq_affinity, nvec);
+ masks = irq_create_affinity_masks(nvec, &default_affd);
if (!masks)
pr_err("Unable to allocate affinity masks, ignoring\n");
}
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 9081f23b..53144e7 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -290,7 +290,7 @@ extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
extern int
irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);

-struct cpumask *irq_create_affinity_masks(const struct cpumask *affinity, int nvec);
+struct cpumask *irq_create_affinity_masks(int nvec, const struct irq_affinity *affd);
int irq_calc_affinity_vectors(int maxvec, const struct irq_affinity *affd);

#else /* CONFIG_SMP */
@@ -325,7 +325,7 @@ irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
}

static inline struct cpumask *
-irq_create_affinity_masks(const struct cpumask *affinity, int nvec)
+irq_create_affinity_masks(int nvec, const struct irq_affinity *affd)
{
return NULL;
}
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 8d92597..17360bd 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -51,16 +51,16 @@ static int get_nodes_in_cpumask(const struct cpumask *mask, nodemask_t *nodemsk)

/**
* irq_create_affinity_masks - Create affinity masks for multiqueue spreading
- * @affinity: The affinity mask to spread. If NULL cpu_online_mask
- * is used
- * @nvecs: The number of vectors
+ * @nvecs: The total number of vectors
+ * @affd: Description of the affinity requirements
*
* Returns the masks pointer or NULL if allocation failed.
*/
-struct cpumask *irq_create_affinity_masks(const struct cpumask *affinity,
- int nvec)
+struct cpumask *
+irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
{
- int n, nodes, vecs_per_node, cpus_per_vec, extra_vecs, curvec = 0;
+ int n, nodes, vecs_per_node, cpus_per_vec, extra_vecs, curvec;
+ int affv = nvecs - affd->pre_vectors - affd->post_vectors;
nodemask_t nodemsk = NODE_MASK_NONE;
struct cpumask *masks;
cpumask_var_t nmsk;
@@ -68,46 +68,46 @@ struct cpumask *irq_create_affinity_masks(const struct cpumask *affinity,
if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
return NULL;

- masks = kzalloc(nvec * sizeof(*masks), GFP_KERNEL);
+ masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
if (!masks)
goto out;

+ /* Fill out vectors at the beginning that don't need affinity */
+ for (curvec = 0; curvec < affd->pre_vectors; curvec++)
+ cpumask_copy(masks + curvec, cpu_possible_mask);
+
/* Stabilize the cpumasks */
get_online_cpus();
- /* If the supplied affinity mask is NULL, use cpu online mask */
- if (!affinity)
- affinity = cpu_online_mask;
-
- nodes = get_nodes_in_cpumask(affinity, &nodemsk);
+ nodes = get_nodes_in_cpumask(cpu_online_mask, &nodemsk);

/*
* If the number of nodes in the mask is less than or equal the
* number of vectors we just spread the vectors across the nodes.
*/
- if (nvec <= nodes) {
+ if (affv <= nodes) {
for_each_node_mask(n, nodemsk) {
cpumask_copy(masks + curvec, cpumask_of_node(n));
- if (++curvec == nvec)
+ if (++curvec == affv)
break;
}
- goto outonl;
+ goto done;
}

/* Spread the vectors per node */
- vecs_per_node = nvec / nodes;
+ vecs_per_node = affv / nodes;
/* Account for rounding errors */
- extra_vecs = nvec - (nodes * vecs_per_node);
+ extra_vecs = affv - (nodes * vecs_per_node);

for_each_node_mask(n, nodemsk) {
int ncpus, v, vecs_to_assign = vecs_per_node;

/* Get the cpus on this node which are in the mask */
- cpumask_and(nmsk, affinity, cpumask_of_node(n));
+ cpumask_and(nmsk, cpu_online_mask, cpumask_of_node(n));

/* Calculate the number of cpus per vector */
ncpus = cpumask_weight(nmsk);

- for (v = 0; curvec < nvec && v < vecs_to_assign; curvec++, v++) {
+ for (v = 0; curvec < affv && v < vecs_to_assign; curvec++, v++) {
cpus_per_vec = ncpus / vecs_to_assign;

/* Account for extra vectors to compensate rounding errors */
@@ -119,12 +119,16 @@ struct cpumask *irq_create_affinity_masks(const struct cpumask *affinity,
irq_spread_init_one(masks + curvec, nmsk, cpus_per_vec);
}

- if (curvec >= nvec)
+ if (curvec >= affv)
break;
}

-outonl:
+done:
put_online_cpus();
+
+ /* Fill out vectors at the end that don't need affinity */
+ for (; curvec < nvecs; curvec++)
+ cpumask_copy(masks + curvec, cpu_possible_mask);
out:
free_cpumask_var(nmsk);
return masks;
--
2.1.4

2016-11-09 01:39:40

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 5/7] pci/msi: Provide pci_alloc_irq_vectors_affinity()

And the subject for this should be PCI/MSI like for the others, sorry.

2016-11-09 04:38:17

by Jens Axboe

[permalink] [raw]
Subject: Re: support for partial irq affinity assignment V3

On 11/08/2016 06:15 PM, Christoph Hellwig wrote:
> This series adds support for automatic interrupt assignment to devices
> that have a few vectors that are set aside for admin or config purposes
> and thus should not fall into the general per-cpu assginment pool.
>
> The first patch adds that support to the core IRQ and PCI/msi code,
> and the second is a small tweak to a block layer helper to make use
> of it. I'd love to have both go into the same tree so that consumers
> of this (e.g. the virtio, scsi and rdma trees) only need to pull in
> one of these trees as dependency.

Series looks good to me, you can add my Acked-by to all of them.

--
Jens Axboe

2016-11-09 07:04:36

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH 2/7] genirq/affinity: Handle pre/post vectors in irq_calc_affinity_vectors()

On 11/09/2016 02:15 AM, Christoph Hellwig wrote:
> Only calculate the affinity for the main I/O vectors, and skip the
> pre or post vectors specified by struct irq_affinity.
>
> Also remove the irq_affinity cpumask argument that has never been used.
> If we ever need it in the future we can pass it through struct
> irq_affinity.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/pci/msi.c | 8 ++++----
> include/linux/interrupt.h | 4 ++--
> kernel/irq/affinity.c | 24 ++++++++++--------------
> 3 files changed, 16 insertions(+), 20 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

2016-11-09 07:05:11

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH 3/7] genirq/affinity: Handle pre/post vectors in irq_create_affinity_masks()

On 11/09/2016 02:15 AM, Christoph Hellwig wrote:
> Only calculate the affinity for the main I/O vectors, and skip the
> pre or post vectors specified by struct irq_affinity.
>
> Also remove the irq_affinity cpumask argument that has never been used.
> If we ever need it in the future we can pass it through struct
> irq_affinity.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> Acked-by: Bjorn Helgaas <[email protected]>
> ---
> drivers/pci/msi.c | 6 ++++--
> include/linux/interrupt.h | 4 ++--
> kernel/irq/affinity.c | 46 +++++++++++++++++++++++++---------------------
> 3 files changed, 31 insertions(+), 25 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

Subject: [tip:irq/core] genirq/affinity: Handle pre/post vectors in irq_calc_affinity_vectors()

Commit-ID: 212bd846223c718b6577d4df16fd8d05a55ad914
Gitweb: http://git.kernel.org/tip/212bd846223c718b6577d4df16fd8d05a55ad914
Author: Christoph Hellwig <[email protected]>
AuthorDate: Tue, 8 Nov 2016 17:15:02 -0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 9 Nov 2016 08:25:08 +0100

genirq/affinity: Handle pre/post vectors in irq_calc_affinity_vectors()

Only calculate the affinity for the main I/O vectors, and skip the pre or
post vectors specified by struct irq_affinity.

Also remove the irq_affinity cpumask argument that has never been used. If
we ever need it in the future we can pass it through struct irq_affinity.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
drivers/pci/msi.c | 8 ++++----
include/linux/interrupt.h | 4 ++--
kernel/irq/affinity.c | 24 ++++++++++--------------
3 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index ad70507..dad2da7 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1061,6 +1061,7 @@ EXPORT_SYMBOL(pci_msi_enabled);
static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
unsigned int flags)
{
+ static const struct irq_affinity default_affd;
bool affinity = flags & PCI_IRQ_AFFINITY;
int nvec;
int rc;
@@ -1091,8 +1092,7 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,

for (;;) {
if (affinity) {
- nvec = irq_calc_affinity_vectors(dev->irq_affinity,
- nvec);
+ nvec = irq_calc_affinity_vectors(nvec, &default_affd);
if (nvec < minvec)
return -ENOSPC;
}
@@ -1132,6 +1132,7 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
struct msix_entry *entries, int minvec, int maxvec,
unsigned int flags)
{
+ static const struct irq_affinity default_affd;
bool affinity = flags & PCI_IRQ_AFFINITY;
int rc, nvec = maxvec;

@@ -1140,8 +1141,7 @@ static int __pci_enable_msix_range(struct pci_dev *dev,

for (;;) {
if (affinity) {
- nvec = irq_calc_affinity_vectors(dev->irq_affinity,
- nvec);
+ nvec = irq_calc_affinity_vectors(nvec, &default_affd);
if (nvec < minvec)
return -ENOSPC;
}
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 6b52686..9081f23 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -291,7 +291,7 @@ extern int
irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);

struct cpumask *irq_create_affinity_masks(const struct cpumask *affinity, int nvec);
-int irq_calc_affinity_vectors(const struct cpumask *affinity, int maxvec);
+int irq_calc_affinity_vectors(int maxvec, const struct irq_affinity *affd);

#else /* CONFIG_SMP */

@@ -331,7 +331,7 @@ irq_create_affinity_masks(const struct cpumask *affinity, int nvec)
}

static inline int
-irq_calc_affinity_vectors(const struct cpumask *affinity, int maxvec)
+irq_calc_affinity_vectors(int maxvec, const struct irq_affinity *affd)
{
return maxvec;
}
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 17f51d63..8d92597 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -131,24 +131,20 @@ out:
}

/**
- * irq_calc_affinity_vectors - Calculate to optimal number of vectors for a given affinity mask
- * @affinity: The affinity mask to spread. If NULL cpu_online_mask
- * is used
- * @maxvec: The maximum number of vectors available
+ * irq_calc_affinity_vectors - Calculate the optimal number of vectors
+ * @maxvec: The maximum number of vectors available
+ * @affd: Description of the affinity requirements
*/
-int irq_calc_affinity_vectors(const struct cpumask *affinity, int maxvec)
+int irq_calc_affinity_vectors(int maxvec, const struct irq_affinity *affd)
{
- int cpus, ret;
+ int resv = affd->pre_vectors + affd->post_vectors;
+ int vecs = maxvec - resv;
+ int cpus;

/* Stabilize the cpumasks */
get_online_cpus();
- /* If the supplied affinity mask is NULL, use cpu online mask */
- if (!affinity)
- affinity = cpu_online_mask;
-
- cpus = cpumask_weight(affinity);
- ret = (cpus < maxvec) ? cpus : maxvec;
-
+ cpus = cpumask_weight(cpu_online_mask);
put_online_cpus();
- return ret;
+
+ return min(cpus, vecs) + resv;
}

Subject: [tip:irq/core] genirq/affinity: Introduce struct irq_affinity

Commit-ID: 20e407e195b29a4f5a18d713a61f54a75f992bd5
Gitweb: http://git.kernel.org/tip/20e407e195b29a4f5a18d713a61f54a75f992bd5
Author: Christoph Hellwig <[email protected]>
AuthorDate: Tue, 8 Nov 2016 17:15:01 -0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 9 Nov 2016 08:25:08 +0100

genirq/affinity: Introduce struct irq_affinity

Some drivers (various network and RDMA adapter for example) have a MSI-X
vector layout where most of the vectors are used for I/O queues and should
have CPU affinity assigned to them, but some (usually 1 but sometimes more)
at the beginning or end are used for low-performance admin or configuration
work and should not have any explicit affinity assigned to them.

Add a new irq_affinity structure, which will be passed through a variant of
pci_irq_alloc_vectors that allows to specify these requirements (and is
extensible to any future quirks in that area) so that the core IRQ affinity
algorithm can take this quirks into account.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
include/linux/interrupt.h | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 72f0721..6b52686 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -232,6 +232,18 @@ struct irq_affinity_notify {
void (*release)(struct kref *ref);
};

+/**
+ * struct irq_affinity - Description for automatic irq affinity assignements
+ * @pre_vectors: Don't apply affinity to @pre_vectors at beginning of
+ * the MSI(-X) vector space
+ * @post_vectors: Don't apply affinity to @post_vectors at end of
+ * the MSI(-X) vector space
+ */
+struct irq_affinity {
+ int pre_vectors;
+ int post_vectors;
+};
+
#if defined(CONFIG_SMP)

extern cpumask_var_t irq_default_affinity;

Subject: [tip:irq/core] genirq/affinity: Handle pre/post vectors in irq_create_affinity_masks()

Commit-ID: 67c93c218dc5d1b45d547771f1fdb44a381e1faf
Gitweb: http://git.kernel.org/tip/67c93c218dc5d1b45d547771f1fdb44a381e1faf
Author: Christoph Hellwig <[email protected]>
AuthorDate: Tue, 8 Nov 2016 17:15:03 -0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 9 Nov 2016 08:25:09 +0100

genirq/affinity: Handle pre/post vectors in irq_create_affinity_masks()

Only calculate the affinity for the main I/O vectors, and skip the
pre or post vectors specified by struct irq_affinity.

Also remove the irq_affinity cpumask argument that has never been used.
If we ever need it in the future we can pass it through struct
irq_affinity.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
drivers/pci/msi.c | 6 ++++--
include/linux/interrupt.h | 4 ++--
kernel/irq/affinity.c | 46 +++++++++++++++++++++++++---------------------
3 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index dad2da7..f4a108b 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -553,12 +553,13 @@ error_attrs:
static struct msi_desc *
msi_setup_entry(struct pci_dev *dev, int nvec, bool affinity)
{
+ static const struct irq_affinity default_affd;
struct cpumask *masks = NULL;
struct msi_desc *entry;
u16 control;

if (affinity) {
- masks = irq_create_affinity_masks(dev->irq_affinity, nvec);
+ masks = irq_create_affinity_masks(nvec, &default_affd);
if (!masks)
pr_err("Unable to allocate affinity masks, ignoring\n");
}
@@ -692,12 +693,13 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
struct msix_entry *entries, int nvec,
bool affinity)
{
+ static const struct irq_affinity default_affd;
struct cpumask *curmsk, *masks = NULL;
struct msi_desc *entry;
int ret, i;

if (affinity) {
- masks = irq_create_affinity_masks(dev->irq_affinity, nvec);
+ masks = irq_create_affinity_masks(nvec, &default_affd);
if (!masks)
pr_err("Unable to allocate affinity masks, ignoring\n");
}
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 9081f23..53144e7 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -290,7 +290,7 @@ extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
extern int
irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);

-struct cpumask *irq_create_affinity_masks(const struct cpumask *affinity, int nvec);
+struct cpumask *irq_create_affinity_masks(int nvec, const struct irq_affinity *affd);
int irq_calc_affinity_vectors(int maxvec, const struct irq_affinity *affd);

#else /* CONFIG_SMP */
@@ -325,7 +325,7 @@ irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
}

static inline struct cpumask *
-irq_create_affinity_masks(const struct cpumask *affinity, int nvec)
+irq_create_affinity_masks(int nvec, const struct irq_affinity *affd)
{
return NULL;
}
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 8d92597..17360bd 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -51,16 +51,16 @@ static int get_nodes_in_cpumask(const struct cpumask *mask, nodemask_t *nodemsk)

/**
* irq_create_affinity_masks - Create affinity masks for multiqueue spreading
- * @affinity: The affinity mask to spread. If NULL cpu_online_mask
- * is used
- * @nvecs: The number of vectors
+ * @nvecs: The total number of vectors
+ * @affd: Description of the affinity requirements
*
* Returns the masks pointer or NULL if allocation failed.
*/
-struct cpumask *irq_create_affinity_masks(const struct cpumask *affinity,
- int nvec)
+struct cpumask *
+irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
{
- int n, nodes, vecs_per_node, cpus_per_vec, extra_vecs, curvec = 0;
+ int n, nodes, vecs_per_node, cpus_per_vec, extra_vecs, curvec;
+ int affv = nvecs - affd->pre_vectors - affd->post_vectors;
nodemask_t nodemsk = NODE_MASK_NONE;
struct cpumask *masks;
cpumask_var_t nmsk;
@@ -68,46 +68,46 @@ struct cpumask *irq_create_affinity_masks(const struct cpumask *affinity,
if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
return NULL;

- masks = kzalloc(nvec * sizeof(*masks), GFP_KERNEL);
+ masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
if (!masks)
goto out;

+ /* Fill out vectors at the beginning that don't need affinity */
+ for (curvec = 0; curvec < affd->pre_vectors; curvec++)
+ cpumask_copy(masks + curvec, cpu_possible_mask);
+
/* Stabilize the cpumasks */
get_online_cpus();
- /* If the supplied affinity mask is NULL, use cpu online mask */
- if (!affinity)
- affinity = cpu_online_mask;
-
- nodes = get_nodes_in_cpumask(affinity, &nodemsk);
+ nodes = get_nodes_in_cpumask(cpu_online_mask, &nodemsk);

/*
* If the number of nodes in the mask is less than or equal the
* number of vectors we just spread the vectors across the nodes.
*/
- if (nvec <= nodes) {
+ if (affv <= nodes) {
for_each_node_mask(n, nodemsk) {
cpumask_copy(masks + curvec, cpumask_of_node(n));
- if (++curvec == nvec)
+ if (++curvec == affv)
break;
}
- goto outonl;
+ goto done;
}

/* Spread the vectors per node */
- vecs_per_node = nvec / nodes;
+ vecs_per_node = affv / nodes;
/* Account for rounding errors */
- extra_vecs = nvec - (nodes * vecs_per_node);
+ extra_vecs = affv - (nodes * vecs_per_node);

for_each_node_mask(n, nodemsk) {
int ncpus, v, vecs_to_assign = vecs_per_node;

/* Get the cpus on this node which are in the mask */
- cpumask_and(nmsk, affinity, cpumask_of_node(n));
+ cpumask_and(nmsk, cpu_online_mask, cpumask_of_node(n));

/* Calculate the number of cpus per vector */
ncpus = cpumask_weight(nmsk);

- for (v = 0; curvec < nvec && v < vecs_to_assign; curvec++, v++) {
+ for (v = 0; curvec < affv && v < vecs_to_assign; curvec++, v++) {
cpus_per_vec = ncpus / vecs_to_assign;

/* Account for extra vectors to compensate rounding errors */
@@ -119,12 +119,16 @@ struct cpumask *irq_create_affinity_masks(const struct cpumask *affinity,
irq_spread_init_one(masks + curvec, nmsk, cpus_per_vec);
}

- if (curvec >= nvec)
+ if (curvec >= affv)
break;
}

-outonl:
+done:
put_online_cpus();
+
+ /* Fill out vectors at the end that don't need affinity */
+ for (; curvec < nvecs; curvec++)
+ cpumask_copy(masks + curvec, cpu_possible_mask);
out:
free_cpumask_var(nmsk);
return masks;

Subject: [tip:irq/core] PCI/MSI: Propagate IRQ affinity description through the MSI code

Commit-ID: 61e1c5905290efe48bacda5e342d4af4cb1b923b
Gitweb: http://git.kernel.org/tip/61e1c5905290efe48bacda5e342d4af4cb1b923b
Author: Christoph Hellwig <[email protected]>
AuthorDate: Tue, 8 Nov 2016 17:15:04 -0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 9 Nov 2016 08:25:09 +0100

PCI/MSI: Propagate IRQ affinity description through the MSI code

No API change yet, just pass it down all the way from
pci_alloc_irq_vectors() to the core MSI code.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
drivers/pci/msi.c | 66 +++++++++++++++++++++++++++----------------------------
1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index f4a108b..512f388 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -551,15 +551,14 @@ error_attrs:
}

static struct msi_desc *
-msi_setup_entry(struct pci_dev *dev, int nvec, bool affinity)
+msi_setup_entry(struct pci_dev *dev, int nvec, const struct irq_affinity *affd)
{
- static const struct irq_affinity default_affd;
struct cpumask *masks = NULL;
struct msi_desc *entry;
u16 control;

- if (affinity) {
- masks = irq_create_affinity_masks(nvec, &default_affd);
+ if (affd) {
+ masks = irq_create_affinity_masks(nvec, affd);
if (!masks)
pr_err("Unable to allocate affinity masks, ignoring\n");
}
@@ -619,7 +618,8 @@ static int msi_verify_entries(struct pci_dev *dev)
* an error, and a positive return value indicates the number of interrupts
* which could have been allocated.
*/
-static int msi_capability_init(struct pci_dev *dev, int nvec, bool affinity)
+static int msi_capability_init(struct pci_dev *dev, int nvec,
+ const struct irq_affinity *affd)
{
struct msi_desc *entry;
int ret;
@@ -627,7 +627,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, bool affinity)

pci_msi_set_enable(dev, 0); /* Disable MSI during set up */

- entry = msi_setup_entry(dev, nvec, affinity);
+ entry = msi_setup_entry(dev, nvec, affd);
if (!entry)
return -ENOMEM;

@@ -691,15 +691,14 @@ static void __iomem *msix_map_region(struct pci_dev *dev, unsigned nr_entries)

static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
struct msix_entry *entries, int nvec,
- bool affinity)
+ const struct irq_affinity *affd)
{
- static const struct irq_affinity default_affd;
struct cpumask *curmsk, *masks = NULL;
struct msi_desc *entry;
int ret, i;

- if (affinity) {
- masks = irq_create_affinity_masks(nvec, &default_affd);
+ if (affd) {
+ masks = irq_create_affinity_masks(nvec, affd);
if (!masks)
pr_err("Unable to allocate affinity masks, ignoring\n");
}
@@ -755,14 +754,14 @@ static void msix_program_entries(struct pci_dev *dev,
* @dev: pointer to the pci_dev data structure of MSI-X device function
* @entries: pointer to an array of struct msix_entry entries
* @nvec: number of @entries
- * @affinity: flag to indicate cpu irq affinity mask should be set
+ * @affd: Optional pointer to enable automatic affinity assignement
*
* Setup the MSI-X capability structure of device function with a
* single MSI-X irq. A return of zero indicates the successful setup of
* requested MSI-X entries with allocated irqs or non-zero for otherwise.
**/
static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
- int nvec, bool affinity)
+ int nvec, const struct irq_affinity *affd)
{
int ret;
u16 control;
@@ -777,7 +776,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
if (!base)
return -ENOMEM;

- ret = msix_setup_entries(dev, base, entries, nvec, affinity);
+ ret = msix_setup_entries(dev, base, entries, nvec, affd);
if (ret)
return ret;

@@ -958,7 +957,7 @@ int pci_msix_vec_count(struct pci_dev *dev)
EXPORT_SYMBOL(pci_msix_vec_count);

static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
- int nvec, bool affinity)
+ int nvec, const struct irq_affinity *affd)
{
int nr_entries;
int i, j;
@@ -990,7 +989,7 @@ static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
dev_info(&dev->dev, "can't enable MSI-X (MSI IRQ already assigned)\n");
return -EINVAL;
}
- return msix_capability_init(dev, entries, nvec, affinity);
+ return msix_capability_init(dev, entries, nvec, affd);
}

/**
@@ -1010,7 +1009,7 @@ static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
**/
int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
{
- return __pci_enable_msix(dev, entries, nvec, false);
+ return __pci_enable_msix(dev, entries, nvec, NULL);
}
EXPORT_SYMBOL(pci_enable_msix);

@@ -1061,10 +1060,8 @@ int pci_msi_enabled(void)
EXPORT_SYMBOL(pci_msi_enabled);

static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
- unsigned int flags)
+ const struct irq_affinity *affd)
{
- static const struct irq_affinity default_affd;
- bool affinity = flags & PCI_IRQ_AFFINITY;
int nvec;
int rc;

@@ -1093,13 +1090,13 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
nvec = maxvec;

for (;;) {
- if (affinity) {
- nvec = irq_calc_affinity_vectors(nvec, &default_affd);
+ if (affd) {
+ nvec = irq_calc_affinity_vectors(nvec, affd);
if (nvec < minvec)
return -ENOSPC;
}

- rc = msi_capability_init(dev, nvec, affinity);
+ rc = msi_capability_init(dev, nvec, affd);
if (rc == 0)
return nvec;

@@ -1126,29 +1123,27 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
**/
int pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec)
{
- return __pci_enable_msi_range(dev, minvec, maxvec, 0);
+ return __pci_enable_msi_range(dev, minvec, maxvec, NULL);
}
EXPORT_SYMBOL(pci_enable_msi_range);

static int __pci_enable_msix_range(struct pci_dev *dev,
- struct msix_entry *entries, int minvec, int maxvec,
- unsigned int flags)
+ struct msix_entry *entries, int minvec,
+ int maxvec, const struct irq_affinity *affd)
{
- static const struct irq_affinity default_affd;
- bool affinity = flags & PCI_IRQ_AFFINITY;
int rc, nvec = maxvec;

if (maxvec < minvec)
return -ERANGE;

for (;;) {
- if (affinity) {
- nvec = irq_calc_affinity_vectors(nvec, &default_affd);
+ if (affd) {
+ nvec = irq_calc_affinity_vectors(nvec, affd);
if (nvec < minvec)
return -ENOSPC;
}

- rc = __pci_enable_msix(dev, entries, nvec, affinity);
+ rc = __pci_enable_msix(dev, entries, nvec, affd);
if (rc == 0)
return nvec;

@@ -1179,7 +1174,7 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
int minvec, int maxvec)
{
- return __pci_enable_msix_range(dev, entries, minvec, maxvec, 0);
+ return __pci_enable_msix_range(dev, entries, minvec, maxvec, NULL);
}
EXPORT_SYMBOL(pci_enable_msix_range);

@@ -1203,17 +1198,22 @@ EXPORT_SYMBOL(pci_enable_msix_range);
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags)
{
+ static const struct irq_affinity msi_default_affd;
+ const struct irq_affinity *affd = NULL;
int vecs = -ENOSPC;

+ if (flags & PCI_IRQ_AFFINITY)
+ affd = &msi_default_affd;
+
if (flags & PCI_IRQ_MSIX) {
vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
- flags);
+ affd);
if (vecs > 0)
return vecs;
}

if (flags & PCI_IRQ_MSI) {
- vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, flags);
+ vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
if (vecs > 0)
return vecs;
}

Subject: [tip:irq/core] PCI/MSI: Provide pci_alloc_irq_vectors_affinity()

Commit-ID: 402723ad5c625ee052432698ae5e56b02d38d4ec
Gitweb: http://git.kernel.org/tip/402723ad5c625ee052432698ae5e56b02d38d4ec
Author: Christoph Hellwig <[email protected]>
AuthorDate: Tue, 8 Nov 2016 17:15:05 -0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 9 Nov 2016 08:25:10 +0100

PCI/MSI: Provide pci_alloc_irq_vectors_affinity()

This is a variant of pci_alloc_irq_vectors() that allows passing a struct
irq_affinity to provide fine-grained IRQ affinity control.

For now this means being able to exclude vectors at the beginning or end of
the MSI vector space, but it could also be used for any other quirks needed
in the future (e.g. more vectors than CPUs, or excluding CPUs from the
spreading).

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
drivers/pci/msi.c | 20 +++++++++++++-------
include/linux/pci.h | 24 +++++++++++++++++++-----
2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 512f388..dd27f73 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1179,11 +1179,12 @@ int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
EXPORT_SYMBOL(pci_enable_msix_range);

/**
- * pci_alloc_irq_vectors - allocate multiple IRQs for a device
+ * pci_alloc_irq_vectors_affinity - allocate multiple IRQs for a device
* @dev: PCI device to operate on
* @min_vecs: minimum number of vectors required (must be >= 1)
* @max_vecs: maximum (desired) number of vectors
* @flags: flags or quirks for the allocation
+ * @affd: optional description of the affinity requirements
*
* Allocate up to @max_vecs interrupt vectors for @dev, using MSI-X or MSI
* vectors if available, and fall back to a single legacy vector
@@ -1195,15 +1196,20 @@ EXPORT_SYMBOL(pci_enable_msix_range);
* To get the Linux IRQ number used for a vector that can be passed to
* request_irq() use the pci_irq_vector() helper.
*/
-int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
- unsigned int max_vecs, unsigned int flags)
+int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
+ unsigned int max_vecs, unsigned int flags,
+ const struct irq_affinity *affd)
{
static const struct irq_affinity msi_default_affd;
- const struct irq_affinity *affd = NULL;
int vecs = -ENOSPC;

- if (flags & PCI_IRQ_AFFINITY)
- affd = &msi_default_affd;
+ if (flags & PCI_IRQ_AFFINITY) {
+ if (!affd)
+ affd = &msi_default_affd;
+ } else {
+ if (WARN_ON(affd))
+ affd = NULL;
+ }

if (flags & PCI_IRQ_MSIX) {
vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
@@ -1226,7 +1232,7 @@ int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,

return vecs;
}
-EXPORT_SYMBOL(pci_alloc_irq_vectors);
+EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);

/**
* pci_free_irq_vectors - free previously allocated IRQs for a device
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0e49f70..7090f5f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -244,6 +244,7 @@ struct pci_cap_saved_state {
struct pci_cap_saved_data cap;
};

+struct irq_affinity;
struct pcie_link_state;
struct pci_vpd;
struct pci_sriov;
@@ -1310,8 +1311,10 @@ static inline int pci_enable_msix_exact(struct pci_dev *dev,
return rc;
return 0;
}
-int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
- unsigned int max_vecs, unsigned int flags);
+int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
+ unsigned int max_vecs, unsigned int flags,
+ const struct irq_affinity *affd);
+
void pci_free_irq_vectors(struct pci_dev *dev);
int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev, int vec);
@@ -1339,14 +1342,17 @@ static inline int pci_enable_msix_range(struct pci_dev *dev,
static inline int pci_enable_msix_exact(struct pci_dev *dev,
struct msix_entry *entries, int nvec)
{ return -ENOSYS; }
-static inline int pci_alloc_irq_vectors(struct pci_dev *dev,
- unsigned int min_vecs, unsigned int max_vecs,
- unsigned int flags)
+
+static inline int
+pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
+ unsigned int max_vecs, unsigned int flags,
+ const struct irq_affinity *aff_desc)
{
if (min_vecs > 1)
return -EINVAL;
return 1;
}
+
static inline void pci_free_irq_vectors(struct pci_dev *dev)
{
}
@@ -1364,6 +1370,14 @@ static inline const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev,
}
#endif

+static inline int
+pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
+ unsigned int max_vecs, unsigned int flags)
+{
+ return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
+ NULL);
+}
+
#ifdef CONFIG_PCIEPORTBUS
extern bool pcie_ports_disabled;
extern bool pcie_ports_auto;

Subject: [tip:irq/core] PCI: Remove the irq_affinity mask from struct pci_dev

Commit-ID: 0cf71b04467bc34063cecae577f12481da6cc565
Gitweb: http://git.kernel.org/tip/0cf71b04467bc34063cecae577f12481da6cc565
Author: Christoph Hellwig <[email protected]>
AuthorDate: Tue, 8 Nov 2016 17:15:06 -0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 9 Nov 2016 08:25:10 +0100

PCI: Remove the irq_affinity mask from struct pci_dev

This has never been used, and now is totally unreferenced. Nuke it.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
include/linux/pci.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7090f5f..f2ba6ac 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -333,7 +333,6 @@ struct pci_dev {
* directly, use the values stored here. They might be different!
*/
unsigned int irq;
- struct cpumask *irq_affinity;
struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */

bool match_driver; /* Skip attaching driver */

2016-11-09 07:54:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: support for partial irq affinity assignment V3

On Tue, 8 Nov 2016, Jens Axboe wrote:

> On 11/08/2016 06:15 PM, Christoph Hellwig wrote:
> > This series adds support for automatic interrupt assignment to devices
> > that have a few vectors that are set aside for admin or config purposes
> > and thus should not fall into the general per-cpu assginment pool.
> >
> > The first patch adds that support to the core IRQ and PCI/msi code,
> > and the second is a small tweak to a block layer helper to make use
> > of it. I'd love to have both go into the same tree so that consumers
> > of this (e.g. the virtio, scsi and rdma trees) only need to pull in
> > one of these trees as dependency.
>
> Series looks good to me, you can add my Acked-by to all of them.

It's available from

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/for-block

for you to pull into the block tree so you can apply the block changes.

Thanks,

tglx

2016-11-09 15:10:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: support for partial irq affinity assignment V3

On Wed, Nov 09, 2016 at 08:51:35AM +0100, Thomas Gleixner wrote:
> It's available from
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/for-block
>
> for you to pull into the block tree so you can apply the block changes.

I don't actually need them in the block tree, but in the SCSI tree..
So the block tree is a bad place for this, that's why I initially
wanted you to pick it up. But thinking about it the SCSI tree might
a better idea as we could avoid a non-bisectability with that.

I'll coordinate it with Jens and Martin and we'll leave you off the
hook. Thanks for all the help!

2016-11-09 15:11:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: support for partial irq affinity assignment V3

On Wed, 9 Nov 2016, Christoph Hellwig wrote:
> On Wed, Nov 09, 2016 at 08:51:35AM +0100, Thomas Gleixner wrote:
> > It's available from
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/for-block
> >
> > for you to pull into the block tree so you can apply the block changes.
>
> I don't actually need them in the block tree, but in the SCSI tree..
> So the block tree is a bad place for this, that's why I initially
> wanted you to pick it up. But thinking about it the SCSI tree might
> a better idea as we could avoid a non-bisectability with that.

Right.

> I'll coordinate it with Jens and Martin and we'll leave you off the
> hook. Thanks for all the help!

You're welcome!

tglx