2018-11-02 15:01:22

by Ming Lei

[permalink] [raw]
Subject: [PATCH 0/4] irq: fix support for allocating sets of IRQs

Hi Jens,

As I mentioned, there are at least two issues in the patch of '
irq: add support for allocating (and affinitizing) sets of IRQs':

1) it is wrong to pass 'mask + usedvec' to irq_build_affinity_masks()

2) we should spread all possible CPUs in 2-stage way on each set of IRQs

The fix isn't trivial, and I introduce two extra patches as preparation,
then the implementation can be more clean.

The patchset is against mq-maps branch of block tree, feel free to
integrate into the whole patchset of multiple queue maps.

Thanks,
Ming


Jens Axboe (1):
irq: add support for allocating (and affinitizing) sets of IRQs

Ming Lei (3):
Revert "irq: add support for allocating (and affinitizing) sets of
IRQs"
irq: move 2-stage irq spread into one helper
irq: pass first vector to __irq_build_affinity_masks

kernel/irq/affinity.c | 119 +++++++++++++++++++++++++++++++-------------------
1 file changed, 75 insertions(+), 44 deletions(-)

Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Hannes Reinecke <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Sagi Grimberg <[email protected]>

--
2.9.5



2018-11-02 15:01:48

by Ming Lei

[permalink] [raw]
Subject: [PATCH 2/4] irq: move 2-stage irq spread into one helper

No functional change, and prepare for the following patch to
support allocating (and affinitizing) sets of IRQs.

Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Hannes Reinecke <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
kernel/irq/affinity.c | 92 +++++++++++++++++++++++++++++++--------------------
1 file changed, 56 insertions(+), 36 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f4f29b9d90ee..a16b601604aa 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -94,7 +94,7 @@ static int get_nodes_in_cpumask(cpumask_var_t *node_to_cpumask,
return nodes;
}

-static int irq_build_affinity_masks(const struct irq_affinity *affd,
+static int __irq_build_affinity_masks(const struct irq_affinity *affd,
int startvec, int numvecs,
cpumask_var_t *node_to_cpumask,
const struct cpumask *cpu_mask,
@@ -166,6 +166,58 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,
return done;
}

+/*
+ * build affinity in two stages:
+ * 1) spread present CPU on these vectors
+ * 2) spread other possible CPUs on these vectors
+ */
+static int irq_build_affinity_masks(const struct irq_affinity *affd,
+ int startvec, int numvecs,
+ cpumask_var_t *node_to_cpumask,
+ struct cpumask *masks)
+{
+ int curvec = startvec, usedvecs = -1;
+ cpumask_var_t nmsk, npresmsk;
+
+ if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
+ return usedvecs;
+
+ if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
+ goto fail;
+
+ /* Stabilize the cpumasks */
+ get_online_cpus();
+ build_node_to_cpumask(node_to_cpumask);
+
+ /* Spread on present CPUs starting from affd->pre_vectors */
+ usedvecs = __irq_build_affinity_masks(affd, curvec, numvecs,
+ node_to_cpumask, cpu_present_mask,
+ nmsk, masks);
+
+ /*
+ * Spread on non present CPUs starting from the next vector to be
+ * handled. If the spreading of present CPUs already exhausted the
+ * vector space, assign the non present CPUs to the already spread
+ * out vectors.
+ */
+ if (usedvecs >= numvecs)
+ curvec = affd->pre_vectors;
+ else
+ curvec = affd->pre_vectors + usedvecs;
+ cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
+ usedvecs += __irq_build_affinity_masks(affd, curvec, numvecs,
+ node_to_cpumask, npresmsk,
+ nmsk, masks);
+ put_online_cpus();
+
+ free_cpumask_var(npresmsk);
+
+ fail:
+ free_cpumask_var(nmsk);
+
+ return usedvecs;
+}
+
/**
* irq_create_affinity_masks - Create affinity masks for multiqueue spreading
* @nvecs: The total number of vectors
@@ -178,7 +230,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
{
int affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
int curvec, usedvecs;
- cpumask_var_t nmsk, npresmsk, *node_to_cpumask;
+ cpumask_var_t *node_to_cpumask;
struct cpumask *masks = NULL;

/*
@@ -188,15 +240,9 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
if (nvecs == affd->pre_vectors + affd->post_vectors)
return NULL;

- if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
- return NULL;
-
- if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
- goto outcpumsk;
-
node_to_cpumask = alloc_node_to_cpumask();
if (!node_to_cpumask)
- goto outnpresmsk;
+ return NULL;

masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
if (!masks)
@@ -206,30 +252,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
for (curvec = 0; curvec < affd->pre_vectors; curvec++)
cpumask_copy(masks + curvec, irq_default_affinity);

- /* Stabilize the cpumasks */
- get_online_cpus();
- build_node_to_cpumask(node_to_cpumask);
-
- /* Spread on present CPUs starting from affd->pre_vectors */
usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
- node_to_cpumask, cpu_present_mask,
- nmsk, masks);
-
- /*
- * Spread on non present CPUs starting from the next vector to be
- * handled. If the spreading of present CPUs already exhausted the
- * vector space, assign the non present CPUs to the already spread
- * out vectors.
- */
- if (usedvecs >= affvecs)
- curvec = affd->pre_vectors;
- else
- curvec = affd->pre_vectors + usedvecs;
- cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
- usedvecs += irq_build_affinity_masks(affd, curvec, affvecs,
- node_to_cpumask, npresmsk,
- nmsk, masks);
- put_online_cpus();
+ node_to_cpumask, masks);

/* Fill out vectors at the end that don't need affinity */
if (usedvecs >= affvecs)
@@ -241,10 +265,6 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)

outnodemsk:
free_node_to_cpumask(node_to_cpumask);
-outnpresmsk:
- free_cpumask_var(npresmsk);
-outcpumsk:
- free_cpumask_var(nmsk);
return masks;
}

--
2.9.5


2018-11-02 15:03:21

by Ming Lei

[permalink] [raw]
Subject: [PATCH 3/4] irq: pass first vector to __irq_build_affinity_masks

No functional change, and prepare for the following patch to
support allocating (and affinitizing) sets of IRQs, in which
each set of IRQ needs whole 2-stage spread, and the 1st vector
should point to the 1st one in this set.

Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Hannes Reinecke <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
kernel/irq/affinity.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index a16b601604aa..9c74f21ab10e 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -95,14 +95,14 @@ static int get_nodes_in_cpumask(cpumask_var_t *node_to_cpumask,
}

static int __irq_build_affinity_masks(const struct irq_affinity *affd,
- int startvec, int numvecs,
+ int startvec, int numvecs, int firstvec,
cpumask_var_t *node_to_cpumask,
const struct cpumask *cpu_mask,
struct cpumask *nmsk,
struct cpumask *masks)
{
int n, nodes, cpus_per_vec, extra_vecs, done = 0;
- int last_affv = affd->pre_vectors + numvecs;
+ int last_affv = firstvec + numvecs;
int curvec = startvec;
nodemask_t nodemsk = NODE_MASK_NONE;

@@ -121,7 +121,7 @@ static int __irq_build_affinity_masks(const struct irq_affinity *affd,
if (++done == numvecs)
break;
if (++curvec == last_affv)
- curvec = affd->pre_vectors;
+ curvec = firstvec;
}
goto out;
}
@@ -130,7 +130,7 @@ static int __irq_build_affinity_masks(const struct irq_affinity *affd,
int ncpus, v, vecs_to_assign, vecs_per_node;

/* Spread the vectors per node */
- vecs_per_node = (numvecs - (curvec - affd->pre_vectors)) / nodes;
+ vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;

/* Get the cpus on this node which are in the mask */
cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
@@ -158,7 +158,7 @@ static int __irq_build_affinity_masks(const struct irq_affinity *affd,
if (done >= numvecs)
break;
if (curvec >= last_affv)
- curvec = affd->pre_vectors;
+ curvec = firstvec;
--nodes;
}

@@ -191,8 +191,8 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,

/* Spread on present CPUs starting from affd->pre_vectors */
usedvecs = __irq_build_affinity_masks(affd, curvec, numvecs,
- node_to_cpumask, cpu_present_mask,
- nmsk, masks);
+ affd->pre_vectors, node_to_cpumask,
+ cpu_present_mask, nmsk, masks);

/*
* Spread on non present CPUs starting from the next vector to be
@@ -206,8 +206,8 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,
curvec = affd->pre_vectors + usedvecs;
cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
usedvecs += __irq_build_affinity_masks(affd, curvec, numvecs,
- node_to_cpumask, npresmsk,
- nmsk, masks);
+ affd->pre_vectors, node_to_cpumask, npresmsk,
+ nmsk, masks);
put_online_cpus();

free_cpumask_var(npresmsk);
--
2.9.5


2018-11-02 15:03:23

by Ming Lei

[permalink] [raw]
Subject: [PATCH 4/4] irq: add support for allocating (and affinitizing) sets of IRQs

From: Jens Axboe <[email protected]>

A driver may have a need to allocate multiple sets of MSI/MSI-X
interrupts, and have them appropriately affinitized. Add support for
defining a number of sets in the irq_affinity structure, of varying
sizes, and get each set affinitized correctly across the machine.

Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
drivers/pci/msi.c | 14 ++++++++++
include/linux/interrupt.h | 4 +++
kernel/irq/affinity.c | 71 ++++++++++++++++++++++++++++++++++-------------
3 files changed, 70 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index af24ed50a245..265ed3e4c920 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1036,6 +1036,13 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
if (maxvec < minvec)
return -ERANGE;

+ /*
+ * If the caller is passing in sets, we can't support a range of
+ * vectors. The caller needs to handle that.
+ */
+ if (affd && affd->nr_sets && minvec != maxvec)
+ return -EINVAL;
+
if (WARN_ON_ONCE(dev->msi_enabled))
return -EINVAL;

@@ -1087,6 +1094,13 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
if (maxvec < minvec)
return -ERANGE;

+ /*
+ * If the caller is passing in sets, we can't support a range of
+ * supported vectors. The caller needs to handle that.
+ */
+ if (affd && affd->nr_sets && minvec != maxvec)
+ return -EINVAL;
+
if (WARN_ON_ONCE(dev->msix_enabled))
return -EINVAL;

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 1d6711c28271..ca397ff40836 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -247,10 +247,14 @@ struct irq_affinity_notify {
* the MSI(-X) vector space
* @post_vectors: Don't apply affinity to @post_vectors at end of
* the MSI(-X) vector space
+ * @nr_sets: Length of passed in *sets array
+ * @sets: Number of affinitized sets
*/
struct irq_affinity {
int pre_vectors;
int post_vectors;
+ int nr_sets;
+ int *sets;
};

#if defined(CONFIG_SMP)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 9c74f21ab10e..d49d3bff702c 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -172,26 +172,28 @@ static int __irq_build_affinity_masks(const struct irq_affinity *affd,
* 2) spread other possible CPUs on these vectors
*/
static int irq_build_affinity_masks(const struct irq_affinity *affd,
- int startvec, int numvecs,
+ int startvec, int numvecs, int firstvec,
cpumask_var_t *node_to_cpumask,
struct cpumask *masks)
{
- int curvec = startvec, usedvecs = -1;
+ int curvec = startvec, nr_present, nr_others;
+ int ret = -ENOMEM;
cpumask_var_t nmsk, npresmsk;

if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
- return usedvecs;
+ return ret;

if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
goto fail;

+ ret = 0;
/* Stabilize the cpumasks */
get_online_cpus();
build_node_to_cpumask(node_to_cpumask);

/* Spread on present CPUs starting from affd->pre_vectors */
- usedvecs = __irq_build_affinity_masks(affd, curvec, numvecs,
- affd->pre_vectors, node_to_cpumask,
+ nr_present = __irq_build_affinity_masks(affd, curvec, numvecs,
+ firstvec, node_to_cpumask,
cpu_present_mask, nmsk, masks);

/*
@@ -200,22 +202,24 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,
* vector space, assign the non present CPUs to the already spread
* out vectors.
*/
- if (usedvecs >= numvecs)
- curvec = affd->pre_vectors;
+ if (nr_present >= numvecs)
+ curvec = firstvec;
else
- curvec = affd->pre_vectors + usedvecs;
+ curvec = firstvec + nr_present;
cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
- usedvecs += __irq_build_affinity_masks(affd, curvec, numvecs,
- affd->pre_vectors, node_to_cpumask, npresmsk,
+ nr_others = __irq_build_affinity_masks(affd, curvec, numvecs,
+ firstvec, node_to_cpumask, npresmsk,
nmsk, masks);
put_online_cpus();

+ if (nr_present < numvecs)
+ WARN_ON(nr_present + nr_others < numvecs);
+
free_cpumask_var(npresmsk);

fail:
free_cpumask_var(nmsk);
-
- return usedvecs;
+ return ret;
}

/**
@@ -232,6 +236,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
int curvec, usedvecs;
cpumask_var_t *node_to_cpumask;
struct cpumask *masks = NULL;
+ int i, nr_sets;

/*
* If there aren't any vectors left after applying the pre/post
@@ -252,8 +257,28 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
for (curvec = 0; curvec < affd->pre_vectors; curvec++)
cpumask_copy(masks + curvec, irq_default_affinity);

- usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
- node_to_cpumask, masks);
+ /*
+ * Spread on present CPUs starting from affd->pre_vectors. If we
+ * have multiple sets, build each sets affinity mask separately.
+ */
+ nr_sets = affd->nr_sets;
+ if (!nr_sets)
+ nr_sets = 1;
+
+ for (i = 0, usedvecs = 0; i < nr_sets; i++) {
+ int this_vecs = affd->sets ? affd->sets[i] : affvecs;
+ int ret;
+
+ ret = irq_build_affinity_masks(affd, curvec, this_vecs,
+ curvec, node_to_cpumask, masks);
+ if (ret) {
+ kfree(masks);
+ masks = NULL;
+ goto outnodemsk;
+ }
+ curvec += this_vecs;
+ usedvecs += this_vecs;
+ }

/* Fill out vectors at the end that don't need affinity */
if (usedvecs >= affvecs)
@@ -278,13 +303,21 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
{
int resv = affd->pre_vectors + affd->post_vectors;
int vecs = maxvec - resv;
- int ret;
+ int set_vecs;

if (resv > minvec)
return 0;

- get_online_cpus();
- ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
- put_online_cpus();
- return ret;
+ if (affd->nr_sets) {
+ int i;
+
+ for (i = 0, set_vecs = 0; i < affd->nr_sets; i++)
+ set_vecs += affd->sets[i];
+ } else {
+ get_online_cpus();
+ set_vecs = cpumask_weight(cpu_possible_mask);
+ put_online_cpus();
+ }
+
+ return resv + min(set_vecs, vecs);
}
--
2.9.5


2018-11-03 21:22:31

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/4] irq: fix support for allocating sets of IRQs

On 11/2/18 8:59 AM, Ming Lei wrote:
> Hi Jens,
>
> As I mentioned, there are at least two issues in the patch of '
> irq: add support for allocating (and affinitizing) sets of IRQs':
>
> 1) it is wrong to pass 'mask + usedvec' to irq_build_affinity_masks()
>
> 2) we should spread all possible CPUs in 2-stage way on each set of IRQs
>
> The fix isn't trivial, and I introduce two extra patches as preparation,
> then the implementation can be more clean.
>
> The patchset is against mq-maps branch of block tree, feel free to
> integrate into the whole patchset of multiple queue maps.

Thanks Ming, I ran this through my testing, and I end up with the
same maps and affinities for all the cases I cared about. I'm going
to drop my initial version, and add the three.

--
Jens Axboe


2018-11-04 12:32:02

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 0/4] irq: fix support for allocating sets of IRQs

Jens,

On Sat, 3 Nov 2018, Jens Axboe wrote:

> On 11/2/18 8:59 AM, Ming Lei wrote:
> > Hi Jens,
> >
> > As I mentioned, there are at least two issues in the patch of '
> > irq: add support for allocating (and affinitizing) sets of IRQs':
> >
> > 1) it is wrong to pass 'mask + usedvec' to irq_build_affinity_masks()
> >
> > 2) we should spread all possible CPUs in 2-stage way on each set of IRQs
> >
> > The fix isn't trivial, and I introduce two extra patches as preparation,
> > then the implementation can be more clean.
> >
> > The patchset is against mq-maps branch of block tree, feel free to
> > integrate into the whole patchset of multiple queue maps.
>
> Thanks Ming, I ran this through my testing, and I end up with the
> same maps and affinities for all the cases I cared about. I'm going
> to drop my initial version, and add the three.

So I assume, that I can pick up Mings series instead.

There is another patch pending affecting the irq affinity spreading. Can
you folks please have a look at it?

https://lkml.kernel.org/r/[email protected]

Thanks,

tglx



2018-11-04 19:26:14

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/4] irq: fix support for allocating sets of IRQs

On 11/4/18 5:02 AM, Thomas Gleixner wrote:
> Jens,
>
> On Sat, 3 Nov 2018, Jens Axboe wrote:
>
>> On 11/2/18 8:59 AM, Ming Lei wrote:
>>> Hi Jens,
>>>
>>> As I mentioned, there are at least two issues in the patch of '
>>> irq: add support for allocating (and affinitizing) sets of IRQs':
>>>
>>> 1) it is wrong to pass 'mask + usedvec' to irq_build_affinity_masks()
>>>
>>> 2) we should spread all possible CPUs in 2-stage way on each set of IRQs
>>>
>>> The fix isn't trivial, and I introduce two extra patches as preparation,
>>> then the implementation can be more clean.
>>>
>>> The patchset is against mq-maps branch of block tree, feel free to
>>> integrate into the whole patchset of multiple queue maps.
>>
>> Thanks Ming, I ran this through my testing, and I end up with the
>> same maps and affinities for all the cases I cared about. I'm going
>> to drop my initial version, and add the three.
>
> So I assume, that I can pick up Mings series instead.

Yes, let's do that.

> There is another patch pending affecting the irq affinity spreading. Can
> you folks please have a look at it?
>
> https://lkml.kernel.org/r/[email protected]

I'll give that a look and test. Thanks for the heads-up.

--
Jens Axboe


2018-11-04 20:18:12

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 0/4] irq: fix support for allocating sets of IRQs

On Sun, 4 Nov 2018, Jens Axboe wrote:

Cc'ing Long with a hopefully working E-Mail address. The previous one
bounced because I stupidly copied the wrong one...

> On 11/4/18 5:02 AM, Thomas Gleixner wrote:
> > Jens,
> >
> > On Sat, 3 Nov 2018, Jens Axboe wrote:
> >
> >> On 11/2/18 8:59 AM, Ming Lei wrote:
> >>> Hi Jens,
> >>>
> >>> As I mentioned, there are at least two issues in the patch of '
> >>> irq: add support for allocating (and affinitizing) sets of IRQs':
> >>>
> >>> 1) it is wrong to pass 'mask + usedvec' to irq_build_affinity_masks()
> >>>
> >>> 2) we should spread all possible CPUs in 2-stage way on each set of IRQs
> >>>
> >>> The fix isn't trivial, and I introduce two extra patches as preparation,
> >>> then the implementation can be more clean.
> >>>
> >>> The patchset is against mq-maps branch of block tree, feel free to
> >>> integrate into the whole patchset of multiple queue maps.
> >>
> >> Thanks Ming, I ran this through my testing, and I end up with the
> >> same maps and affinities for all the cases I cared about. I'm going
> >> to drop my initial version, and add the three.
> >
> > So I assume, that I can pick up Mings series instead.
>
> Yes, let's do that.
>
> > There is another patch pending affecting the irq affinity spreading. Can
> > you folks please have a look at it?
> >
> > https://lkml.kernel.org/r/[email protected]
>
> I'll give that a look and test. Thanks for the heads-up.
>
> --
> Jens Axboe
>
>

2018-11-05 02:19:18

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 0/4] irq: fix support for allocating sets of IRQs

On Sun, Nov 04, 2018 at 01:02:18PM +0100, Thomas Gleixner wrote:
> Jens,
>
> On Sat, 3 Nov 2018, Jens Axboe wrote:
>
> > On 11/2/18 8:59 AM, Ming Lei wrote:
> > > Hi Jens,
> > >
> > > As I mentioned, there are at least two issues in the patch of '
> > > irq: add support for allocating (and affinitizing) sets of IRQs':
> > >
> > > 1) it is wrong to pass 'mask + usedvec' to irq_build_affinity_masks()
> > >
> > > 2) we should spread all possible CPUs in 2-stage way on each set of IRQs
> > >
> > > The fix isn't trivial, and I introduce two extra patches as preparation,
> > > then the implementation can be more clean.
> > >
> > > The patchset is against mq-maps branch of block tree, feel free to
> > > integrate into the whole patchset of multiple queue maps.
> >
> > Thanks Ming, I ran this through my testing, and I end up with the
> > same maps and affinities for all the cases I cared about. I'm going
> > to drop my initial version, and add the three.
>
> So I assume, that I can pick up Mings series instead.
>
> There is another patch pending affecting the irq affinity spreading. Can
> you folks please have a look at it?
>
> https://lkml.kernel.org/r/[email protected]

This patch looks fine.

It ensures that all CPUs are covered in irq's affinity when required
vector number is <= nr_numa_nodes.

Reviewed-by: Ming Lei <[email protected]>


Thanks,
Ming

Subject: [tip:irq/core] genirq/affinity: Move two stage affinity spreading into a helper function

Commit-ID: 5c903e108d0b005cf59904ca3520934fca4b9439
Gitweb: https://git.kernel.org/tip/5c903e108d0b005cf59904ca3520934fca4b9439
Author: Ming Lei <[email protected]>
AuthorDate: Fri, 2 Nov 2018 22:59:49 +0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 5 Nov 2018 12:16:26 +0100

genirq/affinity: Move two stage affinity spreading into a helper function

No functional change. Prepares for supporting allocating and affinitizing
interrupt sets.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: [email protected]
Cc: Hannes Reinecke <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
kernel/irq/affinity.c | 92 +++++++++++++++++++++++++++++++--------------------
1 file changed, 56 insertions(+), 36 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index e12cdf637c71..2f9812b6035e 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -94,7 +94,7 @@ static int get_nodes_in_cpumask(cpumask_var_t *node_to_cpumask,
return nodes;
}

-static int irq_build_affinity_masks(const struct irq_affinity *affd,
+static int __irq_build_affinity_masks(const struct irq_affinity *affd,
int startvec, int numvecs,
cpumask_var_t *node_to_cpumask,
const struct cpumask *cpu_mask,
@@ -165,6 +165,58 @@ out:
return done;
}

+/*
+ * build affinity in two stages:
+ * 1) spread present CPU on these vectors
+ * 2) spread other possible CPUs on these vectors
+ */
+static int irq_build_affinity_masks(const struct irq_affinity *affd,
+ int startvec, int numvecs,
+ cpumask_var_t *node_to_cpumask,
+ struct cpumask *masks)
+{
+ int curvec = startvec, usedvecs = -1;
+ cpumask_var_t nmsk, npresmsk;
+
+ if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
+ return usedvecs;
+
+ if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
+ goto fail;
+
+ /* Stabilize the cpumasks */
+ get_online_cpus();
+ build_node_to_cpumask(node_to_cpumask);
+
+ /* Spread on present CPUs starting from affd->pre_vectors */
+ usedvecs = __irq_build_affinity_masks(affd, curvec, numvecs,
+ node_to_cpumask, cpu_present_mask,
+ nmsk, masks);
+
+ /*
+ * Spread on non present CPUs starting from the next vector to be
+ * handled. If the spreading of present CPUs already exhausted the
+ * vector space, assign the non present CPUs to the already spread
+ * out vectors.
+ */
+ if (usedvecs >= numvecs)
+ curvec = affd->pre_vectors;
+ else
+ curvec = affd->pre_vectors + usedvecs;
+ cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
+ usedvecs += __irq_build_affinity_masks(affd, curvec, numvecs,
+ node_to_cpumask, npresmsk,
+ nmsk, masks);
+ put_online_cpus();
+
+ free_cpumask_var(npresmsk);
+
+ fail:
+ free_cpumask_var(nmsk);
+
+ return usedvecs;
+}
+
/**
* irq_create_affinity_masks - Create affinity masks for multiqueue spreading
* @nvecs: The total number of vectors
@@ -177,7 +229,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
{
int affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
int curvec, usedvecs;
- cpumask_var_t nmsk, npresmsk, *node_to_cpumask;
+ cpumask_var_t *node_to_cpumask;
struct cpumask *masks = NULL;

/*
@@ -187,15 +239,9 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
if (nvecs == affd->pre_vectors + affd->post_vectors)
return NULL;

- if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
- return NULL;
-
- if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
- goto outcpumsk;
-
node_to_cpumask = alloc_node_to_cpumask();
if (!node_to_cpumask)
- goto outnpresmsk;
+ return NULL;

masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
if (!masks)
@@ -205,30 +251,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
for (curvec = 0; curvec < affd->pre_vectors; curvec++)
cpumask_copy(masks + curvec, irq_default_affinity);

- /* Stabilize the cpumasks */
- get_online_cpus();
- build_node_to_cpumask(node_to_cpumask);
-
- /* Spread on present CPUs starting from affd->pre_vectors */
usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
- node_to_cpumask, cpu_present_mask,
- nmsk, masks);
-
- /*
- * Spread on non present CPUs starting from the next vector to be
- * handled. If the spreading of present CPUs already exhausted the
- * vector space, assign the non present CPUs to the already spread
- * out vectors.
- */
- if (usedvecs >= affvecs)
- curvec = affd->pre_vectors;
- else
- curvec = affd->pre_vectors + usedvecs;
- cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
- usedvecs += irq_build_affinity_masks(affd, curvec, affvecs,
- node_to_cpumask, npresmsk,
- nmsk, masks);
- put_online_cpus();
+ node_to_cpumask, masks);

/* Fill out vectors at the end that don't need affinity */
if (usedvecs >= affvecs)
@@ -240,10 +264,6 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)

outnodemsk:
free_node_to_cpumask(node_to_cpumask);
-outnpresmsk:
- free_cpumask_var(npresmsk);
-outcpumsk:
- free_cpumask_var(nmsk);
return masks;
}


2018-11-05 11:25:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 0/4] irq: fix support for allocating sets of IRQs

Jens,

On Sun, 4 Nov 2018, Thomas Gleixner wrote:
> On Sun, 4 Nov 2018, Jens Axboe wrote:
> > On 11/4/18 5:02 AM, Thomas Gleixner wrote:
> > > So I assume, that I can pick up Mings series instead.
> >
> > Yes, let's do that.
> >
> > > There is another patch pending affecting the irq affinity spreading. Can
> > > you folks please have a look at it?
> > >
> > > https://lkml.kernel.org/r/[email protected]
> >
> > I'll give that a look and test. Thanks for the heads-up.

I picked the series from Ming and the patch from Long and merged it into
tip/irq/core.

The result is available for you to pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/for-block

That branch is immutable now and I keep it around in case that there are
incremental updates necessary before the merge window.

Thanks,

tglx

Subject: [tip:irq/core] genirq/affinity: Pass first vector to __irq_build_affinity_masks()

Commit-ID: 060746d9e394084b7401e7532f2de528ecbfb521
Gitweb: https://git.kernel.org/tip/060746d9e394084b7401e7532f2de528ecbfb521
Author: Ming Lei <[email protected]>
AuthorDate: Fri, 2 Nov 2018 22:59:50 +0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 5 Nov 2018 12:16:26 +0100

genirq/affinity: Pass first vector to __irq_build_affinity_masks()

No functional change.

Prepares for support of allocating and affinitizing sets of interrupts, in
which each set of interrupts needs a full two stage spreading. The first
vector argument is necessary for this so the affinitizing starts from the
first vector of each set.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: [email protected]
Cc: Hannes Reinecke <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
kernel/irq/affinity.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 2f9812b6035e..e028b773e38a 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -95,14 +95,14 @@ static int get_nodes_in_cpumask(cpumask_var_t *node_to_cpumask,
}

static int __irq_build_affinity_masks(const struct irq_affinity *affd,
- int startvec, int numvecs,
+ int startvec, int numvecs, int firstvec,
cpumask_var_t *node_to_cpumask,
const struct cpumask *cpu_mask,
struct cpumask *nmsk,
struct cpumask *masks)
{
int n, nodes, cpus_per_vec, extra_vecs, done = 0;
- int last_affv = affd->pre_vectors + numvecs;
+ int last_affv = firstvec + numvecs;
int curvec = startvec;
nodemask_t nodemsk = NODE_MASK_NONE;

@@ -119,7 +119,7 @@ static int __irq_build_affinity_masks(const struct irq_affinity *affd,
for_each_node_mask(n, nodemsk) {
cpumask_or(masks + curvec, masks + curvec, node_to_cpumask[n]);
if (++curvec == last_affv)
- curvec = affd->pre_vectors;
+ curvec = firstvec;
}
done = numvecs;
goto out;
@@ -129,7 +129,7 @@ static int __irq_build_affinity_masks(const struct irq_affinity *affd,
int ncpus, v, vecs_to_assign, vecs_per_node;

/* Spread the vectors per node */
- vecs_per_node = (numvecs - (curvec - affd->pre_vectors)) / nodes;
+ vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;

/* Get the cpus on this node which are in the mask */
cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
@@ -157,7 +157,7 @@ static int __irq_build_affinity_masks(const struct irq_affinity *affd,
if (done >= numvecs)
break;
if (curvec >= last_affv)
- curvec = affd->pre_vectors;
+ curvec = firstvec;
--nodes;
}

@@ -190,8 +190,9 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,

/* Spread on present CPUs starting from affd->pre_vectors */
usedvecs = __irq_build_affinity_masks(affd, curvec, numvecs,
- node_to_cpumask, cpu_present_mask,
- nmsk, masks);
+ affd->pre_vectors,
+ node_to_cpumask,
+ cpu_present_mask, nmsk, masks);

/*
* Spread on non present CPUs starting from the next vector to be
@@ -205,8 +206,9 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,
curvec = affd->pre_vectors + usedvecs;
cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
usedvecs += __irq_build_affinity_masks(affd, curvec, numvecs,
- node_to_cpumask, npresmsk,
- nmsk, masks);
+ affd->pre_vectors,
+ node_to_cpumask, npresmsk,
+ nmsk, masks);
put_online_cpus();

free_cpumask_var(npresmsk);

Subject: [tip:irq/core] genirq/affinity: Add support for allocating interrupt sets

Commit-ID: 6da4b3ab9a6e9b1b5f90322ab3fa3a7dd18edb19
Gitweb: https://git.kernel.org/tip/6da4b3ab9a6e9b1b5f90322ab3fa3a7dd18edb19
Author: Jens Axboe <[email protected]>
AuthorDate: Fri, 2 Nov 2018 22:59:51 +0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 5 Nov 2018 12:16:27 +0100

genirq/affinity: Add support for allocating interrupt sets

A driver may have a need to allocate multiple sets of MSI/MSI-X interrupts,
and have them appropriately affinitized.

Add support for defining a number of sets in the irq_affinity structure, of
varying sizes, and get each set affinitized correctly across the machine.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

---
drivers/pci/msi.c | 14 +++++++++
include/linux/interrupt.h | 4 +++
kernel/irq/affinity.c | 77 +++++++++++++++++++++++++++++++++--------------
3 files changed, 72 insertions(+), 23 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index af24ed50a245..265ed3e4c920 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1036,6 +1036,13 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
if (maxvec < minvec)
return -ERANGE;

+ /*
+ * If the caller is passing in sets, we can't support a range of
+ * vectors. The caller needs to handle that.
+ */
+ if (affd && affd->nr_sets && minvec != maxvec)
+ return -EINVAL;
+
if (WARN_ON_ONCE(dev->msi_enabled))
return -EINVAL;

@@ -1087,6 +1094,13 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
if (maxvec < minvec)
return -ERANGE;

+ /*
+ * If the caller is passing in sets, we can't support a range of
+ * supported vectors. The caller needs to handle that.
+ */
+ if (affd && affd->nr_sets && minvec != maxvec)
+ return -EINVAL;
+
if (WARN_ON_ONCE(dev->msix_enabled))
return -EINVAL;

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 1d6711c28271..ca397ff40836 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -247,10 +247,14 @@ struct irq_affinity_notify {
* the MSI(-X) vector space
* @post_vectors: Don't apply affinity to @post_vectors at end of
* the MSI(-X) vector space
+ * @nr_sets: Length of passed in *sets array
+ * @sets: Number of affinitized sets
*/
struct irq_affinity {
int pre_vectors;
int post_vectors;
+ int nr_sets;
+ int *sets;
};

#if defined(CONFIG_SMP)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index e028b773e38a..08c904eb7279 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -171,28 +171,29 @@ out:
* 2) spread other possible CPUs on these vectors
*/
static int irq_build_affinity_masks(const struct irq_affinity *affd,
- int startvec, int numvecs,
+ int startvec, int numvecs, int firstvec,
cpumask_var_t *node_to_cpumask,
struct cpumask *masks)
{
- int curvec = startvec, usedvecs = -1;
+ int curvec = startvec, nr_present, nr_others;
+ int ret = -ENOMEM;
cpumask_var_t nmsk, npresmsk;

if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
- return usedvecs;
+ return ret;

if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
goto fail;

+ ret = 0;
/* Stabilize the cpumasks */
get_online_cpus();
build_node_to_cpumask(node_to_cpumask);

/* Spread on present CPUs starting from affd->pre_vectors */
- usedvecs = __irq_build_affinity_masks(affd, curvec, numvecs,
- affd->pre_vectors,
- node_to_cpumask,
- cpu_present_mask, nmsk, masks);
+ nr_present = __irq_build_affinity_masks(affd, curvec, numvecs,
+ firstvec, node_to_cpumask,
+ cpu_present_mask, nmsk, masks);

/*
* Spread on non present CPUs starting from the next vector to be
@@ -200,23 +201,24 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,
* vector space, assign the non present CPUs to the already spread
* out vectors.
*/
- if (usedvecs >= numvecs)
- curvec = affd->pre_vectors;
+ if (nr_present >= numvecs)
+ curvec = firstvec;
else
- curvec = affd->pre_vectors + usedvecs;
+ curvec = firstvec + nr_present;
cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
- usedvecs += __irq_build_affinity_masks(affd, curvec, numvecs,
- affd->pre_vectors,
- node_to_cpumask, npresmsk,
- nmsk, masks);
+ nr_others = __irq_build_affinity_masks(affd, curvec, numvecs,
+ firstvec, node_to_cpumask,
+ npresmsk, nmsk, masks);
put_online_cpus();

+ if (nr_present < numvecs)
+ WARN_ON(nr_present + nr_others < numvecs);
+
free_cpumask_var(npresmsk);

fail:
free_cpumask_var(nmsk);
-
- return usedvecs;
+ return ret;
}

/**
@@ -233,6 +235,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
int curvec, usedvecs;
cpumask_var_t *node_to_cpumask;
struct cpumask *masks = NULL;
+ int i, nr_sets;

/*
* If there aren't any vectors left after applying the pre/post
@@ -253,8 +256,28 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
for (curvec = 0; curvec < affd->pre_vectors; curvec++)
cpumask_copy(masks + curvec, irq_default_affinity);

- usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
- node_to_cpumask, masks);
+ /*
+ * Spread on present CPUs starting from affd->pre_vectors. If we
+ * have multiple sets, build each sets affinity mask separately.
+ */
+ nr_sets = affd->nr_sets;
+ if (!nr_sets)
+ nr_sets = 1;
+
+ for (i = 0, usedvecs = 0; i < nr_sets; i++) {
+ int this_vecs = affd->sets ? affd->sets[i] : affvecs;
+ int ret;
+
+ ret = irq_build_affinity_masks(affd, curvec, this_vecs,
+ curvec, node_to_cpumask, masks);
+ if (ret) {
+ kfree(masks);
+ masks = NULL;
+ goto outnodemsk;
+ }
+ curvec += this_vecs;
+ usedvecs += this_vecs;
+ }

/* Fill out vectors at the end that don't need affinity */
if (usedvecs >= affvecs)
@@ -279,13 +302,21 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
{
int resv = affd->pre_vectors + affd->post_vectors;
int vecs = maxvec - resv;
- int ret;
+ int set_vecs;

if (resv > minvec)
return 0;

- get_online_cpus();
- ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
- put_online_cpus();
- return ret;
+ if (affd->nr_sets) {
+ int i;
+
+ for (i = 0, set_vecs = 0; i < affd->nr_sets; i++)
+ set_vecs += affd->sets[i];
+ } else {
+ get_online_cpus();
+ set_vecs = cpumask_weight(cpu_possible_mask);
+ put_online_cpus();
+ }
+
+ return resv + min(set_vecs, vecs);
}

2018-11-06 03:03:19

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/4] irq: fix support for allocating sets of IRQs

On 11/5/18 4:24 AM, Thomas Gleixner wrote:
> Jens,
>
> On Sun, 4 Nov 2018, Thomas Gleixner wrote:
>> On Sun, 4 Nov 2018, Jens Axboe wrote:
>>> On 11/4/18 5:02 AM, Thomas Gleixner wrote:
>>>> So I assume, that I can pick up Mings series instead.
>>>
>>> Yes, let's do that.
>>>
>>>> There is another patch pending affecting the irq affinity spreading. Can
>>>> you folks please have a look at it?
>>>>
>>>> https://lkml.kernel.org/r/[email protected]
>>>
>>> I'll give that a look and test. Thanks for the heads-up.
>
> I picked the series from Ming and the patch from Long and merged it into
> tip/irq/core.
>
> The result is available for you to pull from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/for-block
>
> That branch is immutable now and I keep it around in case that there are
> incremental updates necessary before the merge window.

Great, thanks! I'll pull that in as a base.

--
Jens Axboe