2019-08-09 10:24:31

by Ming Lei

[permalink] [raw]
Subject: [PATCH 0/2] genriq/affinity: two improvement on __irq_build_affinity_masks

Hi,

The 1st patch makes __irq_build_affinity_masks() more reliable, such as,
all nodes can be covered in the spread.

The 2nd patch spread vectors on node according to the ratio of this node's
CPU number to number of all remaining CPUs, then vectors assignment can
become more fair. Meantime, the warning report from Jon Derrick can be
fixed.

Please review & comment!

Ming Lei (2):
genirq/affinity: improve __irq_build_affinity_masks()
genirq/affinity: spread vectors on node according to nr_cpu ratio

kernel/irq/affinity.c | 46 +++++++++++++++++++++++++++++++------------
1 file changed, 33 insertions(+), 13 deletions(-)

Cc: Christoph Hellwig <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: [email protected],
Cc: Jon Derrick <[email protected]>

--
2.20.1


2019-08-09 10:24:58

by Ming Lei

[permalink] [raw]
Subject: [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks()

One invariant of __irq_build_affinity_masks() is that all CPUs in the
specified masks( cpu_mask AND node_to_cpumask for each node) should be
covered during the spread. Even though all requested vectors have been
reached, we still need to spread vectors among left CPUs. The similar
policy has been taken in case of 'numvecs <= nodes'.

So remove the following check inside the loop:

if (done >= numvecs)
break;

Meantime assign at least 1 vector for left nodes if 'numvecs' vectors
have been spread.

Also, if the specified cpumask for one numa node is empty, simply not
spread vectors on this node.

Cc: Christoph Hellwig <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: [email protected],
Cc: Jon Derrick <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
kernel/irq/affinity.c | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 6fef48033f96..bc3652a2c61b 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -129,21 +129,32 @@ static int __irq_build_affinity_masks(unsigned int startvec,
for_each_node_mask(n, nodemsk) {
unsigned int ncpus, v, vecs_to_assign, vecs_per_node;

- /* Spread the vectors per node */
- vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;
-
/* Get the cpus on this node which are in the mask */
cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
-
- /* Calculate the number of cpus per vector */
ncpus = cpumask_weight(nmsk);
+ if (!ncpus)
+ continue;
+
+ /*
+ * Calculate the number of cpus per vector
+ *
+ * Spread the vectors evenly per node. If the requested
+ * vector number has been reached, simply allocate one
+ * vector for each remaining node so that all nodes can
+ * be covered
+ */
+ if (numvecs > done)
+ vecs_per_node = max_t(unsigned,
+ (numvecs - done) / nodes, 1);
+ else
+ vecs_per_node = 1;
+
vecs_to_assign = min(vecs_per_node, ncpus);

/* Account for rounding errors */
extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);

- for (v = 0; curvec < last_affv && v < vecs_to_assign;
- curvec++, v++) {
+ for (v = 0; v < vecs_to_assign; v++) {
cpus_per_vec = ncpus / vecs_to_assign;

/* Account for extra vectors to compensate rounding errors */
@@ -153,16 +164,14 @@ static int __irq_build_affinity_masks(unsigned int startvec,
}
irq_spread_init_one(&masks[curvec].mask, nmsk,
cpus_per_vec);
+ if (++curvec >= last_affv)
+ curvec = firstvec;
}

done += v;
- if (done >= numvecs)
- break;
- if (curvec >= last_affv)
- curvec = firstvec;
--nodes;
}
- return done;
+ return done < numvecs ? done : numvecs;
}

/*
--
2.20.1

2019-08-09 10:26:35

by Ming Lei

[permalink] [raw]
Subject: [PATCH 2/2] genirq/affinity: spread vectors on node according to nr_cpu ratio

Now __irq_build_affinity_masks() spreads vectors evenly per node, and
all vectors may not be spread in case that each numa node has different
CPU number, then the following warning in irq_build_affinity_masks() can
be triggered:

if (nr_present < numvecs)
WARN_ON(nr_present + nr_others < numvecs);

Improve current spreading algorithm by assigning vectors according to
the ratio of node's nr_cpu to nr_remaining_cpus.

Meantime the reported warning can be fixed.

Cc: Christoph Hellwig <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: [email protected],
Cc: Jon Derrick <[email protected]>
Reported-by: Jon Derrick <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
kernel/irq/affinity.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index bc3652a2c61b..76f3d1b27d00 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -106,6 +106,7 @@ static int __irq_build_affinity_masks(unsigned int startvec,
unsigned int last_affv = firstvec + numvecs;
unsigned int curvec = startvec;
nodemask_t nodemsk = NODE_MASK_NONE;
+ unsigned remaining_cpus = 0;

if (!cpumask_weight(cpu_mask))
return 0;
@@ -126,6 +127,11 @@ static int __irq_build_affinity_masks(unsigned int startvec,
return numvecs;
}

+ for_each_node_mask(n, nodemsk) {
+ cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
+ remaining_cpus += cpumask_weight(nmsk);
+ }
+
for_each_node_mask(n, nodemsk) {
unsigned int ncpus, v, vecs_to_assign, vecs_per_node;

@@ -135,17 +141,22 @@ static int __irq_build_affinity_masks(unsigned int startvec,
if (!ncpus)
continue;

+ if (remaining_cpus == 0)
+ break;
+
/*
* Calculate the number of cpus per vector
*
- * Spread the vectors evenly per node. If the requested
- * vector number has been reached, simply allocate one
- * vector for each remaining node so that all nodes can
- * be covered
+ * Spread the vectors among CPUs on this node according
+ * to the ratio of 'ncpus' to 'remaining_cpus'. If the
+ * requested vector number has been reached, simply
+ * spread one vector for each remaining node so that all
+ * nodes can be covered
*/
if (numvecs > done)
vecs_per_node = max_t(unsigned,
- (numvecs - done) / nodes, 1);
+ (numvecs - done) * ncpus /
+ remaining_cpus, 1);
else
vecs_per_node = 1;

@@ -169,7 +180,7 @@ static int __irq_build_affinity_masks(unsigned int startvec,
}

done += v;
- --nodes;
+ remaining_cpus -= ncpus;
}
return done < numvecs ? done : numvecs;
}
--
2.20.1

2019-08-09 14:45:57

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks()

On Fri, Aug 09, 2019 at 06:23:09PM +0800, Ming Lei wrote:
> One invariant of __irq_build_affinity_masks() is that all CPUs in the
> specified masks( cpu_mask AND node_to_cpumask for each node) should be
> covered during the spread. Even though all requested vectors have been
> reached, we still need to spread vectors among left CPUs. The similar
> policy has been taken in case of 'numvecs <= nodes'.
>
> So remove the following check inside the loop:
>
> if (done >= numvecs)
> break;
>
> Meantime assign at least 1 vector for left nodes if 'numvecs' vectors
> have been spread.
>
> Also, if the specified cpumask for one numa node is empty, simply not
> spread vectors on this node.
>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: [email protected],
> Cc: Jon Derrick <[email protected]>
> Signed-off-by: Ming Lei <[email protected]>
> ---
> kernel/irq/affinity.c | 33 +++++++++++++++++++++------------
> 1 file changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index 6fef48033f96..bc3652a2c61b 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -129,21 +129,32 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> for_each_node_mask(n, nodemsk) {
> unsigned int ncpus, v, vecs_to_assign, vecs_per_node;
>
> - /* Spread the vectors per node */
> - vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;
> -
> /* Get the cpus on this node which are in the mask */
> cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
> -
> - /* Calculate the number of cpus per vector */
> ncpus = cpumask_weight(nmsk);
> + if (!ncpus)
> + continue;

This shouldn't be possible, right? The nodemsk we're looping wouldn't
have had that node set if no CPUs intersect the node_to_cpu_mask for
that node, so the resulting cpumask should always have a non-zero weight.

> @@ -153,16 +164,14 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> }
> irq_spread_init_one(&masks[curvec].mask, nmsk,
> cpus_per_vec);
> + if (++curvec >= last_affv)
> + curvec = firstvec;

I'm not so sure about wrapping the vector to share it across nodes. We
have enough vectors in this path to ensure each compute node can have
a unique one, and it's much cheaper to share these within nodes than
across them.

> }
>
> done += v;
> - if (done >= numvecs)
> - break;
> - if (curvec >= last_affv)
> - curvec = firstvec;
> --nodes;
> }
> - return done;
> + return done < numvecs ? done : numvecs;
> }

2019-08-09 23:07:12

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks()

On Fri, Aug 9, 2019 at 10:44 PM Keith Busch <[email protected]> wrote:
>
> On Fri, Aug 09, 2019 at 06:23:09PM +0800, Ming Lei wrote:
> > One invariant of __irq_build_affinity_masks() is that all CPUs in the
> > specified masks( cpu_mask AND node_to_cpumask for each node) should be
> > covered during the spread. Even though all requested vectors have been
> > reached, we still need to spread vectors among left CPUs. The similar
> > policy has been taken in case of 'numvecs <= nodes'.
> >
> > So remove the following check inside the loop:
> >
> > if (done >= numvecs)
> > break;
> >
> > Meantime assign at least 1 vector for left nodes if 'numvecs' vectors
> > have been spread.
> >
> > Also, if the specified cpumask for one numa node is empty, simply not
> > spread vectors on this node.
> >
> > Cc: Christoph Hellwig <[email protected]>
> > Cc: Keith Busch <[email protected]>
> > Cc: [email protected],
> > Cc: Jon Derrick <[email protected]>
> > Signed-off-by: Ming Lei <[email protected]>
> > ---
> > kernel/irq/affinity.c | 33 +++++++++++++++++++++------------
> > 1 file changed, 21 insertions(+), 12 deletions(-)
> >
> > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> > index 6fef48033f96..bc3652a2c61b 100644
> > --- a/kernel/irq/affinity.c
> > +++ b/kernel/irq/affinity.c
> > @@ -129,21 +129,32 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> > for_each_node_mask(n, nodemsk) {
> > unsigned int ncpus, v, vecs_to_assign, vecs_per_node;
> >
> > - /* Spread the vectors per node */
> > - vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;
> > -
> > /* Get the cpus on this node which are in the mask */
> > cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
> > -
> > - /* Calculate the number of cpus per vector */
> > ncpus = cpumask_weight(nmsk);
> > + if (!ncpus)
> > + continue;
>
> This shouldn't be possible, right? The nodemsk we're looping wouldn't
> have had that node set if no CPUs intersect the node_to_cpu_mask for
> that node, so the resulting cpumask should always have a non-zero weight.

cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);

It is often true, see the following cases:

1) all CPUs in one node are not present

OR

2) all CPUs in one node are present

>
> > @@ -153,16 +164,14 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> > }
> > irq_spread_init_one(&masks[curvec].mask, nmsk,
> > cpus_per_vec);
> > + if (++curvec >= last_affv)
> > + curvec = firstvec;
>
> I'm not so sure about wrapping the vector to share it across nodes. We

The wrapping is always there, not added by this patch.

Most time it won't happen since we spread vectors on remaining
(un-assigned)nodes. And it only happens when there is remaining
nodes not spread. We have to make sure all nodes are spread.

And the similar policy is applied on the branch of 'numvecs <= nodes' too.

> have enough vectors in this path to ensure each compute node can have
> a unique one, and it's much cheaper to share these within nodes than
> across them.

The patch just moves the wrapping from loop outside into the loop, then
all 'extra_vecs' can be covered because it is always < 'vecs_to_assign'.

What matters is that the following check is removed:

if (done >= numvecs)
break;

then all nodes can be covered.

Thanks,
Ming Lei

2019-08-12 05:08:19

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks()

On Sat, Aug 10, 2019 at 7:05 AM Ming Lei <[email protected]> wrote:
>
> On Fri, Aug 9, 2019 at 10:44 PM Keith Busch <[email protected]> wrote:
> >
> > On Fri, Aug 09, 2019 at 06:23:09PM +0800, Ming Lei wrote:
> > > One invariant of __irq_build_affinity_masks() is that all CPUs in the
> > > specified masks( cpu_mask AND node_to_cpumask for each node) should be
> > > covered during the spread. Even though all requested vectors have been
> > > reached, we still need to spread vectors among left CPUs. The similar
> > > policy has been taken in case of 'numvecs <= nodes'.
> > >
> > > So remove the following check inside the loop:
> > >
> > > if (done >= numvecs)
> > > break;
> > >
> > > Meantime assign at least 1 vector for left nodes if 'numvecs' vectors
> > > have been spread.
> > >
> > > Also, if the specified cpumask for one numa node is empty, simply not
> > > spread vectors on this node.
> > >
> > > Cc: Christoph Hellwig <[email protected]>
> > > Cc: Keith Busch <[email protected]>
> > > Cc: [email protected],
> > > Cc: Jon Derrick <[email protected]>
> > > Signed-off-by: Ming Lei <[email protected]>
> > > ---
> > > kernel/irq/affinity.c | 33 +++++++++++++++++++++------------
> > > 1 file changed, 21 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> > > index 6fef48033f96..bc3652a2c61b 100644
> > > --- a/kernel/irq/affinity.c
> > > +++ b/kernel/irq/affinity.c
> > > @@ -129,21 +129,32 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> > > for_each_node_mask(n, nodemsk) {
> > > unsigned int ncpus, v, vecs_to_assign, vecs_per_node;
> > >
> > > - /* Spread the vectors per node */
> > > - vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;
> > > -
> > > /* Get the cpus on this node which are in the mask */
> > > cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
> > > -
> > > - /* Calculate the number of cpus per vector */
> > > ncpus = cpumask_weight(nmsk);
> > > + if (!ncpus)
> > > + continue;
> >
> > This shouldn't be possible, right? The nodemsk we're looping wouldn't
> > have had that node set if no CPUs intersect the node_to_cpu_mask for
> > that node, so the resulting cpumask should always have a non-zero weight.
>
> cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
>
> It is often true, see the following cases:
>
> 1) all CPUs in one node are not present
>
> OR
>
> 2) all CPUs in one node are present
>
> >
> > > @@ -153,16 +164,14 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> > > }
> > > irq_spread_init_one(&masks[curvec].mask, nmsk,
> > > cpus_per_vec);
> > > + if (++curvec >= last_affv)
> > > + curvec = firstvec;
> >
> > I'm not so sure about wrapping the vector to share it across nodes. We
>
> The wrapping is always there, not added by this patch.

We could avoid the wrapping completely given 'numvecs' > 'nodes', and
it can be done by sorting each node's nr_cpus, will do it in V2.


Thanks,
Ming Lei