2021-08-14 12:38:27

by Ming Lei

[permalink] [raw]
Subject: [PATCH 0/7] genirq/affinity: abstract new API from managed irq affinity spread

Hello,

irq_build_affinity_masks() actually grouping CPUs evenly into each managed
irq vector according to NUMA and CPU locality, and it is reasonable to abstract
one generic API for grouping CPUs evenly, the idea is suggested by Thomas
Gleixner.

group_cpus_evenly() is abstracted and put into lib/, so blk-mq can re-use
it to build default queue mapping.

Please comments!

Since RFC:
- remove RFC
- rebase on -next tree


Ming Lei (7):
genirq/affinity: remove the 'firstvec' parameter from
irq_build_affinity_masks
genirq/affinity: pass affinity managed mask array to
irq_build_affinity_masks
genirq/affinity: don't pass irq_affinity_desc array to
irq_build_affinity_masks
genirq/affinity: rename irq_build_affinity_masks as group_cpus_evenly
genirq/affinity: move group_cpus_evenly() into lib/
lib/group_cpus: allow to group cpus in case of !CONFIG_SMP
blk-mq: build default queue map via group_cpus_evenly()

block/blk-mq-cpumap.c | 64 ++----
include/linux/group_cpus.h | 28 +++
kernel/irq/affinity.c | 404 +-----------------------------------
lib/Makefile | 2 +
lib/group_cpus.c | 413 +++++++++++++++++++++++++++++++++++++
5 files changed, 465 insertions(+), 446 deletions(-)
create mode 100644 include/linux/group_cpus.h
create mode 100644 lib/group_cpus.c

--
2.31.1


2021-08-14 12:38:27

by Ming Lei

[permalink] [raw]
Subject: [PATCH 3/7] genirq/affinity: don't pass irq_affinity_desc array to irq_build_affinity_masks

Prepare for abstracting irq_build_affinity_masks() into one public helper
for assigning all CPUs evenly into several groups. Don't passing
irq_affinity_desc array to irq_build_affinity_masks, instead returning
one cpumask array by storing each assigned group into one element of
the array.

This way helps us to provide generic interface for grouping all CPUs
evenly from NUMA and CPU locality viewpoint, and the cost is one extra
allocation in irq_build_affinity_masks(), which should be fine since
it is done via GFP_KERNEL and irq_build_affinity_masks() is called very
less.

Signed-off-by: Ming Lei <[email protected]>
---
kernel/irq/affinity.c | 34 ++++++++++++++++++++++++----------
1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 0bc83d57cb34..aef12ec05dcf 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -249,7 +249,7 @@ static int __irq_build_affinity_masks(unsigned int startvec,
cpumask_var_t *node_to_cpumask,
const struct cpumask *cpu_mask,
struct cpumask *nmsk,
- struct irq_affinity_desc *masks)
+ struct cpumask *masks)
{
unsigned int i, n, nodes, cpus_per_vec, extra_vecs, done = 0;
unsigned int last_affv = numvecs;
@@ -268,7 +268,7 @@ static int __irq_build_affinity_masks(unsigned int startvec,
*/
if (numvecs <= nodes) {
for_each_node_mask(n, nodemsk) {
- cpumask_or(&masks[curvec].mask, &masks[curvec].mask,
+ cpumask_or(&masks[curvec], &masks[curvec],
node_to_cpumask[n]);
if (++curvec == last_affv)
curvec = 0;
@@ -320,7 +320,7 @@ static int __irq_build_affinity_masks(unsigned int startvec,
*/
if (curvec >= last_affv)
curvec = 0;
- irq_spread_init_one(&masks[curvec].mask, nmsk,
+ irq_spread_init_one(&masks[curvec], nmsk,
cpus_per_vec);
}
done += nv->nvectors;
@@ -334,16 +334,16 @@ static int __irq_build_affinity_masks(unsigned int startvec,
* 1) spread present CPU on these vectors
* 2) spread other possible CPUs on these vectors
*/
-static int irq_build_affinity_masks(unsigned int numvecs,
- struct irq_affinity_desc *masks)
+static struct cpumask *irq_build_affinity_masks(unsigned int numvecs)
{
unsigned int curvec = 0, nr_present = 0, nr_others = 0;
cpumask_var_t *node_to_cpumask;
cpumask_var_t nmsk, npresmsk;
int ret = -ENOMEM;
+ struct cpumask *masks = NULL;

if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
- return ret;
+ return NULL;

if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
goto fail_nmsk;
@@ -352,6 +352,10 @@ static int irq_build_affinity_masks(unsigned int numvecs,
if (!node_to_cpumask)
goto fail_npresmsk;

+ masks = kcalloc(numvecs, sizeof(*masks), GFP_KERNEL);
+ if (!masks)
+ goto fail_node_to_cpumask;
+
/* Stabilize the cpumasks */
cpus_read_lock();
build_node_to_cpumask(node_to_cpumask);
@@ -385,6 +389,7 @@ static int irq_build_affinity_masks(unsigned int numvecs,
if (ret >= 0)
WARN_ON(nr_present + nr_others < numvecs);

+ fail_node_to_cpumask:
free_node_to_cpumask(node_to_cpumask);

fail_npresmsk:
@@ -392,7 +397,11 @@ static int irq_build_affinity_masks(unsigned int numvecs,

fail_nmsk:
free_cpumask_var(nmsk);
- return ret < 0 ? ret : 0;
+ if (ret < 0) {
+ kfree(masks);
+ return NULL;
+ }
+ return masks;
}

static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
@@ -456,13 +465,18 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
*/
for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
unsigned int this_vecs = affd->set_size[i];
- int ret;
+ int j;
+ struct cpumask *result = irq_build_affinity_masks(this_vecs);

- ret = irq_build_affinity_masks(this_vecs, &masks[curvec]);
- if (ret) {
+ if (!result) {
kfree(masks);
return NULL;
}
+
+ for (j = 0; j < this_vecs; j++)
+ cpumask_copy(&masks[curvec + j].mask, &result[j]);
+ kfree(result);
+
curvec += this_vecs;
usedvecs += this_vecs;
}
--
2.31.1

2021-08-14 12:38:27

by Ming Lei

[permalink] [raw]
Subject: [PATCH 1/7] genirq/affinity: remove the 'firstvec' parameter from irq_build_affinity_masks

The 'firstvec' parameter is always same with the parameter of
'startvec', so use 'startvec' directly inside irq_build_affinity_masks().

Signed-off-by: Ming Lei <[email protected]>
---
kernel/irq/affinity.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f7ff8919dc9b..856ab6d39c05 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -336,10 +336,10 @@ static int __irq_build_affinity_masks(unsigned int startvec,
* 2) spread other possible CPUs on these vectors
*/
static int irq_build_affinity_masks(unsigned int startvec, unsigned int numvecs,
- unsigned int firstvec,
struct irq_affinity_desc *masks)
{
unsigned int curvec = startvec, nr_present = 0, nr_others = 0;
+ unsigned int firstvec = startvec;
cpumask_var_t *node_to_cpumask;
cpumask_var_t nmsk, npresmsk;
int ret = -ENOMEM;
@@ -462,8 +462,7 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
unsigned int this_vecs = affd->set_size[i];
int ret;

- ret = irq_build_affinity_masks(curvec, this_vecs,
- curvec, masks);
+ ret = irq_build_affinity_masks(curvec, this_vecs, masks);
if (ret) {
kfree(masks);
return NULL;
--
2.31.1

2021-08-14 12:38:27

by Ming Lei

[permalink] [raw]
Subject: [PATCH 4/7] genirq/affinity: rename irq_build_affinity_masks as group_cpus_evenly

Map irq vector into group, so we can abstract the algorithm for generic
use case.

Rename irq_build_affinity_masks as group_cpus_evenly, so we can reuse
the API for blk-mq to make default queue mapping.

No functional change, just rename vector as group.

Signed-off-by: Ming Lei <[email protected]>
---
kernel/irq/affinity.c | 241 +++++++++++++++++++++---------------------
1 file changed, 121 insertions(+), 120 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index aef12ec05dcf..ad0ce4b5a28e 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -9,13 +9,13 @@
#include <linux/cpu.h>
#include <linux/sort.h>

-static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
- unsigned int cpus_per_vec)
+static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
+ unsigned int cpus_per_grp)
{
const struct cpumask *siblmsk;
int cpu, sibl;

- for ( ; cpus_per_vec > 0; ) {
+ for ( ; cpus_per_grp > 0; ) {
cpu = cpumask_first(nmsk);

/* Should not happen, but I'm too lazy to think about it */
@@ -24,18 +24,18 @@ static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,

cpumask_clear_cpu(cpu, nmsk);
cpumask_set_cpu(cpu, irqmsk);
- cpus_per_vec--;
+ cpus_per_grp--;

/* If the cpu has siblings, use them first */
siblmsk = topology_sibling_cpumask(cpu);
- for (sibl = -1; cpus_per_vec > 0; ) {
+ for (sibl = -1; cpus_per_grp > 0; ) {
sibl = cpumask_next(sibl, siblmsk);
if (sibl >= nr_cpu_ids)
break;
if (!cpumask_test_and_clear_cpu(sibl, nmsk))
continue;
cpumask_set_cpu(sibl, irqmsk);
- cpus_per_vec--;
+ cpus_per_grp--;
}
}
}
@@ -95,48 +95,48 @@ static int get_nodes_in_cpumask(cpumask_var_t *node_to_cpumask,
return nodes;
}

-struct node_vectors {
+struct node_groups {
unsigned id;

union {
- unsigned nvectors;
+ unsigned ngroups;
unsigned ncpus;
};
};

static int ncpus_cmp_func(const void *l, const void *r)
{
- const struct node_vectors *ln = l;
- const struct node_vectors *rn = r;
+ const struct node_groups *ln = l;
+ const struct node_groups *rn = r;

return ln->ncpus - rn->ncpus;
}

/*
- * Allocate vector number for each node, so that for each node:
+ * Allocate group number for each node, so that for each node:
*
* 1) the allocated number is >= 1
*
- * 2) the allocated numbver is <= active CPU number of this node
+ * 2) the allocated number is <= active CPU number of this node
*
- * The actual allocated total vectors may be less than @numvecs when
- * active total CPU number is less than @numvecs.
+ * The actual allocated total groups may be less than @numgrps when
+ * active total CPU number is less than @numgrps.
*
* Active CPUs means the CPUs in '@cpu_mask AND @node_to_cpumask[]'
* for each node.
*/
-static void alloc_nodes_vectors(unsigned int numvecs,
- cpumask_var_t *node_to_cpumask,
- const struct cpumask *cpu_mask,
- const nodemask_t nodemsk,
- struct cpumask *nmsk,
- struct node_vectors *node_vectors)
+static void alloc_nodes_groups(unsigned int numgrps,
+ cpumask_var_t *node_to_cpumask,
+ const struct cpumask *cpu_mask,
+ const nodemask_t nodemsk,
+ struct cpumask *nmsk,
+ struct node_groups *node_groups)
{
unsigned n, remaining_ncpus = 0;

for (n = 0; n < nr_node_ids; n++) {
- node_vectors[n].id = n;
- node_vectors[n].ncpus = UINT_MAX;
+ node_groups[n].id = n;
+ node_groups[n].ncpus = UINT_MAX;
}

for_each_node_mask(n, nodemsk) {
@@ -148,61 +148,61 @@ static void alloc_nodes_vectors(unsigned int numvecs,
if (!ncpus)
continue;
remaining_ncpus += ncpus;
- node_vectors[n].ncpus = ncpus;
+ node_groups[n].ncpus = ncpus;
}

- numvecs = min_t(unsigned, remaining_ncpus, numvecs);
+ numgrps = min_t(unsigned, remaining_ncpus, numgrps);

- sort(node_vectors, nr_node_ids, sizeof(node_vectors[0]),
+ sort(node_groups, nr_node_ids, sizeof(node_groups[0]),
ncpus_cmp_func, NULL);

/*
- * Allocate vectors for each node according to the ratio of this
- * node's nr_cpus to remaining un-assigned ncpus. 'numvecs' is
+ * Allocate groups for each node according to the ratio of this
+ * node's nr_cpus to remaining un-assigned ncpus. 'numgrps' is
* bigger than number of active numa nodes. Always start the
* allocation from the node with minimized nr_cpus.
*
* This way guarantees that each active node gets allocated at
- * least one vector, and the theory is simple: over-allocation
- * is only done when this node is assigned by one vector, so
- * other nodes will be allocated >= 1 vector, since 'numvecs' is
+ * least one group, and the theory is simple: over-allocation
+ * is only done when this node is assigned by one group, so
+ * other nodes will be allocated >= 1 groups, since 'numgrps' is
* bigger than number of numa nodes.
*
- * One perfect invariant is that number of allocated vectors for
+ * One perfect invariant is that number of allocated groups for
* each node is <= CPU count of this node:
*
* 1) suppose there are two nodes: A and B
* ncpu(X) is CPU count of node X
- * vecs(X) is the vector count allocated to node X via this
+ * grps(X) is the group count allocated to node X via this
* algorithm
*
* ncpu(A) <= ncpu(B)
* ncpu(A) + ncpu(B) = N
- * vecs(A) + vecs(B) = V
+ * grps(A) + grps(B) = G
*
- * vecs(A) = max(1, round_down(V * ncpu(A) / N))
- * vecs(B) = V - vecs(A)
+ * grps(A) = max(1, round_down(G * ncpu(A) / N))
+ * grps(B) = G - grps(A)
*
- * both N and V are integer, and 2 <= V <= N, suppose
- * V = N - delta, and 0 <= delta <= N - 2
+ * both N and G are integer, and 2 <= G <= N, suppose
+ * G = N - delta, and 0 <= delta <= N - 2
*
- * 2) obviously vecs(A) <= ncpu(A) because:
+ * 2) obviously grps(A) <= ncpu(A) because:
*
- * if vecs(A) is 1, then vecs(A) <= ncpu(A) given
+ * if grps(A) is 1, then grps(A) <= ncpu(A) given
* ncpu(A) >= 1
*
* otherwise,
- * vecs(A) <= V * ncpu(A) / N <= ncpu(A), given V <= N
+ * grps(A) <= G * ncpu(A) / N <= ncpu(A), given G <= N
*
- * 3) prove how vecs(B) <= ncpu(B):
+ * 3) prove how grps(B) <= ncpu(B):
*
- * if round_down(V * ncpu(A) / N) == 0, vecs(B) won't be
- * over-allocated, so vecs(B) <= ncpu(B),
+ * if round_down(G * ncpu(A) / N) == 0, vecs(B) won't be
+ * over-allocated, so grps(B) <= ncpu(B),
*
* otherwise:
*
- * vecs(A) =
- * round_down(V * ncpu(A) / N) =
+ * grps(A) =
+ * round_down(G * ncpu(A) / N) =
* round_down((N - delta) * ncpu(A) / N) =
* round_down((N * ncpu(A) - delta * ncpu(A)) / N) >=
* round_down((N * ncpu(A) - delta * N) / N) =
@@ -210,52 +210,50 @@ static void alloc_nodes_vectors(unsigned int numvecs,
*
* then:
*
- * vecs(A) - V >= ncpu(A) - delta - V
+ * grps(A) - G >= ncpu(A) - delta - G
* =>
- * V - vecs(A) <= V + delta - ncpu(A)
+ * G - grps(A) <= G + delta - ncpu(A)
* =>
- * vecs(B) <= N - ncpu(A)
+ * grps(B) <= N - ncpu(A)
* =>
- * vecs(B) <= cpu(B)
+ * grps(B) <= cpu(B)
*
* For nodes >= 3, it can be thought as one node and another big
* node given that is exactly what this algorithm is implemented,
- * and we always re-calculate 'remaining_ncpus' & 'numvecs', and
- * finally for each node X: vecs(X) <= ncpu(X).
+ * and we always re-calculate 'remaining_ncpus' & 'numgrps', and
+ * finally for each node X: grps(X) <= ncpu(X).
*
*/
for (n = 0; n < nr_node_ids; n++) {
- unsigned nvectors, ncpus;
+ unsigned ngroups, ncpus;

- if (node_vectors[n].ncpus == UINT_MAX)
+ if (node_groups[n].ncpus == UINT_MAX)
continue;

- WARN_ON_ONCE(numvecs == 0);
+ WARN_ON_ONCE(numgrps == 0);

- ncpus = node_vectors[n].ncpus;
- nvectors = max_t(unsigned, 1,
- numvecs * ncpus / remaining_ncpus);
- WARN_ON_ONCE(nvectors > ncpus);
+ ncpus = node_groups[n].ncpus;
+ ngroups = max_t(unsigned, 1,
+ numgrps * ncpus / remaining_ncpus);
+ WARN_ON_ONCE(ngroups > ncpus);

- node_vectors[n].nvectors = nvectors;
+ node_groups[n].ngroups = ngroups;

remaining_ncpus -= ncpus;
- numvecs -= nvectors;
+ numgrps -= ngroups;
}
}

-static int __irq_build_affinity_masks(unsigned int startvec,
- unsigned int numvecs,
- cpumask_var_t *node_to_cpumask,
- const struct cpumask *cpu_mask,
- struct cpumask *nmsk,
- struct cpumask *masks)
+static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
+ cpumask_var_t *node_to_cpumask,
+ const struct cpumask *cpu_mask,
+ struct cpumask *nmsk, struct cpumask *masks)
{
- unsigned int i, n, nodes, cpus_per_vec, extra_vecs, done = 0;
- unsigned int last_affv = numvecs;
- unsigned int curvec = startvec;
+ unsigned int i, n, nodes, cpus_per_grp, extra_grps, done = 0;
+ unsigned int last_grp = numgrps;
+ unsigned int curgrp = startgrp;
nodemask_t nodemsk = NODE_MASK_NONE;
- struct node_vectors *node_vectors;
+ struct node_groups *node_groups;

if (!cpumask_weight(cpu_mask))
return 0;
@@ -264,33 +262,33 @@ static int __irq_build_affinity_masks(unsigned int startvec,

/*
* If the number of nodes in the mask is greater than or equal the
- * number of vectors we just spread the vectors across the nodes.
+ * number of groups we just spread the groups across the nodes.
*/
- if (numvecs <= nodes) {
+ if (numgrps <= nodes) {
for_each_node_mask(n, nodemsk) {
- cpumask_or(&masks[curvec], &masks[curvec],
+ cpumask_or(&masks[curgrp], &masks[curgrp],
node_to_cpumask[n]);
- if (++curvec == last_affv)
- curvec = 0;
+ if (++curgrp == last_grp)
+ curgrp = 0;
}
- return numvecs;
+ return numgrps;
}

- node_vectors = kcalloc(nr_node_ids,
- sizeof(struct node_vectors),
+ node_groups = kcalloc(nr_node_ids,
+ sizeof(struct node_groups),
GFP_KERNEL);
- if (!node_vectors)
+ if (!node_groups)
return -ENOMEM;

- /* allocate vector number for each node */
- alloc_nodes_vectors(numvecs, node_to_cpumask, cpu_mask,
- nodemsk, nmsk, node_vectors);
+ /* allocate group number for each node */
+ alloc_nodes_groups(numgrps, node_to_cpumask, cpu_mask,
+ nodemsk, nmsk, node_groups);

for (i = 0; i < nr_node_ids; i++) {
unsigned int ncpus, v;
- struct node_vectors *nv = &node_vectors[i];
+ struct node_groups *nv = &node_groups[i];

- if (nv->nvectors == UINT_MAX)
+ if (nv->ngroups == UINT_MAX)
continue;

/* Get the cpus on this node which are in the mask */
@@ -299,44 +297,47 @@ static int __irq_build_affinity_masks(unsigned int startvec,
if (!ncpus)
continue;

- WARN_ON_ONCE(nv->nvectors > ncpus);
+ WARN_ON_ONCE(nv->ngroups > ncpus);

/* Account for rounding errors */
- extra_vecs = ncpus - nv->nvectors * (ncpus / nv->nvectors);
+ extra_grps = ncpus - nv->ngroups * (ncpus / nv->ngroups);

- /* Spread allocated vectors on CPUs of the current node */
- for (v = 0; v < nv->nvectors; v++, curvec++) {
- cpus_per_vec = ncpus / nv->nvectors;
+ /* Spread allocated groups on CPUs of the current node */
+ for (v = 0; v < nv->ngroups; v++, curgrp++) {
+ cpus_per_grp = ncpus / nv->ngroups;

- /* Account for extra vectors to compensate rounding errors */
- if (extra_vecs) {
- cpus_per_vec++;
- --extra_vecs;
+ /* Account for extra groups to compensate rounding errors */
+ if (extra_grps) {
+ cpus_per_grp++;
+ --extra_grps;
}

/*
- * wrapping has to be considered given 'startvec'
+ * wrapping has to be considered given 'startgrp'
* may start anywhere
*/
- if (curvec >= last_affv)
- curvec = 0;
- irq_spread_init_one(&masks[curvec], nmsk,
- cpus_per_vec);
+ if (curgrp >= last_grp)
+ curgrp = 0;
+ grp_spread_init_one(&masks[curgrp], nmsk,
+ cpus_per_grp);
}
- done += nv->nvectors;
+ done += nv->ngroups;
}
- kfree(node_vectors);
+ kfree(node_groups);
return done;
}

/*
- * build affinity in two stages:
- * 1) spread present CPU on these vectors
- * 2) spread other possible CPUs on these vectors
+ * build affinity in two stages for each group, and try to put close CPUs
+ * in viewpoint of CPU and NUMA locality into same group, and we run
+ * two-stage grouping:
+ *
+ * 1) allocate present CPUs on these groups evenly first
+ * 2) allocate other possible CPUs on these groups evenly
*/
-static struct cpumask *irq_build_affinity_masks(unsigned int numvecs)
+static struct cpumask *group_cpus_evenly(unsigned int numgrps)
{
- unsigned int curvec = 0, nr_present = 0, nr_others = 0;
+ unsigned int curgrp = 0, nr_present = 0, nr_others = 0;
cpumask_var_t *node_to_cpumask;
cpumask_var_t nmsk, npresmsk;
int ret = -ENOMEM;
@@ -352,7 +353,7 @@ static struct cpumask *irq_build_affinity_masks(unsigned int numvecs)
if (!node_to_cpumask)
goto fail_npresmsk;

- masks = kcalloc(numvecs, sizeof(*masks), GFP_KERNEL);
+ masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
if (!masks)
goto fail_node_to_cpumask;

@@ -360,26 +361,26 @@ static struct cpumask *irq_build_affinity_masks(unsigned int numvecs)
cpus_read_lock();
build_node_to_cpumask(node_to_cpumask);

- /* Spread on present CPUs starting from affd->pre_vectors */
- ret = __irq_build_affinity_masks(curvec, numvecs, node_to_cpumask,
- cpu_present_mask, nmsk, masks);
+ /* grouping present CPUs first */
+ ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
+ cpu_present_mask, nmsk, masks);
if (ret < 0)
goto fail_build_affinity;
nr_present = ret;

/*
- * Spread on non present CPUs starting from the next vector to be
- * handled. If the spreading of present CPUs already exhausted the
- * vector space, assign the non present CPUs to the already spread
- * out vectors.
+ * Allocate non present CPUs starting from the next group to be
+ * handled. If the grouping of present CPUs already exhausted the
+ * group space, assign the non present CPUs to the already
+ * allocated out groups.
*/
- if (nr_present >= numvecs)
- curvec = 0;
+ if (nr_present >= numgrps)
+ curgrp = 0;
else
- curvec = nr_present;
+ curgrp = nr_present;
cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
- ret = __irq_build_affinity_masks(curvec, numvecs, node_to_cpumask,
- npresmsk, nmsk, masks);
+ ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
+ npresmsk, nmsk, masks);
if (ret >= 0)
nr_others = ret;

@@ -387,7 +388,7 @@ static struct cpumask *irq_build_affinity_masks(unsigned int numvecs)
cpus_read_unlock();

if (ret >= 0)
- WARN_ON(nr_present + nr_others < numvecs);
+ WARN_ON(nr_present + nr_others < numgrps);

fail_node_to_cpumask:
free_node_to_cpumask(node_to_cpumask);
@@ -466,7 +467,7 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
unsigned int this_vecs = affd->set_size[i];
int j;
- struct cpumask *result = irq_build_affinity_masks(this_vecs);
+ struct cpumask *result = group_cpus_evenly(this_vecs);

if (!result) {
kfree(masks);
--
2.31.1

2021-08-14 12:40:07

by Ming Lei

[permalink] [raw]
Subject: [PATCH 6/7] lib/group_cpus: allow to group cpus in case of !CONFIG_SMP

Allows group_cpus_evenly() to be called in case of !CONFIG_SMP by simply
assigning all CPUs into the 1st group.

Signed-off-by: Ming Lei <[email protected]>
---
include/linux/group_cpus.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/include/linux/group_cpus.h b/include/linux/group_cpus.h
index e42807ec61f6..79e5cc15bd96 100644
--- a/include/linux/group_cpus.h
+++ b/include/linux/group_cpus.h
@@ -9,6 +9,20 @@
#include <linux/kernel.h>
#include <linux/cpu.h>

+#ifdef CONFIG_SMP
struct cpumask *group_cpus_evenly(unsigned int numgrps);
+#else
+static inline struct cpumask *group_cpus_evenly(unsigned int numgrps)
+{
+ struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
+
+ if (!masks)
+ return NULL;
+
+ /* assign all CPUs(cpu 0) to the 1st group only */
+ cpumask_copy(&masks[0], cpu_possible_mask);
+ return masks;
+}
+#endif

#endif
--
2.31.1

2021-08-14 12:40:08

by Ming Lei

[permalink] [raw]
Subject: [PATCH 5/7] genirq/affinity: move group_cpus_evenly() into lib/

group_cpus_evenly() has become one generic helper which can be used for
other subsystems, so move it into lib/.

Signed-off-by: Ming Lei <[email protected]>
---
include/linux/group_cpus.h | 14 ++
kernel/irq/affinity.c | 398 +----------------------------------
lib/Makefile | 2 +
lib/group_cpus.c | 413 +++++++++++++++++++++++++++++++++++++
4 files changed, 430 insertions(+), 397 deletions(-)
create mode 100644 include/linux/group_cpus.h
create mode 100644 lib/group_cpus.c

diff --git a/include/linux/group_cpus.h b/include/linux/group_cpus.h
new file mode 100644
index 000000000000..e42807ec61f6
--- /dev/null
+++ b/include/linux/group_cpus.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2016 Thomas Gleixner.
+ * Copyright (C) 2016-2017 Christoph Hellwig.
+ */
+
+#ifndef __LINUX_GROUP_CPUS_H
+#define __LINUX_GROUP_CPUS_H
+#include <linux/kernel.h>
+#include <linux/cpu.h>
+
+struct cpumask *group_cpus_evenly(unsigned int numgrps);
+
+#endif
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index ad0ce4b5a28e..44a4eba80315 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -7,403 +7,7 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/cpu.h>
-#include <linux/sort.h>
-
-static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
- unsigned int cpus_per_grp)
-{
- const struct cpumask *siblmsk;
- int cpu, sibl;
-
- for ( ; cpus_per_grp > 0; ) {
- cpu = cpumask_first(nmsk);
-
- /* Should not happen, but I'm too lazy to think about it */
- if (cpu >= nr_cpu_ids)
- return;
-
- cpumask_clear_cpu(cpu, nmsk);
- cpumask_set_cpu(cpu, irqmsk);
- cpus_per_grp--;
-
- /* If the cpu has siblings, use them first */
- siblmsk = topology_sibling_cpumask(cpu);
- for (sibl = -1; cpus_per_grp > 0; ) {
- sibl = cpumask_next(sibl, siblmsk);
- if (sibl >= nr_cpu_ids)
- break;
- if (!cpumask_test_and_clear_cpu(sibl, nmsk))
- continue;
- cpumask_set_cpu(sibl, irqmsk);
- cpus_per_grp--;
- }
- }
-}
-
-static cpumask_var_t *alloc_node_to_cpumask(void)
-{
- cpumask_var_t *masks;
- int node;
-
- masks = kcalloc(nr_node_ids, sizeof(cpumask_var_t), GFP_KERNEL);
- if (!masks)
- return NULL;
-
- for (node = 0; node < nr_node_ids; node++) {
- if (!zalloc_cpumask_var(&masks[node], GFP_KERNEL))
- goto out_unwind;
- }
-
- return masks;
-
-out_unwind:
- while (--node >= 0)
- free_cpumask_var(masks[node]);
- kfree(masks);
- return NULL;
-}
-
-static void free_node_to_cpumask(cpumask_var_t *masks)
-{
- int node;
-
- for (node = 0; node < nr_node_ids; node++)
- free_cpumask_var(masks[node]);
- kfree(masks);
-}
-
-static void build_node_to_cpumask(cpumask_var_t *masks)
-{
- int cpu;
-
- for_each_possible_cpu(cpu)
- cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]);
-}
-
-static int get_nodes_in_cpumask(cpumask_var_t *node_to_cpumask,
- const struct cpumask *mask, nodemask_t *nodemsk)
-{
- int n, nodes = 0;
-
- /* Calculate the number of nodes in the supplied affinity mask */
- for_each_node(n) {
- if (cpumask_intersects(mask, node_to_cpumask[n])) {
- node_set(n, *nodemsk);
- nodes++;
- }
- }
- return nodes;
-}
-
-struct node_groups {
- unsigned id;
-
- union {
- unsigned ngroups;
- unsigned ncpus;
- };
-};
-
-static int ncpus_cmp_func(const void *l, const void *r)
-{
- const struct node_groups *ln = l;
- const struct node_groups *rn = r;
-
- return ln->ncpus - rn->ncpus;
-}
-
-/*
- * Allocate group number for each node, so that for each node:
- *
- * 1) the allocated number is >= 1
- *
- * 2) the allocated number is <= active CPU number of this node
- *
- * The actual allocated total groups may be less than @numgrps when
- * active total CPU number is less than @numgrps.
- *
- * Active CPUs means the CPUs in '@cpu_mask AND @node_to_cpumask[]'
- * for each node.
- */
-static void alloc_nodes_groups(unsigned int numgrps,
- cpumask_var_t *node_to_cpumask,
- const struct cpumask *cpu_mask,
- const nodemask_t nodemsk,
- struct cpumask *nmsk,
- struct node_groups *node_groups)
-{
- unsigned n, remaining_ncpus = 0;
-
- for (n = 0; n < nr_node_ids; n++) {
- node_groups[n].id = n;
- node_groups[n].ncpus = UINT_MAX;
- }
-
- for_each_node_mask(n, nodemsk) {
- unsigned ncpus;
-
- cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
- ncpus = cpumask_weight(nmsk);
-
- if (!ncpus)
- continue;
- remaining_ncpus += ncpus;
- node_groups[n].ncpus = ncpus;
- }
-
- numgrps = min_t(unsigned, remaining_ncpus, numgrps);
-
- sort(node_groups, nr_node_ids, sizeof(node_groups[0]),
- ncpus_cmp_func, NULL);
-
- /*
- * Allocate groups for each node according to the ratio of this
- * node's nr_cpus to remaining un-assigned ncpus. 'numgrps' is
- * bigger than number of active numa nodes. Always start the
- * allocation from the node with minimized nr_cpus.
- *
- * This way guarantees that each active node gets allocated at
- * least one group, and the theory is simple: over-allocation
- * is only done when this node is assigned by one group, so
- * other nodes will be allocated >= 1 groups, since 'numgrps' is
- * bigger than number of numa nodes.
- *
- * One perfect invariant is that number of allocated groups for
- * each node is <= CPU count of this node:
- *
- * 1) suppose there are two nodes: A and B
- * ncpu(X) is CPU count of node X
- * grps(X) is the group count allocated to node X via this
- * algorithm
- *
- * ncpu(A) <= ncpu(B)
- * ncpu(A) + ncpu(B) = N
- * grps(A) + grps(B) = G
- *
- * grps(A) = max(1, round_down(G * ncpu(A) / N))
- * grps(B) = G - grps(A)
- *
- * both N and G are integer, and 2 <= G <= N, suppose
- * G = N - delta, and 0 <= delta <= N - 2
- *
- * 2) obviously grps(A) <= ncpu(A) because:
- *
- * if grps(A) is 1, then grps(A) <= ncpu(A) given
- * ncpu(A) >= 1
- *
- * otherwise,
- * grps(A) <= G * ncpu(A) / N <= ncpu(A), given G <= N
- *
- * 3) prove how grps(B) <= ncpu(B):
- *
- * if round_down(G * ncpu(A) / N) == 0, vecs(B) won't be
- * over-allocated, so grps(B) <= ncpu(B),
- *
- * otherwise:
- *
- * grps(A) =
- * round_down(G * ncpu(A) / N) =
- * round_down((N - delta) * ncpu(A) / N) =
- * round_down((N * ncpu(A) - delta * ncpu(A)) / N) >=
- * round_down((N * ncpu(A) - delta * N) / N) =
- * cpu(A) - delta
- *
- * then:
- *
- * grps(A) - G >= ncpu(A) - delta - G
- * =>
- * G - grps(A) <= G + delta - ncpu(A)
- * =>
- * grps(B) <= N - ncpu(A)
- * =>
- * grps(B) <= cpu(B)
- *
- * For nodes >= 3, it can be thought as one node and another big
- * node given that is exactly what this algorithm is implemented,
- * and we always re-calculate 'remaining_ncpus' & 'numgrps', and
- * finally for each node X: grps(X) <= ncpu(X).
- *
- */
- for (n = 0; n < nr_node_ids; n++) {
- unsigned ngroups, ncpus;
-
- if (node_groups[n].ncpus == UINT_MAX)
- continue;
-
- WARN_ON_ONCE(numgrps == 0);
-
- ncpus = node_groups[n].ncpus;
- ngroups = max_t(unsigned, 1,
- numgrps * ncpus / remaining_ncpus);
- WARN_ON_ONCE(ngroups > ncpus);
-
- node_groups[n].ngroups = ngroups;
-
- remaining_ncpus -= ncpus;
- numgrps -= ngroups;
- }
-}
-
-static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
- cpumask_var_t *node_to_cpumask,
- const struct cpumask *cpu_mask,
- struct cpumask *nmsk, struct cpumask *masks)
-{
- unsigned int i, n, nodes, cpus_per_grp, extra_grps, done = 0;
- unsigned int last_grp = numgrps;
- unsigned int curgrp = startgrp;
- nodemask_t nodemsk = NODE_MASK_NONE;
- struct node_groups *node_groups;
-
- if (!cpumask_weight(cpu_mask))
- return 0;
-
- nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk);
-
- /*
- * If the number of nodes in the mask is greater than or equal the
- * number of groups we just spread the groups across the nodes.
- */
- if (numgrps <= nodes) {
- for_each_node_mask(n, nodemsk) {
- cpumask_or(&masks[curgrp], &masks[curgrp],
- node_to_cpumask[n]);
- if (++curgrp == last_grp)
- curgrp = 0;
- }
- return numgrps;
- }
-
- node_groups = kcalloc(nr_node_ids,
- sizeof(struct node_groups),
- GFP_KERNEL);
- if (!node_groups)
- return -ENOMEM;
-
- /* allocate group number for each node */
- alloc_nodes_groups(numgrps, node_to_cpumask, cpu_mask,
- nodemsk, nmsk, node_groups);
-
- for (i = 0; i < nr_node_ids; i++) {
- unsigned int ncpus, v;
- struct node_groups *nv = &node_groups[i];
-
- if (nv->ngroups == UINT_MAX)
- continue;
-
- /* Get the cpus on this node which are in the mask */
- cpumask_and(nmsk, cpu_mask, node_to_cpumask[nv->id]);
- ncpus = cpumask_weight(nmsk);
- if (!ncpus)
- continue;
-
- WARN_ON_ONCE(nv->ngroups > ncpus);
-
- /* Account for rounding errors */
- extra_grps = ncpus - nv->ngroups * (ncpus / nv->ngroups);
-
- /* Spread allocated groups on CPUs of the current node */
- for (v = 0; v < nv->ngroups; v++, curgrp++) {
- cpus_per_grp = ncpus / nv->ngroups;
-
- /* Account for extra groups to compensate rounding errors */
- if (extra_grps) {
- cpus_per_grp++;
- --extra_grps;
- }
-
- /*
- * wrapping has to be considered given 'startgrp'
- * may start anywhere
- */
- if (curgrp >= last_grp)
- curgrp = 0;
- grp_spread_init_one(&masks[curgrp], nmsk,
- cpus_per_grp);
- }
- done += nv->ngroups;
- }
- kfree(node_groups);
- return done;
-}
-
-/*
- * build affinity in two stages for each group, and try to put close CPUs
- * in viewpoint of CPU and NUMA locality into same group, and we run
- * two-stage grouping:
- *
- * 1) allocate present CPUs on these groups evenly first
- * 2) allocate other possible CPUs on these groups evenly
- */
-static struct cpumask *group_cpus_evenly(unsigned int numgrps)
-{
- unsigned int curgrp = 0, nr_present = 0, nr_others = 0;
- cpumask_var_t *node_to_cpumask;
- cpumask_var_t nmsk, npresmsk;
- int ret = -ENOMEM;
- struct cpumask *masks = NULL;
-
- if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
- return NULL;
-
- if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
- goto fail_nmsk;
-
- node_to_cpumask = alloc_node_to_cpumask();
- if (!node_to_cpumask)
- goto fail_npresmsk;
-
- masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
- if (!masks)
- goto fail_node_to_cpumask;
-
- /* Stabilize the cpumasks */
- cpus_read_lock();
- build_node_to_cpumask(node_to_cpumask);
-
- /* grouping present CPUs first */
- ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
- cpu_present_mask, nmsk, masks);
- if (ret < 0)
- goto fail_build_affinity;
- nr_present = ret;
-
- /*
- * Allocate non present CPUs starting from the next group to be
- * handled. If the grouping of present CPUs already exhausted the
- * group space, assign the non present CPUs to the already
- * allocated out groups.
- */
- if (nr_present >= numgrps)
- curgrp = 0;
- else
- curgrp = nr_present;
- cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
- ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
- npresmsk, nmsk, masks);
- if (ret >= 0)
- nr_others = ret;
-
- fail_build_affinity:
- cpus_read_unlock();
-
- if (ret >= 0)
- WARN_ON(nr_present + nr_others < numgrps);
-
- fail_node_to_cpumask:
- free_node_to_cpumask(node_to_cpumask);
-
- fail_npresmsk:
- free_cpumask_var(npresmsk);
-
- fail_nmsk:
- free_cpumask_var(nmsk);
- if (ret < 0) {
- kfree(masks);
- return NULL;
- }
- return masks;
-}
+#include <linux/group_cpus.h>

static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
{
diff --git a/lib/Makefile b/lib/Makefile
index 5efd1b435a37..ff1cbe4958a1 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -338,6 +338,8 @@ obj-$(CONFIG_SBITMAP) += sbitmap.o

obj-$(CONFIG_PARMAN) += parman.o

+obj-$(CONFIG_SMP) += group_cpus.o
+
# GCC library routines
obj-$(CONFIG_GENERIC_LIB_ASHLDI3) += ashldi3.o
obj-$(CONFIG_GENERIC_LIB_ASHRDI3) += ashrdi3.o
diff --git a/lib/group_cpus.c b/lib/group_cpus.c
new file mode 100644
index 000000000000..c36fa67f8671
--- /dev/null
+++ b/lib/group_cpus.c
@@ -0,0 +1,413 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2016 Thomas Gleixner.
+ * Copyright (C) 2016-2017 Christoph Hellwig.
+ */
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/cpu.h>
+#include <linux/sort.h>
+
+static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
+ unsigned int cpus_per_grp)
+{
+ const struct cpumask *siblmsk;
+ int cpu, sibl;
+
+ for ( ; cpus_per_grp > 0; ) {
+ cpu = cpumask_first(nmsk);
+
+ /* Should not happen, but I'm too lazy to think about it */
+ if (cpu >= nr_cpu_ids)
+ return;
+
+ cpumask_clear_cpu(cpu, nmsk);
+ cpumask_set_cpu(cpu, irqmsk);
+ cpus_per_grp--;
+
+ /* If the cpu has siblings, use them first */
+ siblmsk = topology_sibling_cpumask(cpu);
+ for (sibl = -1; cpus_per_grp > 0; ) {
+ sibl = cpumask_next(sibl, siblmsk);
+ if (sibl >= nr_cpu_ids)
+ break;
+ if (!cpumask_test_and_clear_cpu(sibl, nmsk))
+ continue;
+ cpumask_set_cpu(sibl, irqmsk);
+ cpus_per_grp--;
+ }
+ }
+}
+
+static cpumask_var_t *alloc_node_to_cpumask(void)
+{
+ cpumask_var_t *masks;
+ int node;
+
+ masks = kcalloc(nr_node_ids, sizeof(cpumask_var_t), GFP_KERNEL);
+ if (!masks)
+ return NULL;
+
+ for (node = 0; node < nr_node_ids; node++) {
+ if (!zalloc_cpumask_var(&masks[node], GFP_KERNEL))
+ goto out_unwind;
+ }
+
+ return masks;
+
+out_unwind:
+ while (--node >= 0)
+ free_cpumask_var(masks[node]);
+ kfree(masks);
+ return NULL;
+}
+
+static void free_node_to_cpumask(cpumask_var_t *masks)
+{
+ int node;
+
+ for (node = 0; node < nr_node_ids; node++)
+ free_cpumask_var(masks[node]);
+ kfree(masks);
+}
+
+static void build_node_to_cpumask(cpumask_var_t *masks)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]);
+}
+
+static int get_nodes_in_cpumask(cpumask_var_t *node_to_cpumask,
+ const struct cpumask *mask, nodemask_t *nodemsk)
+{
+ int n, nodes = 0;
+
+ /* Calculate the number of nodes in the supplied affinity mask */
+ for_each_node(n) {
+ if (cpumask_intersects(mask, node_to_cpumask[n])) {
+ node_set(n, *nodemsk);
+ nodes++;
+ }
+ }
+ return nodes;
+}
+
+struct node_groups {
+ unsigned id;
+
+ union {
+ unsigned ngroups;
+ unsigned ncpus;
+ };
+};
+
+static int ncpus_cmp_func(const void *l, const void *r)
+{
+ const struct node_groups *ln = l;
+ const struct node_groups *rn = r;
+
+ return ln->ncpus - rn->ncpus;
+}
+
+/*
+ * Allocate group number for each node, so that for each node:
+ *
+ * 1) the allocated number is >= 1
+ *
+ * 2) the allocated number is <= active CPU number of this node
+ *
+ * The actual allocated total groups may be less than @numgrps when
+ * active total CPU number is less than @numgrps.
+ *
+ * Active CPUs means the CPUs in '@cpu_mask AND @node_to_cpumask[]'
+ * for each node.
+ */
+static void alloc_nodes_groups(unsigned int numgrps,
+ cpumask_var_t *node_to_cpumask,
+ const struct cpumask *cpu_mask,
+ const nodemask_t nodemsk,
+ struct cpumask *nmsk,
+ struct node_groups *node_groups)
+{
+ unsigned n, remaining_ncpus = 0;
+
+ for (n = 0; n < nr_node_ids; n++) {
+ node_groups[n].id = n;
+ node_groups[n].ncpus = UINT_MAX;
+ }
+
+ for_each_node_mask(n, nodemsk) {
+ unsigned ncpus;
+
+ cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
+ ncpus = cpumask_weight(nmsk);
+
+ if (!ncpus)
+ continue;
+ remaining_ncpus += ncpus;
+ node_groups[n].ncpus = ncpus;
+ }
+
+ numgrps = min_t(unsigned, remaining_ncpus, numgrps);
+
+ sort(node_groups, nr_node_ids, sizeof(node_groups[0]),
+ ncpus_cmp_func, NULL);
+
+ /*
+ * Allocate groups for each node according to the ratio of this
+ * node's nr_cpus to remaining un-assigned ncpus. 'numgrps' is
+ * bigger than number of active numa nodes. Always start the
+ * allocation from the node with minimized nr_cpus.
+ *
+ * This way guarantees that each active node gets allocated at
+ * least one group, and the theory is simple: over-allocation
+ * is only done when this node is assigned by one group, so
+ * other nodes will be allocated >= 1 groups, since 'numgrps' is
+ * bigger than number of numa nodes.
+ *
+ * One perfect invariant is that number of allocated groups for
+ * each node is <= CPU count of this node:
+ *
+ * 1) suppose there are two nodes: A and B
+ * ncpu(X) is CPU count of node X
+ * grps(X) is the group count allocated to node X via this
+ * algorithm
+ *
+ * ncpu(A) <= ncpu(B)
+ * ncpu(A) + ncpu(B) = N
+ * grps(A) + grps(B) = G
+ *
+ * grps(A) = max(1, round_down(G * ncpu(A) / N))
+ * grps(B) = G - grps(A)
+ *
+ * both N and G are integer, and 2 <= G <= N, suppose
+ * G = N - delta, and 0 <= delta <= N - 2
+ *
+ * 2) obviously grps(A) <= ncpu(A) because:
+ *
+ * if grps(A) is 1, then grps(A) <= ncpu(A) given
+ * ncpu(A) >= 1
+ *
+ * otherwise,
+ * grps(A) <= G * ncpu(A) / N <= ncpu(A), given G <= N
+ *
+ * 3) prove how grps(B) <= ncpu(B):
+ *
+ * if round_down(G * ncpu(A) / N) == 0, vecs(B) won't be
+ * over-allocated, so grps(B) <= ncpu(B),
+ *
+ * otherwise:
+ *
+ * grps(A) =
+ * round_down(G * ncpu(A) / N) =
+ * round_down((N - delta) * ncpu(A) / N) =
+ * round_down((N * ncpu(A) - delta * ncpu(A)) / N) >=
+ * round_down((N * ncpu(A) - delta * N) / N) =
+ * cpu(A) - delta
+ *
+ * then:
+ *
+ * grps(A) - G >= ncpu(A) - delta - G
+ * =>
+ * G - grps(A) <= G + delta - ncpu(A)
+ * =>
+ * grps(B) <= N - ncpu(A)
+ * =>
+ * grps(B) <= cpu(B)
+ *
+ * For nodes >= 3, it can be thought as one node and another big
+ * node given that is exactly what this algorithm is implemented,
+ * and we always re-calculate 'remaining_ncpus' & 'numgrps', and
+ * finally for each node X: grps(X) <= ncpu(X).
+ *
+ */
+ for (n = 0; n < nr_node_ids; n++) {
+ unsigned ngroups, ncpus;
+
+ if (node_groups[n].ncpus == UINT_MAX)
+ continue;
+
+ WARN_ON_ONCE(numgrps == 0);
+
+ ncpus = node_groups[n].ncpus;
+ ngroups = max_t(unsigned, 1,
+ numgrps * ncpus / remaining_ncpus);
+ WARN_ON_ONCE(ngroups > ncpus);
+
+ node_groups[n].ngroups = ngroups;
+
+ remaining_ncpus -= ncpus;
+ numgrps -= ngroups;
+ }
+}
+
+static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
+ cpumask_var_t *node_to_cpumask,
+ const struct cpumask *cpu_mask,
+ struct cpumask *nmsk, struct cpumask *masks)
+{
+ unsigned int i, n, nodes, cpus_per_grp, extra_grps, done = 0;
+ unsigned int last_grp = numgrps;
+ unsigned int curgrp = startgrp;
+ nodemask_t nodemsk = NODE_MASK_NONE;
+ struct node_groups *node_groups;
+
+ if (!cpumask_weight(cpu_mask))
+ return 0;
+
+ nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk);
+
+ /*
+ * If the number of nodes in the mask is greater than or equal the
+ * number of groups we just spread the groups across the nodes.
+ */
+ if (numgrps <= nodes) {
+ for_each_node_mask(n, nodemsk) {
+ cpumask_or(&masks[curgrp], &masks[curgrp],
+ node_to_cpumask[n]);
+ if (++curgrp == last_grp)
+ curgrp = 0;
+ }
+ return numgrps;
+ }
+
+ node_groups = kcalloc(nr_node_ids,
+ sizeof(struct node_groups),
+ GFP_KERNEL);
+ if (!node_groups)
+ return -ENOMEM;
+
+ /* allocate group number for each node */
+ alloc_nodes_groups(numgrps, node_to_cpumask, cpu_mask,
+ nodemsk, nmsk, node_groups);
+
+ for (i = 0; i < nr_node_ids; i++) {
+ unsigned int ncpus, v;
+ struct node_groups *nv = &node_groups[i];
+
+ if (nv->ngroups == UINT_MAX)
+ continue;
+
+ /* Get the cpus on this node which are in the mask */
+ cpumask_and(nmsk, cpu_mask, node_to_cpumask[nv->id]);
+ ncpus = cpumask_weight(nmsk);
+ if (!ncpus)
+ continue;
+
+ WARN_ON_ONCE(nv->ngroups > ncpus);
+
+ /* Account for rounding errors */
+ extra_grps = ncpus - nv->ngroups * (ncpus / nv->ngroups);
+
+ /* Spread allocated groups on CPUs of the current node */
+ for (v = 0; v < nv->ngroups; v++, curgrp++) {
+ cpus_per_grp = ncpus / nv->ngroups;
+
+ /* Account for extra groups to compensate rounding errors */
+ if (extra_grps) {
+ cpus_per_grp++;
+ --extra_grps;
+ }
+
+ /*
+ * wrapping has to be considered given 'startgrp'
+ * may start anywhere
+ */
+ if (curgrp >= last_grp)
+ curgrp = 0;
+ grp_spread_init_one(&masks[curgrp], nmsk,
+ cpus_per_grp);
+ }
+ done += nv->ngroups;
+ }
+ kfree(node_groups);
+ return done;
+}
+
+/**
+ * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
+ * @numgrps: number of groups
+ *
+ * Return: cpumask array if successful, NULL otherwise. And each element
+ * includes CPUs assigned to this group
+ *
+ * Try to put close CPUs from viewpoint of CPU and NUMA locality into
+ * same group, and run two-stage grouping:
+ * 1) allocate present CPUs on these groups evenly first
+ * 2) allocate other possible CPUs on these groups evenly
+ *
+ * We guarantee in the resulted grouping that all CPUs are covered, and
+ * no same CPU is assigned to different groups
+ */
+struct cpumask *group_cpus_evenly(unsigned int numgrps)
+{
+ unsigned int curgrp = 0, nr_present = 0, nr_others = 0;
+ cpumask_var_t *node_to_cpumask;
+ cpumask_var_t nmsk, npresmsk;
+ int ret = -ENOMEM;
+ struct cpumask *masks = NULL;
+
+ if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
+ return NULL;
+
+ if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
+ goto fail_nmsk;
+
+ node_to_cpumask = alloc_node_to_cpumask();
+ if (!node_to_cpumask)
+ goto fail_npresmsk;
+
+ masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
+ if (!masks)
+ goto fail_node_to_cpumask;
+
+ /* Stabilize the cpumasks */
+ cpus_read_lock();
+ build_node_to_cpumask(node_to_cpumask);
+
+ /* grouping present CPUs first */
+ ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
+ cpu_present_mask, nmsk, masks);
+ if (ret < 0)
+ goto fail_build_affinity;
+ nr_present = ret;
+
+ /*
+ * Allocate non present CPUs starting from the next group to be
+ * handled. If the grouping of present CPUs already exhausted the
+ * group space, assign the non present CPUs to the already
+ * allocated out groups.
+ */
+ if (nr_present >= numgrps)
+ curgrp = 0;
+ else
+ curgrp = nr_present;
+ cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
+ ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
+ npresmsk, nmsk, masks);
+ if (ret >= 0)
+ nr_others = ret;
+
+ fail_build_affinity:
+ cpus_read_unlock();
+
+ if (ret >= 0)
+ WARN_ON(nr_present + nr_others < numgrps);
+
+ fail_node_to_cpumask:
+ free_node_to_cpumask(node_to_cpumask);
+
+ fail_npresmsk:
+ free_cpumask_var(npresmsk);
+
+ fail_nmsk:
+ free_cpumask_var(nmsk);
+ if (ret < 0) {
+ kfree(masks);
+ return NULL;
+ }
+ return masks;
+}
+EXPORT_SYMBOL_GPL(group_cpus_evenly);
--
2.31.1

2021-08-14 12:40:16

by Ming Lei

[permalink] [raw]
Subject: [PATCH 7/7] blk-mq: build default queue map via group_cpus_evenly()

The default queue mapping builder of blk_mq_map_queues doesn't take NUMA
topo into account, so the built mapping is pretty bad, since CPUs
belonging to different NUMA node are assigned to same queue. It is
observed that IOPS drops by ~30% when running two jobs on same hctx
of null_blk from two CPUs belonging to two NUMA nodes compared with
from same NUMA node.

Address the issue by reusing group_cpus_evenly() for addressing the
issue since group_cpus_evenly() does group cpus according to CPU/NUMA
locality.

Lots of drivers may benefit from the change, such as nvme pci poll,
nvme tcp, ...

Signed-off-by: Ming Lei <[email protected]>
---
block/blk-mq-cpumap.c | 64 +++++++++----------------------------------
1 file changed, 13 insertions(+), 51 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 3db84d3197f1..5f183f52626c 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -10,67 +10,29 @@
#include <linux/mm.h>
#include <linux/smp.h>
#include <linux/cpu.h>
+#include <linux/group_cpus.h>

#include <linux/blk-mq.h>
#include "blk.h"
#include "blk-mq.h"

-static int queue_index(struct blk_mq_queue_map *qmap,
- unsigned int nr_queues, const int q)
-{
- return qmap->queue_offset + (q % nr_queues);
-}
-
-static int get_first_sibling(unsigned int cpu)
-{
- unsigned int ret;
-
- ret = cpumask_first(topology_sibling_cpumask(cpu));
- if (ret < nr_cpu_ids)
- return ret;
-
- return cpu;
-}
-
int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
{
- unsigned int *map = qmap->mq_map;
- unsigned int nr_queues = qmap->nr_queues;
- unsigned int cpu, first_sibling, q = 0;
-
- for_each_possible_cpu(cpu)
- map[cpu] = -1;
+ const struct cpumask *masks;
+ unsigned int queue, cpu;

- /*
- * Spread queues among present CPUs first for minimizing
- * count of dead queues which are mapped by all un-present CPUs
- */
- for_each_present_cpu(cpu) {
- if (q >= nr_queues)
- break;
- map[cpu] = queue_index(qmap, nr_queues, q++);
- }
+ masks = group_cpus_evenly(qmap->nr_queues);
+ if (!masks)
+ goto fallback;

- for_each_possible_cpu(cpu) {
- if (map[cpu] != -1)
- continue;
- /*
- * First do sequential mapping between CPUs and queues.
- * In case we still have CPUs to map, and we have some number of
- * threads per cores then map sibling threads to the same queue
- * for performance optimizations.
- */
- if (q < nr_queues) {
- map[cpu] = queue_index(qmap, nr_queues, q++);
- } else {
- first_sibling = get_first_sibling(cpu);
- if (first_sibling == cpu)
- map[cpu] = queue_index(qmap, nr_queues, q++);
- else
- map[cpu] = map[first_sibling];
- }
+ for (queue = 0; queue < qmap->nr_queues; queue++) {
+ for_each_cpu(cpu, &masks[queue])
+ qmap->mq_map[cpu] = qmap->queue_offset + queue;
}
-
+ return 0;
+ fallback:
+ for_each_possible_cpu(cpu)
+ qmap->mq_map[cpu] = qmap->queue_offset;
return 0;
}
EXPORT_SYMBOL_GPL(blk_mq_map_queues);
--
2.31.1

2021-08-14 16:33:10

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 7/7] blk-mq: build default queue map via group_cpus_evenly()

Hi Ming,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/irq/core]
[also build test ERROR on next-20210813]
[cannot apply to block/for-next linux/master linus/master v5.14-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 04c2721d3530f0723b4c922a8fa9f26b202a20de
config: arc-randconfig-r043-20210814 (attached as .config)
compiler: arceb-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/46b1d0ed609db266f6f18e7156c4f294bf6c4502
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
git checkout 46b1d0ed609db266f6f18e7156c4f294bf6c4502
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arc

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All error/warnings (new ones prefixed by >>):

In file included from block/blk-mq-cpumap.c:13:
include/linux/group_cpus.h: In function 'group_cpus_evenly':
>> include/linux/group_cpus.h:17:33: error: implicit declaration of function 'kcalloc'; did you mean 'kvcalloc'? [-Werror=implicit-function-declaration]
17 | struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
| ^~~~~~~
| kvcalloc
>> include/linux/group_cpus.h:17:33: warning: initialization of 'struct cpumask *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
In file included from include/linux/genhd.h:16,
from include/linux/blkdev.h:8,
from include/linux/blk-mq.h:5,
from block/blk-mq-cpumap.c:15:
include/linux/slab.h: At top level:
>> include/linux/slab.h:658:21: error: conflicting types for 'kcalloc'; have 'void *(size_t, size_t, gfp_t)' {aka 'void *(unsigned int, unsigned int, unsigned int)'}
658 | static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
| ^~~~~~~
In file included from block/blk-mq-cpumap.c:13:
include/linux/group_cpus.h:17:33: note: previous implicit declaration of 'kcalloc' with type 'int()'
17 | struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
| ^~~~~~~
cc1: some warnings being treated as errors


vim +17 include/linux/group_cpus.h

759f72186bfdd5 Ming Lei 2021-08-14 11
5cd330f089b089 Ming Lei 2021-08-14 12 #ifdef CONFIG_SMP
759f72186bfdd5 Ming Lei 2021-08-14 13 struct cpumask *group_cpus_evenly(unsigned int numgrps);
5cd330f089b089 Ming Lei 2021-08-14 14 #else
5cd330f089b089 Ming Lei 2021-08-14 15 static inline struct cpumask *group_cpus_evenly(unsigned int numgrps)
5cd330f089b089 Ming Lei 2021-08-14 16 {
5cd330f089b089 Ming Lei 2021-08-14 @17 struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
5cd330f089b089 Ming Lei 2021-08-14 18
5cd330f089b089 Ming Lei 2021-08-14 19 if (!masks)
5cd330f089b089 Ming Lei 2021-08-14 20 return NULL;
5cd330f089b089 Ming Lei 2021-08-14 21
5cd330f089b089 Ming Lei 2021-08-14 22 /* assign all CPUs(cpu 0) to the 1st group only */
5cd330f089b089 Ming Lei 2021-08-14 23 cpumask_copy(&masks[0], cpu_possible_mask);
5cd330f089b089 Ming Lei 2021-08-14 24 return masks;
5cd330f089b089 Ming Lei 2021-08-14 25 }
5cd330f089b089 Ming Lei 2021-08-14 26 #endif
759f72186bfdd5 Ming Lei 2021-08-14 27

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (4.23 kB)
.config.gz (33.37 kB)
Download all attachments

2021-08-14 17:03:06

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 5/7] genirq/affinity: move group_cpus_evenly() into lib/

Hi Ming,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/irq/core]
[also build test WARNING on next-20210813]
[cannot apply to block/for-next linux/master linus/master v5.14-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 04c2721d3530f0723b4c922a8fa9f26b202a20de
config: hexagon-randconfig-r041-20210814 (attached as .config)
compiler: clang version 12.0.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/759f72186bfdd5c3ba8b53ac0749cf7ba930012c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
git checkout 759f72186bfdd5c3ba8b53ac0749cf7ba930012c
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=hexagon

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> lib/group_cpus.c:344:17: warning: no previous prototype for function 'group_cpus_evenly' [-Wmissing-prototypes]
struct cpumask *group_cpus_evenly(unsigned int numgrps)
^
lib/group_cpus.c:344:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
struct cpumask *group_cpus_evenly(unsigned int numgrps)
^
static
1 warning generated.


vim +/group_cpus_evenly +344 lib/group_cpus.c

328
329 /**
330 * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
331 * @numgrps: number of groups
332 *
333 * Return: cpumask array if successful, NULL otherwise. And each element
334 * includes CPUs assigned to this group
335 *
336 * Try to put close CPUs from viewpoint of CPU and NUMA locality into
337 * same group, and run two-stage grouping:
338 * 1) allocate present CPUs on these groups evenly first
339 * 2) allocate other possible CPUs on these groups evenly
340 *
341 * We guarantee in the resulted grouping that all CPUs are covered, and
342 * no same CPU is assigned to different groups
343 */
> 344 struct cpumask *group_cpus_evenly(unsigned int numgrps)

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (2.89 kB)
.config.gz (24.75 kB)
Download all attachments

2021-08-14 17:18:40

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 5/7] genirq/affinity: move group_cpus_evenly() into lib/

Hi Ming,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/irq/core]
[also build test WARNING on next-20210813]
[cannot apply to block/for-next linux/master linus/master v5.14-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 04c2721d3530f0723b4c922a8fa9f26b202a20de
config: arc-randconfig-r016-20210814 (attached as .config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/759f72186bfdd5c3ba8b53ac0749cf7ba930012c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
git checkout 759f72186bfdd5c3ba8b53ac0749cf7ba930012c
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arc

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> lib/group_cpus.c:344:17: warning: no previous prototype for 'group_cpus_evenly' [-Wmissing-prototypes]
344 | struct cpumask *group_cpus_evenly(unsigned int numgrps)
| ^~~~~~~~~~~~~~~~~


vim +/group_cpus_evenly +344 lib/group_cpus.c

328
329 /**
330 * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
331 * @numgrps: number of groups
332 *
333 * Return: cpumask array if successful, NULL otherwise. And each element
334 * includes CPUs assigned to this group
335 *
336 * Try to put close CPUs from viewpoint of CPU and NUMA locality into
337 * same group, and run two-stage grouping:
338 * 1) allocate present CPUs on these groups evenly first
339 * 2) allocate other possible CPUs on these groups evenly
340 *
341 * We guarantee in the resulted grouping that all CPUs are covered, and
342 * no same CPU is assigned to different groups
343 */
> 344 struct cpumask *group_cpus_evenly(unsigned int numgrps)

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (2.69 kB)
.config.gz (28.36 kB)
Download all attachments

2021-08-14 20:52:18

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 7/7] blk-mq: build default queue map via group_cpus_evenly()

Hi Ming,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/irq/core]
[also build test ERROR on next-20210813]
[cannot apply to block/for-next linux/master linus/master v5.14-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 04c2721d3530f0723b4c922a8fa9f26b202a20de
config: riscv-buildonly-randconfig-r005-20210814 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 1f7b25ea76a925aca690da28de9d78db7ca99d0c)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/46b1d0ed609db266f6f18e7156c4f294bf6c4502
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
git checkout 46b1d0ed609db266f6f18e7156c4f294bf6c4502
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All error/warnings (new ones prefixed by >>):

In file included from block/blk-mq-cpumap.c:13:
>> include/linux/group_cpus.h:17:26: error: implicit declaration of function 'kcalloc' [-Werror,-Wimplicit-function-declaration]
struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
^
include/linux/group_cpus.h:17:26: note: did you mean 'kvcalloc'?
include/linux/mm.h:827:21: note: 'kvcalloc' declared here
static inline void *kvcalloc(size_t n, size_t size, gfp_t flags)
^
In file included from block/blk-mq-cpumap.c:13:
>> include/linux/group_cpus.h:17:18: warning: incompatible integer to pointer conversion initializing 'struct cpumask *' with an expression of type 'int' [-Wint-conversion]
struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from block/blk-mq-cpumap.c:15:
In file included from include/linux/blk-mq.h:5:
In file included from include/linux/blkdev.h:8:
In file included from include/linux/genhd.h:16:
>> include/linux/slab.h:658:21: error: static declaration of 'kcalloc' follows non-static declaration
static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
^
include/linux/group_cpus.h:17:26: note: previous implicit declaration is here
struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
^
In file included from block/blk-mq-cpumap.c:15:
In file included from include/linux/blk-mq.h:5:
In file included from include/linux/blkdev.h:18:
In file included from include/linux/bio.h:8:
In file included from include/linux/highmem.h:10:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/riscv/include/asm/io.h:136:
include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:36:51: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
^
In file included from block/blk-mq-cpumap.c:15:
In file included from include/linux/blk-mq.h:5:
In file included from include/linux/blkdev.h:18:
In file included from include/linux/bio.h:8:
In file included from include/linux/highmem.h:10:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/riscv/include/asm/io.h:136:
include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:34:51: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
^
In file included from block/blk-mq-cpumap.c:15:
In file included from include/linux/blk-mq.h:5:
In file included from include/linux/blkdev.h:18:
In file included from include/linux/bio.h:8:
In file included from include/linux/highmem.h:10:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/riscv/include/asm/io.h:136:
include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:1024:55: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
return (port > MMIO_UPPER_LIMIT) ? NULL : PCI_IOBASE + port;
~~~~~~~~~~ ^
8 warnings and 2 errors generated.


vim +/kcalloc +17 include/linux/group_cpus.h

759f72186bfdd5 Ming Lei 2021-08-14 11
5cd330f089b089 Ming Lei 2021-08-14 12 #ifdef CONFIG_SMP
759f72186bfdd5 Ming Lei 2021-08-14 13 struct cpumask *group_cpus_evenly(unsigned int numgrps);
5cd330f089b089 Ming Lei 2021-08-14 14 #else
5cd330f089b089 Ming Lei 2021-08-14 15 static inline struct cpumask *group_cpus_evenly(unsigned int numgrps)
5cd330f089b089 Ming Lei 2021-08-14 16 {
5cd330f089b089 Ming Lei 2021-08-14 @17 struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
5cd330f089b089 Ming Lei 2021-08-14 18
5cd330f089b089 Ming Lei 2021-08-14 19 if (!masks)
5cd330f089b089 Ming Lei 2021-08-14 20 return NULL;
5cd330f089b089 Ming Lei 2021-08-14 21
5cd330f089b089 Ming Lei 2021-08-14 22 /* assign all CPUs(cpu 0) to the 1st group only */
5cd330f089b089 Ming Lei 2021-08-14 23 cpumask_copy(&masks[0], cpu_possible_mask);
5cd330f089b089 Ming Lei 2021-08-14 24 return masks;
5cd330f089b089 Ming Lei 2021-08-14 25 }
5cd330f089b089 Ming Lei 2021-08-14 26 #endif
759f72186bfdd5 Ming Lei 2021-08-14 27

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (8.48 kB)
.config.gz (28.15 kB)
Download all attachments

2021-08-16 01:08:23

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 5/7] genirq/affinity: move group_cpus_evenly() into lib/

Hello,

On Sun, Aug 15, 2021 at 01:01:07AM +0800, kernel test robot wrote:
> Hi Ming,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on tip/irq/core]
> [also build test WARNING on next-20210813]
> [cannot apply to block/for-next linux/master linus/master v5.14-rc5]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/0day-ci/linux/commits/Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
> base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 04c2721d3530f0723b4c922a8fa9f26b202a20de
> config: hexagon-randconfig-r041-20210814 (attached as .config)
> compiler: clang version 12.0.0
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://github.com/0day-ci/linux/commit/759f72186bfdd5c3ba8b53ac0749cf7ba930012c
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
> git checkout 759f72186bfdd5c3ba8b53ac0749cf7ba930012c
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=hexagon
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <[email protected]>
>
> All warnings (new ones prefixed by >>):
>
> >> lib/group_cpus.c:344:17: warning: no previous prototype for function 'group_cpus_evenly' [-Wmissing-prototypes]
> struct cpumask *group_cpus_evenly(unsigned int numgrps)
> ^
> lib/group_cpus.c:344:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
> struct cpumask *group_cpus_evenly(unsigned int numgrps)
> ^
> static
> 1 warning generated.
>
>
> vim +/group_cpus_evenly +344 lib/group_cpus.c
>
> 328
> 329 /**
> 330 * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
> 331 * @numgrps: number of groups
> 332 *
> 333 * Return: cpumask array if successful, NULL otherwise. And each element
> 334 * includes CPUs assigned to this group
> 335 *
> 336 * Try to put close CPUs from viewpoint of CPU and NUMA locality into
> 337 * same group, and run two-stage grouping:
> 338 * 1) allocate present CPUs on these groups evenly first
> 339 * 2) allocate other possible CPUs on these groups evenly
> 340 *
> 341 * We guarantee in the resulted grouping that all CPUs are covered, and
> 342 * no same CPU is assigned to different groups
> 343 */
> > 344 struct cpumask *group_cpus_evenly(unsigned int numgrps)

But the above symbol is exported via EXPORT_SYMBOL_GPL(), in current
kernel tree, we usually keep such exported symbol as global, or is there
some change in kernel coding style recently?



Thanks,
Ming

2021-08-16 07:28:10

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 7/7] blk-mq: build default queue map via group_cpus_evenly()

Hello,

On Sun, Aug 15, 2021 at 04:49:25AM +0800, kernel test robot wrote:
> Hi Ming,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on tip/irq/core]
> [also build test ERROR on next-20210813]
> [cannot apply to block/for-next linux/master linus/master v5.14-rc5]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/0day-ci/linux/commits/Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
> base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 04c2721d3530f0723b4c922a8fa9f26b202a20de
> config: riscv-buildonly-randconfig-r005-20210814 (attached as .config)
> compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 1f7b25ea76a925aca690da28de9d78db7ca99d0c)
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://github.com/0day-ci/linux/commit/46b1d0ed609db266f6f18e7156c4f294bf6c4502
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
> git checkout 46b1d0ed609db266f6f18e7156c4f294bf6c4502
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <[email protected]>
>
> All error/warnings (new ones prefixed by >>):
>
> In file included from block/blk-mq-cpumap.c:13:
> >> include/linux/group_cpus.h:17:26: error: implicit declaration of function 'kcalloc' [-Werror,-Wimplicit-function-declaration]
> struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
> ^
> include/linux/group_cpus.h:17:26: note: did you mean 'kvcalloc'?
> include/linux/mm.h:827:21: note: 'kvcalloc' declared here
> static inline void *kvcalloc(size_t n, size_t size, gfp_t flags)
> ^
> In file included from block/blk-mq-cpumap.c:13:
> >> include/linux/group_cpus.h:17:18: warning: incompatible integer to pointer conversion initializing 'struct cpumask *' with an expression of type 'int' [-Wint-conversion]
> struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
> ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Will fix it in next version, and it can be done by include <linux/slab.h> into
include/linux/group_cpus.h.


Thanks,
Ming

2021-08-16 10:09:16

by Chen, Rong A

[permalink] [raw]
Subject: Re: [PATCH 5/7] genirq/affinity: move group_cpus_evenly() into lib/


Hi Ming,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/irq/core]
[also build test WARNING on next-20210813]
[cannot apply to block/for-next linux/master linus/master v5.14-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
04c2721d3530f0723b4c922a8fa9f26b202a20de
config: x86_64-randconfig-c001-20210814 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project
1f7b25ea76a925aca690da28de9d78db7ca99d0c)
reproduce (this is a W=1 build):
wget
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
-O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
#
https://github.com/0day-ci/linux/commit/759f72186bfdd5c3ba8b53ac0749cf7ba930012c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review
Ming-Lei/genirq-affinity-abstract-new-API-from-managed-irq-affinity-spread/20210814-203741
git checkout 759f72186bfdd5c3ba8b53ac0749cf7ba930012c
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross
ARCH=x86_64 clang-analyzer
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


clang-analyzer warnings: (new ones prefixed by >>)
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
7 warnings generated.
Suppressed 7 warnings (6 in non-user code, 1 with check filters).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
5 warnings generated.
Suppressed 5 warnings (5 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
5 warnings generated.
Suppressed 5 warnings (5 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
5 warnings generated.
Suppressed 5 warnings (5 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
5 warnings generated.
Suppressed 5 warnings (5 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
3 warnings generated.
Suppressed 3 warnings (3 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
7 warnings generated.
lib/glob.c:48:32: warning: Assigned value is garbage or undefined
[clang-analyzer-core.uninitialized.Assign]
char const *back_pat = NULL, *back_str = back_str;
^ ~~~~~~~~
lib/glob.c:48:32: note: Assigned value is garbage or undefined
char const *back_pat = NULL, *back_str = back_str;
^ ~~~~~~~~
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
7 warnings generated.
lib/strnlen_user.c:34:2: warning: Value stored to 'src' is never
read [clang-analyzer-deadcode.DeadStores]
src -= align;
^ ~~~~~
lib/strnlen_user.c:34:2: note: Value stored to 'src' is never read
src -= align;
^ ~~~~~
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
7 warnings generated.
Suppressed 7 warnings (7 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
7 warnings generated.
Suppressed 7 warnings (7 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
7 warnings generated.
lib/oid_registry.c:149:3: warning: Value stored to 'num' is never
read [clang-analyzer-deadcode.DeadStores]
num = 0;
^ ~
lib/oid_registry.c:149:3: note: Value stored to 'num' is never read
num = 0;
^ ~
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
9 warnings generated.
Suppressed 9 warnings (2 in non-user code, 7 with check filters).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
7 warnings generated.
>> lib/group_cpus.c:236:22: warning: Division by zero [clang-analyzer-core.DivideZero]
numgrps * ncpus / remaining_ncpus);
^
lib/group_cpus.c:352:2: note: Taking false branch
if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
^
lib/group_cpus.c:355:2: note: Taking false branch
if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
^
lib/group_cpus.c:359:7: note: 'node_to_cpumask' is non-null
if (!node_to_cpumask)
^~~~~~~~~~~~~~~
lib/group_cpus.c:359:2: note: Taking false branch
if (!node_to_cpumask)
^
lib/group_cpus.c:363:6: note: Assuming 'masks' is non-null
if (!masks)
^~~~~~
lib/group_cpus.c:363:2: note: Taking false branch
if (!masks)
^
lib/group_cpus.c:371:8: note: Calling '__group_cpus_evenly'
ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib/group_cpus.c:257:6: note: Assuming the condition is false
if (!cpumask_weight(cpu_mask))
^~~~~~~~~~~~~~~~~~~~~~~~~
lib/group_cpus.c:257:2: note: Taking false branch
if (!cpumask_weight(cpu_mask))
^
lib/group_cpus.c:266:6: note: Assuming 'numgrps' is > 'nodes'
if (numgrps <= nodes) {
^~~~~~~~~~~~~~~~
lib/group_cpus.c:266:2: note: Taking false branch
if (numgrps <= nodes) {
^
lib/group_cpus.c:279:6: note: Assuming 'node_groups' is non-null
if (!node_groups)
^~~~~~~~~~~~
lib/group_cpus.c:279:2: note: Taking false branch
if (!node_groups)
^
lib/group_cpus.c:283:2: note: Calling 'alloc_nodes_groups'
alloc_nodes_groups(numgrps, node_to_cpumask, cpu_mask,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib/group_cpus.c:134:14: note: 'remaining_ncpus' initialized to 0
unsigned n, remaining_ncpus = 0;
^~~~~~~~~~~~~~~
lib/group_cpus.c:136:2: note: Loop condition is true. Entering loop
body
for (n = 0; n < nr_node_ids; n++) {
^
lib/group_cpus.c:136:2: note: Loop condition is false. Execution
continues on line 141
lib/group_cpus.c:141:2: note: Taking false branch
for_each_node_mask(n, nodemsk) {
^
include/linux/nodemask.h:384:2: note: expanded from macro
'for_each_node_mask'
if (!nodes_empty(mask)) \
^
lib/group_cpus.c:153:12: note: '__UNIQUE_ID___x401' is <
'__UNIQUE_ID___y402'
numgrps = min_t(unsigned, remaining_ncpus, numgrps);
^
include/linux/minmax.h:104:27: note: expanded from macro 'min_t'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/minmax.h:38:3: note: expanded from macro '__careful_cmp'
__cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y),
op))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/minmax.h:33:3: note: expanded from macro '__cmp_once'
__cmp(unique_x, unique_y, op); })
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/minmax.h:28:26: note: expanded from macro '__cmp'
#define __cmp(x, y, op) ((x) op (y) ? (x) : (y))
^~~
lib/group_cpus.c:153:12: note: '?' condition is true
numgrps = min_t(unsigned, remaining_ncpus, numgrps);
^
include/linux/minmax.h:104:27: note: expanded from macro 'min_t'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
^
include/linux/minmax.h:38:3: note: expanded from macro '__careful_cmp'
__cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y),
op))
^
include/linux/minmax.h:33:3: note: expanded from macro '__cmp_once'
__cmp(unique_x, unique_y, op); })
^
include/linux/minmax.h:28:26: note: expanded from macro '__cmp'
#define __cmp(x, y, op) ((x) op (y) ? (x) : (y))
^
lib/group_cpus.c:226:2: note: Loop condition is true. Entering loop
body
for (n = 0; n < nr_node_ids; n++) {
^
lib/group_cpus.c:229:7: note: Assuming the condition is false
if (node_groups[n].ncpus == UINT_MAX)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib/group_cpus.c:229:3: note: Taking false branch
if (node_groups[n].ncpus == UINT_MAX)
^
lib/group_cpus.c:232:3: note: Taking true branch
WARN_ON_ONCE(numgrps == 0);
^
include/asm-generic/bug.h:105:2: note: expanded from macro
'WARN_ON_ONCE'

vim +236 lib/group_cpus.c

759f72186bfdd5 Ming Lei 2021-08-14 113 759f72186bfdd5 Ming Lei
2021-08-14 114 /*
759f72186bfdd5 Ming Lei 2021-08-14 115 * Allocate group number for
each node, so that for each node:
759f72186bfdd5 Ming Lei 2021-08-14 116 *
759f72186bfdd5 Ming Lei 2021-08-14 117 * 1) the allocated number is >= 1
759f72186bfdd5 Ming Lei 2021-08-14 118 *
759f72186bfdd5 Ming Lei 2021-08-14 119 * 2) the allocated number is
<= active CPU number of this node
759f72186bfdd5 Ming Lei 2021-08-14 120 *
759f72186bfdd5 Ming Lei 2021-08-14 121 * The actual allocated total
groups may be less than @numgrps when
759f72186bfdd5 Ming Lei 2021-08-14 122 * active total CPU number is
less than @numgrps.
759f72186bfdd5 Ming Lei 2021-08-14 123 *
759f72186bfdd5 Ming Lei 2021-08-14 124 * Active CPUs means the CPUs
in '@cpu_mask AND @node_to_cpumask[]'
759f72186bfdd5 Ming Lei 2021-08-14 125 * for each node.
759f72186bfdd5 Ming Lei 2021-08-14 126 */
759f72186bfdd5 Ming Lei 2021-08-14 127 static void
alloc_nodes_groups(unsigned int numgrps,
759f72186bfdd5 Ming Lei 2021-08-14 128 cpumask_var_t
*node_to_cpumask,
759f72186bfdd5 Ming Lei 2021-08-14 129 const struct cpumask
*cpu_mask,
759f72186bfdd5 Ming Lei 2021-08-14 130 const nodemask_t nodemsk,
759f72186bfdd5 Ming Lei 2021-08-14 131 struct cpumask *nmsk,
759f72186bfdd5 Ming Lei 2021-08-14 132 struct node_groups
*node_groups)
759f72186bfdd5 Ming Lei 2021-08-14 133 {
759f72186bfdd5 Ming Lei 2021-08-14 134 unsigned n, remaining_ncpus = 0;
759f72186bfdd5 Ming Lei 2021-08-14 135 759f72186bfdd5 Ming Lei
2021-08-14 136 for (n = 0; n < nr_node_ids; n++) {
759f72186bfdd5 Ming Lei 2021-08-14 137 node_groups[n].id = n;
759f72186bfdd5 Ming Lei 2021-08-14 138 node_groups[n].ncpus = UINT_MAX;
759f72186bfdd5 Ming Lei 2021-08-14 139 }
759f72186bfdd5 Ming Lei 2021-08-14 140 759f72186bfdd5 Ming Lei
2021-08-14 141 for_each_node_mask(n, nodemsk) {
759f72186bfdd5 Ming Lei 2021-08-14 142 unsigned ncpus;
759f72186bfdd5 Ming Lei 2021-08-14 143 759f72186bfdd5 Ming Lei
2021-08-14 144 cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
759f72186bfdd5 Ming Lei 2021-08-14 145 ncpus = cpumask_weight(nmsk);
759f72186bfdd5 Ming Lei 2021-08-14 146 759f72186bfdd5 Ming Lei
2021-08-14 147 if (!ncpus)
759f72186bfdd5 Ming Lei 2021-08-14 148 continue;
759f72186bfdd5 Ming Lei 2021-08-14 149 remaining_ncpus += ncpus;
759f72186bfdd5 Ming Lei 2021-08-14 150 node_groups[n].ncpus = ncpus;
759f72186bfdd5 Ming Lei 2021-08-14 151 }
759f72186bfdd5 Ming Lei 2021-08-14 152 759f72186bfdd5 Ming Lei
2021-08-14 153 numgrps = min_t(unsigned, remaining_ncpus, numgrps);
759f72186bfdd5 Ming Lei 2021-08-14 154 759f72186bfdd5 Ming Lei
2021-08-14 155 sort(node_groups, nr_node_ids, sizeof(node_groups[0]),
759f72186bfdd5 Ming Lei 2021-08-14 156 ncpus_cmp_func, NULL);
759f72186bfdd5 Ming Lei 2021-08-14 157 759f72186bfdd5 Ming Lei
2021-08-14 158 /*
759f72186bfdd5 Ming Lei 2021-08-14 159 * Allocate groups for each
node according to the ratio of this
759f72186bfdd5 Ming Lei 2021-08-14 160 * node's nr_cpus to remaining
un-assigned ncpus. 'numgrps' is
759f72186bfdd5 Ming Lei 2021-08-14 161 * bigger than number of
active numa nodes. Always start the
759f72186bfdd5 Ming Lei 2021-08-14 162 * allocation from the node
with minimized nr_cpus.
759f72186bfdd5 Ming Lei 2021-08-14 163 *
759f72186bfdd5 Ming Lei 2021-08-14 164 * This way guarantees that
each active node gets allocated at
759f72186bfdd5 Ming Lei 2021-08-14 165 * least one group, and the
theory is simple: over-allocation
759f72186bfdd5 Ming Lei 2021-08-14 166 * is only done when this node
is assigned by one group, so
759f72186bfdd5 Ming Lei 2021-08-14 167 * other nodes will be
allocated >= 1 groups, since 'numgrps' is
759f72186bfdd5 Ming Lei 2021-08-14 168 * bigger than number of numa
nodes.
759f72186bfdd5 Ming Lei 2021-08-14 169 *
759f72186bfdd5 Ming Lei 2021-08-14 170 * One perfect invariant is
that number of allocated groups for
759f72186bfdd5 Ming Lei 2021-08-14 171 * each node is <= CPU count
of this node:
759f72186bfdd5 Ming Lei 2021-08-14 172 *
759f72186bfdd5 Ming Lei 2021-08-14 173 * 1) suppose there are two
nodes: A and B
759f72186bfdd5 Ming Lei 2021-08-14 174 * ncpu(X) is CPU count of node X
759f72186bfdd5 Ming Lei 2021-08-14 175 * grps(X) is the group count
allocated to node X via this
759f72186bfdd5 Ming Lei 2021-08-14 176 * algorithm
759f72186bfdd5 Ming Lei 2021-08-14 177 *
759f72186bfdd5 Ming Lei 2021-08-14 178 * ncpu(A) <= ncpu(B)
759f72186bfdd5 Ming Lei 2021-08-14 179 * ncpu(A) + ncpu(B) = N
759f72186bfdd5 Ming Lei 2021-08-14 180 * grps(A) + grps(B) = G
759f72186bfdd5 Ming Lei 2021-08-14 181 *
759f72186bfdd5 Ming Lei 2021-08-14 182 * grps(A) = max(1,
round_down(G * ncpu(A) / N))
759f72186bfdd5 Ming Lei 2021-08-14 183 * grps(B) = G - grps(A)
759f72186bfdd5 Ming Lei 2021-08-14 184 *
759f72186bfdd5 Ming Lei 2021-08-14 185 * both N and G are integer,
and 2 <= G <= N, suppose
759f72186bfdd5 Ming Lei 2021-08-14 186 * G = N - delta, and 0 <=
delta <= N - 2
759f72186bfdd5 Ming Lei 2021-08-14 187 *
759f72186bfdd5 Ming Lei 2021-08-14 188 * 2) obviously grps(A) <=
ncpu(A) because:
759f72186bfdd5 Ming Lei 2021-08-14 189 *
759f72186bfdd5 Ming Lei 2021-08-14 190 * if grps(A) is 1, then
grps(A) <= ncpu(A) given
759f72186bfdd5 Ming Lei 2021-08-14 191 * ncpu(A) >= 1
759f72186bfdd5 Ming Lei 2021-08-14 192 *
759f72186bfdd5 Ming Lei 2021-08-14 193 * otherwise,
759f72186bfdd5 Ming Lei 2021-08-14 194 * grps(A) <= G * ncpu(A) /
N <= ncpu(A), given G <= N
759f72186bfdd5 Ming Lei 2021-08-14 195 *
759f72186bfdd5 Ming Lei 2021-08-14 196 * 3) prove how grps(B) <=
ncpu(B):
759f72186bfdd5 Ming Lei 2021-08-14 197 *
759f72186bfdd5 Ming Lei 2021-08-14 198 * if round_down(G * ncpu(A)
/ N) == 0, vecs(B) won't be
759f72186bfdd5 Ming Lei 2021-08-14 199 * over-allocated, so grps(B)
<= ncpu(B),
759f72186bfdd5 Ming Lei 2021-08-14 200 *
759f72186bfdd5 Ming Lei 2021-08-14 201 * otherwise:
759f72186bfdd5 Ming Lei 2021-08-14 202 *
759f72186bfdd5 Ming Lei 2021-08-14 203 * grps(A) =
759f72186bfdd5 Ming Lei 2021-08-14 204 * round_down(G * ncpu(A) / N) =
759f72186bfdd5 Ming Lei 2021-08-14 205 * round_down((N - delta) *
ncpu(A) / N) =
759f72186bfdd5 Ming Lei 2021-08-14 206 * round_down((N * ncpu(A) -
delta * ncpu(A)) / N) >=
759f72186bfdd5 Ming Lei 2021-08-14 207 * round_down((N * ncpu(A) -
delta * N) / N) =
759f72186bfdd5 Ming Lei 2021-08-14 208 * cpu(A) - delta
759f72186bfdd5 Ming Lei 2021-08-14 209 *
759f72186bfdd5 Ming Lei 2021-08-14 210 * then:
759f72186bfdd5 Ming Lei 2021-08-14 211 *
759f72186bfdd5 Ming Lei 2021-08-14 212 * grps(A) - G >= ncpu(A) -
delta - G
759f72186bfdd5 Ming Lei 2021-08-14 213 * =>
759f72186bfdd5 Ming Lei 2021-08-14 214 * G - grps(A) <= G + delta -
ncpu(A)
759f72186bfdd5 Ming Lei 2021-08-14 215 * =>
759f72186bfdd5 Ming Lei 2021-08-14 216 * grps(B) <= N - ncpu(A)
759f72186bfdd5 Ming Lei 2021-08-14 217 * =>
759f72186bfdd5 Ming Lei 2021-08-14 218 * grps(B) <= cpu(B)
759f72186bfdd5 Ming Lei 2021-08-14 219 *
759f72186bfdd5 Ming Lei 2021-08-14 220 * For nodes >= 3, it can be
thought as one node and another big
759f72186bfdd5 Ming Lei 2021-08-14 221 * node given that is exactly
what this algorithm is implemented,
759f72186bfdd5 Ming Lei 2021-08-14 222 * and we always re-calculate
'remaining_ncpus' & 'numgrps', and
759f72186bfdd5 Ming Lei 2021-08-14 223 * finally for each node X:
grps(X) <= ncpu(X).
759f72186bfdd5 Ming Lei 2021-08-14 224 *
759f72186bfdd5 Ming Lei 2021-08-14 225 */
759f72186bfdd5 Ming Lei 2021-08-14 226 for (n = 0; n < nr_node_ids;
n++) {
759f72186bfdd5 Ming Lei 2021-08-14 227 unsigned ngroups, ncpus;
759f72186bfdd5 Ming Lei 2021-08-14 228 759f72186bfdd5 Ming Lei
2021-08-14 229 if (node_groups[n].ncpus == UINT_MAX)
759f72186bfdd5 Ming Lei 2021-08-14 230 continue;
759f72186bfdd5 Ming Lei 2021-08-14 231 759f72186bfdd5 Ming Lei
2021-08-14 232 WARN_ON_ONCE(numgrps == 0);
759f72186bfdd5 Ming Lei 2021-08-14 233 759f72186bfdd5 Ming Lei
2021-08-14 234 ncpus = node_groups[n].ncpus;
759f72186bfdd5 Ming Lei 2021-08-14 235 ngroups = max_t(unsigned, 1,
759f72186bfdd5 Ming Lei 2021-08-14 @236 numgrps * ncpus /
remaining_ncpus);
759f72186bfdd5 Ming Lei 2021-08-14 237 WARN_ON_ONCE(ngroups > ncpus);
759f72186bfdd5 Ming Lei 2021-08-14 238 759f72186bfdd5 Ming Lei
2021-08-14 239 node_groups[n].ngroups = ngroups;
759f72186bfdd5 Ming Lei 2021-08-14 240 759f72186bfdd5 Ming Lei
2021-08-14 241 remaining_ncpus -= ncpus;
759f72186bfdd5 Ming Lei 2021-08-14 242 numgrps -= ngroups;
759f72186bfdd5 Ming Lei 2021-08-14 243 }
759f72186bfdd5 Ming Lei 2021-08-14 244 }
759f72186bfdd5 Ming Lei 2021-08-14 245
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
.config.gz (30.88 kB)
Attached Message Part (154.00 B)
Download all attachments

2021-08-17 04:50:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/7] genirq/affinity: remove the 'firstvec' parameter from irq_build_affinity_masks

On Sat, Aug 14, 2021 at 08:35:26PM +0800, Ming Lei wrote:
> The 'firstvec' parameter is always same with the parameter of
> 'startvec', so use 'startvec' directly inside irq_build_affinity_masks().
>
> Signed-off-by: Ming Lei <[email protected]>

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2021-08-17 04:51:48

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/7] genirq/affinity: don't pass irq_affinity_desc array to irq_build_affinity_masks

On Sat, Aug 14, 2021 at 08:35:28PM +0800, Ming Lei wrote:
> Prepare for abstracting irq_build_affinity_masks() into one public helper
> for assigning all CPUs evenly into several groups. Don't passing

s/passing/pass/

> irq_affinity_desc array to irq_build_affinity_masks, instead returning

s/returning/return/

> one cpumask array by storing each assigned group into one element of

s/one/a/

> the array.
>
> This way helps us to provide generic interface for grouping all CPUs

s/way //

Otherwise looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2021-08-17 04:54:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/7] genirq/affinity: rename irq_build_affinity_masks as group_cpus_evenly

s/as/to/ in the subjects.

On Sat, Aug 14, 2021 at 08:35:29PM +0800, Ming Lei wrote:
> Map irq vector into group, so we can abstract the algorithm for generic
> use case.

s/vector/vectors/

Reviewed-by: Christoph Hellwig <[email protected]>

2021-08-17 04:55:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 6/7] lib/group_cpus: allow to group cpus in case of !CONFIG_SMP

On Sat, Aug 14, 2021 at 08:35:31PM +0800, Ming Lei wrote:
> Allows group_cpus_evenly() to be called in case of !CONFIG_SMP by simply
> assigning all CPUs into the 1st group.

Looks good, but almost too large for an inline function.

2021-08-17 04:56:35

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 5/7] genirq/affinity: move group_cpus_evenly() into lib/

On Mon, Aug 16, 2021 at 09:04:21AM +0800, Ming Lei wrote:
> But the above symbol is exported via EXPORT_SYMBOL_GPL(), in current
> kernel tree, we usually keep such exported symbol as global, or is there
> some change in kernel coding style recently?

This is about prototypes. You need to include group_cpus.h in
group_cpus.c so that the prototype is visible at the implementation site.

2021-08-18 08:39:24

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 4/7] genirq/affinity: rename irq_build_affinity_masks as group_cpus_evenly

On Tue, Aug 17, 2021 at 06:50:27AM +0200, Christoph Hellwig wrote:
> s/as/to/ in the subjects.
>
> On Sat, Aug 14, 2021 at 08:35:29PM +0800, Ming Lei wrote:
> > Map irq vector into group, so we can abstract the algorithm for generic
> > use case.
>
> s/vector/vectors/

One group actually is abstracted from one irq vector, and it can represent
vector, blk-mq hw queue and others. Currently genirq/affinity spreads
vectors across all possible cpus, since this patch we spread groups
among all possible cpus evenly.

Thanks,
Ming