LinuxLists.cc - [RFC 0/9] padata: use unbound workqueues for parallel jobs

2019-07-25 21:26:35

Subject: [RFC 0/9] padata: use unbound workqueues for parallel jobs

Padata binds the parallel part of a job to a single CPU. Though the
serial parts rely on per-CPU queues for correct ordering, they're not
necessary for parallel work, and it improves performance to run the job
locally on NUMA machines and let the scheduler pick the CPU within a
node on a busy system.

This series makes parallel padata jobs run on unbound workqueues.

Any feedback is welcome, but I'm particularly hoping for comments on the
high-level approach.

Thanks,
Daniel

Patch Description
----- -----------

1 Make a padata instance allocate its workqueue internally.

2 Unconfine some recently-confined workqueue interfaces.

3-6 Address recursive CPU hotplug locking issue.

padata_alloc* requires its callers to hold this lock, but allocating
an unbound workqueue and calling apply_workqueue_attrs also take it.
Fix by removing the requirement for callers of padata_alloc*.

7-8 Add a second workqueue for each padata instance that's dedicated to
parallel jobs.

9 Small cleanup.

Performance
-----------

Measurements are from a 2-socket, 20-core, 40-CPU Xeon server.

For repeatability, modprobe was bound to a CPU and the serial cpumasks
for both pencrypt and pdecrypt were also restricted to a CPU different
from modprobe's.

# modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
# modprobe tcrypt mode=211 sec=1
# modprobe tcrypt mode=215 sec=1

Busy system (tcrypt run while 10 stress-ng tasks were burning 100% CPU)

base test
---------------- ---------------
speedup key_sz blk_sz ops/sec stdev ops/sec stdev

(pcrypt(rfc4106-gcm-aesni)) encryption (tcrypt mode=211)

117.2x 160 16 960 30 112555 24775
135.1x 160 64 845 246 114145 25124
113.2x 160 256 993 17 112395 24714
111.3x 160 512 1000 0 111252 23755
110.0x 160 1024 983 16 108153 22374
104.2x 160 2048 985 22 102563 20530
98.5x 160 4096 998 3 98346 18777
86.2x 160 8192 1000 0 86173 14480

(pcrypt(rfc4106-gcm-aesni)) decryption (tcrypt mode=211)

127.2x 160 16 997 5 126834 24244
128.4x 160 64 1000 0 128438 23261
127.6x 160 256 992 7 126627 23493
124.0x 160 512 1000 0 123958 22746
122.8x 160 1024 989 20 121372 22632
112.8x 160 2048 998 3 112602 18287
106.9x 160 4096 994 10 106255 16111
91.7x 160 8192 1000 0 91742 11670

multibuffer (pcrypt(rfc4106-gcm-aesni)) encryption (tcrypt mode=215)

242.2x 160 16 2363 141 572189 16846
242.1x 160 64 2397 151 580424 11923
231.1x 160 256 2472 21 571387 16364
237.6x 160 512 2429 24 577264 8692
238.3x 160 1024 2384 97 568155 6621
216.3x 160 2048 2453 74 530627 3480
209.2x 160 4096 2381 206 498192 19177
176.5x 160 8192 2323 157 410013 9903

multibuffer (pcrypt(rfc4106-gcm-aesni)) decryption (tcrypt mode=215)

220.3x 160 16 2341 228 515733 91317
216.6x 160 64 2467 33 534381 101262
217.7x 160 256 2451 45 533443 85418
213.8x 160 512 2485 26 531293 83767
211.0x 160 1024 2472 28 521677 80339
200.8x 160 2048 2459 67 493808 63587
188.8x 160 4096 2491 9 470325 58055
159.9x 160 8192 2459 51 393147 25756

Idle system (tcrypt run by itself)

base test
---------------- ---------------
speedup key_sz blk_sz ops/sec stdev ops/sec stdev

(pcrypt(rfc4106-gcm-aesni)) encryption (tcrypt mode=211)

2.5x 160 16 63412 43075 161615 1034
4.1x 160 64 39554 24006 161653 981
6.0x 160 256 26504 1436 160110 1158
6.2x 160 512 25500 40 157018 951
5.9x 160 1024 25777 1094 151852 915
5.8x 160 2048 24653 218 143756 508
5.6x 160 4096 24333 20 136752 548
5.0x 160 8192 23310 15 117660 481

(pcrypt(rfc4106-gcm-aesni)) decryption (tcrypt mode=211)

2.4x 160 16 53471 48279 128047 31328
3.4x 160 64 37712 20855 128187 31074
4.5x 160 256 27911 4378 126430 31084
4.9x 160 512 25346 175 123870 29099
3.1x 160 1024 38452 23118 120817 26846
4.7x 160 2048 24612 187 115036 23942
4.5x 160 4096 24217 114 109583 21559
4.2x 160 8192 23144 108 96850 16686

multibuffer (pcrypt(rfc4106-gcm-aesni)) encryption (tcrypt mode=215)

1.0x 160 16 412157 3855 426973 1591
1.0x 160 64 412600 4410 431920 4224
1.1x 160 256 410352 3254 453691 17831
1.2x 160 512 406293 4948 473491 39818
1.2x 160 1024 395123 7804 478539 27660
1.2x 160 2048 385144 7601 453720 17579
1.2x 160 4096 371989 3631 449923 15331
1.2x 160 8192 346723 1617 399824 18559

multibuffer (pcrypt(rfc4106-gcm-aesni)) decryption (tcrypt mode=215)

1.1x 160 16 407317 1487 452619 14404
1.1x 160 64 411821 4261 464059 23541
1.2x 160 256 408941 4945 477483 36576
1.2x 160 512 406451 611 472661 11038
1.2x 160 1024 394813 2667 456357 11452
1.2x 160 2048 390291 4175 448928 8957
1.2x 160 4096 371904 1068 449344 14225
1.2x 160 8192 344227 1973 404397 19540

Series based on recent mainline plus these two patches:
https://lore.kernel.org/linux-crypto/[email protected]/
https://lore.kernel.org/linux-crypto/[email protected]/

Daniel Jordan (9):
padata: allocate workqueue internally
workqueue: unconfine alloc/apply/free_workqueue_attrs()
workqueue: require CPU hotplug read exclusion for
apply_workqueue_attrs
padata: make padata_do_parallel find alternate callback CPU
pcrypt: remove padata cpumask notifier
padata, pcrypt: take CPU hotplug lock internally in
padata_alloc_possible
padata: use separate workqueues for parallel and serial work
padata: unbind parallel jobs from specific CPUs
padata: remove cpu_index from the parallel_queue

Documentation/padata.txt | 12 +--
crypto/pcrypt.c | 167 ++++-------------------------------
include/linux/padata.h | 17 ++--
include/linux/workqueue.h | 4 +
kernel/padata.c | 180 ++++++++++++++++++++++----------------
kernel/workqueue.c | 25 ++++--
6 files changed, 159 insertions(+), 246 deletions(-)

--
2.22.0

2019-07-25 21:27:06

by Daniel Jordan

[permalink] [raw]

Subject: [RFC 8/9] padata: unbind parallel jobs from specific CPUs

Padata binds the parallel part of a job to a single CPU. Though the
serial parts rely on per-CPU queues, it's not necessary for the parallel
part, and it's beneficial to run the job locally on NUMA machines
and let the scheduler pick the CPU within a node on a busy system.

So, make the parallel workqueue unbound.

Update the parallel workqueue's cpumask when the instance's parallel
cpumask changes.

Now that parallel jobs no longer run on max_active=1 workqueues, two or
more parallel works that hash to the same CPU may run simultaneously,
finish out of order, and so be serialized out of order. Prevent this by
keeping the works sorted on the reorder list by sequence number and
teaching padata_get_next about it.

The ENODATA case in padata_get_next no longer makes sense because
parallel jobs aren't bound to specific CPUs. The EINPROGRESS case takes
care of the scenario where a parallel job is potentially running on the
same CPU as padata_get_next.

Signed-off-by: Daniel Jordan <[email protected]>
---
include/linux/padata.h | 4 +-
kernel/padata.c | 97 +++++++++++++++++++++++-------------------
2 files changed, 57 insertions(+), 44 deletions(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index e7978f8942ca..cc420064186f 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -35,6 +35,7 @@ struct padata_priv {
struct parallel_data *pd;
int cb_cpu;
int cpu;
+ unsigned int seq_nr;
int info;
void (*parallel)(struct padata_priv *padata);
void (*serial)(struct padata_priv *padata);
@@ -104,7 +105,7 @@ struct padata_cpumask {
* @squeue: percpu padata queues used for serialuzation.
* @reorder_objects: Number of objects waiting in the reorder queues.
* @refcnt: Number of objects holding a reference on this parallel_data.
- * @max_seq_nr: Maximal used sequence number.
+ * @processed: Number of already processed objects.
* @cpu: Next CPU to be processed.
* @cpumask: The cpumasks in use for parallel and serial workers.
* @reorder_work: work struct for reordering.
@@ -117,6 +118,7 @@ struct parallel_data {
atomic_t reorder_objects;
atomic_t refcnt;
atomic_t seq_nr;
+ unsigned int processed;
int cpu;
struct padata_cpumask cpumask;
struct work_struct reorder_work;
diff --git a/kernel/padata.c b/kernel/padata.c
index 44c647f7300e..09a7dbdd9678 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -46,18 +46,13 @@ static int padata_index_to_cpu(struct parallel_data *pd, int cpu_index)
return target_cpu;
}

-static int padata_cpu_hash(struct parallel_data *pd)
+static int padata_cpu_hash(struct parallel_data *pd, unsigned int seq_nr)
{
- unsigned int seq_nr;
- int cpu_index;
-
/*
* Hash the sequence numbers to the cpus by taking
* seq_nr mod. number of cpus in use.
*/
-
- seq_nr = atomic_inc_return(&pd->seq_nr);
- cpu_index = seq_nr % cpumask_weight(pd->cpumask.pcpu);
+ int cpu_index = seq_nr % cpumask_weight(pd->cpumask.pcpu);

return padata_index_to_cpu(pd, cpu_index);
}
@@ -144,7 +139,8 @@ int padata_do_parallel(struct padata_instance *pinst,
padata->pd = pd;
padata->cb_cpu = *cb_cpu;

- target_cpu = padata_cpu_hash(pd);
+ padata->seq_nr = atomic_inc_return(&pd->seq_nr);
+ target_cpu = padata_cpu_hash(pd, padata->seq_nr);
padata->cpu = target_cpu;
queue = per_cpu_ptr(pd->pqueue, target_cpu);

@@ -152,7 +148,7 @@ int padata_do_parallel(struct padata_instance *pinst,
list_add_tail(&padata->list, &queue->parallel.list);
spin_unlock(&queue->parallel.lock);

- queue_work_on(target_cpu, pinst->parallel_wq, &queue->work);
+ queue_work(pinst->parallel_wq, &queue->work);

out:
rcu_read_unlock_bh();
@@ -172,9 +168,6 @@ EXPORT_SYMBOL(padata_do_parallel);
* -EINPROGRESS, if the next object that needs serialization will
* be parallel processed by another cpu and is not yet present in
* the cpu's reorder queue.
- *
- * -ENODATA, if this cpu has to do the parallel processing for
- * the next object.
*/
static struct padata_priv *padata_get_next(struct parallel_data *pd)
{
@@ -191,22 +184,25 @@ static struct padata_priv *padata_get_next(struct parallel_data *pd)
padata = list_entry(reorder->list.next,
struct padata_priv, list);

- list_del_init(&padata->list);
- atomic_dec(&pd->reorder_objects);
+ /*
+ * The check fails in the unlikely event that two or more
+ * parallel jobs have hashed to the same CPU and one of the
+ * later ones finishes first.
+ */
+ if (padata->seq_nr == pd->processed) {
+ list_del_init(&padata->list);
+ atomic_dec(&pd->reorder_objects);

- pd->cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1,
- false);
+ ++pd->processed;
+ pd->cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1,
+ false);

- spin_unlock(&reorder->lock);
- goto out;
+ spin_unlock(&reorder->lock);
+ goto out;
+ }
}
spin_unlock(&reorder->lock);

- if (__this_cpu_read(pd->pqueue->cpu_index) == next_queue->cpu_index) {
- padata = ERR_PTR(-ENODATA);
- goto out;
- }
-
padata = ERR_PTR(-EINPROGRESS);
out:
return padata;
@@ -244,16 +240,6 @@ static void padata_reorder(struct parallel_data *pd)
if (PTR_ERR(padata) == -EINPROGRESS)
break;

- /*
- * This cpu has to do the parallel processing of the next
- * object. It's waiting in the cpu's parallelization queue,
- * so exit immediately.
- */
- if (PTR_ERR(padata) == -ENODATA) {
- spin_unlock_bh(&pd->lock);
- return;
- }
-
cb_cpu = padata->cb_cpu;
squeue = per_cpu_ptr(pd->squeue, cb_cpu);

@@ -332,9 +318,14 @@ void padata_do_serial(struct padata_priv *padata)
struct parallel_data *pd = padata->pd;
struct padata_parallel_queue *pqueue = per_cpu_ptr(pd->pqueue,
padata->cpu);
+ struct padata_priv *cur;

spin_lock(&pqueue->reorder.lock);
- list_add_tail(&padata->list, &pqueue->reorder.list);
+ /* Sort in ascending order of sequence number. */
+ list_for_each_entry_reverse(cur, &pqueue->reorder.list, list)
+ if (cur->seq_nr < padata->seq_nr)
+ break;
+ list_add(&padata->list, &cur->list);
atomic_inc(&pd->reorder_objects);
spin_unlock(&pqueue->reorder.lock);

@@ -353,17 +344,36 @@ static int padata_setup_cpumasks(struct parallel_data *pd,
const struct cpumask *pcpumask,
const struct cpumask *cbcpumask)
{
- if (!alloc_cpumask_var(&pd->cpumask.pcpu, GFP_KERNEL))
- return -ENOMEM;
+ struct workqueue_attrs *attrs;
+ int err = -ENOMEM;

+ if (!alloc_cpumask_var(&pd->cpumask.pcpu, GFP_KERNEL))
+ goto out;
cpumask_and(pd->cpumask.pcpu, pcpumask, cpu_online_mask);
- if (!alloc_cpumask_var(&pd->cpumask.cbcpu, GFP_KERNEL)) {
- free_cpumask_var(pd->cpumask.pcpu);
- return -ENOMEM;
- }

+ if (!alloc_cpumask_var(&pd->cpumask.cbcpu, GFP_KERNEL))
+ goto free_pcpu_mask;
cpumask_and(pd->cpumask.cbcpu, cbcpumask, cpu_online_mask);
+
+ attrs = alloc_workqueue_attrs();
+ if (!attrs)
+ goto free_cbcpu_mask;
+
+ /* Restrict parallel_wq workers to pd->cpumask.pcpu. */
+ cpumask_copy(attrs->cpumask, pd->cpumask.pcpu);
+ err = apply_workqueue_attrs(pd->pinst->parallel_wq, attrs);
+ free_workqueue_attrs(attrs);
+ if (err < 0)
+ goto free_cbcpu_mask;
+
return 0;
+
+free_cbcpu_mask:
+ free_cpumask_var(pd->cpumask.cbcpu);
+free_pcpu_mask:
+ free_cpumask_var(pd->cpumask.pcpu);
+out:
+ return err;
}

static void __padata_list_init(struct padata_list *pd_list)
@@ -429,6 +439,8 @@ static struct parallel_data *padata_alloc_pd(struct padata_instance *pinst,
pd->squeue = alloc_percpu(struct padata_serial_queue);
if (!pd->squeue)
goto err_free_pqueue;
+
+ pd->pinst = pinst;
if (padata_setup_cpumasks(pd, pcpumask, cbcpumask) < 0)
goto err_free_squeue;

@@ -437,7 +449,6 @@ static struct parallel_data *padata_alloc_pd(struct padata_instance *pinst,
atomic_set(&pd->seq_nr, -1);
atomic_set(&pd->reorder_objects, 0);
atomic_set(&pd->refcnt, 0);
- pd->pinst = pinst;
spin_lock_init(&pd->lock);
pd->cpu = cpumask_first(pcpumask);
INIT_WORK(&pd->reorder_work, invoke_padata_reorder);
@@ -968,8 +979,8 @@ static struct padata_instance *padata_alloc(const char *name,
if (!pinst)
goto err;

- pinst->parallel_wq = alloc_workqueue("%s_parallel", WQ_MEM_RECLAIM |
- WQ_CPU_INTENSIVE, 1, name);
+ pinst->parallel_wq = alloc_workqueue("%s_parallel", WQ_UNBOUND, 0,
+ name);
if (!pinst->parallel_wq)
goto err_free_inst;

--
2.22.0

2019-07-25 21:27:58

by Daniel Jordan

[permalink] [raw]

Subject: [RFC 6/9] padata, pcrypt: take CPU hotplug lock internally in padata_alloc_possible

With pcrypt's cpumask no longer used, take the CPU hotplug lock inside
padata_alloc_possible.

Useful later in the series for avoiding nested acquisition of the CPU
hotplug lock in padata when padata_alloc_possible is allocating an
unbound workqueue.

Without this patch, this nested acquisition would happen later in the
series:

pcrypt_init_padata
get_online_cpus
alloc_padata_possible
alloc_padata
alloc_workqueue(WQ_UNBOUND) // later in the series
alloc_and_link_pwqs
apply_wqattrs_lock
get_online_cpus // recursive rwsem acquisition

Signed-off-by: Daniel Jordan <[email protected]>
---
crypto/pcrypt.c | 4 ----
kernel/padata.c | 17 +++++++++--------
2 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index 2ec36e6a132f..543792e0ebf0 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -308,8 +308,6 @@ static int pcrypt_init_padata(struct padata_instance **pinst, const char *name)
{
int ret = -ENOMEM;

- get_online_cpus();
-
*pinst = padata_alloc_possible(name);
if (!*pinst)
return ret;
@@ -318,8 +316,6 @@ static int pcrypt_init_padata(struct padata_instance **pinst, const char *name)
if (ret)
padata_free(*pinst);

- put_online_cpus();
-
return ret;
}

diff --git a/kernel/padata.c b/kernel/padata.c
index 9d9e7f5e89cb..82edd5f88a32 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -955,8 +955,6 @@ static struct kobj_type padata_attr_type = {
* @name: used to identify the instance
* @pcpumask: cpumask that will be used for padata parallelization
* @cbcpumask: cpumask that will be used for padata serialization
- *
- * Must be called from a cpus_read_lock() protected region
*/
static struct padata_instance *padata_alloc(const char *name,
const struct cpumask *pcpumask,
@@ -974,11 +972,13 @@ static struct padata_instance *padata_alloc(const char *name,
if (!pinst->wq)
goto err_free_inst;

+ get_online_cpus();
+
if (!alloc_cpumask_var(&pinst->cpumask.pcpu, GFP_KERNEL))
- goto err_free_wq;
+ goto err_put_cpus;
if (!alloc_cpumask_var(&pinst->cpumask.cbcpu, GFP_KERNEL)) {
free_cpumask_var(pinst->cpumask.pcpu);
- goto err_free_wq;
+ goto err_put_cpus;
}
if (!padata_validate_cpumask(pinst, pcpumask) ||
!padata_validate_cpumask(pinst, cbcpumask))
@@ -1002,12 +1002,16 @@ static struct padata_instance *padata_alloc(const char *name,
#ifdef CONFIG_HOTPLUG_CPU
cpuhp_state_add_instance_nocalls_cpuslocked(hp_online, &pinst->node);
#endif
+
+ put_online_cpus();
+
return pinst;

err_free_masks:
free_cpumask_var(pinst->cpumask.pcpu);
free_cpumask_var(pinst->cpumask.cbcpu);
-err_free_wq:
+err_put_cpus:
+ put_online_cpus();
destroy_workqueue(pinst->wq);
err_free_inst:
kfree(pinst);
@@ -1021,12 +1025,9 @@ static struct padata_instance *padata_alloc(const char *name,
* parallel workers.
*
* @name: used to identify the instance
- *
- * Must be called from a cpus_read_lock() protected region
*/
struct padata_instance *padata_alloc_possible(const char *name)
{
- lockdep_assert_cpus_held();
return padata_alloc(name, cpu_possible_mask, cpu_possible_mask);
}
EXPORT_SYMBOL(padata_alloc_possible);
--
2.22.0

2019-07-25 21:27:59

by Daniel Jordan

[permalink] [raw]

Subject: [RFC 4/9] padata: make padata_do_parallel find alternate callback CPU

padata_do_parallel currently returns -EINVAL if the callback CPU isn't
in the callback cpumask.

pcrypt tries to prevent this situation by keeping its own callback
cpumask in sync with padata's and checks that the callback CPU it passes
to padata is valid. Make padata handle this instead.

padata_do_parallel now takes a pointer to the callback CPU and updates
it for the caller if an alternate CPU is used. Overall behavior in
terms of which callback CPUs are chosen stays the same.

Prepares for removal of the padata cpumask notifier in pcrypt, which
will fix a lockdep complaint about nested acquisition of the CPU hotplug
lock later in the series.

Signed-off-by: Daniel Jordan <[email protected]>
---
crypto/pcrypt.c | 33 ++-------------------------------
include/linux/padata.h | 2 +-
kernel/padata.c | 27 ++++++++++++++++++++-------
3 files changed, 23 insertions(+), 39 deletions(-)

diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index d67293063c7f..efca962ab12a 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -57,35 +57,6 @@ struct pcrypt_aead_ctx {
unsigned int cb_cpu;
};

-static int pcrypt_do_parallel(struct padata_priv *padata, unsigned int *cb_cpu,
- struct padata_pcrypt *pcrypt)
-{
- unsigned int cpu_index, cpu, i;
- struct pcrypt_cpumask *cpumask;
-
- cpu = *cb_cpu;
-
- rcu_read_lock_bh();
- cpumask = rcu_dereference_bh(pcrypt->cb_cpumask);
- if (cpumask_test_cpu(cpu, cpumask->mask))
- goto out;
-
- if (!cpumask_weight(cpumask->mask))
- goto out;
-
- cpu_index = cpu % cpumask_weight(cpumask->mask);
-
- cpu = cpumask_first(cpumask->mask);
- for (i = 0; i < cpu_index; i++)
- cpu = cpumask_next(cpu, cpumask->mask);
-
- *cb_cpu = cpu;
-
-out:
- rcu_read_unlock_bh();
- return padata_do_parallel(pcrypt->pinst, padata, cpu);
-}
-
static int pcrypt_aead_setkey(struct crypto_aead *parent,
const u8 *key, unsigned int keylen)
{
@@ -157,7 +128,7 @@ static int pcrypt_aead_encrypt(struct aead_request *req)
req->cryptlen, req->iv);
aead_request_set_ad(creq, req->assoclen);

- err = pcrypt_do_parallel(padata, &ctx->cb_cpu, &pencrypt);
+ err = padata_do_parallel(pencrypt.pinst, padata, &ctx->cb_cpu);
if (!err)
return -EINPROGRESS;

@@ -199,7 +170,7 @@ static int pcrypt_aead_decrypt(struct aead_request *req)
req->cryptlen, req->iv);
aead_request_set_ad(creq, req->assoclen);

- err = pcrypt_do_parallel(padata, &ctx->cb_cpu, &pdecrypt);
+ err = padata_do_parallel(pdecrypt.pinst, padata, &ctx->cb_cpu);
if (!err)
return -EINPROGRESS;

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 839d9319920a..f7851f8e2190 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -154,7 +154,7 @@ struct padata_instance {
extern struct padata_instance *padata_alloc_possible(const char *name);
extern void padata_free(struct padata_instance *pinst);
extern int padata_do_parallel(struct padata_instance *pinst,
- struct padata_priv *padata, int cb_cpu);
+ struct padata_priv *padata, int *cb_cpu);
extern void padata_do_serial(struct padata_priv *padata);
extern int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
cpumask_var_t cpumask);
diff --git a/kernel/padata.c b/kernel/padata.c
index 5ae815adf0de..9d9e7f5e89cb 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -94,17 +94,19 @@ static void padata_parallel_worker(struct work_struct *parallel_work)
*
* @pinst: padata instance
* @padata: object to be parallelized
- * @cb_cpu: cpu the serialization callback function will run on,
- * must be in the serial cpumask of padata(i.e. cpumask.cbcpu).
+ * @cb_cpu: pointer to the CPU that the serialization callback function should
+ * run on. If it's not in the serial cpumask of @pinst
+ * (i.e. cpumask.cbcpu), this function selects a fallback CPU and if
+ * none found, returns -EINVAL.
*
* The parallelization callback function will run with BHs off.
* Note: Every object which is parallelized by padata_do_parallel
* must be seen by padata_do_serial.
*/
int padata_do_parallel(struct padata_instance *pinst,
- struct padata_priv *padata, int cb_cpu)
+ struct padata_priv *padata, int *cb_cpu)
{
- int target_cpu, err;
+ int i, cpu, cpu_index, target_cpu, err;
struct padata_parallel_queue *queue;
struct parallel_data *pd;

@@ -116,8 +118,19 @@ int padata_do_parallel(struct padata_instance *pinst,
if (!(pinst->flags & PADATA_INIT) || pinst->flags & PADATA_INVALID)
goto out;

- if (!cpumask_test_cpu(cb_cpu, pd->cpumask.cbcpu))
- goto out;
+ if (!cpumask_test_cpu(*cb_cpu, pd->cpumask.cbcpu)) {
+ if (!cpumask_weight(pd->cpumask.cbcpu))
+ goto out;
+
+ /* Select an alternate fallback CPU and notify the caller. */
+ cpu_index = *cb_cpu % cpumask_weight(pd->cpumask.cbcpu);
+
+ cpu = cpumask_first(pd->cpumask.cbcpu);
+ for (i = 0; i < cpu_index; i++)
+ cpu = cpumask_next(cpu, pd->cpumask.cbcpu);
+
+ *cb_cpu = cpu;
+ }

err = -EBUSY;
if ((pinst->flags & PADATA_RESET))
@@ -129,7 +142,7 @@ int padata_do_parallel(struct padata_instance *pinst,
err = 0;
atomic_inc(&pd->refcnt);
padata->pd = pd;
- padata->cb_cpu = cb_cpu;
+ padata->cb_cpu = *cb_cpu;

target_cpu = padata_cpu_hash(pd);
padata->cpu = target_cpu;
--
2.22.0

2019-07-25 21:28:09

by Daniel Jordan

[permalink] [raw]

Subject: [RFC 9/9] padata: remove cpu_index from the parallel_queue

With the removal of the ENODATA case from padata_get_next, the cpu_index
field is no longer useful, so it can go away.

Signed-off-by: Daniel Jordan <[email protected]>
---
include/linux/padata.h | 2 --
kernel/padata.c | 13 ++-----------
2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index cc420064186f..a39c7b9cec3c 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -75,14 +75,12 @@ struct padata_serial_queue {
* @swork: work struct for serialization.
* @work: work struct for parallelization.
* @num_obj: Number of objects that are processed by this cpu.
- * @cpu_index: Index of the cpu.
*/
struct padata_parallel_queue {
struct padata_list parallel;
struct padata_list reorder;
struct work_struct work;
atomic_t num_obj;
- int cpu_index;
};

/**
diff --git a/kernel/padata.c b/kernel/padata.c
index 09a7dbdd9678..b5236fd84c45 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -399,21 +399,12 @@ static void padata_init_squeues(struct parallel_data *pd)
/* Initialize all percpu queues used by parallel workers */
static void padata_init_pqueues(struct parallel_data *pd)
{
- int cpu_index, cpu;
+ int cpu;
struct padata_parallel_queue *pqueue;

- cpu_index = 0;
- for_each_possible_cpu(cpu) {
+ for_each_cpu(cpu, pd->cpumask.pcpu) {
pqueue = per_cpu_ptr(pd->pqueue, cpu);

- if (!cpumask_test_cpu(cpu, pd->cpumask.pcpu)) {
- pqueue->cpu_index = -1;
- continue;
- }
-
- pqueue->cpu_index = cpu_index;
- cpu_index++;
-
__padata_list_init(&pqueue->reorder);
__padata_list_init(&pqueue->parallel);
INIT_WORK(&pqueue->work, padata_parallel_worker);
--
2.22.0