2019-11-19 05:18:40

by Herbert Xu

[permalink] [raw]
Subject: [PATCH] padata: Remove broken queue flushing

The function padata_flush_queues is fundamentally broken because
it cannot force padata users to complete the request that is
underway. IOW padata has to passively wait for the completion
of any outstanding work.

As it stands flushing is used in two places. Its use in padata_stop
is simply unnecessary because nothing depends on the queues to
be flushed afterwards.

The other use in padata_replace is more substantial as we depend
on it to free the old pd structure. This patch instead uses the
pd->refcnt to dynamically free the pd structure once all requests
are complete.

Fixes: 2b73b07ab8a4 ("padata: Flush the padata queues actively")
Cc: <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

diff --git a/kernel/padata.c b/kernel/padata.c
index c3fec1413295..da56a235a255 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -35,6 +35,8 @@

#define MAX_OBJ_NUM 1000

+static void padata_free_pd(struct parallel_data *pd);
+
static int padata_index_to_cpu(struct parallel_data *pd, int cpu_index)
{
int cpu, target_cpu;
@@ -283,6 +285,7 @@ static void padata_serial_worker(struct work_struct *serial_work)
struct padata_serial_queue *squeue;
struct parallel_data *pd;
LIST_HEAD(local_list);
+ int cnt;

local_bh_disable();
squeue = container_of(serial_work, struct padata_serial_queue, work);
@@ -292,6 +295,8 @@ static void padata_serial_worker(struct work_struct *serial_work)
list_replace_init(&squeue->serial.list, &local_list);
spin_unlock(&squeue->serial.lock);

+ cnt = 0;
+
while (!list_empty(&local_list)) {
struct padata_priv *padata;

@@ -301,9 +306,12 @@ static void padata_serial_worker(struct work_struct *serial_work)
list_del_init(&padata->list);

padata->serial(padata);
- atomic_dec(&pd->refcnt);
+ cnt++;
}
local_bh_enable();
+
+ if (atomic_sub_and_test(cnt, &pd->refcnt))
+ padata_free_pd(pd);
}

/**
@@ -440,7 +448,7 @@ static struct parallel_data *padata_alloc_pd(struct padata_instance *pinst,
padata_init_squeues(pd);
atomic_set(&pd->seq_nr, -1);
atomic_set(&pd->reorder_objects, 0);
- atomic_set(&pd->refcnt, 0);
+ atomic_set(&pd->refcnt, 1);
spin_lock_init(&pd->lock);
pd->cpu = cpumask_first(pd->cpumask.pcpu);
INIT_WORK(&pd->reorder_work, invoke_padata_reorder);
@@ -466,29 +474,6 @@ static void padata_free_pd(struct parallel_data *pd)
kfree(pd);
}

-/* Flush all objects out of the padata queues. */
-static void padata_flush_queues(struct parallel_data *pd)
-{
- int cpu;
- struct padata_parallel_queue *pqueue;
- struct padata_serial_queue *squeue;
-
- for_each_cpu(cpu, pd->cpumask.pcpu) {
- pqueue = per_cpu_ptr(pd->pqueue, cpu);
- flush_work(&pqueue->work);
- }
-
- if (atomic_read(&pd->reorder_objects))
- padata_reorder(pd);
-
- for_each_cpu(cpu, pd->cpumask.cbcpu) {
- squeue = per_cpu_ptr(pd->squeue, cpu);
- flush_work(&squeue->work);
- }
-
- BUG_ON(atomic_read(&pd->refcnt) != 0);
-}
-
static void __padata_start(struct padata_instance *pinst)
{
pinst->flags |= PADATA_INIT;
@@ -502,10 +487,6 @@ static void __padata_stop(struct padata_instance *pinst)
pinst->flags &= ~PADATA_INIT;

synchronize_rcu();
-
- get_online_cpus();
- padata_flush_queues(pinst->pd);
- put_online_cpus();
}

/* Replace the internal control structure with a new one. */
@@ -526,8 +507,8 @@ static void padata_replace(struct padata_instance *pinst,
if (!cpumask_equal(pd_old->cpumask.cbcpu, pd_new->cpumask.cbcpu))
notification_mask |= PADATA_CPU_SERIAL;

- padata_flush_queues(pd_old);
- padata_free_pd(pd_old);
+ if (atomic_dec_and_test(&pd_old->refcnt))
+ padata_free_pd(pd_old);

if (notification_mask)
blocking_notifier_call_chain(&pinst->cpumask_change_notifier,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


2019-11-19 19:24:50

by Daniel Jordan

[permalink] [raw]
Subject: Re: [PATCH] padata: Remove broken queue flushing

On Tue, Nov 19, 2019 at 01:17:31PM +0800, Herbert Xu wrote:
> The function padata_flush_queues is fundamentally broken because
> it cannot force padata users to complete the request that is
> underway. IOW padata has to passively wait for the completion
> of any outstanding work.
>
> As it stands flushing is used in two places. Its use in padata_stop
> is simply unnecessary because nothing depends on the queues to
> be flushed afterwards.
>
> The other use in padata_replace is more substantial as we depend
> on it to free the old pd structure. This patch instead uses the
> pd->refcnt to dynamically free the pd structure once all requests
> are complete.

__padata_free unconditionally frees pd, so a padata job might choke on it
later. padata_do_parallel calls seem safe because they use RCU, but it seems
possible that a job could call padata_do_serial after the instance is gone.

Best idea I can think of now is to indicate the instance has been freed in the
pd before dropping the initial pd ref in __padata_free, and use that to bail
out early from places that touch the instance or its data (workqueues say).
Will think more on this.


(By the way, I was on leave longer than anticipated, so thanks for picking up
my slack on this patch. I plan to repost my other padata fixes soon.)

2019-11-19 22:45:13

by Daniel Jordan

[permalink] [raw]
Subject: Re: [PATCH] padata: Remove broken queue flushing

On Wed, Nov 20, 2019 at 05:53:45AM +0800, Herbert Xu wrote:
> On Tue, Nov 19, 2019 at 02:24:05PM -0500, Daniel Jordan wrote:
> >
> > __padata_free unconditionally frees pd, so a padata job might choke on it
> > later. padata_do_parallel calls seem safe because they use RCU, but it seems
> > possible that a job could call padata_do_serial after the instance is gone.
>
> Actually __padata_free no longer frees the pd after my patch.

I assume you mean the third patch you recently posted, "crypto: pcrypt - Avoid
deadlock by using per-instance padata queues". That's true, the problem is
fixed there, and the bug being present in bisection doesn't seem like enough
justification to implement something short-lived just to prevent it.

> We should also mandate that __padata_free can only be called after
> all jobs have completed. This is already the case with pcrypt.

Makes sense to me, though I don't see how it's enforced now in pcrypt. I'm not
an async crypto person, but wouldn't unloading the pcrypt module when there are
still outstanding async jobs break this rule?

> In fact we should discuss merging padata into pcrypt. It's been
> 10 years and not a single user of padata has emerged in the kernel.

Actually, I'm working on an RFC right now to add more users for padata. It
should be posted in the coming week or two, and I hope it can be part of that
discussion.

2019-11-19 22:50:51

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] padata: Remove broken queue flushing

On Tue, Nov 19, 2019 at 05:44:32PM -0500, Daniel Jordan wrote:
>
> I assume you mean the third patch you recently posted, "crypto: pcrypt - Avoid
> deadlock by using per-instance padata queues". That's true, the problem is
> fixed there, and the bug being present in bisection doesn't seem like enough
> justification to implement something short-lived just to prevent it.

Right. But as pcrypt is the only user this should still work.

> Makes sense to me, though I don't see how it's enforced now in pcrypt. I'm not
> an async crypto person, but wouldn't unloading the pcrypt module when there are
> still outstanding async jobs break this rule?

It's enforced through module reference counting. While there are
any outstanding requests, there must be allocated crypto tfms. Each
crypto tfm maintains a module reference count on pcrypt which
prevents it from being unloaded.

> Actually, I'm working on an RFC right now to add more users for padata. It
> should be posted in the coming week or two, and I hope it can be part of that
> discussion.

OK.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-11-27 23:37:30

by Daniel Jordan

[permalink] [raw]
Subject: Re: [PATCH] padata: Remove broken queue flushing

On Tue, Nov 19, 2019 at 01:17:31PM +0800, Herbert Xu wrote:
> The function padata_flush_queues is fundamentally broken because
> it cannot force padata users to complete the request that is
> underway. IOW padata has to passively wait for the completion
> of any outstanding work.
>
> As it stands flushing is used in two places. Its use in padata_stop
> is simply unnecessary because nothing depends on the queues to
> be flushed afterwards.
>
> The other use in padata_replace is more substantial as we depend
> on it to free the old pd structure. This patch instead uses the
> pd->refcnt to dynamically free the pd structure once all requests
> are complete.
>
> Fixes: 2b73b07ab8a4 ("padata: Flush the padata queues actively")
> Cc: <[email protected]>
> Signed-off-by: Herbert Xu <[email protected]>

Reviewed-by: Daniel Jordan <[email protected]>