LinuxLists.cc - [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

2019-07-11 22:21:51

Subject: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

Testing padata with the tcrypt module on a 5.2 kernel...

# modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
# modprobe tcrypt mode=211 sec=1

...produces this splat:

INFO: task modprobe:10075 blocked for more than 120 seconds.
Not tainted 5.2.0-base+ #16
modprobe D 0 10075 10064 0x80004080
Call Trace:
? __schedule+0x4dd/0x610
? ring_buffer_unlock_commit+0x23/0x100
schedule+0x6c/0x90
schedule_timeout+0x3b/0x320
? trace_buffer_unlock_commit_regs+0x4f/0x1f0
wait_for_common+0x160/0x1a0
? wake_up_q+0x80/0x80
{ crypto_wait_req } # entries in braces added by hand
{ do_one_aead_op }
{ test_aead_jiffies }
test_aead_speed.constprop.17+0x681/0xf30 [tcrypt]
do_test+0x4053/0x6a2b [tcrypt]
? 0xffffffffa00f4000
tcrypt_mod_init+0x50/0x1000 [tcrypt]
...

The second modprobe command never finishes because in padata_reorder,
CPU0's load of reorder_objects is executed before the unlocking store in
spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment:

CPU0 CPU1

padata_reorder padata_do_serial
LOAD reorder_objects // 0
INC reorder_objects // 1
padata_reorder
TRYLOCK pd->lock // failed
UNLOCK pd->lock

CPU0 deletes the timer before returning from padata_reorder and since no
other job is submitted to padata, modprobe waits indefinitely.

Add a full barrier to prevent this scenario. The hang was happening
about once every three runs, but now the test has finished successfully
fifty times in a row.

Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface")
Signed-off-by: Daniel Jordan <[email protected]>
Cc: Andrea Parri <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---

memory-barriers.txt says that a full barrier pairs with a release barrier, but
I'd appreciate a look from someone more familiar with barriers. Thanks.

kernel/padata.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/padata.c b/kernel/padata.c
index 2d2fddbb7a4c..9cffd4c303cb 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -267,7 +267,12 @@ static void padata_reorder(struct parallel_data *pd)
* The next object that needs serialization might have arrived to
* the reorder queues in the meantime, we will be called again
* from the timer function if no one else cares for it.
+ *
+ * Ensure reorder_objects is read after pd->lock is dropped so we see
+ * an increment from another task in padata_do_serial. Pairs with
+ * spin_unlock(&pqueue->reorder.lock) in padata_do_serial.
*/
+ smp_mb();
if (atomic_read(&pd->reorder_objects)
&& !(pinst->flags & PADATA_RESET))
mod_timer(&pd->timer, jiffies + HZ);

base-commit: 0ecfebd2b52404ae0c54a878c872bb93363ada36
--
2.22.0

2019-07-12 10:08:03

by Herbert Xu

[permalink] [raw]

Subject: Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

Daniel Jordan <[email protected]> wrote:
>
> CPU0 CPU1
>
> padata_reorder padata_do_serial
> LOAD reorder_objects // 0
> INC reorder_objects // 1
> padata_reorder
> TRYLOCK pd->lock // failed
> UNLOCK pd->lock

I think this can't happen because CPU1 won't call padata_reorder
at all as it's the wrong CPU so it gets pushed back onto a work
queue which will go back to CPU0.

Steffen, could you please take a look at this as there clearly
is a problem here?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-07-12 10:10:31

by Steffen Klassert

[permalink] [raw]

Subject: Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

On Fri, Jul 12, 2019 at 06:06:36PM +0800, Herbert Xu wrote:
> Daniel Jordan <[email protected]> wrote:
> >
> > CPU0 CPU1
> >
> > padata_reorder padata_do_serial
> > LOAD reorder_objects // 0
> > INC reorder_objects // 1
> > padata_reorder
> > TRYLOCK pd->lock // failed
> > UNLOCK pd->lock
>
> I think this can't happen because CPU1 won't call padata_reorder
> at all as it's the wrong CPU so it gets pushed back onto a work
> queue which will go back to CPU0.
>
> Steffen, could you please take a look at this as there clearly
> is a problem here?

I'm currently travelling, will have a look at it when I'm back.

2019-07-16 10:05:28

by Herbert Xu

[permalink] [raw]

Subject: Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

On Mon, Jul 15, 2019 at 12:10:46PM -0400, Daniel Jordan wrote:
>
> I've been wrong before plenty of times, and there's nothing preventing this
> from being one of those times :) , but in this case I believe what I'm showing
> is correct.
>
> The padata_do_serial call for a given job ensures padata_reorder runs on the
> CPU that the job hashed to in padata_do_parallel, which is not necessarily the
> same CPU as the one that padata_do_parallel itself ran on.

You're right. I was taking the comment in the code at face value,
never trust comments :)

While looking at the code in question, I think it is seriously
broken. For instance, padata_replace does not deal with async
crypto at all. It would fail miserably if the underlying async
crypto held onto references to the old pd.

So we may have to restrict pcrypt to sync crypto only, which
would obviously mean that it can no longer use aesni. Or we
will have to spend a lot of time to fix this up properly.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-07-16 11:15:01

by Steffen Klassert

[permalink] [raw]

Subject: Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

On Tue, Jul 16, 2019 at 06:04:47PM +0800, Herbert Xu wrote:
> On Mon, Jul 15, 2019 at 12:10:46PM -0400, Daniel Jordan wrote:
> >
> > I've been wrong before plenty of times, and there's nothing preventing this
> > from being one of those times :) , but in this case I believe what I'm showing
> > is correct.
> >
> > The padata_do_serial call for a given job ensures padata_reorder runs on the
> > CPU that the job hashed to in padata_do_parallel, which is not necessarily the
> > same CPU as the one that padata_do_parallel itself ran on.
>
> You're right. I was taking the comment in the code at face value,
> never trust comments :)
>
> While looking at the code in question, I think it is seriously
> broken. For instance, padata_replace does not deal with async
> crypto at all. It would fail miserably if the underlying async
> crypto held onto references to the old pd.

Hm, yes looks like that.

padata_replace should not call padata_free_pd() as long as the
refcount is not zero. Currenlty padata_flush_queues() will
BUG if there are references left.

Maybe we can fix it if we call padata_free_pd() from
padata_serial_worker() when it sent out the last object.

2019-07-16 12:53:47

by Andrea Parri

[permalink] [raw]

Subject: Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

Hi Daniel,

My two cents (summarizing some findings we discussed privately):

> I think adding the full barrier guarantees the following ordering, and the
> memory model people can correct me if I'm wrong:
>
> CPU21 CPU22
> ------------------------ --------------------------
> UNLOCK pd->lock
> smp_mb()
> LOAD reorder_objects
> INC reorder_objects
> spin_unlock(&pqueue->reorder.lock) // release barrier
> TRYLOCK pd->lock
>
> So if CPU22 has incremented reorder_objects but CPU21 reads 0 for it, CPU21
> should also have unlocked pd->lock so CPU22 can get it and serialize any
> remaining jobs.

This information inspired me to write down the following litmus test:
(AFAICT, you independently wrote a very similar test, which is indeed
quite reassuring! ;D)

C daniel-padata

{ }

P0(atomic_t *reorder_objects, spinlock_t *pd_lock)
{
int r0;

spin_lock(pd_lock);
spin_unlock(pd_lock);
smp_mb();
r0 = atomic_read(reorder_objects);
}

P1(atomic_t *reorder_objects, spinlock_t *pd_lock, spinlock_t *reorder_lock)
{
int r1;

spin_lock(reorder_lock);
atomic_inc(reorder_objects);
spin_unlock(reorder_lock);
//smp_mb();
r1 = spin_trylock(pd_lock);
}

exists (0:r0=0 /\ 1:r1=0)

It seems worth noticing that this test's "exists" clause is satisfiable
according to the (current) memory consistency model. (Informally, this
can be explained by noticing that the RELEASE from the spin_unlock() in
P1 does not provide any order between the atomic increment and the read
part of the spin_trylock() operation.) FWIW, uncommenting the smp_mb()
in P1 would suffice to prevent this clause from being satisfiable; I am
not sure, however, whether this approach is feasible or ideal... (sorry,
I'm definitely not too familiar with this code... ;/)

Thanks,
Andrea

2019-07-16 12:57:43

by Herbert Xu

[permalink] [raw]

Subject: [PATCH] padata: Use RCU when fetching pd from do_serial

On Tue, Jul 16, 2019 at 01:14:10PM +0200, Steffen Klassert wrote:
>
> Maybe we can fix it if we call padata_free_pd() from
> padata_serial_worker() when it sent out the last object.

How about using RCU?

We still need to fix up the refcnt if it's supposed to limit the
overall number of outstanding requests.

---8<---
The function padata_do_serial uses parallel_data without obeying
the RCU rules around its life-cycle. This means that a concurrent
padata_replace call can result in a crash.

This patch fixes it by using RCU just as we do in padata_do_parallel.

Fixes: 16295bec6398 ("padata: Generic parallelization/...")
Signed-off-by: Herbert Xu <[email protected]>

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 5d13d25da2c8..952f6514dd72 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -35,7 +35,7 @@
* struct padata_priv - Embedded to the users data structure.
*
* @list: List entry, to attach to the padata lists.
- * @pd: Pointer to the internal control structure.
+ * @inst: Pointer to the overall control structure.
* @cb_cpu: Callback cpu for serializatioon.
* @cpu: Cpu for parallelization.
* @seq_nr: Sequence number of the parallelized data object.
@@ -45,7 +45,7 @@
*/
struct padata_priv {
struct list_head list;
- struct parallel_data *pd;
+ struct padata_instance *inst;
int cb_cpu;
int cpu;
int info;
diff --git a/kernel/padata.c b/kernel/padata.c
index 2d2fddbb7a4c..fb5dd1210d2b 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -128,7 +128,7 @@ int padata_do_parallel(struct padata_instance *pinst,

err = 0;
atomic_inc(&pd->refcnt);
- padata->pd = pd;
+ padata->inst = pinst;
padata->cb_cpu = cb_cpu;

target_cpu = padata_cpu_hash(pd);
@@ -367,7 +368,7 @@ void padata_do_serial(struct padata_priv *padata)
struct parallel_data *pd;
int reorder_via_wq = 0;

- pd = padata->pd;
+ pd = rcu_dereference_bh(padata->inst->pd);

cpu = get_cpu();

--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-07-16 13:10:10

by Herbert Xu

[permalink] [raw]

Subject: Re: [PATCH] padata: Use RCU when fetching pd from do_serial

On Tue, Jul 16, 2019 at 08:57:04PM +0800, Herbert Xu wrote:
>
> How about using RCU?
>
> We still need to fix up the refcnt if it's supposed to limit the
> overall number of outstanding requests.

Hmm, it doesn't work because the refcnt is attached to the old
pd. That shouldn't be a problem though as we could simply ignore
the refcnt in padata_flush_queue.

However, I think this leads to another bug in that pcrypt doesn't
support dm-crypt properly. It never does the backlog stuff and
therefore can't guarantee reliable processing which dm-crypt requires.

Is it intentional to only allow pcrypt for IPsec?

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-07-16 13:14:26

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

On Tue, Jul 16, 2019 at 02:53:09PM +0200, Andrea Parri wrote:

> C daniel-padata
>
> { }
>
> P0(atomic_t *reorder_objects, spinlock_t *pd_lock)
> {
> int r0;
>
> spin_lock(pd_lock);
> spin_unlock(pd_lock);
> smp_mb();
> r0 = atomic_read(reorder_objects);
> }
>
> P1(atomic_t *reorder_objects, spinlock_t *pd_lock, spinlock_t *reorder_lock)
> {
> int r1;
>
> spin_lock(reorder_lock);
> atomic_inc(reorder_objects);
> spin_unlock(reorder_lock);
> //smp_mb();
> r1 = spin_trylock(pd_lock);
> }
>
> exists (0:r0=0 /\ 1:r1=0)
>
> It seems worth noticing that this test's "exists" clause is satisfiable
> according to the (current) memory consistency model. (Informally, this
> can be explained by noticing that the RELEASE from the spin_unlock() in
> P1 does not provide any order between the atomic increment and the read
> part of the spin_trylock() operation.) FWIW, uncommenting the smp_mb()
> in P1 would suffice to prevent this clause from being satisfiable; I am
> not sure, however, whether this approach is feasible or ideal... (sorry,
> I'm definitely not too familiar with this code... ;/)

Urgh, that one again.

Yes, you need the smp_mb(); although a whole bunch of architectures can
live without it. IIRC it is part of the eternal RCsc/RCpc debate.

Paul/RCU have their smp_mb__after_unlock_lock() that is about something
similar, although we've so far confinsed that to the RCU code, because
of how confusing that all is.

2019-07-16 13:15:57

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] padata: Use RCU when fetching pd from do_serial

On Tue, Jul 16, 2019 at 08:57:04PM +0800, Herbert Xu wrote:
> On Tue, Jul 16, 2019 at 01:14:10PM +0200, Steffen Klassert wrote:
> >
> > Maybe we can fix it if we call padata_free_pd() from
> > padata_serial_worker() when it sent out the last object.
>
> How about using RCU?
>
> We still need to fix up the refcnt if it's supposed to limit the
> overall number of outstanding requests.
>
> ---8<---
> The function padata_do_serial uses parallel_data without obeying
> the RCU rules around its life-cycle. This means that a concurrent
> padata_replace call can result in a crash.
>
> This patch fixes it by using RCU just as we do in padata_do_parallel.
>
> Fixes: 16295bec6398 ("padata: Generic parallelization/...")
> Signed-off-by: Herbert Xu <[email protected]>

> diff --git a/kernel/padata.c b/kernel/padata.c
> index 2d2fddbb7a4c..fb5dd1210d2b 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -128,7 +128,7 @@ int padata_do_parallel(struct padata_instance *pinst,
>
> err = 0;
> atomic_inc(&pd->refcnt);
> - padata->pd = pd;
> + padata->inst = pinst;
> padata->cb_cpu = cb_cpu;
>
> target_cpu = padata_cpu_hash(pd);
> @@ -367,7 +368,7 @@ void padata_do_serial(struct padata_priv *padata)
> struct parallel_data *pd;
> int reorder_via_wq = 0;
>
> - pd = padata->pd;
> + pd = rcu_dereference_bh(padata->inst->pd);
>
> cpu = get_cpu();
>

That's weird for not having a matching assign and lacking comments to
explain that.

2019-07-16 13:19:02

by Herbert Xu

[permalink] [raw]

Subject: Re: [PATCH] padata: Use RCU when fetching pd from do_serial

On Tue, Jul 16, 2019 at 03:15:20PM +0200, Peter Zijlstra wrote:
>
> > @@ -367,7 +368,7 @@ void padata_do_serial(struct padata_priv *padata)
> > struct parallel_data *pd;
> > int reorder_via_wq = 0;
> >
> > - pd = padata->pd;
> > + pd = rcu_dereference_bh(padata->inst->pd);
> >
> > cpu = get_cpu();
> >
>
> That's weird for not having a matching assign and lacking comments to
> explain that.

There is a matching rcu_assign_pointer. But we should add some
RCU markers.

Or perhaps you're misreading the level of indirections :)

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-07-16 13:24:28

by Herbert Xu

[permalink] [raw]

Subject: [v2 PATCH] padata: Use RCU when fetching pd from do_serial

On Tue, Jul 16, 2019 at 09:09:28PM +0800, Herbert Xu wrote:
>
> Hmm, it doesn't work because the refcnt is attached to the old
> pd. That shouldn't be a problem though as we could simply ignore
> the refcnt in padata_flush_queue.

This should fix it:

---8<---
The function padata_do_serial uses parallel_data without obeying
the RCU rules around its life-cycle. This means that a concurrent
padata_replace call can result in a crash.

This patch fixes it by using RCU just as we do in padata_do_parallel.

As the refcnt may now span two parallel_data structures, this patch
moves it to padata_instance instead. FWIW the refcnt is used to
limit the number of outstanding requests (albeit a soft limit as
we don't do a proper atomic inc and test).

Fixes: 16295bec6398 ("padata: Generic parallelization/...")
Signed-off-by: Herbert Xu <[email protected]>

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 5d13d25da2c8..ce51555cb86c 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -35,7 +35,7 @@
* struct padata_priv - Embedded to the users data structure.
*
* @list: List entry, to attach to the padata lists.
- * @pd: Pointer to the internal control structure.
+ * @inst: Pointer to the overall control structure.
* @cb_cpu: Callback cpu for serializatioon.
* @cpu: Cpu for parallelization.
* @seq_nr: Sequence number of the parallelized data object.
@@ -45,7 +45,7 @@
*/
struct padata_priv {
struct list_head list;
- struct parallel_data *pd;
+ struct padata_instance *inst;
int cb_cpu;
int cpu;
int info;
@@ -120,7 +120,6 @@ struct padata_cpumask {
* @pqueue: percpu padata queues used for parallelization.
* @squeue: percpu padata queues used for serialuzation.
* @reorder_objects: Number of objects waiting in the reorder queues.
- * @refcnt: Number of objects holding a reference on this parallel_data.
* @max_seq_nr: Maximal used sequence number.
* @cpumask: The cpumasks in use for parallel and serial workers.
* @lock: Reorder lock.
@@ -132,7 +131,6 @@ struct parallel_data {
struct padata_parallel_queue __percpu *pqueue;
struct padata_serial_queue __percpu *squeue;
atomic_t reorder_objects;
- atomic_t refcnt;
atomic_t seq_nr;
struct padata_cpumask cpumask;
spinlock_t lock ____cacheline_aligned;
@@ -152,6 +150,7 @@ struct parallel_data {
* or both cpumasks change.
* @kobj: padata instance kernel object.
* @lock: padata instance lock.
+ * @refcnt: Number of objects holding a reference on this padata_instance.
* @flags: padata flags.
*/
struct padata_instance {
@@ -162,6 +161,7 @@ struct padata_instance {
struct blocking_notifier_head cpumask_change_notifier;
struct kobject kobj;
struct mutex lock;
+ atomic_t refcnt;
u8 flags;
#define PADATA_INIT 1
#define PADATA_RESET 2
diff --git a/kernel/padata.c b/kernel/padata.c
index 2d2fddbb7a4c..c86333fa3cec 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -123,12 +123,12 @@ int padata_do_parallel(struct padata_instance *pinst,
if ((pinst->flags & PADATA_RESET))
goto out;

- if (atomic_read(&pd->refcnt) >= MAX_OBJ_NUM)
+ if (atomic_read(&pinst->refcnt) >= MAX_OBJ_NUM)
goto out;

err = 0;
- atomic_inc(&pd->refcnt);
- padata->pd = pd;
+ atomic_inc(&pinst->refcnt);
+ padata->inst = pinst;
padata->cb_cpu = cb_cpu;

target_cpu = padata_cpu_hash(pd);
@@ -347,7 +347,7 @@ static void padata_serial_worker(struct work_struct *serial_work)
list_del_init(&padata->list);

padata->serial(padata);
- atomic_dec(&pd->refcnt);
+ atomic_dec(&pd->pinst->refcnt);
}
local_bh_enable();
}
@@ -367,7 +367,7 @@ void padata_do_serial(struct padata_priv *padata)
struct parallel_data *pd;
int reorder_via_wq = 0;

- pd = padata->pd;
+ pd = rcu_dereference_bh(padata->inst->pd);

cpu = get_cpu();

@@ -489,7 +489,6 @@ static struct parallel_data *padata_alloc_pd(struct padata_instance *pinst,
timer_setup(&pd->timer, padata_reorder_timer, 0);
atomic_set(&pd->seq_nr, -1);
atomic_set(&pd->reorder_objects, 0);
- atomic_set(&pd->refcnt, 0);
pd->pinst = pinst;
spin_lock_init(&pd->lock);

@@ -535,8 +534,6 @@ static void padata_flush_queues(struct parallel_data *pd)
squeue = per_cpu_ptr(pd->squeue, cpu);
flush_work(&squeue->work);
}
-
- BUG_ON(atomic_read(&pd->refcnt) != 0);
}

static void __padata_start(struct padata_instance *pinst)
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-07-16 13:52:50

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] padata: Use RCU when fetching pd from do_serial

On Tue, Jul 16, 2019 at 09:18:07PM +0800, Herbert Xu wrote:
> On Tue, Jul 16, 2019 at 03:15:20PM +0200, Peter Zijlstra wrote:
> >
> > > @@ -367,7 +368,7 @@ void padata_do_serial(struct padata_priv *padata)
> > > struct parallel_data *pd;
> > > int reorder_via_wq = 0;
> > >
> > > - pd = padata->pd;
> > > + pd = rcu_dereference_bh(padata->inst->pd);
> > >
> > > cpu = get_cpu();
> > >
> >
> > That's weird for not having a matching assign and lacking comments to
> > explain that.
>
> There is a matching rcu_assign_pointer. But we should add some
> RCU markers.
>
> Or perhaps you're misreading the level of indirections :)

Could well be, I'm not much familiar with this code; I'll look more
careful.

2019-07-16 15:02:25

by Herbert Xu

[permalink] [raw]

Subject: Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

On Tue, Jul 16, 2019 at 02:53:09PM +0200, Andrea Parri wrote:
>
> P1(atomic_t *reorder_objects, spinlock_t *pd_lock, spinlock_t *reorder_lock)
> {
> int r1;
>
> spin_lock(reorder_lock);
> atomic_inc(reorder_objects);
> spin_unlock(reorder_lock);
> //smp_mb();
> r1 = spin_trylock(pd_lock);
> }

Yes we need a matching mb on the other side. However, we can
get away with using smp_mb__after_atomic thanks to the atomic_inc
above.

Daniel, can you please respin the patch with the matching smp_mb?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-07-16 15:45:35

by Daniel Jordan

[permalink] [raw]

Subject: Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

On 7/16/19 11:01 AM, Herbert Xu wrote:
> On Tue, Jul 16, 2019 at 02:53:09PM +0200, Andrea Parri wrote:
>>
>> P1(atomic_t *reorder_objects, spinlock_t *pd_lock, spinlock_t *reorder_lock)
>> {
>> int r1;
>>
>> spin_lock(reorder_lock);
>> atomic_inc(reorder_objects);
>> spin_unlock(reorder_lock);
>> //smp_mb();
>> r1 = spin_trylock(pd_lock);
>> }
>
> Yes we need a matching mb on the other side. However, we can
> get away with using smp_mb__after_atomic thanks to the atomic_inc
> above.
>
> Daniel, can you please respin the patch with the matching smp_mb?

Sure, Herbert, will do.

Thanks,
Daniel

2019-07-16 16:34:00

by Daniel Jordan

[permalink] [raw]

Subject: [PATCH v2] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

Testing padata with the tcrypt module on a 5.2 kernel...

# modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
# modprobe tcrypt mode=211 sec=1

...produces this splat:

INFO: task modprobe:10075 blocked for more than 120 seconds.
Not tainted 5.2.0-base+ #16
modprobe D 0 10075 10064 0x80004080
Call Trace:
? __schedule+0x4dd/0x610
? ring_buffer_unlock_commit+0x23/0x100
schedule+0x6c/0x90
schedule_timeout+0x3b/0x320
? trace_buffer_unlock_commit_regs+0x4f/0x1f0
wait_for_common+0x160/0x1a0
? wake_up_q+0x80/0x80
{ crypto_wait_req } # entries in braces added by hand
{ do_one_aead_op }
{ test_aead_jiffies }
test_aead_speed.constprop.17+0x681/0xf30 [tcrypt]
do_test+0x4053/0x6a2b [tcrypt]
? 0xffffffffa00f4000
tcrypt_mod_init+0x50/0x1000 [tcrypt]
...

The second modprobe command never finishes because in padata_reorder,
CPU0's load of reorder_objects is executed before the unlocking store in
spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment:

CPU0 CPU1

padata_reorder padata_do_serial
LOAD reorder_objects // 0
INC reorder_objects // 1
padata_reorder
TRYLOCK pd->lock // failed
UNLOCK pd->lock

CPU0 deletes the timer before returning from padata_reorder and since no
other job is submitted to padata, modprobe waits indefinitely.

Add a pair of full barriers to guarantee proper ordering:

CPU0 CPU1

padata_reorder padata_do_serial
UNLOCK pd->lock
smp_mb()
LOAD reorder_objects
INC reorder_objects
smp_mb__after_atomic()
padata_reorder
TRYLOCK pd->lock

smp_mb__after_atomic is needed so the read part of the trylock operation
comes after the INC, as Andrea points out. Thanks also to Andrea for
help with writing a litmus test.

Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface")
Signed-off-by: Daniel Jordan <[email protected]>
Cc: Andrea Parri <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
kernel/padata.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/kernel/padata.c b/kernel/padata.c
index 2d2fddbb7a4c..15a8ad63f4ff 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -267,7 +267,12 @@ static void padata_reorder(struct parallel_data *pd)
* The next object that needs serialization might have arrived to
* the reorder queues in the meantime, we will be called again
* from the timer function if no one else cares for it.
+ *
+ * Ensure reorder_objects is read after pd->lock is dropped so we see
+ * an increment from another task in padata_do_serial. Pairs with
+ * smp_mb__after_atomic in padata_do_serial.
*/
+ smp_mb();
if (atomic_read(&pd->reorder_objects)
&& !(pinst->flags & PADATA_RESET))
mod_timer(&pd->timer, jiffies + HZ);
@@ -387,6 +392,13 @@ void padata_do_serial(struct padata_priv *padata)
list_add_tail(&padata->list, &pqueue->reorder.list);
spin_unlock(&pqueue->reorder.lock);

+ /*
+ * Ensure the atomic_inc of reorder_objects above is ordered correctly
+ * with the trylock of pd->lock in padata_reorder. Pairs with smp_mb
+ * in padata_reorder.
+ */
+ smp_mb__after_atomic();
+
put_cpu();

/* If we're running on the wrong CPU, call padata_reorder() via a

base-commit: 0ecfebd2b52404ae0c54a878c872bb93363ada36
--
2.22.0

2019-07-17 08:28:44

by Steffen Klassert

[permalink] [raw]

Subject: Re: [PATCH] padata: Use RCU when fetching pd from do_serial

On Tue, Jul 16, 2019 at 09:09:28PM +0800, Herbert Xu wrote:
>
> However, I think this leads to another bug in that pcrypt doesn't
> support dm-crypt properly. It never does the backlog stuff and
> therefore can't guarantee reliable processing which dm-crypt requires.
>
> Is it intentional to only allow pcrypt for IPsec?

I had a patch to support crypto backlog some years ago,
but testing with dm-crypt did not show any performance
improvement. So I decided to just skip that patch because
it added code for no need.

2019-07-17 08:37:27

by Steffen Klassert

[permalink] [raw]

Subject: Re: [v2 PATCH] padata: Use RCU when fetching pd from do_serial

On Tue, Jul 16, 2019 at 09:23:45PM +0800, Herbert Xu wrote:
> On Tue, Jul 16, 2019 at 09:09:28PM +0800, Herbert Xu wrote:
> >
> > Hmm, it doesn't work because the refcnt is attached to the old
> > pd. That shouldn't be a problem though as we could simply ignore
> > the refcnt in padata_flush_queue.
>
> This should fix it:
>
> ---8<---
> The function padata_do_serial uses parallel_data without obeying
> the RCU rules around its life-cycle. This means that a concurrent
> padata_replace call can result in a crash.
>
> This patch fixes it by using RCU just as we do in padata_do_parallel.

RCU alone won't help because if some object is queued for async
crypto, we left the RCU protected aera. I think padata_do_serial
needs to do RCU and should free 'parallel_data' if the flag
PADATA_RESET is set and the refcount goes to zero. padata_replace
should do the same then.

>
> As the refcnt may now span two parallel_data structures, this patch
> moves it to padata_instance instead. FWIW the refcnt is used to
> limit the number of outstanding requests (albeit a soft limit as
> we don't do a proper atomic inc and test).
>
> Fixes: 16295bec6398 ("padata: Generic parallelization/...")
> Signed-off-by: Herbert Xu <[email protected]>
>
> diff --git a/include/linux/padata.h b/include/linux/padata.h

...

> index 5d13d25da2c8..ce51555cb86c 100644
> @@ -367,7 +367,7 @@ void padata_do_serial(struct padata_priv *padata)
> struct parallel_data *pd;
> int reorder_via_wq = 0;
>
> - pd = padata->pd;
> + pd = rcu_dereference_bh(padata->inst->pd);

Why not just

pd = rcu_dereference_bh(padata->pd);

2019-07-17 08:48:33

by Herbert Xu

[permalink] [raw]

Subject: Re: [PATCH] padata: Use RCU when fetching pd from do_serial

On Wed, Jul 17, 2019 at 10:28:15AM +0200, Steffen Klassert wrote:
>
> I had a patch to support crypto backlog some years ago,
> but testing with dm-crypt did not show any performance
> improvement. So I decided to just skip that patch because
> it added code for no need.

Well pcrypt is part of the API so it needs to obey the rules.
So even if it wouldn't make sense to use dm-crypt with pcrypt
it still needs to do the right thing.

Thanks,`
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-07-17 08:51:11

by Herbert Xu

[permalink] [raw]

Subject: Re: [v2 PATCH] padata: Use RCU when fetching pd from do_serial

On Wed, Jul 17, 2019 at 10:36:41AM +0200, Steffen Klassert wrote:
>
> > This patch fixes it by using RCU just as we do in padata_do_parallel.
>
> RCU alone won't help because if some object is queued for async
> crypto, we left the RCU protected aera. I think padata_do_serial
> needs to do RCU and should free 'parallel_data' if the flag
> PADATA_RESET is set and the refcount goes to zero. padata_replace
> should do the same then.

Yes this patch doesn't work because you can't just switch over to
the new pd as the old pd will then get stuck due to the missing
entry.

> > index 5d13d25da2c8..ce51555cb86c 100644
> > @@ -367,7 +367,7 @@ void padata_do_serial(struct padata_priv *padata)
> > struct parallel_data *pd;
> > int reorder_via_wq = 0;
> >
> > - pd = padata->pd;
> > + pd = rcu_dereference_bh(padata->inst->pd);
>
> Why not just
>
> pd = rcu_dereference_bh(padata->pd);

I was trying to get the new pd which could only come from the inst.
In any case the whole RCU idea doesn't work so we'll probably do the
refcount idea that you suggested.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-07-17 08:53:56

by Steffen Klassert

[permalink] [raw]

Subject: Re: [PATCH] padata: Use RCU when fetching pd from do_serial

On Wed, Jul 17, 2019 at 04:47:40PM +0800, Herbert Xu wrote:
> On Wed, Jul 17, 2019 at 10:28:15AM +0200, Steffen Klassert wrote:
> >
> > I had a patch to support crypto backlog some years ago,
> > but testing with dm-crypt did not show any performance
> > improvement. So I decided to just skip that patch because
> > it added code for no need.
>
> Well pcrypt is part of the API so it needs to obey the rules.
> So even if it wouldn't make sense to use dm-crypt with pcrypt
> it still needs to do the right thing.

Ok, makes sense. The old patch still exists:

https://git.kernel.org/pub/scm/linux/kernel/git/klassert/linux-stk.git/commit/?h=net-next-pcrypt&id=5909a88ccef6f4c78fe9969160155a8f0ce8fee7

Let me see if I can respin it...