2015-05-04 21:33:02

by Rabin Vincent

[permalink] [raw]
Subject: Re: [PATCH 3.14 73/92] dm crypt: fix deadlock when async crypto algorithm returns -EBUSY

On Sat, May 02, 2015 at 09:03:28PM +0200, Greg Kroah-Hartman wrote:
> 3.14-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Ben Collins <[email protected]>
>
> commit 0618764cb25f6fa9fb31152995de42a8a0496475 upstream.

The commit in question was recently merged to mainline and is now being
sent to various stable kernels. But I think the patch is wrong and the
real problem lies in the Freescale CAAM driver which is described as
having being tested with.

commit 0618764cb25f6fa9fb31152995de42a8a0496475
Author: Ben Collins <[email protected]>
Date: Fri Apr 3 16:09:46 2015 +0000

dm crypt: fix deadlock when async crypto algorithm returns -EBUSY

I suspect this doesn't show up for most anyone because software
algorithms typically don't have a sense of being too busy. However,
when working with the Freescale CAAM driver it will return -EBUSY on
occasion under heavy -- which resulted in dm-crypt deadlock.

After checking the logic in some other drivers, the scheme for
crypt_convert() and it's callback, kcryptd_async_done(), were not
correctly laid out to properly handle -EBUSY or -EINPROGRESS.

Fix this by using the completion for both -EBUSY and -EINPROGRESS. Now
crypt_convert()'s use of completion is comparable to
af_alg_wait_for_completion(). Similarly, kcryptd_async_done() follows
the pattern used in af_alg_complete().

Before this fix dm-crypt would lockup within 1-2 minutes running with
the CAAM driver. Fix was regression tested against software algorithms
on PPC32 and x86_64, and things seem perfectly happy there as well.

Signed-off-by: Ben Collins <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Cc: [email protected]

dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto
driver should internally backlog requests which arrive when the queue is
full and process them later. Until the crypto hw's queue becomes full,
the driver returns -EINPROGRESS. When the crypto hw's queue if full,
the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is
expected to backlog the request and process it when the hardware has
queue space. At the point when the driver takes the request from the
backlog and starts processing it, it calls the completion function with
a status of -EINPROGRESS. The completion function is called (for a
second time, in the case of backlogged requests) with a status/err of 0
when a request is done.

Crypto drivers for hardware without hardware queueing use the helpers,
crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request()
and crypto_get_backlog() helpers to implement this behaviour correctly,
while others implement this behaviour without these helpers (ccp, for
example).

dm-crypt (before this patch) uses this API correctly. It queues up as
many requests as the hw queues will allow (i.e. as long as it gets back
-EINPROGRESS from the request function). Then, when it sees at least
one backlogged request (gets -EBUSY), it waits till that backlogged
request is handled (completion gets called with -EINPROGRESS), and then
continues. The references to af_alg_wait_for_completion() and
af_alg_complete() in the commit message are irrelevant because those
functions only handle one request at a time, unlink dm-crypt.

The problem is that the Freescale CAAM driver, which this problem is
described as being seen on, fails to implement the backlogging behaviour
correctly. In cam_jr_enqueue(), if the hardware queue is full, it
simply returns -EBUSY without backlogging the request. What the
observed deadlock was is not described in the commit message but it is
obviously the wait_for_completion() in crypto_convert() where dm-crypto
would wait for the completion being called with -EINPROGRESS in the case
of backlogged requests. This completion will never be completed due to
the bug in the CAAM driver.

What this patch does is that it makes dm-crypt wait for every request,
even when the driver/hardware queues are not full, which means that
dm-crypt will never see -EBUSY. This means that this patch will cause a
performance regression on all crypto drivers which implement the API
correctly.

So, I think this patch should be reverted in mainline and the stable
kernels where it has been merged, and correct backlog handling should be
implemented in the CAAM driver instead.


2015-05-04 21:40:23

by Ben Collins

[permalink] [raw]
Subject: Re: [PATCH 3.14 73/92] dm crypt: fix deadlock when async crypto algorithm returns -EBUSY


> On May 4, 2015, at 5:32 PM, Rabin Vincent <[email protected]> wrote:
>
> On Sat, May 02, 2015 at 09:03:28PM +0200, Greg Kroah-Hartman wrote:
>> 3.14-stable review patch. If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: Ben Collins <[email protected]>
>>
>> commit 0618764cb25f6fa9fb31152995de42a8a0496475 upstream.
>
> The commit in question was recently merged to mainline and is now being
> sent to various stable kernels. But I think the patch is wrong and the
> real problem lies in the Freescale CAAM driver which is described as
> having being tested with.

There may be something quirky with CAAM, yes, but the fact is, the code
in the dm-crypt module was very dissimilar to other drivers performing
the same loop/logic.

I don’t think that this patch, which was deemed correct, should be held
up based on lower level logic “possibly” being partially at fault. On the
contrary, I think errors you point out may just be showing the error in
dm-crypt more easily, rather than being the complete problem.

> commit 0618764cb25f6fa9fb31152995de42a8a0496475
> Author: Ben Collins <[email protected]>
> Date: Fri Apr 3 16:09:46 2015 +0000
>
> dm crypt: fix deadlock when async crypto algorithm returns -EBUSY
>
> I suspect this doesn't show up for most anyone because software
> algorithms typically don't have a sense of being too busy. However,
> when working with the Freescale CAAM driver it will return -EBUSY on
> occasion under heavy -- which resulted in dm-crypt deadlock.
>
> After checking the logic in some other drivers, the scheme for
> crypt_convert() and it's callback, kcryptd_async_done(), were not
> correctly laid out to properly handle -EBUSY or -EINPROGRESS.
>
> Fix this by using the completion for both -EBUSY and -EINPROGRESS. Now
> crypt_convert()'s use of completion is comparable to
> af_alg_wait_for_completion(). Similarly, kcryptd_async_done() follows
> the pattern used in af_alg_complete().
>
> Before this fix dm-crypt would lockup within 1-2 minutes running with
> the CAAM driver. Fix was regression tested against software algorithms
> on PPC32 and x86_64, and things seem perfectly happy there as well.
>
> Signed-off-by: Ben Collins <[email protected]>
> Signed-off-by: Mike Snitzer <[email protected]>
> Cc: [email protected]
>
> dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto
> driver should internally backlog requests which arrive when the queue is
> full and process them later. Until the crypto hw's queue becomes full,
> the driver returns -EINPROGRESS. When the crypto hw's queue if full,
> the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is
> expected to backlog the request and process it when the hardware has
> queue space. At the point when the driver takes the request from the
> backlog and starts processing it, it calls the completion function with
> a status of -EINPROGRESS. The completion function is called (for a
> second time, in the case of backlogged requests) with a status/err of 0
> when a request is done.
>
> Crypto drivers for hardware without hardware queueing use the helpers,
> crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request()
> and crypto_get_backlog() helpers to implement this behaviour correctly,
> while others implement this behaviour without these helpers (ccp, for
> example).
>
> dm-crypt (before this patch) uses this API correctly. It queues up as
> many requests as the hw queues will allow (i.e. as long as it gets back
> -EINPROGRESS from the request function). Then, when it sees at least
> one backlogged request (gets -EBUSY), it waits till that backlogged
> request is handled (completion gets called with -EINPROGRESS), and then
> continues. The references to af_alg_wait_for_completion() and
> af_alg_complete() in the commit message are irrelevant because those
> functions only handle one request at a time, unlink dm-crypt.
>
> The problem is that the Freescale CAAM driver, which this problem is
> described as being seen on, fails to implement the backlogging behaviour
> correctly. In cam_jr_enqueue(), if the hardware queue is full, it
> simply returns -EBUSY without backlogging the request. What the
> observed deadlock was is not described in the commit message but it is
> obviously the wait_for_completion() in crypto_convert() where dm-crypto
> would wait for the completion being called with -EINPROGRESS in the case
> of backlogged requests. This completion will never be completed due to
> the bug in the CAAM driver.
>
> What this patch does is that it makes dm-crypt wait for every request,
> even when the driver/hardware queues are not full, which means that
> dm-crypt will never see -EBUSY. This means that this patch will cause a
> performance regression on all crypto drivers which implement the API
> correctly.
>
> So, I think this patch should be reverted in mainline and the stable
> kernels where it has been merged, and correct backlog handling should be
> implemented in the CAAM driver instead.

——
Ben Collins
Cyphre Champion
——————————————
VP of Engineering
Servergy, Inc.


Attachments:
signature.asc (832.00 B)
Message signed with OpenPGP using GPGMail

2015-05-05 01:53:34

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 3.14 73/92] dm crypt: fix deadlock when async crypto algorithm returns -EBUSY

On Mon, May 04, 2015 at 11:32:54PM +0200, Rabin Vincent wrote:
> On Sat, May 02, 2015 at 09:03:28PM +0200, Greg Kroah-Hartman wrote:
> > 3.14-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Ben Collins <[email protected]>
> >
> > commit 0618764cb25f6fa9fb31152995de42a8a0496475 upstream.
>
> The commit in question was recently merged to mainline and is now being
> sent to various stable kernels. But I think the patch is wrong and the
> real problem lies in the Freescale CAAM driver which is described as
> having being tested with.

I agree. Please revert this patch.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-05-05 03:22:13

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 3.14 73/92] dm crypt: fix deadlock when async crypto algorithm returns -EBUSY

On Mon, May 04 2015 at 5:32pm -0400,
Rabin Vincent <[email protected]> wrote:

> On Sat, May 02, 2015 at 09:03:28PM +0200, Greg Kroah-Hartman wrote:
> > 3.14-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Ben Collins <[email protected]>
> >
> > commit 0618764cb25f6fa9fb31152995de42a8a0496475 upstream.
>
> The commit in question was recently merged to mainline and is now being
> sent to various stable kernels. But I think the patch is wrong and the
> real problem lies in the Freescale CAAM driver which is described as
> having being tested with.
>
> commit 0618764cb25f6fa9fb31152995de42a8a0496475
> Author: Ben Collins <[email protected]>
> Date: Fri Apr 3 16:09:46 2015 +0000
>
> dm crypt: fix deadlock when async crypto algorithm returns -EBUSY
>
> I suspect this doesn't show up for most anyone because software
> algorithms typically don't have a sense of being too busy. However,
> when working with the Freescale CAAM driver it will return -EBUSY on
> occasion under heavy -- which resulted in dm-crypt deadlock.
>
> After checking the logic in some other drivers, the scheme for
> crypt_convert() and it's callback, kcryptd_async_done(), were not
> correctly laid out to properly handle -EBUSY or -EINPROGRESS.
>
> Fix this by using the completion for both -EBUSY and -EINPROGRESS. Now
> crypt_convert()'s use of completion is comparable to
> af_alg_wait_for_completion(). Similarly, kcryptd_async_done() follows
> the pattern used in af_alg_complete().
>
> Before this fix dm-crypt would lockup within 1-2 minutes running with
> the CAAM driver. Fix was regression tested against software algorithms
> on PPC32 and x86_64, and things seem perfectly happy there as well.
>
> Signed-off-by: Ben Collins <[email protected]>
> Signed-off-by: Mike Snitzer <[email protected]>
> Cc: [email protected]
>
> dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto
> driver should internally backlog requests which arrive when the queue is
> full and process them later. Until the crypto hw's queue becomes full,
> the driver returns -EINPROGRESS. When the crypto hw's queue if full,
> the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is
> expected to backlog the request and process it when the hardware has
> queue space. At the point when the driver takes the request from the
> backlog and starts processing it, it calls the completion function with
> a status of -EINPROGRESS. The completion function is called (for a
> second time, in the case of backlogged requests) with a status/err of 0
> when a request is done.
>
> Crypto drivers for hardware without hardware queueing use the helpers,
> crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request()
> and crypto_get_backlog() helpers to implement this behaviour correctly,
> while others implement this behaviour without these helpers (ccp, for
> example).
>
> dm-crypt (before this patch) uses this API correctly. It queues up as
> many requests as the hw queues will allow (i.e. as long as it gets back
> -EINPROGRESS from the request function). Then, when it sees at least
> one backlogged request (gets -EBUSY), it waits till that backlogged
> request is handled (completion gets called with -EINPROGRESS), and then
> continues. The references to af_alg_wait_for_completion() and
> af_alg_complete() in the commit message are irrelevant because those
> functions only handle one request at a time, unlink dm-crypt.

OK, I clearly missed the subtleties of the API. dm-crypt.c definitely
needs more documentation related to CRYPTO_TFM_REQ_MAY_BACKLOG and the
implications it has on completions, etc.

Any chance you'd be willing to provide in-code comments in the
appropriate places in dm-crypt.c (after having reverted this patch in
question)?

I'd be happy to take the combination of the revert and documentation in
a single patch and get it to Linus for 4.0-rc3.

> The problem is that the Freescale CAAM driver, which this problem is
> described as being seen on, fails to implement the backlogging behaviour
> correctly. In cam_jr_enqueue(), if the hardware queue is full, it
> simply returns -EBUSY without backlogging the request. What the
> observed deadlock was is not described in the commit message but it is
> obviously the wait_for_completion() in crypto_convert() where dm-crypto
> would wait for the completion being called with -EINPROGRESS in the case
> of backlogged requests. This completion will never be completed due to
> the bug in the CAAM driver.
>
> What this patch does is that it makes dm-crypt wait for every request,
> even when the driver/hardware queues are not full, which means that
> dm-crypt will never see -EBUSY. This means that this patch will cause a
> performance regression on all crypto drivers which implement the API
> correctly.
>
> So, I think this patch should be reverted in mainline and the stable
> kernels where it has been merged, and correct backlog handling should be
> implemented in the CAAM driver instead.

Hopefully it hasn't been merged, Greg?

If it has been merged to stable@ then we'll have to split the revert
patch out from any documentation improvements.

2015-05-05 06:42:44

by Milan Broz

[permalink] [raw]
Subject: Re: [PATCH 3.14 73/92] dm crypt: fix deadlock when async crypto algorithm returns -EBUSY

On 05/05/2015 05:22 AM, Mike Snitzer wrote:
> On Mon, May 04 2015 at 5:32pm -0400,
> Rabin Vincent <[email protected]> wrote:
>
>> On Sat, May 02, 2015 at 09:03:28PM +0200, Greg Kroah-Hartman wrote:
>>> 3.14-stable review patch. If anyone has any objections, please let me know.
>>>
>>> ------------------
>>>
>>> From: Ben Collins <[email protected]>
>>>
>>> commit 0618764cb25f6fa9fb31152995de42a8a0496475 upstream.
>>
>> The commit in question was recently merged to mainline and is now being
>> sent to various stable kernels. But I think the patch is wrong and the
>> real problem lies in the Freescale CAAM driver which is described as
>> having being tested with.
...

>> dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto
>> driver should internally backlog requests which arrive when the queue is
>> full and process them later. Until the crypto hw's queue becomes full,
>> the driver returns -EINPROGRESS. When the crypto hw's queue if full,
>> the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is
>> expected to backlog the request and process it when the hardware has
>> queue space. At the point when the driver takes the request from the
>> backlog and starts processing it, it calls the completion function with
>> a status of -EINPROGRESS. The completion function is called (for a
>> second time, in the case of backlogged requests) with a status/err of 0
>> when a request is done.


Mike, I thought there was a discussion with crypto API maintainer before
merging this patch, I think there were some comments that the patch seems
to be not correct...
This API (EBUSY/EINPROGESS codes) use was there since dmcrypt switched
to async crypto API.

There should tests for it (some years ago we tested the async path by forcing
to use cryptd, this was able to clearly simulate the BUSY path as well),
but not sure if it is still possible so easily.

> Any chance you'd be willing to provide in-code comments in the
> appropriate places in dm-crypt.c (after having reverted this patch in
> question)?
>
> I'd be happy to take the combination of the revert and documentation in
> a single patch and get it to Linus for 4.0-rc3.

Please, do not mix revert patches with additions (even if it is just comment),
someone could be even more confused what's going on here.

If nobody volunteers, I'll try to add some comments later to dmcrypt code,
but that is definitely not for stable.

Thanks,
Milan

2015-05-05 12:50:42

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 3.14 73/92] dm crypt: fix deadlock when async crypto algorithm returns -EBUSY

On Tue, May 05 2015 at 2:42am -0400,
Milan Broz <[email protected]> wrote:

> On 05/05/2015 05:22 AM, Mike Snitzer wrote:
> > On Mon, May 04 2015 at 5:32pm -0400,
> > Rabin Vincent <[email protected]> wrote:
> >
> >> On Sat, May 02, 2015 at 09:03:28PM +0200, Greg Kroah-Hartman wrote:
> >>> 3.14-stable review patch. If anyone has any objections, please let me know.
> >>>
> >>> ------------------
> >>>
> >>> From: Ben Collins <[email protected]>
> >>>
> >>> commit 0618764cb25f6fa9fb31152995de42a8a0496475 upstream.
> >>
> >> The commit in question was recently merged to mainline and is now being
> >> sent to various stable kernels. But I think the patch is wrong and the
> >> real problem lies in the Freescale CAAM driver which is described as
> >> having being tested with.
> ...
>
> >> dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto
> >> driver should internally backlog requests which arrive when the queue is
> >> full and process them later. Until the crypto hw's queue becomes full,
> >> the driver returns -EINPROGRESS. When the crypto hw's queue if full,
> >> the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is
> >> expected to backlog the request and process it when the hardware has
> >> queue space. At the point when the driver takes the request from the
> >> backlog and starts processing it, it calls the completion function with
> >> a status of -EINPROGRESS. The completion function is called (for a
> >> second time, in the case of backlogged requests) with a status/err of 0
> >> when a request is done.
>
>
> Mike, I thought there was a discussion with crypto API maintainer before
> merging this patch, I think there were some comments that the patch seems
> to be not correct...

I unfortunately didn't cc Herbert on the original v2 patch (but I did
later bounce the mail to him IIRC). Regardless, no I didn't see any
feedback and I really should've been more active in engaging him.

> This API (EBUSY/EINPROGESS codes) use was there since dmcrypt switched
> to async crypto API.
>
> There should tests for it (some years ago we tested the async path by forcing
> to use cryptd, this was able to clearly simulate the BUSY path as well),
> but not sure if it is still possible so easily.
>
> > Any chance you'd be willing to provide in-code comments in the
> > appropriate places in dm-crypt.c (after having reverted this patch in
> > question)?
> >
> > I'd be happy to take the combination of the revert and documentation in
> > a single patch and get it to Linus for 4.0-rc3.
>
> Please, do not mix revert patches with additions (even if it is just comment),
> someone could be even more confused what's going on here.

I was only saying to mix comments and revert if the patch in question
didn't get into stable@ at all.

But safer to just do a pure revert. I get it queued up to send to Linus.

> If nobody volunteers, I'll try to add some comments later to dmcrypt code,
> but that is definitely not for stable.

Yeap, thanks.

2015-05-05 13:16:12

by Rabin Vincent

[permalink] [raw]
Subject: [PATCH] Revert "dm crypt: fix deadlock when async crypto algorithm returns -EBUSY"

This reverts commit 0618764cb25f6fa9fb31152995de42a8a0496475.

The problem which that commit attempts to fix actually lies in the
Freescale CAAM crypto driver.

dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto
driver should internally backlog requests which arrive when the queue is
full and process them later. Until the crypto hw's queue becomes full,
the driver returns -EINPROGRESS. When the crypto hw's queue if full,
the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is
expected to backlog the request and process it when the hardware has
queue space. At the point when the driver takes the request from the
backlog and starts processing it, it calls the completion function with
a status of -EINPROGRESS. The completion function is called (for a
second time, in the case of backlogged requests) with a status/err of 0
when a request is done.

Crypto drivers for hardware without hardware queueing use the helpers,
crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request()
and crypto_get_backlog() helpers to implement this behaviour correctly,
while others implement this behaviour without these helpers (ccp, for
example).

dm-crypt (before this patch) uses this API correctly. It queues up as
many requests as the hw queues will allow (i.e. as long as it gets back
-EINPROGRESS from the request function). Then, when it sees at least
one backlogged request (gets -EBUSY), it waits till that backlogged
request is handled (completion gets called with -EINPROGRESS), and then
continues. The references to af_alg_wait_for_completion() and
af_alg_complete() in that commit's commit message are irrelevant because
those functions only handle one request at a time, unlink dm-crypt.

The problem is that the Freescale CAAM driver, which that commit
describes as having being tested with, fails to implement the
backlogging behaviour correctly. In cam_jr_enqueue(), if the hardware
queue is full, it simply returns -EBUSY without backlogging the request.
What the observed deadlock was is not described in the commit message
but it is obviously the wait_for_completion() in crypto_convert() where
dm-crypto would wait for the completion being called with -EINPROGRESS
in the case of backlogged requests. This completion will never be
completed due to the bug in the CAAM driver.

What that commit does is that it makes dm-crypt wait for every request,
even when the driver/hardware queues are not full, which means that
dm-crypt will never see -EBUSY. This means that that commit will cause
a performance regression on all crypto drivers which implement the API
correctly.

Revert it. Correct backlog handling should be implemented in the CAAM
driver instead.

Signed-off-by: Rabin Vincent <[email protected]>
---
drivers/md/dm-crypt.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 9eeea19..5503e43 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -925,10 +925,11 @@ static int crypt_convert(struct crypt_config *cc,

switch (r) {
/* async */
- case -EINPROGRESS:
case -EBUSY:
wait_for_completion(&ctx->restart);
reinit_completion(&ctx->restart);
+ /* fall through*/
+ case -EINPROGRESS:
ctx->req = NULL;
ctx->cc_sector++;
continue;
@@ -1345,8 +1346,10 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
struct dm_crypt_io *io = container_of(ctx, struct dm_crypt_io, ctx);
struct crypt_config *cc = io->cc;

- if (error == -EINPROGRESS)
+ if (error == -EINPROGRESS) {
+ complete(&ctx->restart);
return;
+ }

if (!error && cc->iv_gen_ops && cc->iv_gen_ops->post)
error = cc->iv_gen_ops->post(cc, iv_of_dmreq(cc, dmreq), dmreq);
@@ -1357,15 +1360,12 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
crypt_free_req(cc, req_of_dmreq(cc, dmreq), io->base_bio);

if (!atomic_dec_and_test(&ctx->cc_pending))
- goto done;
+ return;

if (bio_data_dir(io->base_bio) == READ)
kcryptd_crypt_read_done(io);
else
kcryptd_crypt_write_io_submit(io, 1);
-done:
- if (!completion_done(&ctx->restart))
- complete(&ctx->restart);
}

static void kcryptd_crypt(struct work_struct *work)
--
1.7.10.4

2015-05-05 13:23:10

by Horia Geantă

[permalink] [raw]
Subject: Re: [PATCH] Revert "dm crypt: fix deadlock when async crypto algorithm returns -EBUSY"

On 5/5/2015 4:15 PM, Rabin Vincent wrote:
> This reverts commit 0618764cb25f6fa9fb31152995de42a8a0496475.
>
> The problem which that commit attempts to fix actually lies in the
> Freescale CAAM crypto driver.
>
> dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto
> driver should internally backlog requests which arrive when the queue is
> full and process them later. Until the crypto hw's queue becomes full,
> the driver returns -EINPROGRESS. When the crypto hw's queue if full,
> the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is
> expected to backlog the request and process it when the hardware has
> queue space. At the point when the driver takes the request from the
> backlog and starts processing it, it calls the completion function with
> a status of -EINPROGRESS. The completion function is called (for a
> second time, in the case of backlogged requests) with a status/err of 0
> when a request is done.
>
> Crypto drivers for hardware without hardware queueing use the helpers,
> crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request()
> and crypto_get_backlog() helpers to implement this behaviour correctly,
> while others implement this behaviour without these helpers (ccp, for
> example).
>
> dm-crypt (before this patch) uses this API correctly. It queues up as
> many requests as the hw queues will allow (i.e. as long as it gets back
> -EINPROGRESS from the request function). Then, when it sees at least
> one backlogged request (gets -EBUSY), it waits till that backlogged
> request is handled (completion gets called with -EINPROGRESS), and then
> continues. The references to af_alg_wait_for_completion() and
> af_alg_complete() in that commit's commit message are irrelevant because
> those functions only handle one request at a time, unlink dm-crypt.
>
> The problem is that the Freescale CAAM driver, which that commit
> describes as having being tested with, fails to implement the
> backlogging behaviour correctly. In cam_jr_enqueue(), if the hardware
> queue is full, it simply returns -EBUSY without backlogging the request.
> What the observed deadlock was is not described in the commit message
> but it is obviously the wait_for_completion() in crypto_convert() where
> dm-crypto would wait for the completion being called with -EINPROGRESS
> in the case of backlogged requests. This completion will never be
> completed due to the bug in the CAAM driver.
>
> What that commit does is that it makes dm-crypt wait for every request,
> even when the driver/hardware queues are not full, which means that
> dm-crypt will never see -EBUSY. This means that that commit will cause
> a performance regression on all crypto drivers which implement the API
> correctly.
>
> Revert it. Correct backlog handling should be implemented in the CAAM
> driver instead.
>
> Signed-off-by: Rabin Vincent <[email protected]>

Reviewed-by: Horia Geanta <[email protected]>

I confirm CAAM crypto driver currently lacks backlogging support.

Horia