2017-10-25 15:26:53

by Romain Izard

[permalink] [raw]
Subject: "BUG: scheduling while atomic" in atmel-aes on Linux v4.14-rc6

Hello,

While running the kcapi test suite on a SAMA5D2 Xplained board with a
v4.14-rc6 kernel, I encountered the following error:

# kcapi -x 9 -e -c "cbc(aes)" -i 00000000000000000000000000000000 -k 00000000000
0000000000000000000000000000000000000 -p 1b077a6af4b7f98229de786d7516b639
BUG: scheduling while atomic: kcapi/926/0x00000100
CPU: 0 PID: 926 Comm: kcapi Not tainted 4.14.0-rc6 #2
Hardware name: Atmel SAMA5
[<c010caac>] (unwind_backtrace) from [<c010a9d8>] (show_stack+0x10/0x14)
[<c010a9d8>] (show_stack) from [<c013a914>] (__schedule_bug+0x60/0x80)
[<c013a914>] (__schedule_bug) from [<c074ac70>] (__schedule+0x368/0x3fc)
[<c074ac70>] (__schedule) from [<c074ad74>] (schedule+0x40/0xa0)
[<c074ad74>] (schedule) from [<c05cb96c>] (__lock_sock+0x78/0xb0)
[<c05cb96c>] (__lock_sock) from [<c05cddd0>] (lock_sock_nested+0x48/0x50)
[<c05cddd0>] (lock_sock_nested) from [<c030eec8>] (af_alg_async_cb+0x20/0x80)
[<c030eec8>] (af_alg_async_cb) from [<c0573b70>]
(atmel_aes_transfer_complete+0x38/0x68)
[<c0573b70>] (atmel_aes_transfer_complete) from [<c0121258>]
(tasklet_action+0x68/0xb4)
[<c0121258>] (tasklet_action) from [<c010158c>] (__do_softirq+0xc4/0x250)
[<c010158c>] (__do_softirq) from [<c01215f0>] (irq_exit+0xfc/0x130)
[<c01215f0>] (irq_exit) from [<c014a208>] (__handle_domain_irq+0x58/0xa8)
[<c014a208>] (__handle_domain_irq) from [<c010b60c>] (__irq_svc+0x6c/0x90)
[<c010b60c>] (__irq_svc) from [<c0310280>] (skcipher_recvmsg+0x2d8/0x318)
[<c0310280>] (skcipher_recvmsg) from [<c05c8794>] (sock_read_iter+0x88/0xc8)
[<c05c8794>] (sock_read_iter) from [<c01f1940>]
(aio_read.constprop.3+0xcc/0x178)
[<c01f1940>] (aio_read.constprop.3) from [<c01f2968>]
(SyS_io_submit+0x540/0x644)
[<c01f2968>] (SyS_io_submit) from [<c0107480>] (ret_fast_syscall+0x0/0x48)

After bisecting, I determined that it appeared during the 4.14 merge
window, with the following commit:

e870456d8e7c crypto: algif_skcipher - overhaul memory management


Best regards,
--
Romain Izard


2017-10-25 15:59:37

by Stephan Müller

[permalink] [raw]
Subject: Re: "BUG: scheduling while atomic" in atmel-aes on Linux v4.14-rc6

Am Mittwoch, 25. Oktober 2017, 17:26:31 CEST schrieb Romain Izard:

Hi Romain, Herbert,


> Hello,
>
> While running the kcapi test suite on a SAMA5D2 Xplained board with a
> v4.14-rc6 kernel, I encountered the following error:

Thank you for the report.
>
> # kcapi -x 9 -e -c "cbc(aes)" -i 00000000000000000000000000000000 -k
> 00000000000 0000000000000000000000000000000000000 -p
> 1b077a6af4b7f98229de786d7516b639 BUG: scheduling while atomic:
> kcapi/926/0x00000100
> CPU: 0 PID: 926 Comm: kcapi Not tainted 4.14.0-rc6 #2
> Hardware name: Atmel SAMA5
> [<c010caac>] (unwind_backtrace) from [<c010a9d8>] (show_stack+0x10/0x14)
> [<c010a9d8>] (show_stack) from [<c013a914>] (__schedule_bug+0x60/0x80)
> [<c013a914>] (__schedule_bug) from [<c074ac70>] (__schedule+0x368/0x3fc)
> [<c074ac70>] (__schedule) from [<c074ad74>] (schedule+0x40/0xa0)
> [<c074ad74>] (schedule) from [<c05cb96c>] (__lock_sock+0x78/0xb0)
> [<c05cb96c>] (__lock_sock) from [<c05cddd0>] (lock_sock_nested+0x48/0x50)
> [<c05cddd0>] (lock_sock_nested) from [<c030eec8>]
> (af_alg_async_cb+0x20/0x80) [<c030eec8>] (af_alg_async_cb) from
> [<c0573b70>]
> (atmel_aes_transfer_complete+0x38/0x68)
> [<c0573b70>] (atmel_aes_transfer_complete) from [<c0121258>]
> (tasklet_action+0x68/0xb4)
> [<c0121258>] (tasklet_action) from [<c010158c>] (__do_softirq+0xc4/0x250)
> [<c010158c>] (__do_softirq) from [<c01215f0>] (irq_exit+0xfc/0x130)
> [<c01215f0>] (irq_exit) from [<c014a208>] (__handle_domain_irq+0x58/0xa8)
> [<c014a208>] (__handle_domain_irq) from [<c010b60c>] (__irq_svc+0x6c/0x90)
> [<c010b60c>] (__irq_svc) from [<c0310280>] (skcipher_recvmsg+0x2d8/0x318)
> [<c0310280>] (skcipher_recvmsg) from [<c05c8794>] (sock_read_iter+0x88/0xc8)
> [<c05c8794>] (sock_read_iter) from [<c01f1940>]
> (aio_read.constprop.3+0xcc/0x178)
> [<c01f1940>] (aio_read.constprop.3) from [<c01f2968>]
> (SyS_io_submit+0x540/0x644)
> [<c01f2968>] (SyS_io_submit) from [<c0107480>] (ret_fast_syscall+0x0/0x48)
>
> After bisecting, I determined that it appeared during the 4.14 merge
> window, with the following commit:
>
> e870456d8e7c crypto: algif_skcipher - overhaul memory management

I think the culprit is the lock_sock in af_alg_async_cb. When going through
the code protected by the lock, I actually do not see that the struct sock is
actually accessed in any way other than with an atomic operation. Thus, I
would infer that lock/unlock in af_alg_async_cb could be safely removed.

Would you agree, Herbert?


Ciao
Stephan

2017-10-29 20:39:36

by Stephan Müller

[permalink] [raw]
Subject: [PATCH] crypto: AF_ALG - remove locking in async callback

Am Mittwoch, 25. Oktober 2017, 17:26:31 CET schrieb Romain Izard:

Hi Romain,

the patch below should cover the issue you see. Would you mind testing it?

Thanks
Stephan

---8<---

The code paths protected by the socket-lock do not use or modify the
socket in a non-atomic fashion. The actions pertaining the socket do not
even need to be handled as an atomic operation. Thus, the socket-lock
can be safely ignored.

This fixes a bug regarding scheduling in atomic as the callback function
may be invoked in interrupt context.

Fixes: 2d97591ef43d0 ("crypto: af_alg - consolidation of duplicate code")
Reported-by: Romain Izard <[email protected]>
Signed-off-by: Stephan Mueller <[email protected]>
---
crypto/af_alg.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 337cf382718e..a41f08642eee 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1063,8 +1063,6 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err)
struct kiocb *iocb = areq->iocb;
unsigned int resultlen;

- lock_sock(sk);
-
/* Buffer size written by crypto operation. */
resultlen = areq->outlen;

@@ -1073,8 +1071,6 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err)
__sock_put(sk);

iocb->ki_complete(iocb, err ? err : resultlen, 0);
-
- release_sock(sk);
}
EXPORT_SYMBOL_GPL(af_alg_async_cb);

--
2.13.6

2017-10-30 17:15:53

by Romain Izard

[permalink] [raw]
Subject: Re: [PATCH] crypto: AF_ALG - remove locking in async callback

2017-10-29 21:39 GMT+01:00 Stephan Müller <[email protected]>:
> Am Mittwoch, 25. Oktober 2017, 17:26:31 CET schrieb Romain Izard:
>
> Hi Romain,
>
> the patch below should cover the issue you see. Would you mind testing it?
>
> Thanks
> Stephan
>
> ---8<---
>
> The code paths protected by the socket-lock do not use or modify the
> socket in a non-atomic fashion. The actions pertaining the socket do not
> even need to be handled as an atomic operation. Thus, the socket-lock
> can be safely ignored.
>
> This fixes a bug regarding scheduling in atomic as the callback function
> may be invoked in interrupt context.
>
> Fixes: 2d97591ef43d0 ("crypto: af_alg - consolidation of duplicate code")
> Reported-by: Romain Izard <[email protected]>
> Signed-off-by: Stephan Mueller <[email protected]>

Tested-by: Romain Izard <[email protected]>

The issue observed with atmel-aes is not reproduced anymore.

> ---
> crypto/af_alg.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> index 337cf382718e..a41f08642eee 100644
> --- a/crypto/af_alg.c
> +++ b/crypto/af_alg.c
> @@ -1063,8 +1063,6 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err)
> struct kiocb *iocb = areq->iocb;
> unsigned int resultlen;
>
> - lock_sock(sk);
> -
> /* Buffer size written by crypto operation. */
> resultlen = areq->outlen;
>
> @@ -1073,8 +1071,6 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err)
> __sock_put(sk);
>
> iocb->ki_complete(iocb, err ? err : resultlen, 0);
> -
> - release_sock(sk);
> }
> EXPORT_SYMBOL_GPL(af_alg_async_cb);
>
> --
> 2.13.6
>
>

2017-11-03 13:20:47

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: AF_ALG - remove locking in async callback

On Sun, Oct 29, 2017 at 09:39:30PM +0100, Stephan M?ller wrote:
> Am Mittwoch, 25. Oktober 2017, 17:26:31 CET schrieb Romain Izard:
>
> Hi Romain,
>
> the patch below should cover the issue you see. Would you mind testing it?
>
> Thanks
> Stephan
>
> ---8<---
>
> The code paths protected by the socket-lock do not use or modify the
> socket in a non-atomic fashion. The actions pertaining the socket do not
> even need to be handled as an atomic operation. Thus, the socket-lock
> can be safely ignored.

Are you sure about that? In particular is the callback function still
sane without the socket lock if a concurrent recvmsg/sendmsg call is
made?

Your fixes header is wrong too as the locks weren't introduced in that
commit, they just got moved around.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2017-11-03 13:34:47

by Stephan Müller

[permalink] [raw]
Subject: Re: [PATCH] crypto: AF_ALG - remove locking in async callback

Am Freitag, 3. November 2017, 14:20:16 CET schrieb Herbert Xu:

Hi Herbert,

> On Sun, Oct 29, 2017 at 09:39:30PM +0100, Stephan M?ller wrote:
> > Am Mittwoch, 25. Oktober 2017, 17:26:31 CET schrieb Romain Izard:
> >
> > Hi Romain,
> >
> > the patch below should cover the issue you see. Would you mind testing it?
> >
> > Thanks
> > Stephan
> >
> > ---8<---
> >
> > The code paths protected by the socket-lock do not use or modify the
> > socket in a non-atomic fashion. The actions pertaining the socket do not
> > even need to be handled as an atomic operation. Thus, the socket-lock
> > can be safely ignored.
>
> Are you sure about that? In particular is the callback function still
> sane without the socket lock if a concurrent recvmsg/sendmsg call is
> made?

resultlen receives its data from the async_request -> no socket

af_alg_free_areq_sgls(areq) does not require a socket, but it uses the socket
to find the data structures -> I do not see that the socket is operated on
though. The socket will always be alive as the sk_refcnt is not yet
decremented by __sock_put.

sock_kfree_s uses an atomic operation

__sock_put uses an atomic operation

iocb->ki_complete does not use the socket

Where would you think that the lock is needed?
>
> Your fixes header is wrong too as the locks weren't introduced in that
> commit, they just got moved around.

Neither the skcipher_async_cb nor aead_async_cb up to and including 4.13
contain any lock.

Ciao
Stephan

2017-11-06 16:06:13

by Stephan Müller

[permalink] [raw]
Subject: Re: [PATCH] crypto: AF_ALG - remove locking in async callback

Am Freitag, 3. November 2017, 14:20:16 CET schrieb Herbert Xu:

Hi Herbert,

> >
> > The code paths protected by the socket-lock do not use or modify the
> > socket in a non-atomic fashion. The actions pertaining the socket do not
> > even need to be handled as an atomic operation. Thus, the socket-lock
> > can be safely ignored.
>
> Are you sure about that? In particular is the callback function still
> sane without the socket lock if a concurrent recvmsg/sendmsg call is
> made?

I reviewed the code again and I cannot find a reason for keeping the lock. All
we need to ensure is that the socket exists. This is ensured with the refcount
of the socket released by __sock_put().

Would you mind helping me where you think the lock is needed.
>
> Your fixes header is wrong too as the locks weren't introduced in that
> commit, they just got moved around.

Correct, the initial introduction was in
e870456d8e7c8d57c059ea479b5aadbb55ff4c3a (algif_skcipher) and
d887c52d6ae43aeebd249b5f2f1333e60236aa60 (algif_aead)

Thanks a lot.

Ciao
Stephan

2017-11-07 05:23:10

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: AF_ALG - remove locking in async callback

On Mon, Nov 06, 2017 at 05:06:09PM +0100, Stephan Mueller wrote:
> Am Freitag, 3. November 2017, 14:20:16 CET schrieb Herbert Xu:
>
> > Are you sure about that? In particular is the callback function still
> > sane without the socket lock if a concurrent recvmsg/sendmsg call is
> > made?
>
> I reviewed the code again and I cannot find a reason for keeping the lock. All
> we need to ensure is that the socket exists. This is ensured with the refcount
> of the socket released by __sock_put().

OK, I can't see why we need a lock there either. However, the call
to __sock_put looks suspicious. Why isn't this using sock_put?

Also the sock_hold on the caller side looks buggy. Surely it needs
to be made before we even call the encrypt/decrypt functions rather
than after it returns EINPROGRESS at which point it may well be too
late?

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2017-11-07 06:19:35

by Stephan Müller

[permalink] [raw]
Subject: Re: [PATCH] crypto: AF_ALG - remove locking in async callback

Am Dienstag, 7. November 2017, 06:22:35 CET schrieb Herbert Xu:

Hi Herbert,

> On Mon, Nov 06, 2017 at 05:06:09PM +0100, Stephan Mueller wrote:
> > Am Freitag, 3. November 2017, 14:20:16 CET schrieb Herbert Xu:
> > > Are you sure about that? In particular is the callback function still
> > > sane without the socket lock if a concurrent recvmsg/sendmsg call is
> > > made?
> >
> > I reviewed the code again and I cannot find a reason for keeping the lock.
> > All we need to ensure is that the socket exists. This is ensured with the
> > refcount of the socket released by __sock_put().
>
> OK, I can't see why we need a lock there either. However, the call
> to __sock_put looks suspicious. Why isn't this using sock_put?

I simply ported the existing code from algif_aead over -- but I think you are
right that sock_put is more appropriate.
>
> Also the sock_hold on the caller side looks buggy. Surely it needs
> to be made before we even call the encrypt/decrypt functions rather
> than after it returns EINPROGRESS at which point it may well be too
> late?

I would concur. The sock_hold would need to be moved from the EINPROGRESS
conditional to before the AIO enc/dec operation is invoked.

Where I am not fully sure is whether af_alg_async_cb is called in any case.
I.e. when we invoke an AIO operation with a cipher that completes
synchronously (e.g. AES-NI), is this callback triggered?

Ciao
Stephan

2017-11-07 06:32:48

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: AF_ALG - remove locking in async callback

On Tue, Nov 07, 2017 at 07:19:32AM +0100, Stephan M?ller wrote:
>
> Where I am not fully sure is whether af_alg_async_cb is called in any case.
> I.e. when we invoke an AIO operation with a cipher that completes
> synchronously (e.g. AES-NI), is this callback triggered?

It's the same with any other callback in the crypto API. The
completion function is called if the return value is EINPROGRESS
or EBUSY.

In fact the algif_aead/algif_skcipher code already frees the SG lists
and areq on a sync return. So perhaps you can merge this with the
completion function.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2017-11-07 09:05:52

by Stephan Müller

[permalink] [raw]
Subject: [PATCH v2] crypto: AF_ALG - remove locking in async callback

The code paths protected by the socket-lock do not use or modify the
socket in a non-atomic fashion. The actions pertaining the socket do not
even need to be handled as an atomic operation. Thus, the socket-lock
can be safely ignored.

This fixes a bug regarding scheduling in atomic as the callback function
may be invoked in interrupt context.

In addition, the sock_hold is moved before the AIO encrypt/decrypt
operation to ensure that the socket is always present. This avoids a
tiny race window where the socket is unprotected and yet used by the AIO
operation.

Finally, the release of resources for a crypto operation is moved into a
common function of af_alg_free_resources.

Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory management")
Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory management")
Reported-by: Romain Izard <[email protected]>
Signed-off-by: Stephan Mueller <[email protected]>
---
crypto/af_alg.c | 21 ++++++++++++++-------
crypto/algif_aead.c | 23 ++++++++++++-----------
crypto/algif_skcipher.c | 23 ++++++++++++-----------
include/crypto/if_alg.h | 1 +
4 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 85cea9de324a..358749c38894 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1021,6 +1021,18 @@ ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
EXPORT_SYMBOL_GPL(af_alg_sendpage);

/**
+ * af_alg_free_resources - release resources required for crypto request
+ */
+void af_alg_free_resources(struct af_alg_async_req *areq)
+{
+ struct sock *sk = areq->sk;
+
+ af_alg_free_areq_sgls(areq);
+ sock_kfree_s(sk, areq, areq->areqlen);
+}
+EXPORT_SYMBOL_GPL(af_alg_free_resources);
+
+/**
* af_alg_async_cb - AIO callback handler
*
* This handler cleans up the struct af_alg_async_req upon completion of the
@@ -1036,18 +1048,13 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err)
struct kiocb *iocb = areq->iocb;
unsigned int resultlen;

- lock_sock(sk);
-
/* Buffer size written by crypto operation. */
resultlen = areq->outlen;

- af_alg_free_areq_sgls(areq);
- sock_kfree_s(sk, areq, areq->areqlen);
- __sock_put(sk);
+ af_alg_free_resources(areq);
+ sock_put(sk);

iocb->ki_complete(iocb, err ? err : resultlen, 0);
-
- release_sock(sk);
}
EXPORT_SYMBOL_GPL(af_alg_async_cb);

diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index aacae0837aff..6cdd4fb08335 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -268,12 +268,23 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,

if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
/* AIO operation */
+ sock_hold(sk);
areq->iocb = msg->msg_iocb;
aead_request_set_callback(&areq->cra_u.aead_req,
CRYPTO_TFM_REQ_MAY_BACKLOG,
af_alg_async_cb, areq);
err = ctx->enc ? crypto_aead_encrypt(&areq->cra_u.aead_req) :
crypto_aead_decrypt(&areq->cra_u.aead_req);
+
+ /* AIO operation in progress */
+ if (err == -EINPROGRESS) {
+ /* Remember output size that will be generated. */
+ areq->outlen = outlen;
+
+ return -EIOCBQUEUED;
+ }
+
+ sock_put(sk);
} else {
/* Synchronous operation */
aead_request_set_callback(&areq->cra_u.aead_req,
@@ -285,19 +296,9 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
&ctx->wait);
}

- /* AIO operation in progress */
- if (err == -EINPROGRESS) {
- sock_hold(sk);
-
- /* Remember output size that will be generated. */
- areq->outlen = outlen;
-
- return -EIOCBQUEUED;
- }

free:
- af_alg_free_areq_sgls(areq);
- sock_kfree_s(sk, areq, areq->areqlen);
+ af_alg_free_resources(areq);

return err ? err : outlen;
}
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index 9954b078f0b9..0e0dd44f0545 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -117,6 +117,7 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,

if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
/* AIO operation */
+ sock_hold(sk);
areq->iocb = msg->msg_iocb;
skcipher_request_set_callback(&areq->cra_u.skcipher_req,
CRYPTO_TFM_REQ_MAY_SLEEP,
@@ -124,6 +125,16 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
err = ctx->enc ?
crypto_skcipher_encrypt(&areq->cra_u.skcipher_req) :
crypto_skcipher_decrypt(&areq->cra_u.skcipher_req);
+
+ /* AIO operation in progress */
+ if (err == -EINPROGRESS) {
+ /* Remember output size that will be generated. */
+ areq->outlen = len;
+
+ return -EIOCBQUEUED;
+ }
+
+ sock_put(sk);
} else {
/* Synchronous operation */
skcipher_request_set_callback(&areq->cra_u.skcipher_req,
@@ -136,19 +147,9 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
&ctx->wait);
}

- /* AIO operation in progress */
- if (err == -EINPROGRESS) {
- sock_hold(sk);
-
- /* Remember output size that will be generated. */
- areq->outlen = len;
-
- return -EIOCBQUEUED;
- }

free:
- af_alg_free_areq_sgls(areq);
- sock_kfree_s(sk, areq, areq->areqlen);
+ af_alg_free_resources(areq);

return err ? err : len;
}
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index 6abf0a3604dc..38d9c5861ed8 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -242,6 +242,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
unsigned int ivsize);
ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
int offset, size_t size, int flags);
+void af_alg_free_resources(struct af_alg_async_req *areq);
void af_alg_async_cb(struct crypto_async_request *_req, int err);
unsigned int af_alg_poll(struct file *file, struct socket *sock,
poll_table *wait);
--
2.13.6

2017-11-10 11:11:17

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2] crypto: AF_ALG - remove locking in async callback

On Tue, Nov 07, 2017 at 10:05:48AM +0100, Stephan M?ller wrote:
>
> if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
> /* AIO operation */
> + sock_hold(sk);
> areq->iocb = msg->msg_iocb;
> aead_request_set_callback(&areq->cra_u.aead_req,
> CRYPTO_TFM_REQ_MAY_BACKLOG,
> af_alg_async_cb, areq);
> err = ctx->enc ? crypto_aead_encrypt(&areq->cra_u.aead_req) :
> crypto_aead_decrypt(&areq->cra_u.aead_req);
> +
> + /* AIO operation in progress */
> + if (err == -EINPROGRESS) {
> + /* Remember output size that will be generated. */
> + areq->outlen = outlen;
> +
> + return -EIOCBQUEUED;
> + }

Since we're allowing backlogs, what about EBUSY?

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2017-11-10 12:20:58

by Stephan Müller

[permalink] [raw]
Subject: [PATCH v3] crypto: AF_ALG - remove locking in async callback

The code paths protected by the socket-lock do not use or modify the
socket in a non-atomic fashion. The actions pertaining the socket do not
even need to be handled as an atomic operation. Thus, the socket-lock
can be safely ignored.

This fixes a bug regarding scheduling in atomic as the callback function
may be invoked in interrupt context.

In addition, the sock_hold is moved before the AIO encrypt/decrypt
operation to ensure that the socket is always present. This avoids a
tiny race window where the socket is unprotected and yet used by the AIO
operation.

Finally, the release of resources for a crypto operation is moved into a
common function of af_alg_free_resources.

Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory management")
Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory management")
Reported-by: Romain Izard <[email protected]>
Signed-off-by: Stephan Mueller <[email protected]>
---
crypto/af_alg.c | 21 ++++++++++++++-------
crypto/algif_aead.c | 23 ++++++++++++-----------
crypto/algif_skcipher.c | 23 ++++++++++++-----------
include/crypto/if_alg.h | 1 +
4 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 85cea9de324a..358749c38894 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1021,6 +1021,18 @@ ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
EXPORT_SYMBOL_GPL(af_alg_sendpage);

/**
+ * af_alg_free_resources - release resources required for crypto request
+ */
+void af_alg_free_resources(struct af_alg_async_req *areq)
+{
+ struct sock *sk = areq->sk;
+
+ af_alg_free_areq_sgls(areq);
+ sock_kfree_s(sk, areq, areq->areqlen);
+}
+EXPORT_SYMBOL_GPL(af_alg_free_resources);
+
+/**
* af_alg_async_cb - AIO callback handler
*
* This handler cleans up the struct af_alg_async_req upon completion of the
@@ -1036,18 +1048,13 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err)
struct kiocb *iocb = areq->iocb;
unsigned int resultlen;

- lock_sock(sk);
-
/* Buffer size written by crypto operation. */
resultlen = areq->outlen;

- af_alg_free_areq_sgls(areq);
- sock_kfree_s(sk, areq, areq->areqlen);
- __sock_put(sk);
+ af_alg_free_resources(areq);
+ sock_put(sk);

iocb->ki_complete(iocb, err ? err : resultlen, 0);
-
- release_sock(sk);
}
EXPORT_SYMBOL_GPL(af_alg_async_cb);

diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index aacae0837aff..db9378a45296 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -268,12 +268,23 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,

if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
/* AIO operation */
+ sock_hold(sk);
areq->iocb = msg->msg_iocb;
aead_request_set_callback(&areq->cra_u.aead_req,
CRYPTO_TFM_REQ_MAY_BACKLOG,
af_alg_async_cb, areq);
err = ctx->enc ? crypto_aead_encrypt(&areq->cra_u.aead_req) :
crypto_aead_decrypt(&areq->cra_u.aead_req);
+
+ /* AIO operation in progress */
+ if (err == -EINPROGRESS || err == -EBUSY) {
+ /* Remember output size that will be generated. */
+ areq->outlen = outlen;
+
+ return -EIOCBQUEUED;
+ }
+
+ sock_put(sk);
} else {
/* Synchronous operation */
aead_request_set_callback(&areq->cra_u.aead_req,
@@ -285,19 +296,9 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
&ctx->wait);
}

- /* AIO operation in progress */
- if (err == -EINPROGRESS) {
- sock_hold(sk);
-
- /* Remember output size that will be generated. */
- areq->outlen = outlen;
-
- return -EIOCBQUEUED;
- }

free:
- af_alg_free_areq_sgls(areq);
- sock_kfree_s(sk, areq, areq->areqlen);
+ af_alg_free_resources(areq);

return err ? err : outlen;
}
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index 9954b078f0b9..30cff827dd8f 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -117,6 +117,7 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,

if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
/* AIO operation */
+ sock_hold(sk);
areq->iocb = msg->msg_iocb;
skcipher_request_set_callback(&areq->cra_u.skcipher_req,
CRYPTO_TFM_REQ_MAY_SLEEP,
@@ -124,6 +125,16 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
err = ctx->enc ?
crypto_skcipher_encrypt(&areq->cra_u.skcipher_req) :
crypto_skcipher_decrypt(&areq->cra_u.skcipher_req);
+
+ /* AIO operation in progress */
+ if (err == -EINPROGRESS || err == -EBUSY) {
+ /* Remember output size that will be generated. */
+ areq->outlen = len;
+
+ return -EIOCBQUEUED;
+ }
+
+ sock_put(sk);
} else {
/* Synchronous operation */
skcipher_request_set_callback(&areq->cra_u.skcipher_req,
@@ -136,19 +147,9 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
&ctx->wait);
}

- /* AIO operation in progress */
- if (err == -EINPROGRESS) {
- sock_hold(sk);
-
- /* Remember output size that will be generated. */
- areq->outlen = len;
-
- return -EIOCBQUEUED;
- }

free:
- af_alg_free_areq_sgls(areq);
- sock_kfree_s(sk, areq, areq->areqlen);
+ af_alg_free_resources(areq);

return err ? err : len;
}
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index 6abf0a3604dc..38d9c5861ed8 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -242,6 +242,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
unsigned int ivsize);
ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
int offset, size_t size, int flags);
+void af_alg_free_resources(struct af_alg_async_req *areq);
void af_alg_async_cb(struct crypto_async_request *_req, int err);
unsigned int af_alg_poll(struct file *file, struct socket *sock,
poll_table *wait);
--
2.13.6

2017-11-10 16:51:09

by Romain Izard

[permalink] [raw]
Subject: Re: [PATCH v3] crypto: AF_ALG - remove locking in async callback

2017-11-10 13:20 GMT+01:00 Stephan Müller <[email protected]>:
> The code paths protected by the socket-lock do not use or modify the
> socket in a non-atomic fashion. The actions pertaining the socket do not
> even need to be handled as an atomic operation. Thus, the socket-lock
> can be safely ignored.
>
> This fixes a bug regarding scheduling in atomic as the callback function
> may be invoked in interrupt context.
>
> In addition, the sock_hold is moved before the AIO encrypt/decrypt
> operation to ensure that the socket is always present. This avoids a
> tiny race window where the socket is unprotected and yet used by the AIO
> operation.
>
> Finally, the release of resources for a crypto operation is moved into a
> common function of af_alg_free_resources.
>
> Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory management")
> Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory management")
> Reported-by: Romain Izard <[email protected]>
> Signed-off-by: Stephan Mueller <[email protected]>

Tested-by: Romain Izard <[email protected]>

On 4.14-rc8, with some fuzzing when applying the patch.
The deterministic crash is not reproduced.

> ---
> crypto/af_alg.c | 21 ++++++++++++++-------
> crypto/algif_aead.c | 23 ++++++++++++-----------
> crypto/algif_skcipher.c | 23 ++++++++++++-----------
> include/crypto/if_alg.h | 1 +
> 4 files changed, 39 insertions(+), 29 deletions(-)
>
> diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> index 85cea9de324a..358749c38894 100644
> --- a/crypto/af_alg.c
> +++ b/crypto/af_alg.c
> @@ -1021,6 +1021,18 @@ ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
> EXPORT_SYMBOL_GPL(af_alg_sendpage);
>
> /**
> + * af_alg_free_resources - release resources required for crypto request
> + */
> +void af_alg_free_resources(struct af_alg_async_req *areq)
> +{
> + struct sock *sk = areq->sk;
> +
> + af_alg_free_areq_sgls(areq);
> + sock_kfree_s(sk, areq, areq->areqlen);
> +}
> +EXPORT_SYMBOL_GPL(af_alg_free_resources);
> +
> +/**
> * af_alg_async_cb - AIO callback handler
> *
> * This handler cleans up the struct af_alg_async_req upon completion of the
> @@ -1036,18 +1048,13 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err)
> struct kiocb *iocb = areq->iocb;
> unsigned int resultlen;
>
> - lock_sock(sk);
> -
> /* Buffer size written by crypto operation. */
> resultlen = areq->outlen;
>
> - af_alg_free_areq_sgls(areq);
> - sock_kfree_s(sk, areq, areq->areqlen);
> - __sock_put(sk);
> + af_alg_free_resources(areq);
> + sock_put(sk);
>
> iocb->ki_complete(iocb, err ? err : resultlen, 0);
> -
> - release_sock(sk);
> }
> EXPORT_SYMBOL_GPL(af_alg_async_cb);
>
> diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
> index aacae0837aff..db9378a45296 100644
> --- a/crypto/algif_aead.c
> +++ b/crypto/algif_aead.c
> @@ -268,12 +268,23 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
>
> if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
> /* AIO operation */
> + sock_hold(sk);
> areq->iocb = msg->msg_iocb;
> aead_request_set_callback(&areq->cra_u.aead_req,
> CRYPTO_TFM_REQ_MAY_BACKLOG,
> af_alg_async_cb, areq);
> err = ctx->enc ? crypto_aead_encrypt(&areq->cra_u.aead_req) :
> crypto_aead_decrypt(&areq->cra_u.aead_req);
> +
> + /* AIO operation in progress */
> + if (err == -EINPROGRESS || err == -EBUSY) {
> + /* Remember output size that will be generated. */
> + areq->outlen = outlen;
> +
> + return -EIOCBQUEUED;
> + }
> +
> + sock_put(sk);
> } else {
> /* Synchronous operation */
> aead_request_set_callback(&areq->cra_u.aead_req,
> @@ -285,19 +296,9 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
> &ctx->wait);
> }
>
> - /* AIO operation in progress */
> - if (err == -EINPROGRESS) {
> - sock_hold(sk);
> -
> - /* Remember output size that will be generated. */
> - areq->outlen = outlen;
> -
> - return -EIOCBQUEUED;
> - }
>
> free:
> - af_alg_free_areq_sgls(areq);
> - sock_kfree_s(sk, areq, areq->areqlen);
> + af_alg_free_resources(areq);
>
> return err ? err : outlen;
> }
> diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
> index 9954b078f0b9..30cff827dd8f 100644
> --- a/crypto/algif_skcipher.c
> +++ b/crypto/algif_skcipher.c
> @@ -117,6 +117,7 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
>
> if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
> /* AIO operation */
> + sock_hold(sk);
> areq->iocb = msg->msg_iocb;
> skcipher_request_set_callback(&areq->cra_u.skcipher_req,
> CRYPTO_TFM_REQ_MAY_SLEEP,
> @@ -124,6 +125,16 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
> err = ctx->enc ?
> crypto_skcipher_encrypt(&areq->cra_u.skcipher_req) :
> crypto_skcipher_decrypt(&areq->cra_u.skcipher_req);
> +
> + /* AIO operation in progress */
> + if (err == -EINPROGRESS || err == -EBUSY) {
> + /* Remember output size that will be generated. */
> + areq->outlen = len;
> +
> + return -EIOCBQUEUED;
> + }
> +
> + sock_put(sk);
> } else {
> /* Synchronous operation */
> skcipher_request_set_callback(&areq->cra_u.skcipher_req,
> @@ -136,19 +147,9 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
> &ctx->wait);
> }
>
> - /* AIO operation in progress */
> - if (err == -EINPROGRESS) {
> - sock_hold(sk);
> -
> - /* Remember output size that will be generated. */
> - areq->outlen = len;
> -
> - return -EIOCBQUEUED;
> - }
>
> free:
> - af_alg_free_areq_sgls(areq);
> - sock_kfree_s(sk, areq, areq->areqlen);
> + af_alg_free_resources(areq);
>
> return err ? err : len;
> }
> diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
> index 6abf0a3604dc..38d9c5861ed8 100644
> --- a/include/crypto/if_alg.h
> +++ b/include/crypto/if_alg.h
> @@ -242,6 +242,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
> unsigned int ivsize);
> ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
> int offset, size_t size, int flags);
> +void af_alg_free_resources(struct af_alg_async_req *areq);
> void af_alg_async_cb(struct crypto_async_request *_req, int err);
> unsigned int af_alg_poll(struct file *file, struct socket *sock,
> poll_table *wait);
> --
> 2.13.6
>
>

2017-11-24 07:37:39

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3] crypto: AF_ALG - remove locking in async callback

On Fri, Nov 10, 2017 at 01:20:55PM +0100, Stephan M?ller wrote:
> The code paths protected by the socket-lock do not use or modify the
> socket in a non-atomic fashion. The actions pertaining the socket do not
> even need to be handled as an atomic operation. Thus, the socket-lock
> can be safely ignored.
>
> This fixes a bug regarding scheduling in atomic as the callback function
> may be invoked in interrupt context.
>
> In addition, the sock_hold is moved before the AIO encrypt/decrypt
> operation to ensure that the socket is always present. This avoids a
> tiny race window where the socket is unprotected and yet used by the AIO
> operation.
>
> Finally, the release of resources for a crypto operation is moved into a
> common function of af_alg_free_resources.
>
> Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory management")
> Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory management")
> Reported-by: Romain Izard <[email protected]>
> Signed-off-by: Stephan Mueller <[email protected]>

Patch applied. Thanks.
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2017-11-24 16:04:22

by Stephan Müller

[permalink] [raw]
Subject: Re: [PATCH v3] crypto: AF_ALG - remove locking in async callback

Am Freitag, 24. November 2017, 08:37:39 CET schrieb Herbert Xu:

Hi Herbert,

> On Fri, Nov 10, 2017 at 01:20:55PM +0100, Stephan M?ller wrote:
> > The code paths protected by the socket-lock do not use or modify the
> > socket in a non-atomic fashion. The actions pertaining the socket do not
> > even need to be handled as an atomic operation. Thus, the socket-lock
> > can be safely ignored.
> >
> > This fixes a bug regarding scheduling in atomic as the callback function
> > may be invoked in interrupt context.
> >
> > In addition, the sock_hold is moved before the AIO encrypt/decrypt
> > operation to ensure that the socket is always present. This avoids a
> > tiny race window where the socket is unprotected and yet used by the AIO
> > operation.
> >
> > Finally, the release of resources for a crypto operation is moved into a
> > common function of af_alg_free_resources.
> >
> > Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory
> > management") Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory
> > management") Reported-by: Romain Izard <[email protected]>
> > Signed-off-by: Stephan Mueller <[email protected]>
>
> Patch applied. Thanks.

Thanks a lot.

Would it make sense to feed it to stable?

Ciao
Stephan

2017-11-24 17:37:17

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v3] crypto: AF_ALG - remove locking in async callback

On Fri, 24 Nov 2017 17:04:19 +0100
Stephan Mueller <[email protected]> wrote:

> Am Freitag, 24. November 2017, 08:37:39 CET schrieb Herbert Xu:
>
> Hi Herbert,
>
> > On Fri, Nov 10, 2017 at 01:20:55PM +0100, Stephan M?ller wrote:
> > > The code paths protected by the socket-lock do not use or modify the
> > > socket in a non-atomic fashion. The actions pertaining the socket do not
> > > even need to be handled as an atomic operation. Thus, the socket-lock
> > > can be safely ignored.
> > >
> > > This fixes a bug regarding scheduling in atomic as the callback function
> > > may be invoked in interrupt context.
> > >
> > > In addition, the sock_hold is moved before the AIO encrypt/decrypt
> > > operation to ensure that the socket is always present. This avoids a
> > > tiny race window where the socket is unprotected and yet used by the AIO
> > > operation.
> > >
> > > Finally, the release of resources for a crypto operation is moved into a
> > > common function of af_alg_free_resources.
> > >
> > > Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory
> > > management") Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory
> > > management") Reported-by: Romain Izard <[email protected]>
> > > Signed-off-by: Stephan Mueller <[email protected]>
> >
> > Patch applied. Thanks.
>
> Thanks a lot.
>
> Would it make sense to feed it to stable?
>
> Ciao
> Stephan
My view would be definitely. Ran into this precise issue whilst testing
a new driver 4.14 today...

Jonathan

2017-11-25 00:18:08

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3] crypto: AF_ALG - remove locking in async callback

On Fri, Nov 24, 2017 at 05:04:19PM +0100, Stephan Mueller wrote:
>
> Would it make sense to feed it to stable?

I already added stable Cc's so it should go in automatically once
it's pushed to Linus.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt