2023-09-22 21:40:42

by Mikulas Patocka

[permalink] [raw]
Subject: [PATCH v2] qat: fix deadlock in backlog processing

Hi


On Fri, 22 Sep 2023, Giovanni Cabiddu wrote:

> Hi Mikulas,
>
> many thanks for reporting this issue and finding a solution.
>
> On Thu, Sep 21, 2023 at 10:53:55PM +0200, Mikulas Patocka wrote:
> > I was evaluating whether it is feasible to use QAT with dm-crypt (the
> > answer is that it is not - QAT is slower than AES-NI for this type of
> > workload; QAT starts to be benefical for encryption requests longer than
> > 64k).
> Correct. Is there anything that we can do to batch requests in a single
> call?

Ask Herbert Xu. I think it would complicate the design of crypto API.

> Sometime ago there was some work done to build a geniv template cipher
> and optimize dm-crypt to encrypt larger block sizes in a single call,
> see [1][2]. Don't know if that work was completed.
>
> >And I got some deadlocks.
> Ouch!
>
> > The reason for the deadlocks is this: suppose that one of the "if"
> > conditions in "qat_alg_send_message_maybacklog" is true and we jump to the
> > "enqueue" label. At this point, an interrupt comes in and clears all
> > pending messages. Now, the interrupt returns, we grab backlog->lock, add
> > the message to the backlog, drop backlog->lock - and there is no one to
> > remove the backlogged message out of the list and submit it.
> Makes sense. In my testing I wasn't able to reproduce this condition.

I reproduced it with this:
Use a system with two Intel(R) Xeon(R) Gold 5420+ processors
Use a kernel 6.6-rc2
Patch the kernel, so that dm-crypt uses QAT - that is, in
drivers/md/dm-crypt.c, replace all strings
"CRYPTO_ALG_ALLOCATES_MEMORY" with "0"
Use .config from RHEL-9.4 beta and compile the kernel
On the system, disable hyperthreading with
"echo off >/sys/devices/system/cpu/smt/control"
Activate dm-crypt on the top of nvme:
"cryptsetup create cr /dev/nvme3n1 --sector-size=4096"
Run fio in a loop:
"while true; do
fio --ioengine=psync --iodepth=1 --rw=randwrite --direct=1
--end_fsync=1 --bs=64k --numjobs=56 --time_based
--runtime=10 --group_reporting --name=job
--filename=/dev/mapper/cr
done"

With this setup, I get a deadlock in a few iterations of fio.

> > I fixed it with this patch - with this patch, the test passes and there
> > are no longer any deadlocks. I didn't want to add a spinlock to the hot
> > path, so I take it only if some of the condition suggests that queuing may
> > be required.
> >
> > Signed-off-by: Mikulas Patocka <[email protected]>
> > Cc: [email protected]
> The commit message requires a bit of rework to describe the change.

I improved the message and I send a second version of the patch.

> Also, deserves a fixes tag.

"Fixes" tag is for something that worked and that was broken in some
previous commit. A quick search through git shows that QAT backlogging was
broken since the introduction of QAT.

> >
> > ---
> > drivers/crypto/intel/qat/qat_common/qat_algs_send.c | 31 ++++++++++++--------
> > 1 file changed, 20 insertions(+), 11 deletions(-)
> >
> > Index: linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > +++ linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > @@ -40,16 +40,6 @@ void qat_alg_send_backlog(struct qat_ins
> > spin_unlock_bh(&backlog->lock);
> > }
> >
> > -static void qat_alg_backlog_req(struct qat_alg_req *req,
> > - struct qat_instance_backlog *backlog)
> > -{
> > - INIT_LIST_HEAD(&req->list);
> Is the initialization of an element no longer needed?

It was never needed. list_add_tail calls __list_add and __list_add
overwrites new->next and new->prev without reading them. So, there's no
need to initialize them.

> > -
> > - spin_lock_bh(&backlog->lock);
> > - list_add_tail(&req->list, &backlog->list);
> > - spin_unlock_bh(&backlog->lock);
> > -}
> > -
> > static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
> > {
> > struct qat_instance_backlog *backlog = req->backlog;
> > @@ -71,8 +61,27 @@ static int qat_alg_send_message_maybackl
> > return -EINPROGRESS;
> >
> > enqueue:
> > - qat_alg_backlog_req(req, backlog);
> > + spin_lock_bh(&backlog->lock);
> > +
> > + /* If any request is already backlogged, then add to backlog list */
> > + if (!list_empty(&backlog->list))
> > + goto enqueue2;
> >
> > + /* If ring is nearly full, then add to backlog list */
> > + if (adf_ring_nearly_full(tx_ring))
> > + goto enqueue2;
> > +
> > + /* If adding request to HW ring fails, then add to backlog list */
> > + if (adf_send_message(tx_ring, fw_req))
> > + goto enqueue2;
> In a nutshell, you are re-doing the same steps taking the backlog lock.
>
> It should be possible to re-write it so that there is a function that
> attempts enqueuing and if it fails, then the same is called again taking
> the lock.
> If you want I can rework it and resubmit.

Yes, if you prefer it this way, I reworked the patch so that we execute
the same code with or without the spinlock held.

> > +
> > + spin_unlock_bh(&backlog->lock);
> > + return -EINPROGRESS;
> > +
> > +enqueue2:
> > + list_add_tail(&req->list, &backlog->list);
> > +
> > + spin_unlock_bh(&backlog->lock);
> > return -EBUSY;
> > }
>
> [1] https://www.mail-archive.com/[email protected]/msg1276510.html
> [2] https://www.mail-archive.com/[email protected]/msg1428293.html
>
> Regards,
>
> --
> Giovanni
>

From: Mikulas Patocka <[email protected]>
Subject: [PATCH] qat: fix deadlock in backlog processing

I was testing QAT with dm-crypt and I got some deadlocks.

The reason for the deadlocks is this: suppose that one of the "if"
conditions in "qat_alg_send_message_maybacklog" is true and we jump to the
"enqueue" label. At this point, an interrupt comes in and clears all
pending messages. Now, the interrupt returns, we grab backlog->lock, add
the message to the backlog, drop backlog->lock - and there is no one to
remove the backlogged message out of the list and submit it.

In order to fix the bug, we must hold the spinlock backlog->lock when we
perform test for free space in the ring - so that the test for free space
and adding the request to a backlog is atomic and can't be interrupted by
an interrupt. Every completion interrupt calls qat_alg_send_backlog which
grabs backlog->lock, so holding this spinlock is sufficient to synchronize
with interrupts.

I didn't want to add a spinlock unconditionally to the hot path for
performance reasons, so I take it only if some of the condition suggests
that queuing may be required.

Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]

---
drivers/crypto/intel/qat/qat_common/qat_algs_send.c | 23 ++++++++++----------
1 file changed, 12 insertions(+), 11 deletions(-)

Index: linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
===================================================================
--- linux-2.6.orig/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
+++ linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
@@ -40,22 +40,14 @@ void qat_alg_send_backlog(struct qat_ins
spin_unlock_bh(&backlog->lock);
}

-static void qat_alg_backlog_req(struct qat_alg_req *req,
- struct qat_instance_backlog *backlog)
-{
- INIT_LIST_HEAD(&req->list);
-
- spin_lock_bh(&backlog->lock);
- list_add_tail(&req->list, &backlog->list);
- spin_unlock_bh(&backlog->lock);
-}
-
static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
{
struct qat_instance_backlog *backlog = req->backlog;
struct adf_etr_ring_data *tx_ring = req->tx_ring;
u32 *fw_req = req->fw_req;
+ bool locked = false;

+repeat:
/* If any request is already backlogged, then add to backlog list */
if (!list_empty(&backlog->list))
goto enqueue;
@@ -68,11 +60,20 @@ static int qat_alg_send_message_maybackl
if (adf_send_message(tx_ring, fw_req))
goto enqueue;

+ if (unlikely(locked))
+ spin_unlock_bh(&backlog->lock);
return -EINPROGRESS;

enqueue:
- qat_alg_backlog_req(req, backlog);
+ if (!locked) {
+ spin_lock_bh(&backlog->lock);
+ locked = true;
+ goto repeat;
+ }
+
+ list_add_tail(&req->list, &backlog->list);

+ spin_unlock_bh(&backlog->lock);
return -EBUSY;
}



2023-09-29 21:07:25

by Cabiddu, Giovanni

[permalink] [raw]
Subject: Re: [PATCH v2] qat: fix deadlock in backlog processing

On Fri, Sep 22, 2023 at 06:34:16PM +0200, Mikulas Patocka wrote:
> On Fri, 22 Sep 2023, Giovanni Cabiddu wrote:
> > Hi Mikulas,
> >
> > many thanks for reporting this issue and finding a solution.
> >
> > On Thu, Sep 21, 2023 at 10:53:55PM +0200, Mikulas Patocka wrote:
> > > I was evaluating whether it is feasible to use QAT with dm-crypt (the
> > > answer is that it is not - QAT is slower than AES-NI for this type of
> > > workload; QAT starts to be benefical for encryption requests longer than
> > > 64k).
> > Correct. Is there anything that we can do to batch requests in a single
> > call?
>
> Ask Herbert Xu. I think it would complicate the design of crypto API.
>
> > Sometime ago there was some work done to build a geniv template cipher
> > and optimize dm-crypt to encrypt larger block sizes in a single call,
> > see [1][2]. Don't know if that work was completed.
> >
> > >And I got some deadlocks.
> > Ouch!
> >
> > > The reason for the deadlocks is this: suppose that one of the "if"
> > > conditions in "qat_alg_send_message_maybacklog" is true and we jump to the
> > > "enqueue" label. At this point, an interrupt comes in and clears all
> > > pending messages. Now, the interrupt returns, we grab backlog->lock, add
> > > the message to the backlog, drop backlog->lock - and there is no one to
> > > remove the backlogged message out of the list and submit it.
> > Makes sense. In my testing I wasn't able to reproduce this condition.
>
> I reproduced it with this:
> Use a system with two Intel(R) Xeon(R) Gold 5420+ processors
> Use a kernel 6.6-rc2
> Patch the kernel, so that dm-crypt uses QAT - that is, in
> drivers/md/dm-crypt.c, replace all strings
> "CRYPTO_ALG_ALLOCATES_MEMORY" with "0"
> Use .config from RHEL-9.4 beta and compile the kernel
> On the system, disable hyperthreading with
> "echo off >/sys/devices/system/cpu/smt/control"
> Activate dm-crypt on the top of nvme:
> "cryptsetup create cr /dev/nvme3n1 --sector-size=4096"
> Run fio in a loop:
> "while true; do
> fio --ioengine=psync --iodepth=1 --rw=randwrite --direct=1
> --end_fsync=1 --bs=64k --numjobs=56 --time_based
> --runtime=10 --group_reporting --name=job
> --filename=/dev/mapper/cr
> done"
>
> With this setup, I get a deadlock in a few iterations of fio.
>
> > > I fixed it with this patch - with this patch, the test passes and there
> > > are no longer any deadlocks. I didn't want to add a spinlock to the hot
> > > path, so I take it only if some of the condition suggests that queuing may
> > > be required.
> > >
> > > Signed-off-by: Mikulas Patocka <[email protected]>
> > > Cc: [email protected]
> > The commit message requires a bit of rework to describe the change.
>
> I improved the message and I send a second version of the patch.
>
> > Also, deserves a fixes tag.
>
> "Fixes" tag is for something that worked and that was broken in some
> previous commit.
That's right.

> A quick search through git shows that QAT backlogging was
> broken since the introduction of QAT.
The driver was moved from drivers/crypto/qat to drivers/crypto/intel/qat
that's why you see a single patch.
This fixes 386823839732 ("crypto: qat - add backlog mechanism")

>
> > >
> > > ---
> > > drivers/crypto/intel/qat/qat_common/qat_algs_send.c | 31 ++++++++++++--------
> > > 1 file changed, 20 insertions(+), 11 deletions(-)
> > >
> > > Index: linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > > ===================================================================
> > > --- linux-2.6.orig/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > > +++ linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > > @@ -40,16 +40,6 @@ void qat_alg_send_backlog(struct qat_ins
> > > spin_unlock_bh(&backlog->lock);
> > > }
> > >
> > > -static void qat_alg_backlog_req(struct qat_alg_req *req,
> > > - struct qat_instance_backlog *backlog)
> > > -{
> > > - INIT_LIST_HEAD(&req->list);
> > Is the initialization of an element no longer needed?
>
> It was never needed. list_add_tail calls __list_add and __list_add
> overwrites new->next and new->prev without reading them. So, there's no
> need to initialize them.
>
> > > -
> > > - spin_lock_bh(&backlog->lock);
> > > - list_add_tail(&req->list, &backlog->list);
> > > - spin_unlock_bh(&backlog->lock);
> > > -}
> > > -
> > > static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
> > > {
> > > struct qat_instance_backlog *backlog = req->backlog;
> > > @@ -71,8 +61,27 @@ static int qat_alg_send_message_maybackl
> > > return -EINPROGRESS;
> > >
> > > enqueue:
> > > - qat_alg_backlog_req(req, backlog);
> > > + spin_lock_bh(&backlog->lock);
> > > +
> > > + /* If any request is already backlogged, then add to backlog list */
> > > + if (!list_empty(&backlog->list))
> > > + goto enqueue2;
> > >
> > > + /* If ring is nearly full, then add to backlog list */
> > > + if (adf_ring_nearly_full(tx_ring))
> > > + goto enqueue2;
> > > +
> > > + /* If adding request to HW ring fails, then add to backlog list */
> > > + if (adf_send_message(tx_ring, fw_req))
> > > + goto enqueue2;
> > In a nutshell, you are re-doing the same steps taking the backlog lock.
> >
> > It should be possible to re-write it so that there is a function that
> > attempts enqueuing and if it fails, then the same is called again taking
> > the lock.
> > If you want I can rework it and resubmit.
>
> Yes, if you prefer it this way, I reworked the patch so that we execute
> the same code with or without the spinlock held.
>
> > > +
> > > + spin_unlock_bh(&backlog->lock);
> > > + return -EINPROGRESS;
> > > +
> > > +enqueue2:
> > > + list_add_tail(&req->list, &backlog->list);
> > > +
> > > + spin_unlock_bh(&backlog->lock);
> > > return -EBUSY;
> > > }
> >
> > [1] https://www.mail-archive.com/[email protected]/msg1276510.html
> > [2] https://www.mail-archive.com/[email protected]/msg1428293.html
> >
> > Regards,
> >
> > --
> > Giovanni
> >
>
> From: Mikulas Patocka <[email protected]>
> Subject: [PATCH] qat: fix deadlock in backlog processing
crypto: qat - fix ...
>
> I was testing QAT with dm-crypt and I got some deadlocks.
>
> The reason for the deadlocks is this: suppose that one of the "if"
> conditions in "qat_alg_send_message_maybacklog" is true and we jump to the
> "enqueue" label. At this point, an interrupt comes in and clears all
> pending messages. Now, the interrupt returns, we grab backlog->lock, add
> the message to the backlog, drop backlog->lock - and there is no one to
> remove the backlogged message out of the list and submit it.
>
> In order to fix the bug, we must hold the spinlock backlog->lock when we
> perform test for free space in the ring - so that the test for free space
> and adding the request to a backlog is atomic and can't be interrupted by
> an interrupt. Every completion interrupt calls qat_alg_send_backlog which
> grabs backlog->lock, so holding this spinlock is sufficient to synchronize
> with interrupts.
>
> I didn't want to add a spinlock unconditionally to the hot path for
> performance reasons, so I take it only if some of the condition suggests
> that queuing may be required.
>
> Signed-off-by: Mikulas Patocka <[email protected]>
> Cc: [email protected]
>
> ---
> drivers/crypto/intel/qat/qat_common/qat_algs_send.c | 23 ++++++++++----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
>
> Index: linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> ===================================================================
> --- linux-2.6.orig/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> +++ linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> @@ -40,22 +40,14 @@ void qat_alg_send_backlog(struct qat_ins
> spin_unlock_bh(&backlog->lock);
> }
>
> -static void qat_alg_backlog_req(struct qat_alg_req *req,
> - struct qat_instance_backlog *backlog)
> -{
> - INIT_LIST_HEAD(&req->list);
> -
> - spin_lock_bh(&backlog->lock);
> - list_add_tail(&req->list, &backlog->list);
> - spin_unlock_bh(&backlog->lock);
> -}
> -
> static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
> {
> struct qat_instance_backlog *backlog = req->backlog;
> struct adf_etr_ring_data *tx_ring = req->tx_ring;
> u32 *fw_req = req->fw_req;
> + bool locked = false;
>
> +repeat:
> /* If any request is already backlogged, then add to backlog list */
> if (!list_empty(&backlog->list))
> goto enqueue;
> @@ -68,11 +60,20 @@ static int qat_alg_send_message_maybackl
> if (adf_send_message(tx_ring, fw_req))
> goto enqueue;
>
> + if (unlikely(locked))
> + spin_unlock_bh(&backlog->lock);
> return -EINPROGRESS;
>
> enqueue:
> - qat_alg_backlog_req(req, backlog);
> + if (!locked) {
> + spin_lock_bh(&backlog->lock);
> + locked = true;
> + goto repeat;
> + }
> +
> + list_add_tail(&req->list, &backlog->list);
>
> + spin_unlock_bh(&backlog->lock);
> return -EBUSY;
> }
>
>
Thanks - when I proposed the rework I was thinking at a solution without
gotos. Here is a draft:
------------8<----------------
.../intel/qat/qat_common/qat_algs_send.c | 40 ++++++++++---------
1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/crypto/intel/qat/qat_common/qat_algs_send.c b/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
index bb80455b3e81..18c6a233ab96 100644
--- a/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
+++ b/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
@@ -40,17 +40,7 @@ void qat_alg_send_backlog(struct qat_instance_backlog *backlog)
spin_unlock_bh(&backlog->lock);
}

-static void qat_alg_backlog_req(struct qat_alg_req *req,
- struct qat_instance_backlog *backlog)
-{
- INIT_LIST_HEAD(&req->list);
-
- spin_lock_bh(&backlog->lock);
- list_add_tail(&req->list, &backlog->list);
- spin_unlock_bh(&backlog->lock);
-}
-
-static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
+static bool qat_alg_try_enqueue(struct qat_alg_req *req)
{
struct qat_instance_backlog *backlog = req->backlog;
struct adf_etr_ring_data *tx_ring = req->tx_ring;
@@ -58,22 +48,36 @@ static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)

/* If any request is already backlogged, then add to backlog list */
if (!list_empty(&backlog->list))
- goto enqueue;
+ return false;

/* If ring is nearly full, then add to backlog list */
if (adf_ring_nearly_full(tx_ring))
- goto enqueue;
+ return false;

/* If adding request to HW ring fails, then add to backlog list */
if (adf_send_message(tx_ring, fw_req))
- goto enqueue;
+ return false;

- return -EINPROGRESS;
+ return true;
+}

-enqueue:
- qat_alg_backlog_req(req, backlog);

- return -EBUSY;
+static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
+{
+ struct qat_instance_backlog *backlog = req->backlog;
+ int ret = -EINPROGRESS;
+
+ if (qat_alg_try_enqueue(req))
+ return ret;
+
+ spin_lock_bh(&backlog->lock);
+ if (!qat_alg_try_enqueue(req)) {
+ list_add_tail(&req->list, &backlog->list);
+ ret = -EBUSY;
+ }
+ spin_unlock_bh(&backlog->lock);
+
+ return ret;
}

int qat_alg_send_message(struct qat_alg_req *req)
--
2.41.0


2023-10-02 09:26:35

by Mikulas Patocka

[permalink] [raw]
Subject: Re: [PATCH v2] qat: fix deadlock in backlog processing



On Fri, 29 Sep 2023, Giovanni Cabiddu wrote:

> On Fri, Sep 22, 2023 at 06:34:16PM +0200, Mikulas Patocka wrote:
> >
> > > Also, deserves a fixes tag.
> >
> > "Fixes" tag is for something that worked and that was broken in some
> > previous commit.
> That's right.
>
> > A quick search through git shows that QAT backlogging was
> > broken since the introduction of QAT.
> The driver was moved from drivers/crypto/qat to drivers/crypto/intel/qat
> that's why you see a single patch.
> This fixes 386823839732 ("crypto: qat - add backlog mechanism")

But before 386823839732 it also didn't work - it returned -EBUSY without
queuing the request and deadlocked.

> Thanks - when I proposed the rework I was thinking at a solution without
> gotos. Here is a draft:

Yes - it is possible to fix it this way.



> ------------8<----------------
> .../intel/qat/qat_common/qat_algs_send.c | 40 ++++++++++---------
> 1 file changed, 22 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/crypto/intel/qat/qat_common/qat_algs_send.c b/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> index bb80455b3e81..18c6a233ab96 100644
> --- a/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> +++ b/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> @@ -40,17 +40,7 @@ void qat_alg_send_backlog(struct qat_instance_backlog *backlog)
> spin_unlock_bh(&backlog->lock);
> }
>
> -static void qat_alg_backlog_req(struct qat_alg_req *req,
> - struct qat_instance_backlog *backlog)
> -{
> - INIT_LIST_HEAD(&req->list);
> -
> - spin_lock_bh(&backlog->lock);
> - list_add_tail(&req->list, &backlog->list);
> - spin_unlock_bh(&backlog->lock);
> -}
> -
> -static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
> +static bool qat_alg_try_enqueue(struct qat_alg_req *req)
> {
> struct qat_instance_backlog *backlog = req->backlog;
> struct adf_etr_ring_data *tx_ring = req->tx_ring;
> @@ -58,22 +48,36 @@ static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
>
> /* If any request is already backlogged, then add to backlog list */
> if (!list_empty(&backlog->list))
> - goto enqueue;
> + return false;
>
> /* If ring is nearly full, then add to backlog list */
> if (adf_ring_nearly_full(tx_ring))
> - goto enqueue;
> + return false;
>
> /* If adding request to HW ring fails, then add to backlog list */
> if (adf_send_message(tx_ring, fw_req))
> - goto enqueue;
> + return false;
>
> - return -EINPROGRESS;
> + return true;
> +}
>
> -enqueue:
> - qat_alg_backlog_req(req, backlog);
>
> - return -EBUSY;
> +static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
> +{
> + struct qat_instance_backlog *backlog = req->backlog;
> + int ret = -EINPROGRESS;
> +
> + if (qat_alg_try_enqueue(req))
> + return ret;
> +
> + spin_lock_bh(&backlog->lock);
> + if (!qat_alg_try_enqueue(req)) {
> + list_add_tail(&req->list, &backlog->list);
> + ret = -EBUSY;
> + }
> + spin_unlock_bh(&backlog->lock);
> +
> + return ret;
> }
>
> int qat_alg_send_message(struct qat_alg_req *req)
> --
> 2.41.0


Reviwed-by: Mikulas Patocka <[email protected]>

Mikulas


2023-10-20 11:36:18

by Cabiddu, Giovanni

[permalink] [raw]
Subject: Re: [PATCH v2] qat: fix deadlock in backlog processing

On Mon, Oct 02, 2023 at 11:15:05AM +0200, Mikulas Patocka wrote:
>
>
> On Fri, 29 Sep 2023, Giovanni Cabiddu wrote:
>
> > On Fri, Sep 22, 2023 at 06:34:16PM +0200, Mikulas Patocka wrote:
> > >
> > > > Also, deserves a fixes tag.
> > >
> > > "Fixes" tag is for something that worked and that was broken in some
> > > previous commit.
> > That's right.
> >
> > > A quick search through git shows that QAT backlogging was
> > > broken since the introduction of QAT.
> > The driver was moved from drivers/crypto/qat to drivers/crypto/intel/qat
> > that's why you see a single patch.
> > This fixes 386823839732 ("crypto: qat - add backlog mechanism")
>
> But before 386823839732 it also didn't work - it returned -EBUSY without
> queuing the request and deadlocked.
>
> > Thanks - when I proposed the rework I was thinking at a solution without
> > gotos. Here is a draft:
>
> Yes - it is possible to fix it this way.
>
>
>
> > ------------8<----------------
> > .../intel/qat/qat_common/qat_algs_send.c | 40 ++++++++++---------
> > 1 file changed, 22 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/crypto/intel/qat/qat_common/qat_algs_send.c b/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > index bb80455b3e81..18c6a233ab96 100644
> > --- a/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > +++ b/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > @@ -40,17 +40,7 @@ void qat_alg_send_backlog(struct qat_instance_backlog *backlog)
> > spin_unlock_bh(&backlog->lock);
> > }
> >
> > -static void qat_alg_backlog_req(struct qat_alg_req *req,
> > - struct qat_instance_backlog *backlog)
> > -{
> > - INIT_LIST_HEAD(&req->list);
> > -
> > - spin_lock_bh(&backlog->lock);
> > - list_add_tail(&req->list, &backlog->list);
> > - spin_unlock_bh(&backlog->lock);
> > -}
> > -
> > -static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
> > +static bool qat_alg_try_enqueue(struct qat_alg_req *req)
> > {
> > struct qat_instance_backlog *backlog = req->backlog;
> > struct adf_etr_ring_data *tx_ring = req->tx_ring;
> > @@ -58,22 +48,36 @@ static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
> >
> > /* If any request is already backlogged, then add to backlog list */
> > if (!list_empty(&backlog->list))
> > - goto enqueue;
> > + return false;
> >
> > /* If ring is nearly full, then add to backlog list */
> > if (adf_ring_nearly_full(tx_ring))
> > - goto enqueue;
> > + return false;
> >
> > /* If adding request to HW ring fails, then add to backlog list */
> > if (adf_send_message(tx_ring, fw_req))
> > - goto enqueue;
> > + return false;
> >
> > - return -EINPROGRESS;
> > + return true;
> > +}
> >
> > -enqueue:
> > - qat_alg_backlog_req(req, backlog);
> >
> > - return -EBUSY;
> > +static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
> > +{
> > + struct qat_instance_backlog *backlog = req->backlog;
> > + int ret = -EINPROGRESS;
> > +
> > + if (qat_alg_try_enqueue(req))
> > + return ret;
> > +
> > + spin_lock_bh(&backlog->lock);
> > + if (!qat_alg_try_enqueue(req)) {
> > + list_add_tail(&req->list, &backlog->list);
> > + ret = -EBUSY;
> > + }
> > + spin_unlock_bh(&backlog->lock);
> > +
> > + return ret;
> > }
> >
> > int qat_alg_send_message(struct qat_alg_req *req)
> > --
> > 2.41.0
>
>
> Reviwed-by: Mikulas Patocka <[email protected]>
Thanks. I'm going to resend it with a slight change on the comments in
the function qat_alg_try_enqueue().

Regards,

--
Giovanni