2023-05-17 19:50:44

by Jeffrey Hugo

[permalink] [raw]
Subject: [PATCH 0/5] accel/qaic fixes for 6.4

During development of new features, we noticed some spots in the code that
could be improved based on review feedback from the initial driver series.

Also two race condition fixes, one found during stress testing and another
via code inspection.

Jeffrey Hugo (1):
accel/qaic: Fix NNC message corruption

Pranjal Ramajor Asha Kanojiya (4):
accel/qaic: Validate user data before grabbing any lock
accel/qaic: Validate if BO is sliced before slicing
accel/qaic: Flush the transfer list again
accel/qaic: Grab ch_lock during QAIC_ATTACH_SLICE_BO

drivers/accel/qaic/qaic_control.c | 41 ++++++++------
drivers/accel/qaic/qaic_data.c | 91 +++++++++++++++----------------
2 files changed, 70 insertions(+), 62 deletions(-)

--
2.40.1



2023-05-17 19:53:32

by Jeffrey Hugo

[permalink] [raw]
Subject: [PATCH 3/5] accel/qaic: Flush the transfer list again

From: Pranjal Ramajor Asha Kanojiya <[email protected]>

Before calling synchronize_srcu() we clear the transfer list, this is to
allow all the QAIC_WAIT_BO callers to exit otherwise the system could
deadlock. There could be a corner case where more elements get added to
transfer list after we have flushed it. Re-flush the transfer list once
all the holders of dbc->ch_lock have completed execution i.e.
synchronize_srcu() is complete.

Fixes: ff13be830333 ("accel/qaic: Add datapath")
Signed-off-by: Pranjal Ramajor Asha Kanojiya <[email protected]>
Reviewed-by: Carl Vanderlip <[email protected]>
Reviewed-by: Jeffrey Hugo <[email protected]>
Signed-off-by: Jeffrey Hugo <[email protected]>
---
drivers/accel/qaic/qaic_data.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/accel/qaic/qaic_data.c b/drivers/accel/qaic/qaic_data.c
index 1285c3dc9aef..8603e99a2a61 100644
--- a/drivers/accel/qaic/qaic_data.c
+++ b/drivers/accel/qaic/qaic_data.c
@@ -1845,6 +1845,11 @@ void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id)
dbc->usr = NULL;
empty_xfer_list(qdev, dbc);
synchronize_srcu(&dbc->ch_lock);
+ /*
+ * Threads holding channel lock, may add more elements in the xfer_list.
+ * Flush out these elements from xfer_list.
+ */
+ empty_xfer_list(qdev, dbc);
}

void release_dbc(struct qaic_device *qdev, u32 dbc_id)
--
2.40.1


2023-05-17 19:54:15

by Jeffrey Hugo

[permalink] [raw]
Subject: [PATCH 2/5] accel/qaic: Validate if BO is sliced before slicing

From: Pranjal Ramajor Asha Kanojiya <[email protected]>

QAIC_ATTACH_SLICE_BO attaches slicing configuration to a BO. Validate if
given BO is already sliced. An already sliced BO cannot be sliced again.

Fixes: ff13be830333 ("accel/qaic: Add datapath")
Signed-off-by: Pranjal Ramajor Asha Kanojiya <[email protected]>
Reviewed-by: Carl Vanderlip <[email protected]>
Reviewed-by: Jeffrey Hugo <[email protected]>
Signed-off-by: Jeffrey Hugo <[email protected]>
---
drivers/accel/qaic/qaic_data.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/accel/qaic/qaic_data.c b/drivers/accel/qaic/qaic_data.c
index 7a4397e3122b..1285c3dc9aef 100644
--- a/drivers/accel/qaic/qaic_data.c
+++ b/drivers/accel/qaic/qaic_data.c
@@ -1001,6 +1001,11 @@ int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_fi

bo = to_qaic_bo(obj);

+ if (bo->sliced) {
+ ret = -EINVAL;
+ goto put_bo;
+ }
+
ret = qaic_prepare_bo(qdev, bo, &args->hdr);
if (ret)
goto put_bo;
--
2.40.1


2023-05-17 19:56:04

by Jeffrey Hugo

[permalink] [raw]
Subject: [PATCH 5/5] accel/qaic: Fix NNC message corruption

If msg_xfer() is unable to queue part of a NNC message because the MHI ring
is full, it will attempt to give the QSM some time to drain the queue.
However, if QSM fails to make any room, msg_xfer() will fail and tell the
caller to try again. This is problematic because part of the message may
have been committed to the ring and there is no mechanism to revoke that
content. This will cause QSM to receive a corrupt message.

The better way to do this is to check if the ring has enough space for the
entire message before committing any of the message. Since msg_xfer() is
under the cntl_mutex no one else can come in and consume the space.

Fixes: 129776ac2e38 ("accel/qaic: Add control path")
Signed-off-by: Jeffrey Hugo <[email protected]>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <[email protected]>
Reviewed-by: Carl Vanderlip <[email protected]>
---
drivers/accel/qaic/qaic_control.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/accel/qaic/qaic_control.c b/drivers/accel/qaic/qaic_control.c
index 9e39b1a324f7..5c57f7b4494e 100644
--- a/drivers/accel/qaic/qaic_control.c
+++ b/drivers/accel/qaic/qaic_control.c
@@ -997,14 +997,34 @@ static void *msg_xfer(struct qaic_device *qdev, struct wrapper_list *wrappers, u
struct xfer_queue_elem elem;
struct wire_msg *out_buf;
struct wrapper_msg *w;
+ long ret = -EAGAIN;
+ int xfer_count = 0;
int retry_count;
- long ret;

if (qdev->in_reset) {
mutex_unlock(&qdev->cntl_mutex);
return ERR_PTR(-ENODEV);
}

+ /* Attempt to avoid a partial commit of a message */
+ list_for_each_entry(w, &wrappers->list, list)
+ xfer_count++;
+
+ for (retry_count = 0; retry_count < QAIC_MHI_RETRY_MAX; retry_count++) {
+ if (xfer_count <= mhi_get_free_desc_count(qdev->cntl_ch, DMA_TO_DEVICE)) {
+ ret = 0;
+ break;
+ }
+ msleep_interruptible(QAIC_MHI_RETRY_WAIT_MS);
+ if (signal_pending(current))
+ break;
+ }
+
+ if (ret) {
+ mutex_unlock(&qdev->cntl_mutex);
+ return ERR_PTR(ret);
+ }
+
elem.seq_num = seq_num;
elem.buf = NULL;
init_completion(&elem.xfer_done);
@@ -1038,16 +1058,9 @@ static void *msg_xfer(struct qaic_device *qdev, struct wrapper_list *wrappers, u
list_for_each_entry(w, &wrappers->list, list) {
kref_get(&w->ref_count);
retry_count = 0;
-retry:
ret = mhi_queue_buf(qdev->cntl_ch, DMA_TO_DEVICE, &w->msg, w->len,
list_is_last(&w->list, &wrappers->list) ? MHI_EOT : MHI_CHAIN);
if (ret) {
- if (ret == -EAGAIN && retry_count++ < QAIC_MHI_RETRY_MAX) {
- msleep_interruptible(QAIC_MHI_RETRY_WAIT_MS);
- if (!signal_pending(current))
- goto retry;
- }
-
qdev->cntl_lost_buf = true;
kref_put(&w->ref_count, free_wrapper);
mutex_unlock(&qdev->cntl_mutex);
--
2.40.1


2023-05-23 16:13:29

by Jeffrey Hugo

[permalink] [raw]
Subject: Re: [PATCH 0/5] accel/qaic fixes for 6.4

On 5/17/2023 1:35 PM, Jeffrey Hugo wrote:
> During development of new features, we noticed some spots in the code that
> could be improved based on review feedback from the initial driver series.
>
> Also two race condition fixes, one found during stress testing and another
> via code inspection.
>
> Jeffrey Hugo (1):
> accel/qaic: Fix NNC message corruption
>
> Pranjal Ramajor Asha Kanojiya (4):
> accel/qaic: Validate user data before grabbing any lock
> accel/qaic: Validate if BO is sliced before slicing
> accel/qaic: Flush the transfer list again
> accel/qaic: Grab ch_lock during QAIC_ATTACH_SLICE_BO
>
> drivers/accel/qaic/qaic_control.c | 41 ++++++++------
> drivers/accel/qaic/qaic_data.c | 91 +++++++++++++++----------------
> 2 files changed, 70 insertions(+), 62 deletions(-)
>

Pushed to drm-misc-fixes