2022-05-11 09:29:05

by Li Zhijian

[permalink] [raw]
Subject: [PATCH v2 0/2] RDMA/rxe: Fix no completion event issue

Since RXE always posts RDMA_WRITE successfully, it's observed that
no more completion occurs after a few incorrect posts. Actually, it
will block the polling. we can easily reproduce it by the below pattern.

a. post correct RDMA_WRITE
b. poll completion event
while true {
c. post incorrect RDMA_WRITE(wrong rkey for example)
d. poll completion event <<<< block after 2 incorrect RDMA_WRITE posts
}

Li Zhijian (2):
RDMA/rxe: Update wqe_index for each wqe error completion
RDMA/rxe: Generate error completion for error requester state

drivers/infiniband/sw/rxe/rxe_req.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

--
2.31.1





2022-05-11 09:41:06

by Li Zhijian

[permalink] [raw]
Subject: [PATCH v2 1/2] RDMA/rxe: Update wqe_index for each wqe error completion

Previously, if user space keeps sending abnormal wqe, queue.prod will
keep increasing while queue.index doesn't. Once
queue.index==queue.prod in next round, req_next_wqe() will treat queue
as empty. In such case, no new completion would be generated.

Update wqe_index for each wqe completion so that req_next_wqe() can get
next wqe properly.

Signed-off-by: Li Zhijian <[email protected]>
---
V2: Fix typos in commit logs

drivers/infiniband/sw/rxe/rxe_req.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index a0d5e57f73c1..8bdd0b6b578f 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -773,6 +773,8 @@ int rxe_requester(void *arg)
if (ah)
rxe_put(ah);
err:
+ /* update wqe_index for each wqe completion */
+ qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);
wqe->state = wqe_state_error;
__rxe_do_task(&qp->comp.task);

--
2.31.1




2022-05-11 10:00:23

by Li Zhijian

[permalink] [raw]
Subject: [PATCH v2 2/2] RDMA/rxe: Generate error completion for error requester state

SoftRoCE always returns success when user space is posting a new wqe where
it usually just enqueues a wqe.

Once the requester state becomes QP_STATE_ERROR, we should generate error
completion for all subsequent wqe. So the user is able to poll the
completion event to check if the former wqe is handled correctly.

Here we check QP_STATE_ERROR after req_next_wqe() so that the completion
can associate with its wqe.

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_req.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 8bdd0b6b578f..ed6a486c4343 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -624,7 +624,7 @@ int rxe_requester(void *arg)
rxe_get(qp);

next_wqe:
- if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
+ if (unlikely(!qp->valid))
goto exit;

if (unlikely(qp->req.state == QP_STATE_RESET)) {
@@ -646,6 +646,14 @@ int rxe_requester(void *arg)
if (unlikely(!wqe))
goto exit;

+ if (qp->req.state == QP_STATE_ERROR) {
+ /*
+ * Generate an error completion so that user space is able to
+ * poll this completion.
+ */
+ goto err;
+ }
+
if (wqe->mask & WR_LOCAL_OP_MASK) {
ret = rxe_do_local_ops(qp, wqe);
if (unlikely(ret))
--
2.31.1