2022-12-06 13:35:20

by Li Zhijian

[permalink] [raw]
Subject: [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation

Changes in V7:
- rebase to jgg-for-next(Atomic Write applied)
- patch-09: enable FLUSH QP attribute for supported device only, and remove RVB # Jason
- patch-04: pr_debug -> rxe_dbg_mr # Jason

Changes in V6:
- rebase to v6.1-rc1
- add Yanjun's reviewed-by expect "Allow registering persistent flag for pmem MR only"
- minimize pmem checking side effect # Jason
- return EOPNOSUPP if HCA doesn't support flush operation

These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
added in the MEMORY PLACEMENT EXTENSIONS section.

This patchset makes SoftRoCE support new RDMA FLUSH on RC service.

pyverbs tests and blktests(nvme over RDMA and srp) are tested.

You can verify the patchset by building and running the rdma_flush example[2].
server:
$ ./rdma_flush_server -s [server_address] -p [port_number]
client:
$ ./rdma_flush_client -s [server_address] -p [port_number]

Corresponding pyverbs and tests(tests.test_qpex.QpExTestCase.test_qp_ex_rc_rdma_flush)
are also added to rdma-core

[1]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
[2]: https://github.com/zhijianli88/rdma-core/tree/rdma-flush-v5

CC: Xiao Yang <[email protected]>
CC: "Gotou, Yasunori" <[email protected]>
CC: Jason Gunthorpe <[email protected]>
CC: Zhu Yanjun <[email protected]>
CC: Leon Romanovsky <[email protected]>
CC: Bob Pearson <[email protected]>
CC: Mark Bloch <[email protected]>
CC: Wenpeng Liang <[email protected]>
CC: Tom Talpey <[email protected]>
CC: "Gromadzki, Tomasz" <[email protected]>
CC: Dan Williams <[email protected]>
CC: [email protected]
CC: [email protected]

Can also access the kernel source in:
https://github.com/zhijianli88/linux/tree/rdma-flush-v7
Changes log
V6: https://lore.kernel.org/lkml/[email protected]/T/
V5: https://lore.kernel.org/lkml/[email protected]/t/
V4:
- rework responder process
- rebase to v5.19+
- remove [7/7]: RDMA/rxe: Add RD FLUSH service support since RD is not really supported

V3:
- Just rebase and commit log and comment updates
- delete patch-1: "RDMA: mr: Introduce is_pmem", which will be combined into "Allow registering persistent flag for pmem MR only"
- delete patch-7

V2:
RDMA: mr: Introduce is_pmem
check 1st byte to avoid crossing page boundary
new scheme to check is_pmem # Dan

RDMA: Allow registering MR with flush access flags
combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
split RDMA_FLUSH to 2 capabilities

RDMA/rxe: Allow registering persistent flag for pmem MR only
update commit message, get rid of confusing ib_check_flush_access_flags() # Tom

RDMA/rxe: Implement RC RDMA FLUSH service in requester side
extend flush to include length field. # Tom and Tomasz

RDMA/rxe: Implement flush execution in responder side
adjust start for WHOLE MR level # Tom
don't support DMA mr for flush # Tom
check flush return value

RDMA/rxe: Enable RDMA FLUSH capability for rxe device
adjust patch's order. move it here from [04/10]

Li Zhijian (10):
RDMA: Extend RDMA user ABI to support flush
RDMA: Extend RDMA kernel verbs ABI to support flush
RDMA/rxe: Extend rxe user ABI to support flush
RDMA/rxe: Allow registering persistent flag for pmem MR only
RDMA/rxe: Extend rxe packet format to support flush
RDMA/rxe: Implement RC RDMA FLUSH service in requester side
RDMA/rxe: Implement flush execution in responder side
RDMA/rxe: Implement flush completion
RDMA/cm: Make QP FLUSHABLE for supported device
RDMA/rxe: Enable RDMA FLUSH capability for rxe device

drivers/infiniband/core/cm.c | 13 +-
drivers/infiniband/sw/rxe/rxe_comp.c | 4 +-
drivers/infiniband/sw/rxe/rxe_hdr.h | 47 +++++++
drivers/infiniband/sw/rxe/rxe_loc.h | 1 +
drivers/infiniband/sw/rxe/rxe_mr.c | 58 ++++++++-
drivers/infiniband/sw/rxe/rxe_opcode.c | 17 +++
drivers/infiniband/sw/rxe/rxe_opcode.h | 14 ++-
drivers/infiniband/sw/rxe/rxe_param.h | 2 +
drivers/infiniband/sw/rxe/rxe_req.c | 15 ++-
drivers/infiniband/sw/rxe/rxe_resp.c | 160 +++++++++++++++++++++---
drivers/infiniband/sw/rxe/rxe_verbs.h | 6 +
include/rdma/ib_pack.h | 3 +
include/rdma/ib_verbs.h | 18 ++-
include/uapi/rdma/ib_user_ioctl_verbs.h | 2 +
include/uapi/rdma/ib_user_verbs.h | 17 +++
include/uapi/rdma/rdma_user_rxe.h | 7 ++
16 files changed, 352 insertions(+), 32 deletions(-)

--
2.31.1


2022-12-06 13:35:28

by Li Zhijian

[permalink] [raw]
Subject: [for-next PATCH 06/10] RDMA/rxe: Implement RC RDMA FLUSH service in requester side

Implement FLUSH request operation in the requester.

Reviewed-by: Zhu Yanjun <[email protected]>
Signed-off-by: Li Zhijian <[email protected]>
---
V4: Remove flush union for legecy API, add WR_FLUSH_MASK
V3: Fix sparse: incorrect type in assignment; Reported-by: kernel test robot <[email protected]>
V2: extend flush to include length field.
---
drivers/infiniband/sw/rxe/rxe_req.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 2713e9058922..899c8779f800 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -241,6 +241,9 @@ static int next_opcode_rc(struct rxe_qp *qp, u32 opcode, int fits)
IB_OPCODE_RC_SEND_ONLY_WITH_IMMEDIATE :
IB_OPCODE_RC_SEND_FIRST;

+ case IB_WR_FLUSH:
+ return IB_OPCODE_RC_FLUSH;
+
case IB_WR_RDMA_READ:
return IB_OPCODE_RC_RDMA_READ_REQUEST;

@@ -425,11 +428,18 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,

/* init optional headers */
if (pkt->mask & RXE_RETH_MASK) {
- reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
+ if (pkt->mask & RXE_FETH_MASK)
+ reth_set_rkey(pkt, ibwr->wr.flush.rkey);
+ else
+ reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
reth_set_va(pkt, wqe->iova);
reth_set_len(pkt, wqe->dma.resid);
}

+ /* Fill Flush Extension Transport Header */
+ if (pkt->mask & RXE_FETH_MASK)
+ feth_init(pkt, ibwr->wr.flush.type, ibwr->wr.flush.level);
+
if (pkt->mask & RXE_IMMDT_MASK)
immdt_set_imm(pkt, ibwr->ex.imm_data);

@@ -488,6 +498,9 @@ static int finish_packet(struct rxe_qp *qp, struct rxe_av *av,

memset(pad, 0, bth_pad(pkt));
}
+ } else if (pkt->mask & RXE_FLUSH_MASK) {
+ /* oA19-2: shall have no payload. */
+ wqe->dma.resid = 0;
}

if (pkt->mask & RXE_ATOMIC_WRITE_MASK) {
--
2.31.1

2022-12-06 13:37:12

by Li Zhijian

[permalink] [raw]
Subject: [for-next PATCH 09/10] RDMA/cm: Make QP FLUSHABLE for supported device

Similar to RDMA and Atomic qp attributes enabled by default in CM, enable
FLUSH attribute for supported device. That makes applications that are
built with rdma_create_ep, rdma_accept APIs have FLUSH qp attribute
natively so that user is able to request FLUSH operation simpler.

Note that, a FLUSH operation requires FLUSH are supported by both
device(HCA) and memory region(MR) and QP at the same time, so it's safe
to enable FLUSH qp attribute by default here.

FLUSH attribute can be disable by modify_qp() interface.

Signed-off-by: Li Zhijian <[email protected]>
---
V7: enable flush for supported device only # Jason
V5: new patch, inspired by Bob
Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/core/cm.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 1f9938a2c475..603c0aecc361 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -4094,9 +4094,18 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
*qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS |
IB_QP_PKEY_INDEX | IB_QP_PORT;
qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
- if (cm_id_priv->responder_resources)
+ if (cm_id_priv->responder_resources) {
+ struct ib_device *ib_dev = cm_id_priv->id.device;
+ u64 support_flush = ib_dev->attrs.device_cap_flags &
+ (IB_DEVICE_FLUSH_GLOBAL | IB_DEVICE_FLUSH_PERSISTENT);
+ u32 flushable = support_flush ?
+ (IB_ACCESS_FLUSH_GLOBAL |
+ IB_ACCESS_FLUSH_PERSISTENT) : 0;
+
qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
- IB_ACCESS_REMOTE_ATOMIC;
+ IB_ACCESS_REMOTE_ATOMIC |
+ flushable;
+ }
qp_attr->pkey_index = cm_id_priv->av.pkey_index;
if (cm_id_priv->av.port)
qp_attr->port_num = cm_id_priv->av.port->port_num;
--
2.31.1

2022-12-06 13:52:28

by Li Zhijian

[permalink] [raw]
Subject: [for-next PATCH 07/10] RDMA/rxe: Implement flush execution in responder side

Only the requested placement types that also registered in the destination
memory region are acceptable.
Otherwise, responder will also reply NAK "Remote Access Error" if it
found a placement type violation.

We will persist data via arch_wb_cache_pmem(), which could be
architecture specific.

This commit also adds 2 helpers to update qp.resp from the incoming packet.

Reviewed-by: Zhu Yanjun <[email protected]>
Signed-off-by: Li Zhijian <[email protected]>
---
v6: call iova_to_vaddr to transform iova
v5: add QP attr check for flush access
rename flush_nvdimm_iova -> rxe_flush_pmem_iova()
v4: add send_read_response_ack and flush resource
---
drivers/infiniband/sw/rxe/rxe_loc.h | 1 +
drivers/infiniband/sw/rxe/rxe_mr.c | 36 ++++++
drivers/infiniband/sw/rxe/rxe_resp.c | 160 ++++++++++++++++++++++----
drivers/infiniband/sw/rxe/rxe_verbs.h | 6 +
4 files changed, 183 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index a22476d27b38..948ce4902b10 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -64,6 +64,7 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr);
int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
int access, struct rxe_mr *mr);
int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr);
+int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length);
int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
enum rxe_mr_copy_dir dir);
int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 81a438e5010a..072eac4b65d2 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -4,6 +4,8 @@
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*/

+#include <linux/libnvdimm.h>
+
#include "rxe.h"
#include "rxe_loc.h"

@@ -192,6 +194,7 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
mr->offset = ib_umem_offset(umem);
mr->state = RXE_MR_STATE_VALID;
mr->ibmr.type = IB_MR_TYPE_USER;
+ mr->ibmr.page_size = PAGE_SIZE;

return 0;

@@ -295,6 +298,39 @@ void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length)
return addr;
}

+int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length)
+{
+ size_t offset;
+
+ if (length == 0)
+ return 0;
+
+ if (mr->ibmr.type == IB_MR_TYPE_DMA)
+ return -EFAULT;
+
+ offset = (iova - mr->ibmr.iova + mr->offset) & mr->page_mask;
+ while (length > 0) {
+ u8 *va;
+ int bytes;
+
+ bytes = mr->ibmr.page_size - offset;
+ if (bytes > length)
+ bytes = length;
+
+ va = iova_to_vaddr(mr, iova, length);
+ if (!va)
+ return -EFAULT;
+
+ arch_wb_cache_pmem(va, bytes);
+
+ length -= bytes;
+ iova += bytes;
+ offset = 0;
+ }
+
+ return 0;
+}
+
/* copy data from a range (vaddr, vaddr+length-1) to or from
* a mr object starting at iova.
*/
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 12a699ffa323..798fb36ee867 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -23,6 +23,7 @@ enum resp_states {
RESPST_READ_REPLY,
RESPST_ATOMIC_REPLY,
RESPST_ATOMIC_WRITE_REPLY,
+ RESPST_PROCESS_FLUSH,
RESPST_COMPLETE,
RESPST_ACKNOWLEDGE,
RESPST_CLEANUP,
@@ -59,6 +60,7 @@ static char *resp_state_name[] = {
[RESPST_READ_REPLY] = "READ_REPLY",
[RESPST_ATOMIC_REPLY] = "ATOMIC_REPLY",
[RESPST_ATOMIC_WRITE_REPLY] = "ATOMIC_WRITE_REPLY",
+ [RESPST_PROCESS_FLUSH] = "PROCESS_FLUSH",
[RESPST_COMPLETE] = "COMPLETE",
[RESPST_ACKNOWLEDGE] = "ACKNOWLEDGE",
[RESPST_CLEANUP] = "CLEANUP",
@@ -258,19 +260,37 @@ static enum resp_states check_op_seq(struct rxe_qp *qp,
}
}

+static bool check_qp_attr_access(struct rxe_qp *qp,
+ struct rxe_pkt_info *pkt)
+{
+ if (((pkt->mask & RXE_READ_MASK) &&
+ !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_READ)) ||
+ ((pkt->mask & (RXE_WRITE_MASK | RXE_ATOMIC_WRITE_MASK)) &&
+ !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_WRITE)) ||
+ ((pkt->mask & RXE_ATOMIC_MASK) &&
+ !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_ATOMIC)))
+ return false;
+
+ if (pkt->mask & RXE_FLUSH_MASK) {
+ u32 flush_type = feth_plt(pkt);
+
+ if ((flush_type & IB_FLUSH_GLOBAL &&
+ !(qp->attr.qp_access_flags & IB_ACCESS_FLUSH_GLOBAL)) ||
+ (flush_type & IB_FLUSH_PERSISTENT &&
+ !(qp->attr.qp_access_flags & IB_ACCESS_FLUSH_PERSISTENT)))
+ return false;
+ }
+
+ return true;
+}
+
static enum resp_states check_op_valid(struct rxe_qp *qp,
struct rxe_pkt_info *pkt)
{
switch (qp_type(qp)) {
case IB_QPT_RC:
- if (((pkt->mask & RXE_READ_MASK) &&
- !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_READ)) ||
- ((pkt->mask & (RXE_WRITE_MASK | RXE_ATOMIC_WRITE_MASK)) &&
- !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_WRITE)) ||
- ((pkt->mask & RXE_ATOMIC_MASK) &&
- !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_ATOMIC))) {
+ if (!check_qp_attr_access(qp, pkt))
return RESPST_ERR_UNSUPPORTED_OPCODE;
- }

break;

@@ -427,6 +447,23 @@ static enum resp_states check_length(struct rxe_qp *qp,
return RESPST_CHK_RKEY;
}

+static void qp_resp_from_reth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
+{
+ qp->resp.va = reth_va(pkt);
+ qp->resp.offset = 0;
+ qp->resp.rkey = reth_rkey(pkt);
+ qp->resp.resid = reth_len(pkt);
+ qp->resp.length = reth_len(pkt);
+}
+
+static void qp_resp_from_atmeth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
+{
+ qp->resp.va = atmeth_va(pkt);
+ qp->resp.offset = 0;
+ qp->resp.rkey = atmeth_rkey(pkt);
+ qp->resp.resid = sizeof(u64);
+}
+
static enum resp_states check_rkey(struct rxe_qp *qp,
struct rxe_pkt_info *pkt)
{
@@ -438,23 +475,26 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
u32 pktlen;
int mtu = qp->mtu;
enum resp_states state;
- int access;
+ int access = 0;

if (pkt->mask & (RXE_READ_OR_WRITE_MASK | RXE_ATOMIC_WRITE_MASK)) {
- if (pkt->mask & RXE_RETH_MASK) {
- qp->resp.va = reth_va(pkt);
- qp->resp.offset = 0;
- qp->resp.rkey = reth_rkey(pkt);
- qp->resp.resid = reth_len(pkt);
- qp->resp.length = reth_len(pkt);
- }
+ if (pkt->mask & RXE_RETH_MASK)
+ qp_resp_from_reth(qp, pkt);
+
access = (pkt->mask & RXE_READ_MASK) ? IB_ACCESS_REMOTE_READ
: IB_ACCESS_REMOTE_WRITE;
+ } else if (pkt->mask & RXE_FLUSH_MASK) {
+ u32 flush_type = feth_plt(pkt);
+
+ if (pkt->mask & RXE_RETH_MASK)
+ qp_resp_from_reth(qp, pkt);
+
+ if (flush_type & IB_FLUSH_GLOBAL)
+ access |= IB_ACCESS_FLUSH_GLOBAL;
+ if (flush_type & IB_FLUSH_PERSISTENT)
+ access |= IB_ACCESS_FLUSH_PERSISTENT;
} else if (pkt->mask & RXE_ATOMIC_MASK) {
- qp->resp.va = atmeth_va(pkt);
- qp->resp.offset = 0;
- qp->resp.rkey = atmeth_rkey(pkt);
- qp->resp.resid = sizeof(u64);
+ qp_resp_from_atmeth(qp, pkt);
access = IB_ACCESS_REMOTE_ATOMIC;
} else {
return RESPST_EXECUTE;
@@ -501,11 +541,20 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
}
}

+ if (pkt->mask & RXE_FLUSH_MASK) {
+ /* FLUSH MR may not set va or resid
+ * no need to check range since we will flush whole mr
+ */
+ if (feth_sel(pkt) == IB_FLUSH_MR)
+ goto skip_check_range;
+ }
+
if (mr_check_range(mr, va + qp->resp.offset, resid)) {
state = RESPST_ERR_RKEY_VIOLATION;
goto err;
}

+skip_check_range:
if (pkt->mask & (RXE_WRITE_MASK | RXE_ATOMIC_WRITE_MASK)) {
if (resid > mtu) {
if (pktlen != mtu || bth_pad(pkt)) {
@@ -611,11 +660,61 @@ static struct resp_res *rxe_prepare_res(struct rxe_qp *qp,
res->last_psn = pkt->psn;
res->cur_psn = pkt->psn;
break;
+ case RXE_FLUSH_MASK:
+ res->flush.va = qp->resp.va + qp->resp.offset;
+ res->flush.length = qp->resp.length;
+ res->flush.type = feth_plt(pkt);
+ res->flush.level = feth_sel(pkt);
}

return res;
}

+static enum resp_states process_flush(struct rxe_qp *qp,
+ struct rxe_pkt_info *pkt)
+{
+ u64 length, start;
+ struct rxe_mr *mr = qp->resp.mr;
+ struct resp_res *res = qp->resp.res;
+
+ /* oA19-14, oA19-15 */
+ if (res && res->replay)
+ return RESPST_ACKNOWLEDGE;
+ else if (!res) {
+ res = rxe_prepare_res(qp, pkt, RXE_FLUSH_MASK);
+ qp->resp.res = res;
+ }
+
+ if (res->flush.level == IB_FLUSH_RANGE) {
+ start = res->flush.va;
+ length = res->flush.length;
+ } else { /* level == IB_FLUSH_MR */
+ start = mr->ibmr.iova;
+ length = mr->ibmr.length;
+ }
+
+ if (res->flush.type & IB_FLUSH_PERSISTENT) {
+ if (rxe_flush_pmem_iova(mr, start, length))
+ return RESPST_ERR_RKEY_VIOLATION;
+ /* Make data persistent. */
+ wmb();
+ } else if (res->flush.type & IB_FLUSH_GLOBAL) {
+ /* Make data global visibility. */
+ wmb();
+ }
+
+ qp->resp.msn++;
+
+ /* next expected psn, read handles this separately */
+ qp->resp.psn = (pkt->psn + 1) & BTH_PSN_MASK;
+ qp->resp.ack_psn = qp->resp.psn;
+
+ qp->resp.opcode = pkt->opcode;
+ qp->resp.status = IB_WC_SUCCESS;
+
+ return RESPST_ACKNOWLEDGE;
+}
+
/* Guarantee atomicity of atomic operations at the machine level. */
static DEFINE_SPINLOCK(atomic_ops_lock);

@@ -970,6 +1069,8 @@ static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
return RESPST_ATOMIC_REPLY;
} else if (pkt->mask & RXE_ATOMIC_WRITE_MASK) {
return RESPST_ATOMIC_WRITE_REPLY;
+ } else if (pkt->mask & RXE_FLUSH_MASK) {
+ return RESPST_PROCESS_FLUSH;
} else {
/* Unreachable */
WARN_ON_ONCE(1);
@@ -1166,7 +1267,7 @@ static enum resp_states acknowledge(struct rxe_qp *qp,
send_ack(qp, qp->resp.aeth_syndrome, pkt->psn);
else if (pkt->mask & RXE_ATOMIC_MASK)
send_atomic_ack(qp, AETH_ACK_UNLIMITED, pkt->psn);
- else if (pkt->mask & RXE_ATOMIC_WRITE_MASK)
+ else if (pkt->mask & (RXE_FLUSH_MASK | RXE_ATOMIC_WRITE_MASK))
send_read_response_ack(qp, AETH_ACK_UNLIMITED, pkt->psn);
else if (bth_ack(pkt))
send_ack(qp, AETH_ACK_UNLIMITED, pkt->psn);
@@ -1224,6 +1325,22 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
/* SEND. Ack again and cleanup. C9-105. */
send_ack(qp, AETH_ACK_UNLIMITED, prev_psn);
return RESPST_CLEANUP;
+ } else if (pkt->mask & RXE_FLUSH_MASK) {
+ struct resp_res *res;
+
+ /* Find the operation in our list of responder resources. */
+ res = find_resource(qp, pkt->psn);
+ if (res) {
+ res->replay = 1;
+ res->cur_psn = pkt->psn;
+ qp->resp.res = res;
+ rc = RESPST_PROCESS_FLUSH;
+ goto out;
+ }
+
+ /* Resource not found. Class D error. Drop the request. */
+ rc = RESPST_CLEANUP;
+ goto out;
} else if (pkt->mask & RXE_READ_MASK) {
struct resp_res *res;

@@ -1421,6 +1538,9 @@ int rxe_responder(void *arg)
case RESPST_ATOMIC_WRITE_REPLY:
state = atomic_write_reply(qp, pkt);
break;
+ case RESPST_PROCESS_FLUSH:
+ state = process_flush(qp, pkt);
+ break;
case RESPST_ACKNOWLEDGE:
state = acknowledge(qp, pkt);
break;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 22a299b0a9f0..19ddfa890480 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -165,6 +165,12 @@ struct resp_res {
u64 va;
u32 resid;
} read;
+ struct {
+ u32 length;
+ u64 va;
+ u8 type;
+ u8 level;
+ } flush;
};
};

--
2.31.1

2022-12-10 00:40:24

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation

On Tue, Dec 06, 2022 at 09:01:51PM +0800, Li Zhijian wrote:

> Li Zhijian (10):
> RDMA: Extend RDMA user ABI to support flush
> RDMA: Extend RDMA kernel verbs ABI to support flush
> RDMA/rxe: Extend rxe user ABI to support flush
> RDMA/rxe: Allow registering persistent flag for pmem MR only
> RDMA/rxe: Extend rxe packet format to support flush
> RDMA/rxe: Implement RC RDMA FLUSH service in requester side
> RDMA/rxe: Implement flush execution in responder side
> RDMA/rxe: Implement flush completion
> RDMA/cm: Make QP FLUSHABLE for supported device
> RDMA/rxe: Enable RDMA FLUSH capability for rxe device

Applied to for-next, thanks

Jason