2022-03-16 07:06:06

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation

Hey folks,

These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
added in the MEMORY PLACEMENT EXTENSIONS section.

This patchset makes SoftRoCE support new RDMA FLUSH on RC and RD service.

You can verify the patchset by building and running the rdma_flush example[2].
server:
$ ./rdma_flush_server -s [server_address] -p [port_number]
client:
$ ./rdma_flush_client -s [server_address] -p [port_number]

- We introduce new packet format for FLUSH request.
- We introduce FLUSH placement type attributes to HCA
- We introduce FLUSH access flags that users are able to register with

[1]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
[2]: https://github.com/zhijianli88/rdma-core/tree/rdma-flush

CC: Xiao Yang <[email protected]>
CC: [email protected]
CC: Jason Gunthorpe <[email protected]>
CC: Zhu Yanjun <[email protected]
CC: Leon Romanovsky <[email protected]>
CC: Bob Pearson <[email protected]>
CC: Mark Bloch <[email protected]>
CC: Wenpeng Liang <[email protected]>
CC: Aharon Landau <[email protected]>
CC: Tom Talpey <[email protected]>
CC: "Gromadzki, Tomasz" <[email protected]>
CC: Dan Williams <[email protected]>
CC: [email protected]
CC: [email protected]

Can also access the kernel source in:
https://github.com/zhijianli88/linux/tree/rdma-flush
Changes log
V3:
- Just rebase and commit log and comment updates
- delete patch-1: "RDMA: mr: Introduce is_pmem", which will be combined into "Allow registering persistent flag for pmem MR only"
- delete patch-7

V2:
RDMA: mr: Introduce is_pmem
check 1st byte to avoid crossing page boundary
new scheme to check is_pmem # Dan

RDMA: Allow registering MR with flush access flags
combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
split RDMA_FLUSH to 2 capabilities

RDMA/rxe: Allow registering persistent flag for pmem MR only
update commit message, get rid of confusing ib_check_flush_access_flags() # Tom

RDMA/rxe: Implement RC RDMA FLUSH service in requester side
extend flush to include length field. # Tom and Tomasz

RDMA/rxe: Implement flush execution in responder side
adjust start for WHOLE MR level # Tom
don't support DMA mr for flush # Tom
check flush return value

RDMA/rxe: Enable RDMA FLUSH capability for rxe device
adjust patch's order. move it here from [04/10]

Li Zhijian (7):
RDMA: Allow registering MR with flush access flags
RDMA/rxe: Allow registering persistent flag for pmem MR only
RDMA/rxe: Implement RC RDMA FLUSH service in requester side
RDMA/rxe: Implement flush execution in responder side
RDMA/rxe: Implement flush completion
RDMA/rxe: Enable RDMA FLUSH capability for rxe device
RDMA/rxe: Add RD FLUSH service support

drivers/infiniband/core/uverbs_cmd.c | 17 +++
drivers/infiniband/sw/rxe/rxe_comp.c | 4 +-
drivers/infiniband/sw/rxe/rxe_hdr.h | 48 +++++++++
drivers/infiniband/sw/rxe/rxe_loc.h | 2 +
drivers/infiniband/sw/rxe/rxe_mr.c | 36 ++++++-
drivers/infiniband/sw/rxe/rxe_opcode.c | 35 ++++++
drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +
drivers/infiniband/sw/rxe/rxe_param.h | 4 +-
drivers/infiniband/sw/rxe/rxe_req.c | 15 ++-
drivers/infiniband/sw/rxe/rxe_resp.c | 135 ++++++++++++++++++++++--
include/rdma/ib_pack.h | 3 +
include/rdma/ib_verbs.h | 29 ++++-
include/uapi/rdma/ib_user_ioctl_verbs.h | 2 +
include/uapi/rdma/ib_user_verbs.h | 19 ++++
include/uapi/rdma/rdma_user_rxe.h | 7 ++
15 files changed, 346 insertions(+), 13 deletions(-)

--
2.31.1




2022-03-17 04:34:17

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v3 6/7] RDMA/rxe: Enable RDMA FLUSH capability for rxe device

Now we are ready to enable RDMA FLUSH capability for RXE.
It can support Global Visibility and Persistence placement types.

Signed-off-by: Li Zhijian <[email protected]>
---
V2: adjust patch's order. move it here from [04/10]
---
drivers/infiniband/sw/rxe/rxe_param.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index 918270e34a35..281e1977b147 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -53,7 +53,9 @@ enum rxe_device_param {
| IB_DEVICE_ALLOW_USER_UNREG
| IB_DEVICE_MEM_WINDOW
| IB_DEVICE_MEM_WINDOW_TYPE_2A
- | IB_DEVICE_MEM_WINDOW_TYPE_2B,
+ | IB_DEVICE_MEM_WINDOW_TYPE_2B
+ | IB_DEVICE_PLT_GLOBAL_VISIBILITY
+ | IB_DEVICE_PLT_PERSISTENT,
RXE_MAX_SGE = 32,
RXE_MAX_WQE_SIZE = sizeof(struct rxe_send_wqe) +
sizeof(struct ib_sge) * RXE_MAX_SGE,
--
2.31.1



2022-03-17 04:49:14

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v3 2/7] RDMA/rxe: Allow registering persistent flag for pmem MR only

Memory region could at most support 2 placement types:
IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL_VISIBILITY

But we only allow user to register persistence flush to non-pmem MR.

Signed-off-by: Li Zhijian <[email protected]>
---
V3: combine [RFC PATCH v2 1/9] RDMA: mr: Introduce is_pmem
V2: update commit message, get rid of confusing ib_check_flush_access_flags() # Tom
---
drivers/infiniband/sw/rxe/rxe_mr.c | 32 ++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 453ef3c9d535..4f5c4af19fe0 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -161,6 +161,28 @@ void rxe_mr_init_dma(struct rxe_pd *pd, int access, struct rxe_mr *mr)
mr->type = IB_MR_TYPE_DMA;
}

+static bool iova_in_pmem(struct rxe_mr *mr, u64 iova, int length)
+{
+ char *vaddr;
+ int is_pmem;
+
+ /* XXX: Shall me allow length == 0 */
+ if (length == 0) {
+ return false;
+ }
+ /* check the 1st byte only to avoid crossing page boundary */
+ vaddr = iova_to_vaddr(mr, iova, 1);
+ if (!vaddr) {
+ pr_warn("not a valid iova 0x%llx\n", iova);
+ return false;
+ }
+
+ is_pmem = region_intersects(virt_to_phys(vaddr), 1, IORESOURCE_MEM,
+ IORES_DESC_PERSISTENT_MEMORY);
+
+ return is_pmem == REGION_INTERSECTS;
+}
+
int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
int access, struct rxe_mr *mr)
{
@@ -235,6 +257,16 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
set->va = start;
set->offset = ib_umem_offset(umem);

+ // iova_in_pmem() must be called after set is updated
+ if (!iova_in_pmem(mr, iova, length) &&
+ access & IB_ACCESS_FLUSH_PERSISTENT) {
+ pr_warn("Cannot register IB_ACCESS_FLUSH_PERSISTENT for non-pmem memory\n");
+ mr->state = RXE_MR_STATE_INVALID;
+ mr->umem = NULL;
+ err = -EINVAL;
+ goto err_release_umem;
+ }
+
return 0;

err_release_umem:
--
2.31.1



2022-03-17 05:20:15

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v3 4/7] RDMA/rxe: Implement flush execution in responder side

In contrast to other opcodes, after a series of sanity checking, FLUSH
opcode will do a Placement Type checking before it really do the FLUSH
operation.

We will persist data via arch_wb_cache_pmem(), which could be
architecture specific.

Signed-off-by: Li Zhijian <[email protected]>
---
V3: Fix sparse: incorrect type in assignment; Reported-by: kernel test robot <[email protected]>
V2:
# from Tom
- adjust start for WHOLE MR level
- don't support DMA mr for flush
- check flush return value
- FLUSH only requires FLUSH access flags, not READ nor WRITE
---
drivers/infiniband/sw/rxe/rxe_hdr.h | 28 ++++++
drivers/infiniband/sw/rxe/rxe_loc.h | 2 +
drivers/infiniband/sw/rxe/rxe_mr.c | 4 +-
drivers/infiniband/sw/rxe/rxe_resp.c | 135 +++++++++++++++++++++++++--
include/uapi/rdma/ib_user_verbs.h | 10 ++
5 files changed, 171 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
index 8063b5018445..2fe98146130e 100644
--- a/drivers/infiniband/sw/rxe/rxe_hdr.h
+++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
@@ -626,6 +626,34 @@ static inline void feth_init(struct rxe_pkt_info *pkt, u8 type, u8 level)
*p = cpu_to_be32(feth);
}

+static inline u32 __feth_plt(void *arg)
+{
+ __be32 *fethp = arg;
+ u32 feth = be32_to_cpu(*fethp);
+
+ return (feth & FETH_PLT_MASK) >> FETH_PLT_SHIFT;
+}
+
+static inline u32 __feth_sel(void *arg)
+{
+ __be32 *fethp = arg;
+ u32 feth = be32_to_cpu(*fethp);
+
+ return (feth & FETH_SEL_MASK) >> FETH_SEL_SHIFT;
+}
+
+static inline u32 feth_plt(struct rxe_pkt_info *pkt)
+{
+ return __feth_plt(pkt->hdr +
+ rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+}
+
+static inline u32 feth_sel(struct rxe_pkt_info *pkt)
+{
+ return __feth_sel(pkt->hdr +
+ rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+}
+
/******************************************************************************
* Atomic Extended Transport Header
******************************************************************************/
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index b1e174afb1d4..73c39ff11e28 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -80,6 +80,8 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
enum rxe_mr_copy_dir dir);
int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
void *addr, int length, enum rxe_mr_copy_dir dir);
+void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out,
+ size_t *offset_out);
void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length);
struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key,
enum rxe_mr_lookup_type type);
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 4f5c4af19fe0..28bef8a39cd7 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -297,8 +297,8 @@ int rxe_mr_init_fast(struct rxe_pd *pd, int max_pages, struct rxe_mr *mr)
return err;
}

-static void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out,
- size_t *offset_out)
+void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out,
+ size_t *offset_out)
{
struct rxe_map_set *set = mr->cur_map_set;
size_t offset = iova - set->iova + set->offset;
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index e0093fad4e0f..8ad35667a476 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -5,6 +5,7 @@
*/

#include <linux/skbuff.h>
+#include <linux/libnvdimm.h>

#include "rxe.h"
#include "rxe_loc.h"
@@ -19,6 +20,7 @@ enum resp_states {
RESPST_CHK_RESOURCE,
RESPST_CHK_LENGTH,
RESPST_CHK_RKEY,
+ RESPST_CHK_PLT,
RESPST_EXECUTE,
RESPST_READ_REPLY,
RESPST_COMPLETE,
@@ -35,6 +37,7 @@ enum resp_states {
RESPST_ERR_TOO_MANY_RDMA_ATM_REQ,
RESPST_ERR_RNR,
RESPST_ERR_RKEY_VIOLATION,
+ RESPST_ERR_PLT_VIOLATION,
RESPST_ERR_INVALIDATE_RKEY,
RESPST_ERR_LENGTH,
RESPST_ERR_CQ_OVERFLOW,
@@ -53,6 +56,7 @@ static char *resp_state_name[] = {
[RESPST_CHK_RESOURCE] = "CHK_RESOURCE",
[RESPST_CHK_LENGTH] = "CHK_LENGTH",
[RESPST_CHK_RKEY] = "CHK_RKEY",
+ [RESPST_CHK_PLT] = "CHK_PLACEMENT_TYPE",
[RESPST_EXECUTE] = "EXECUTE",
[RESPST_READ_REPLY] = "READ_REPLY",
[RESPST_COMPLETE] = "COMPLETE",
@@ -69,6 +73,7 @@ static char *resp_state_name[] = {
[RESPST_ERR_TOO_MANY_RDMA_ATM_REQ] = "ERR_TOO_MANY_RDMA_ATM_REQ",
[RESPST_ERR_RNR] = "ERR_RNR",
[RESPST_ERR_RKEY_VIOLATION] = "ERR_RKEY_VIOLATION",
+ [RESPST_ERR_PLT_VIOLATION] = "ERR_PLACEMENT_TYPE_VIOLATION",
[RESPST_ERR_INVALIDATE_RKEY] = "ERR_INVALIDATE_RKEY_VIOLATION",
[RESPST_ERR_LENGTH] = "ERR_LENGTH",
[RESPST_ERR_CQ_OVERFLOW] = "ERR_CQ_OVERFLOW",
@@ -400,6 +405,24 @@ static enum resp_states check_length(struct rxe_qp *qp,
}
}

+static enum resp_states check_placement_type(struct rxe_qp *qp,
+ struct rxe_pkt_info *pkt)
+{
+ struct rxe_mr *mr = qp->resp.mr;
+ u32 plt = feth_plt(pkt);
+
+ if ((plt & IB_EXT_PLT_GLB_VIS &&
+ !(mr->access & IB_ACCESS_FLUSH_GLOBAL_VISIBILITY)) ||
+ (plt & IB_EXT_PLT_PERSIST &&
+ !(mr->access & IB_ACCESS_FLUSH_PERSISTENT))) {
+ pr_info("Target MR didn't support this placement type, registered flag: %x, requested flag: %x\n",
+ (mr->access & IB_ACCESS_FLUSHABLE) >> 8, plt);
+ return RESPST_ERR_PLT_VIOLATION;
+ }
+
+ return RESPST_EXECUTE;
+}
+
static enum resp_states check_rkey(struct rxe_qp *qp,
struct rxe_pkt_info *pkt)
{
@@ -413,7 +436,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
enum resp_states state;
int access;

- if (pkt->mask & RXE_READ_OR_WRITE_MASK) {
+ if (pkt->mask & (RXE_READ_OR_WRITE_MASK | RXE_FLUSH_MASK)) {
if (pkt->mask & RXE_RETH_MASK) {
qp->resp.va = reth_va(pkt);
qp->resp.offset = 0;
@@ -421,8 +444,12 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
qp->resp.resid = reth_len(pkt);
qp->resp.length = reth_len(pkt);
}
- access = (pkt->mask & RXE_READ_MASK) ? IB_ACCESS_REMOTE_READ
- : IB_ACCESS_REMOTE_WRITE;
+ if (pkt->mask & RXE_FLUSH_MASK)
+ access = IB_ACCESS_FLUSHABLE;
+ else if (pkt->mask & RXE_READ_MASK)
+ access = IB_ACCESS_REMOTE_READ;
+ else
+ access = IB_ACCESS_REMOTE_WRITE;
} else if (pkt->mask & RXE_ATOMIC_MASK) {
qp->resp.va = atmeth_va(pkt);
qp->resp.offset = 0;
@@ -434,8 +461,10 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
}

/* A zero-byte op is not required to set an addr or rkey. */
+ /* RXE_FETH_MASK carraies zero-byte playload */
if ((pkt->mask & RXE_READ_OR_WRITE_MASK) &&
(pkt->mask & RXE_RETH_MASK) &&
+ !(pkt->mask & RXE_FETH_MASK) &&
reth_len(pkt) == 0) {
return RESPST_EXECUTE;
}
@@ -503,7 +532,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
WARN_ON_ONCE(qp->resp.mr);

qp->resp.mr = mr;
- return RESPST_EXECUTE;
+ return pkt->mask & RXE_FETH_MASK ? RESPST_CHK_PLT : RESPST_EXECUTE;

err:
if (mr)
@@ -549,6 +578,93 @@ static enum resp_states write_data_in(struct rxe_qp *qp,
return rc;
}

+static int nvdimm_flush_iova(struct rxe_mr *mr, u64 iova, int length)
+{
+ int err;
+ int bytes;
+ u8 *va;
+ struct rxe_map **map;
+ struct rxe_phys_buf *buf;
+ int m;
+ int i;
+ size_t offset;
+
+ if (length == 0)
+ return 0;
+
+ if (mr->type == IB_MR_TYPE_DMA) {
+ err = -EFAULT;
+ goto err1;
+ }
+
+ err = mr_check_range(mr, iova, length);
+ if (err) {
+ err = -EFAULT;
+ goto err1;
+ }
+
+ lookup_iova(mr, iova, &m, &i, &offset);
+
+ map = mr->cur_map_set->map + m;
+ buf = map[0]->buf + i;
+
+ while (length > 0) {
+ va = (u8 *)(uintptr_t)buf->addr + offset;
+ bytes = buf->size - offset;
+
+ if (bytes > length)
+ bytes = length;
+
+ arch_wb_cache_pmem(va, bytes);
+
+ length -= bytes;
+
+ offset = 0;
+ buf++;
+ i++;
+
+ if (i == RXE_BUF_PER_MAP) {
+ i = 0;
+ map++;
+ buf = map[0]->buf;
+ }
+ }
+
+ return 0;
+
+err1:
+ return err;
+}
+
+static enum resp_states process_flush(struct rxe_qp *qp,
+ struct rxe_pkt_info *pkt)
+{
+ u64 length, start;
+ u32 sel = feth_sel(pkt);
+ u32 plt = feth_plt(pkt);
+ struct rxe_mr *mr = qp->resp.mr;
+
+ if (sel == IB_EXT_SEL_MR_RANGE) {
+ start = qp->resp.va;
+ length = qp->resp.length;
+ } else { /* sel == IB_EXT_SEL_MR_WHOLE */
+ start = mr->cur_map_set->iova;
+ length = mr->cur_map_set->length;
+ }
+
+ if (plt & IB_EXT_PLT_PERSIST) {
+ if (nvdimm_flush_iova(mr, start, length))
+ return RESPST_ERR_RKEY_VIOLATION;
+ wmb();
+ } else if (plt & IB_EXT_PLT_GLB_VIS)
+ wmb();
+
+ /* Prepare RDMA READ response of zero */
+ qp->resp.resid = 0;
+
+ return RESPST_READ_REPLY;
+}
+
/* Guarantee atomicity of atomic operations at the machine level. */
static DEFINE_SPINLOCK(atomic_ops_lock);

@@ -801,6 +917,8 @@ static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
err = process_atomic(qp, pkt);
if (err)
return err;
+ } else if (pkt->mask & RXE_FLUSH_MASK) {
+ return process_flush(qp, pkt);
} else {
/* Unreachable */
WARN_ON_ONCE(1);
@@ -1061,7 +1179,7 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
/* SEND. Ack again and cleanup. C9-105. */
send_ack(qp, pkt, AETH_ACK_UNLIMITED, prev_psn);
return RESPST_CLEANUP;
- } else if (pkt->mask & RXE_READ_MASK) {
+ } else if (pkt->mask & RXE_READ_MASK || pkt->mask & RXE_FLUSH_MASK) {
struct resp_res *res;

res = find_resource(qp, pkt->psn);
@@ -1100,7 +1218,7 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
/* Reset the resource, except length. */
res->read.va_org = iova;
res->read.va = iova;
- res->read.resid = resid;
+ res->read.resid = pkt->mask & RXE_FLUSH_MASK ? 0 : resid;

/* Replay the RDMA read reply. */
qp->resp.res = res;
@@ -1247,6 +1365,9 @@ int rxe_responder(void *arg)
case RESPST_CHK_RKEY:
state = check_rkey(qp, pkt);
break;
+ case RESPST_CHK_PLT:
+ state = check_placement_type(qp, pkt);
+ break;
case RESPST_EXECUTE:
state = execute(qp, pkt);
break;
@@ -1301,6 +1422,8 @@ int rxe_responder(void *arg)
break;

case RESPST_ERR_RKEY_VIOLATION:
+ /* oA19-13 8 */
+ case RESPST_ERR_PLT_VIOLATION:
if (qp_type(qp) == IB_QPT_RC) {
/* Class C */
do_class_ac_error(qp, AETH_NAK_REM_ACC_ERR,
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index c4131913ef6a..69a04bb828a0 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -105,6 +105,16 @@ enum {
IB_USER_VERBS_EX_CMD_MODIFY_CQ
};

+enum ib_ext_placement_type {
+ IB_EXT_PLT_GLB_VIS = 1 << 0,
+ IB_EXT_PLT_PERSIST = 1 << 1,
+};
+
+enum ib_ext_selectivity_level {
+ IB_EXT_SEL_MR_RANGE = 0, /* select a MR range */
+ IB_EXT_SEL_MR_WHOLE, /* select the whole MR */
+};
+
/*
* Make sure that all structs defined in this file remain laid out so
* that they pack the same way on 32-bit and 64-bit architectures (to
--
2.31.1



2022-03-17 06:15:15

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v3 7/7] RDMA/rxe: Add RD FLUSH service support

Since XRC has not been supported by the rxe, XRC FLUSH will not be
supported until rxe implements XRC service.

Signed-off-by: Li Zhijian <[email protected]>
---
I have not setup a RD environment to test this protocol
---
drivers/infiniband/sw/rxe/rxe_opcode.c | 20 ++++++++++++++++++++
include/rdma/ib_pack.h | 1 +
2 files changed, 21 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c
index adea6c16dfb5..3d86129558f7 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.c
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
@@ -922,6 +922,26 @@ struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
RXE_RDETH_BYTES,
}
},
+ [IB_OPCODE_RD_RDMA_FLUSH] = {
+ .name = "IB_OPCODE_RD_RDMA_FLUSH",
+ .mask = RXE_RDETH_MASK | RXE_FETH_MASK | RXE_RETH_MASK |
+ RXE_FLUSH_MASK | RXE_START_MASK |
+ RXE_END_MASK | RXE_REQ_MASK,
+ .length = RXE_BTH_BYTES + RXE_FETH_BYTES + RXE_RETH_BYTES,
+ .offset = {
+ [RXE_BTH] = 0,
+ [RXE_RDETH] = RXE_BTH_BYTES,
+ [RXE_FETH] = RXE_BTH_BYTES +
+ RXE_RDETH_BYTES,
+ [RXE_RETH] = RXE_BTH_BYTES +
+ RXE_RDETH_BYTES +
+ RXE_FETH_BYTES,
+ [RXE_PAYLOAD] = RXE_BTH_BYTES +
+ RXE_RDETH_BYTES +
+ RXE_FETH_BYTES +
+ RXE_RETH_BYTES,
+ }
+ },

/* UD */
[IB_OPCODE_UD_SEND_ONLY] = {
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index d19edb502de6..40568a33ead8 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -151,6 +151,7 @@ enum {
IB_OPCODE(RD, ATOMIC_ACKNOWLEDGE),
IB_OPCODE(RD, COMPARE_SWAP),
IB_OPCODE(RD, FETCH_ADD),
+ IB_OPCODE(RD, RDMA_FLUSH),

/* UD */
IB_OPCODE(UD, SEND_ONLY),
--
2.31.1



2022-03-25 19:52:38

by Li Zhijian

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation

kindly ping


On 15/03/2022 18:18, Li Zhijian wrote:
> Hey folks,
>
> These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
> In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
> added in the MEMORY PLACEMENT EXTENSIONS section.
>
> This patchset makes SoftRoCE support new RDMA FLUSH on RC and RD service.
>
> You can verify the patchset by building and running the rdma_flush example[2].
> server:
> $ ./rdma_flush_server -s [server_address] -p [port_number]
> client:
> $ ./rdma_flush_client -s [server_address] -p [port_number]
>
> - We introduce new packet format for FLUSH request.
> - We introduce FLUSH placement type attributes to HCA
> - We introduce FLUSH access flags that users are able to register with
>
> [1]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
> [2]: https://github.com/zhijianli88/rdma-core/tree/rdma-flush
>
> CC: Xiao Yang <[email protected]>
> CC: [email protected]
> CC: Jason Gunthorpe <[email protected]>
> CC: Zhu Yanjun <[email protected]
> CC: Leon Romanovsky <[email protected]>
> CC: Bob Pearson <[email protected]>
> CC: Mark Bloch <[email protected]>
> CC: Wenpeng Liang <[email protected]>
> CC: Aharon Landau <[email protected]>
> CC: Tom Talpey <[email protected]>
> CC: "Gromadzki, Tomasz" <[email protected]>
> CC: Dan Williams <[email protected]>
> CC: [email protected]
> CC: [email protected]
>
> Can also access the kernel source in:
> https://github.com/zhijianli88/linux/tree/rdma-flush
> Changes log
> V3:
> - Just rebase and commit log and comment updates
> - delete patch-1: "RDMA: mr: Introduce is_pmem", which will be combined into "Allow registering persistent flag for pmem MR only"
> - delete patch-7
>
> V2:
> RDMA: mr: Introduce is_pmem
> check 1st byte to avoid crossing page boundary
> new scheme to check is_pmem # Dan
>
> RDMA: Allow registering MR with flush access flags
> combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
> split RDMA_FLUSH to 2 capabilities
>
> RDMA/rxe: Allow registering persistent flag for pmem MR only
> update commit message, get rid of confusing ib_check_flush_access_flags() # Tom
>
> RDMA/rxe: Implement RC RDMA FLUSH service in requester side
> extend flush to include length field. # Tom and Tomasz
>
> RDMA/rxe: Implement flush execution in responder side
> adjust start for WHOLE MR level # Tom
> don't support DMA mr for flush # Tom
> check flush return value
>
> RDMA/rxe: Enable RDMA FLUSH capability for rxe device
> adjust patch's order. move it here from [04/10]
>
> Li Zhijian (7):
> RDMA: Allow registering MR with flush access flags
> RDMA/rxe: Allow registering persistent flag for pmem MR only
> RDMA/rxe: Implement RC RDMA FLUSH service in requester side
> RDMA/rxe: Implement flush execution in responder side
> RDMA/rxe: Implement flush completion
> RDMA/rxe: Enable RDMA FLUSH capability for rxe device
> RDMA/rxe: Add RD FLUSH service support
>
> drivers/infiniband/core/uverbs_cmd.c | 17 +++
> drivers/infiniband/sw/rxe/rxe_comp.c | 4 +-
> drivers/infiniband/sw/rxe/rxe_hdr.h | 48 +++++++++
> drivers/infiniband/sw/rxe/rxe_loc.h | 2 +
> drivers/infiniband/sw/rxe/rxe_mr.c | 36 ++++++-
> drivers/infiniband/sw/rxe/rxe_opcode.c | 35 ++++++
> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +
> drivers/infiniband/sw/rxe/rxe_param.h | 4 +-
> drivers/infiniband/sw/rxe/rxe_req.c | 15 ++-
> drivers/infiniband/sw/rxe/rxe_resp.c | 135 ++++++++++++++++++++++--
> include/rdma/ib_pack.h | 3 +
> include/rdma/ib_verbs.h | 29 ++++-
> include/uapi/rdma/ib_user_ioctl_verbs.h | 2 +
> include/uapi/rdma/ib_user_verbs.h | 19 ++++
> include/uapi/rdma/rdma_user_rxe.h | 7 ++
> 15 files changed, 346 insertions(+), 13 deletions(-)
>

2022-07-04 14:00:14

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation

On Tue, Mar 15, 2022 at 06:18:38PM +0800, Li Zhijian wrote:
> Hey folks,
>
> These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
> In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
> added in the MEMORY PLACEMENT EXTENSIONS section.
>
> This patchset makes SoftRoCE support new RDMA FLUSH on RC and RD service.
>
> You can verify the patchset by building and running the rdma_flush example[2].
> server:
> $ ./rdma_flush_server -s [server_address] -p [port_number]
> client:
> $ ./rdma_flush_client -s [server_address] -p [port_number]
>
> - We introduce new packet format for FLUSH request.
> - We introduce FLUSH placement type attributes to HCA
> - We introduce FLUSH access flags that users are able to register with

So where are we on this? Are all the rxe regressions fixed now? It
doesn't apply so I'm dropping it from patchworks.

Thanks,
Jason