2022-01-25 09:34:19

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v2 0/9] RDMA/rxe: Add RDMA FLUSH operation

Hey folks,

I wanna thank all of you for the kind feedback in my previous RFC.
Recently, i have tried my best to do some updates as per your comments.
Indeed, not all comments have been addressed for some reasons, i still
wish to post this new one to start a new discussion.

Outstanding issues:
- iova_to_addr() without any kmap/kmap_local_page flows might not always
work. # existing issue.
- responder should reply error to requested side when it requests a
persistence placement type to DRAM ?
-------

These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
In IB SPEC 1.5[1][2], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
added in the MEMORY PLACEMENT EXTENSIONS section.

FLUSH is used by the requesting node to achieve guarantees on the data
placement within the memory subsystem of preceding accesses to a
single memory region, such as those performed by RDMA WRITE, Atomics
and ATOMIC WRITE requests.

The operation indicates the virtual address space of a destination node
and where the guarantees should apply. This range must be contiguous
in the virtual space of the memory key but it is not necessarily a
contiguous range of physical memory.

FLUSH packets carry FLUSH extended transport header (see below) to
specify the placement type and the selectivity level of the operation
and RDMA extended header (RETH, see base document RETH definition) to
specify the R_Key VA and Length associated with this request following
the BTH in RC, RDETH in RD and XRCETH in XRC.

RC FLUSH:
+----+------+------+
|BTH | FETH | RETH |
+----+------+------+

RD FLUSH:
+----+------+------+------+
|BTH | RDETH| FETH | RETH |
+----+------+------+------+

XRC FLUSH:
+----+-------+------+------+
|BTH | XRCETH| FETH | RETH |
+----+-------+------+------+

Currently, we introduce RC and RD services only, since XRC has not been
implemented by rxe yet.
NOTE: only RC service is tested now, and since other HCAs have not
added/implemented FLUSH yet, we can only test FLUSH operation in both
SoftRoCE/rxe devices.

The corresponding rdma-core and FLUSH example are available on:
https://github.com/zhijianli88/rdma-core/tree/rfc
Can access the kernel source in:
https://github.com/zhijianli88/linux/tree/rdma-flush

- We introduce is_pmem attribute to MR(memory region)
- We introduce FLUSH placement type attributes to HCA
- We introduce FLUSH access flags that users are able to register with
Below figure shows the valid access flags uses can register with:
+------------------------+------------------+--------------+
| HCA attributes | register access flags |
| and +-----------------+---------------+
| MR attribute(is_pmem) |global visibility | persistence |
|------------------------+------------------+--------------+
| global visibility(DRAM)| O | X |
|------------------------+------------------+--------------+
| global visibility(PMEM)| O | X |
|------------------------+------------------+--------------+
| persistence(DRAM) | X | X |
|------------------------+------------------+--------------+
| persistence(PMEM) | X | O |
+------------------------+------------------+--------------+
O: allow to register such access flag

In order to make placement guarentees, we currently reject requesting a
persistent flush to a non-pmem.
The responder will check the remote requested placement types by checking
the registered access flags.
+------------------------+------------------+--------------+
| | registered flags |
| remote requested types +------------------+--------------+
| |global visibility | persistence |
|------------------------+------------------+--------------+
| global visibility | O | x |
+------------------------+------------------+--------------+
| persistence | X | O |
+------------------------+------------------+--------------+
O: allow to request such placement type

Below list some details about FLUSH transport packet:

A FLUSH message is built upon FLUSH request packet and is responded
successfully by RDMA READ response of zero size.

oA19-2: FLUSH shall be single packet message and shall have no payload.
oA19-5: FLUSH BTH shall hold the Opcode = 0x1C

FLUSH Extended Transport Header(FETH)
+-----+-----------+------------------------+----------------------+
|Bits | 31-6 | 5-4 | 3-0 |
+-----+-----------+------------------------+----------------------+
| | Reserved | Selectivity Level(SEL) | Placement Type(PLT) |
+-----+-----------+------------------------+----------------------+

Selectivity Level (SEL) – defines the memory region scope the FLUSH
should apply on. Values are as follows:
• b’00 - Memory Region Range: FLUSH applies for all preceding memory
updates to the RETH range on this QP. All RETH fields shall be
valid in this selectivity mode. RETH:DMALen field shall be
between zero and (2 31 -1) bytes (inclusive).
• b’01 - Memory Region: FLUSH applies for all preceding memory up-
dates to RETH.R_key on this QP. RETH:DMALen and RETH:VA
shall be ignored in this mode.
• b'10 - Reserved.
• b'11 - Reserved.

Placement Type (PLT) – Defines the memory placement guarantee of
this FLUSH. Multiple bits may be set in this field. Values are as follows:
• Bit 0 if set to '1' indicated that the FLUSH should guarantee Global
Visibility.
• Bit 1 if set to '1' indicated that the FLUSH should guarantee
Persistence.
• Bits 3:2 are reserved

[1]: https://www.infinibandta.org/ibta-specification/ # login required
[2]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx

CC: [email protected]
CC: [email protected]
CC: Jason Gunthorpe <[email protected]>
CC: Zhu Yanjun <[email protected]
CC: Leon Romanovsky <[email protected]>
CC: Bob Pearson <[email protected]>
CC: Mark Bloch <[email protected]>
CC: Wenpeng Liang <[email protected]>
CC: Aharon Landau <[email protected]>
CC: Tom Talpey <[email protected]>
CC: "Gromadzki, Tomasz" <[email protected]>
CC: Dan Williams <[email protected]>
CC: [email protected]
CC: [email protected]

V1:
https://lore.kernel.org/lkml/[email protected]/T/
or https://github.com/zhijianli88/linux/tree/rdma-flush-rfcv1

Changes log
V2:
https://github.com/zhijianli88/linux/tree/rdma-flush
RDMA: mr: Introduce is_pmem
check 1st byte to avoid crossing page boundary
new scheme to check is_pmem # Dan

RDMA: Allow registering MR with flush access flags
combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
split RDMA_FLUSH to 2 capabilities

RDMA/rxe: Allow registering persistent flag for pmem MR only
update commit message, get rid of confusing ib_check_flush_access_flags() # Tom

RDMA/rxe: Implement RC RDMA FLUSH service in requester side
extend flush to include length field. # Tom and Tomasz

RDMA/rxe: Implement flush execution in responder side
adjust start for WHOLE MR level # Tom
don't support DMA mr for flush # Tom
check flush return value

RDMA/rxe: Enable RDMA FLUSH capability for rxe device
adjust patch's order. move it here from [04/10]

Li Zhijian (9):
RDMA: mr: Introduce is_pmem
RDMA: Allow registering MR with flush access flags
RDMA/rxe: Allow registering persistent flag for pmem MR only
RDMA/rxe: Implement RC RDMA FLUSH service in requester side
RDMA/rxe: Set BTH's SE to zero for FLUSH packet
RDMA/rxe: Implement flush execution in responder side
RDMA/rxe: Implement flush completion
RDMA/rxe: Enable RDMA FLUSH capability for rxe device
RDMA/rxe: Add RD FLUSH service support

drivers/infiniband/core/uverbs_cmd.c | 17 +++
drivers/infiniband/sw/rxe/rxe_comp.c | 4 +-
drivers/infiniband/sw/rxe/rxe_hdr.h | 52 +++++++++
drivers/infiniband/sw/rxe/rxe_loc.h | 2 +
drivers/infiniband/sw/rxe/rxe_mr.c | 37 ++++++-
drivers/infiniband/sw/rxe/rxe_opcode.c | 35 +++++++
drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +
drivers/infiniband/sw/rxe/rxe_param.h | 4 +-
drivers/infiniband/sw/rxe/rxe_req.c | 19 +++-
drivers/infiniband/sw/rxe/rxe_resp.c | 133 +++++++++++++++++++++++-
include/rdma/ib_pack.h | 3 +
include/rdma/ib_verbs.h | 30 +++++-
include/uapi/rdma/ib_user_ioctl_verbs.h | 2 +
include/uapi/rdma/ib_user_verbs.h | 19 ++++
include/uapi/rdma/rdma_user_rxe.h | 7 ++
15 files changed, 355 insertions(+), 12 deletions(-)

--
2.31.1




2022-01-25 09:34:37

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v2 3/9] RDMA/rxe: Allow registering persistent flag for pmem MR only

Memory region should support 2 placement types: IB_ACCESS_FLUSH_PERSISTENT
and IB_ACCESS_FLUSH_GLOBAL_VISIBILITY, and only pmem/nvdimm has ability to
persist data(IB_ACCESS_FLUSH_PERSISTENT).

It prevents local user from registering a persistent access flag to
a non-pmem MR.

+------------------------+------------------+--------------+
| HCA attributes | register access flags |
| and +-----------------+---------------+
| MR attribute(is_pmem) |global visibility | persistence |
|------------------------+------------------+--------------+
| global visibility(DRAM)| O | X |
|------------------------+------------------+--------------+
| global visibility(PMEM)| O | X |
|------------------------+------------------+--------------+
| persistence(DRAM) | X | X |
|------------------------+------------------+--------------+
| persistence(PMEM) | X | O |
+------------------------+------------------+--------------+
PMEM: is_pmem is true
DRAM: is_pmem is false
O: allow to register such access flag
X: otherwise

Signed-off-by: Li Zhijian <[email protected]>
---
V2: update commit message, get rid of confusing ib_check_flush_access_flags() # Tom
---
drivers/infiniband/sw/rxe/rxe_mr.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 0427baea8c06..89a3bb4e8b71 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -258,7 +258,15 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
set->offset = ib_umem_offset(umem);

// iova_in_pmem() must be called after set is updated
- mr->ibmr.is_pmem = iova_in_pmem(mr, iova, length);
+ if (iova_in_pmem(mr, iova, length))
+ mr->ibmr.is_pmem = true;
+ else if (access & IB_ACCESS_FLUSH_PERSISTENT) {
+ pr_warn("Cannot register IB_ACCESS_FLUSH_PERSISTENT for non-pmem memory\n");
+ mr->state = RXE_MR_STATE_INVALID;
+ mr->umem = NULL;
+ err = -EINVAL;
+ goto err_release_umem;
+ }

return 0;

--
2.31.1



2022-01-25 09:34:40

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v2 1/9] RDMA: mr: Introduce is_pmem

We can use it to indicate whether the registering mr is associated with
a pmem/nvdimm or not.

Currently, we only update it in rxe driver, for other device/drivers,
they should implement it if needed.

CC: Dan Williams <[email protected]>
Signed-off-by: Li Zhijian <[email protected]>
---
V2: check 1st byte to avoid crossing page boundary
new scheme to check is_pmem # Dan
---
drivers/infiniband/sw/rxe/rxe_mr.c | 25 +++++++++++++++++++++++++
include/rdma/ib_verbs.h | 1 +
2 files changed, 26 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 453ef3c9d535..0427baea8c06 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -161,6 +161,28 @@ void rxe_mr_init_dma(struct rxe_pd *pd, int access, struct rxe_mr *mr)
mr->type = IB_MR_TYPE_DMA;
}

+static bool iova_in_pmem(struct rxe_mr *mr, u64 iova, int length)
+{
+ char *vaddr;
+ int is_pmem;
+
+ /* XXX: Shall me allow length == 0 */
+ if (length == 0) {
+ return false;
+ }
+ /* check the 1st byte only to avoid crossing page boundary */
+ vaddr = iova_to_vaddr(mr, iova, 1);
+ if (!vaddr) {
+ pr_warn("not a valid iova 0x%llx\n", iova);
+ return false;
+ }
+
+ is_pmem = region_intersects(virt_to_phys(vaddr), 1, IORESOURCE_MEM,
+ IORES_DESC_PERSISTENT_MEMORY);
+
+ return is_pmem == REGION_INTERSECTS;
+}
+
int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
int access, struct rxe_mr *mr)
{
@@ -235,6 +257,9 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
set->va = start;
set->offset = ib_umem_offset(umem);

+ // iova_in_pmem() must be called after set is updated
+ mr->ibmr.is_pmem = iova_in_pmem(mr, iova, length);
+
return 0;

err_release_umem:
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 69d883f7fb41..4fa07b123c8d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1807,6 +1807,7 @@ struct ib_mr {
unsigned int page_size;
enum ib_mr_type type;
bool need_inval;
+ bool is_pmem;
union {
struct ib_uobject *uobject; /* user */
struct list_head qp_entry; /* FR */
--
2.31.1



2022-01-25 09:35:32

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v2 7/9] RDMA/rxe: Implement flush completion

Introduce a new IB_UVERBS_WC_FLUSH code to tell userspace a FLUSH
completion.

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_comp.c | 4 +++-
include/rdma/ib_verbs.h | 1 +
include/uapi/rdma/ib_user_verbs.h | 1 +
3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index f363fe3fa414..e5b9d07eba93 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -104,6 +104,7 @@ static enum ib_wc_opcode wr_to_wc_opcode(enum ib_wr_opcode opcode)
case IB_WR_LOCAL_INV: return IB_WC_LOCAL_INV;
case IB_WR_REG_MR: return IB_WC_REG_MR;
case IB_WR_BIND_MW: return IB_WC_BIND_MW;
+ case IB_WR_RDMA_FLUSH: return IB_WC_RDMA_FLUSH;

default:
return 0xff;
@@ -261,7 +262,8 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
*/
case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
if (wqe->wr.opcode != IB_WR_RDMA_READ &&
- wqe->wr.opcode != IB_WR_RDMA_READ_WITH_INV) {
+ wqe->wr.opcode != IB_WR_RDMA_READ_WITH_INV &&
+ wqe->wr.opcode != IB_WR_RDMA_FLUSH) {
wqe->status = IB_WC_FATAL_ERR;
return COMPST_ERROR;
}
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index d8555b6e4eba..5242acb73004 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -965,6 +965,7 @@ const char *__attribute_const__ ib_wc_status_msg(enum ib_wc_status status);
enum ib_wc_opcode {
IB_WC_SEND = IB_UVERBS_WC_SEND,
IB_WC_RDMA_WRITE = IB_UVERBS_WC_RDMA_WRITE,
+ IB_WC_RDMA_FLUSH = IB_UVERBS_WC_FLUSH,
IB_WC_RDMA_READ = IB_UVERBS_WC_RDMA_READ,
IB_WC_COMP_SWAP = IB_UVERBS_WC_COMP_SWAP,
IB_WC_FETCH_ADD = IB_UVERBS_WC_FETCH_ADD,
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index be1f9dca08a8..d43671fef93e 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -476,6 +476,7 @@ enum ib_uverbs_wc_opcode {
IB_UVERBS_WC_BIND_MW = 5,
IB_UVERBS_WC_LOCAL_INV = 6,
IB_UVERBS_WC_TSO = 7,
+ IB_UVERBS_WC_FLUSH = 8,
};

struct ib_uverbs_wc {
--
2.31.1



2022-01-25 09:36:20

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v2 5/9] RDMA/rxe: Set BTH's SE to zero for FLUSH packet

The SPEC says:
oA19-6: FLUSH BTH header field solicited event (SE) indication shall be
set to zero.

Signed-off-by: Li Zhijian <[email protected]>
---
V2: said -> says
---
drivers/infiniband/sw/rxe/rxe_req.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 708138117136..363a33b905bf 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -401,7 +401,9 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
(pkt->mask & RXE_END_MASK) &&
((pkt->mask & (RXE_SEND_MASK)) ||
(pkt->mask & (RXE_WRITE_MASK | RXE_IMMDT_MASK)) ==
- (RXE_WRITE_MASK | RXE_IMMDT_MASK));
+ (RXE_WRITE_MASK | RXE_IMMDT_MASK)) &&
+ /* oA19-6: always set SE to zero */
+ !(pkt->mask & RXE_FETH_MASK);

qp_num = (pkt->mask & RXE_DETH_MASK) ? ibwr->wr.ud.remote_qpn :
qp->attr.dest_qp_num;
--
2.31.1



2022-01-25 09:36:24

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v2 6/9] RDMA/rxe: Implement flush execution in responder side

In contrast to other opcodes, after a series of sanity checking, FLUSH
opcode will do a Placement Type checking before it really do the FLUSH
operation. Responder will also reply NAK "Remote Access Error" if it
found a placement type violation.

The responder will check the remote requested placement types by checking
the registered access flags.
+------------------------+------------------+--------------+
| | registered flags |
| remote requested types +------------------+--------------+
| |global visibility | persistence |
|------------------------+------------------+--------------+
| global visibility | O | x |
+------------------------+------------------+--------------+
| persistence | X | O |
+------------------------+------------------+--------------+
O: allow to request such placement type
X: otherwise

We will persist data via arch_wb_cache_pmem(), which could be
architecture specific.

After the execution, responder would reply a responded successfully by
RDMA READ response of zero size.

Signed-off-by: Li Zhijian <[email protected]>
---
V2:
# from Tom
- adjust start for WHOLE MR level
- don't support DMA mr for flush
- check flush return value
---
drivers/infiniband/sw/rxe/rxe_hdr.h | 28 ++++++
drivers/infiniband/sw/rxe/rxe_loc.h | 2 +
drivers/infiniband/sw/rxe/rxe_mr.c | 4 +-
drivers/infiniband/sw/rxe/rxe_resp.c | 136 +++++++++++++++++++++++++--
include/uapi/rdma/ib_user_verbs.h | 10 ++
5 files changed, 172 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
index e37aa1944b18..cdfd393b8bd8 100644
--- a/drivers/infiniband/sw/rxe/rxe_hdr.h
+++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
@@ -630,6 +630,34 @@ static inline void feth_init(struct rxe_pkt_info *pkt, u32 type, u32 level)
*p = cpu_to_be32(feth);
}

+static inline u32 __feth_plt(void *arg)
+{
+ u32 *fethp = arg;
+ u32 feth = be32_to_cpu(*fethp);
+
+ return (feth & FETH_PLT_MASK) >> FETH_PLT_SHIFT;
+}
+
+static inline u32 __feth_sel(void *arg)
+{
+ u32 *fethp = arg;
+ u32 feth = be32_to_cpu(*fethp);
+
+ return (feth & FETH_SEL_MASK) >> FETH_SEL_SHIFT;
+}
+
+static inline u32 feth_plt(struct rxe_pkt_info *pkt)
+{
+ return __feth_plt(pkt->hdr +
+ rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+}
+
+static inline u32 feth_sel(struct rxe_pkt_info *pkt)
+{
+ return __feth_sel(pkt->hdr +
+ rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+}
+
/******************************************************************************
* Atomic Extended Transport Header
******************************************************************************/
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index b1e174afb1d4..73c39ff11e28 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -80,6 +80,8 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
enum rxe_mr_copy_dir dir);
int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
void *addr, int length, enum rxe_mr_copy_dir dir);
+void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out,
+ size_t *offset_out);
void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length);
struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key,
enum rxe_mr_lookup_type type);
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 89a3bb4e8b71..cd55fcc00e65 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -298,8 +298,8 @@ int rxe_mr_init_fast(struct rxe_pd *pd, int max_pages, struct rxe_mr *mr)
return err;
}

-static void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out,
- size_t *offset_out)
+void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out,
+ size_t *offset_out)
{
struct rxe_map_set *set = mr->cur_map_set;
size_t offset = iova - set->iova + set->offset;
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index e0093fad4e0f..3277a36f506f 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -5,6 +5,7 @@
*/

#include <linux/skbuff.h>
+#include <linux/libnvdimm.h>

#include "rxe.h"
#include "rxe_loc.h"
@@ -19,6 +20,7 @@ enum resp_states {
RESPST_CHK_RESOURCE,
RESPST_CHK_LENGTH,
RESPST_CHK_RKEY,
+ RESPST_CHK_PLT,
RESPST_EXECUTE,
RESPST_READ_REPLY,
RESPST_COMPLETE,
@@ -35,6 +37,7 @@ enum resp_states {
RESPST_ERR_TOO_MANY_RDMA_ATM_REQ,
RESPST_ERR_RNR,
RESPST_ERR_RKEY_VIOLATION,
+ RESPST_ERR_PLT_VIOLATION,
RESPST_ERR_INVALIDATE_RKEY,
RESPST_ERR_LENGTH,
RESPST_ERR_CQ_OVERFLOW,
@@ -53,6 +56,7 @@ static char *resp_state_name[] = {
[RESPST_CHK_RESOURCE] = "CHK_RESOURCE",
[RESPST_CHK_LENGTH] = "CHK_LENGTH",
[RESPST_CHK_RKEY] = "CHK_RKEY",
+ [RESPST_CHK_PLT] = "CHK_PLACEMENT_TYPE",
[RESPST_EXECUTE] = "EXECUTE",
[RESPST_READ_REPLY] = "READ_REPLY",
[RESPST_COMPLETE] = "COMPLETE",
@@ -69,6 +73,7 @@ static char *resp_state_name[] = {
[RESPST_ERR_TOO_MANY_RDMA_ATM_REQ] = "ERR_TOO_MANY_RDMA_ATM_REQ",
[RESPST_ERR_RNR] = "ERR_RNR",
[RESPST_ERR_RKEY_VIOLATION] = "ERR_RKEY_VIOLATION",
+ [RESPST_ERR_PLT_VIOLATION] = "ERR_PLACEMENT_TYPE_VIOLATION",
[RESPST_ERR_INVALIDATE_RKEY] = "ERR_INVALIDATE_RKEY_VIOLATION",
[RESPST_ERR_LENGTH] = "ERR_LENGTH",
[RESPST_ERR_CQ_OVERFLOW] = "ERR_CQ_OVERFLOW",
@@ -400,6 +405,25 @@ static enum resp_states check_length(struct rxe_qp *qp,
}
}

+static enum resp_states check_placement_type(struct rxe_qp *qp,
+ struct rxe_pkt_info *pkt)
+{
+ struct rxe_mr *mr = qp->resp.mr;
+ u32 plt = feth_plt(pkt);
+
+ if ((plt & IB_EXT_PLT_GLB_VIS &&
+ !(mr->access & IB_ACCESS_FLUSH_GLOBAL_VISIBILITY)) ||
+ (plt & IB_EXT_PLT_PERSIST &&
+ !(mr->access & IB_ACCESS_FLUSH_PERSISTENT))) {
+ pr_info("Target MR didn't support this placement type, is_pmem: %s, registered flag: %x, requested flag: %x\n",
+ mr->ibmr.is_pmem ? "true" : "false",
+ (mr->access & IB_ACCESS_FLUSHABLE) >> 8, plt);
+ return RESPST_ERR_PLT_VIOLATION;
+ }
+
+ return RESPST_EXECUTE;
+}
+
static enum resp_states check_rkey(struct rxe_qp *qp,
struct rxe_pkt_info *pkt)
{
@@ -413,7 +437,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
enum resp_states state;
int access;

- if (pkt->mask & RXE_READ_OR_WRITE_MASK) {
+ if (pkt->mask & (RXE_READ_OR_WRITE_MASK | RXE_FLUSH_MASK)) {
if (pkt->mask & RXE_RETH_MASK) {
qp->resp.va = reth_va(pkt);
qp->resp.offset = 0;
@@ -421,8 +445,12 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
qp->resp.resid = reth_len(pkt);
qp->resp.length = reth_len(pkt);
}
- access = (pkt->mask & RXE_READ_MASK) ? IB_ACCESS_REMOTE_READ
- : IB_ACCESS_REMOTE_WRITE;
+ if (pkt->mask & RXE_FLUSH_MASK)
+ access = IB_ACCESS_FLUSHABLE;
+ else if (pkt->mask & RXE_READ_MASK)
+ access = IB_ACCESS_REMOTE_READ;
+ else
+ access = IB_ACCESS_REMOTE_WRITE;
} else if (pkt->mask & RXE_ATOMIC_MASK) {
qp->resp.va = atmeth_va(pkt);
qp->resp.offset = 0;
@@ -434,8 +462,10 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
}

/* A zero-byte op is not required to set an addr or rkey. */
+ /* RXE_FETH_MASK carraies zero-byte playload */
if ((pkt->mask & RXE_READ_OR_WRITE_MASK) &&
(pkt->mask & RXE_RETH_MASK) &&
+ !(pkt->mask & RXE_FETH_MASK) &&
reth_len(pkt) == 0) {
return RESPST_EXECUTE;
}
@@ -503,7 +533,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
WARN_ON_ONCE(qp->resp.mr);

qp->resp.mr = mr;
- return RESPST_EXECUTE;
+ return pkt->mask & RXE_FETH_MASK ? RESPST_CHK_PLT : RESPST_EXECUTE;

err:
if (mr)
@@ -549,6 +579,93 @@ static enum resp_states write_data_in(struct rxe_qp *qp,
return rc;
}

+static int nvdimm_flush_iova(struct rxe_mr *mr, u64 iova, int length)
+{
+ int err;
+ int bytes;
+ u8 *va;
+ struct rxe_map **map;
+ struct rxe_phys_buf *buf;
+ int m;
+ int i;
+ size_t offset;
+
+ if (length == 0)
+ return 0;
+
+ if (mr->type == IB_MR_TYPE_DMA) {
+ err = -EFAULT;
+ goto err1;
+ }
+
+ err = mr_check_range(mr, iova, length);
+ if (err) {
+ err = -EFAULT;
+ goto err1;
+ }
+
+ lookup_iova(mr, iova, &m, &i, &offset);
+
+ map = mr->cur_map_set->map + m;
+ buf = map[0]->buf + i;
+
+ while (length > 0) {
+ va = (u8 *)(uintptr_t)buf->addr + offset;
+ bytes = buf->size - offset;
+
+ if (bytes > length)
+ bytes = length;
+
+ arch_wb_cache_pmem(va, bytes);
+
+ length -= bytes;
+
+ offset = 0;
+ buf++;
+ i++;
+
+ if (i == RXE_BUF_PER_MAP) {
+ i = 0;
+ map++;
+ buf = map[0]->buf;
+ }
+ }
+
+ return 0;
+
+err1:
+ return err;
+}
+
+static enum resp_states process_flush(struct rxe_qp *qp,
+ struct rxe_pkt_info *pkt)
+{
+ u64 length, start;
+ u32 sel = feth_sel(pkt);
+ u32 plt = feth_plt(pkt);
+ struct rxe_mr *mr = qp->resp.mr;
+
+ if (sel == IB_EXT_SEL_MR_RANGE) {
+ start = qp->resp.va;
+ length = qp->resp.length;
+ } else { /* sel == IB_EXT_SEL_MR_WHOLE */
+ start = mr->cur_map_set->iova;
+ length = mr->cur_map_set->length;
+ }
+
+ if (plt & IB_EXT_PLT_PERSIST) {
+ if (nvdimm_flush_iova(mr, start, length))
+ return RESPST_ERR_RKEY_VIOLATION;
+ wmb();
+ } else if (plt & IB_EXT_PLT_GLB_VIS)
+ wmb();
+
+ /* Prepare RDMA READ response of zero */
+ qp->resp.resid = 0;
+
+ return RESPST_READ_REPLY;
+}
+
/* Guarantee atomicity of atomic operations at the machine level. */
static DEFINE_SPINLOCK(atomic_ops_lock);

@@ -801,6 +918,8 @@ static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
err = process_atomic(qp, pkt);
if (err)
return err;
+ } else if (pkt->mask & RXE_FLUSH_MASK) {
+ return process_flush(qp, pkt);
} else {
/* Unreachable */
WARN_ON_ONCE(1);
@@ -1061,7 +1180,7 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
/* SEND. Ack again and cleanup. C9-105. */
send_ack(qp, pkt, AETH_ACK_UNLIMITED, prev_psn);
return RESPST_CLEANUP;
- } else if (pkt->mask & RXE_READ_MASK) {
+ } else if (pkt->mask & RXE_READ_MASK || pkt->mask & RXE_FLUSH_MASK) {
struct resp_res *res;

res = find_resource(qp, pkt->psn);
@@ -1100,7 +1219,7 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
/* Reset the resource, except length. */
res->read.va_org = iova;
res->read.va = iova;
- res->read.resid = resid;
+ res->read.resid = pkt->mask & RXE_FLUSH_MASK ? 0 : resid;

/* Replay the RDMA read reply. */
qp->resp.res = res;
@@ -1247,6 +1366,9 @@ int rxe_responder(void *arg)
case RESPST_CHK_RKEY:
state = check_rkey(qp, pkt);
break;
+ case RESPST_CHK_PLT:
+ state = check_placement_type(qp, pkt);
+ break;
case RESPST_EXECUTE:
state = execute(qp, pkt);
break;
@@ -1301,6 +1423,8 @@ int rxe_responder(void *arg)
break;

case RESPST_ERR_RKEY_VIOLATION:
+ /* oA19-13 8 */
+ case RESPST_ERR_PLT_VIOLATION:
if (qp_type(qp) == IB_QPT_RC) {
/* Class C */
do_class_ac_error(qp, AETH_NAK_REM_ACC_ERR,
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index c4131913ef6a..be1f9dca08a8 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -105,6 +105,16 @@ enum {
IB_USER_VERBS_EX_CMD_MODIFY_CQ
};

+enum ib_ext_placement_type {
+ IB_EXT_PLT_GLB_VIS = 1 << 0,
+ IB_EXT_PLT_PERSIST = 1 << 1,
+};
+
+enum ib_ext_selectivity_level {
+ IB_EXT_SEL_MR_RANGE = 0,
+ IB_EXT_SEL_MR_WHOLE,
+};
+
/*
* Make sure that all structs defined in this file remain laid out so
* that they pack the same way on 32-bit and 64-bit architectures (to
--
2.31.1



2022-01-25 09:37:01

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v2 8/9] RDMA/rxe: Enable RDMA FLUSH capability for rxe device

A19.4.3.1 HCA RESOURCES
This Annex introduces the following new HCA attributes:
• Ability to support Memory Placement Extensions
a) Ability to support FLUSH
i) Ability to support FLUSH with PLT Global Visibility
ii) Ability to support FLUSH with PLT Persistence

Now we are ready to enable RDMA FLUSH capability for RXE.

Signed-off-by: Li Zhijian <[email protected]>
---
V2: adjust patch's order. move it here from [04/10]
update comments, add referring to SPEC
---
drivers/infiniband/sw/rxe/rxe_param.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index 918270e34a35..281e1977b147 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -53,7 +53,9 @@ enum rxe_device_param {
| IB_DEVICE_ALLOW_USER_UNREG
| IB_DEVICE_MEM_WINDOW
| IB_DEVICE_MEM_WINDOW_TYPE_2A
- | IB_DEVICE_MEM_WINDOW_TYPE_2B,
+ | IB_DEVICE_MEM_WINDOW_TYPE_2B
+ | IB_DEVICE_PLT_GLOBAL_VISIBILITY
+ | IB_DEVICE_PLT_PERSISTENT,
RXE_MAX_SGE = 32,
RXE_MAX_WQE_SIZE = sizeof(struct rxe_send_wqe) +
sizeof(struct ib_sge) * RXE_MAX_SGE,
--
2.31.1



2022-01-25 09:37:23

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v2 4/9] RDMA/rxe: Implement RC RDMA FLUSH service in requester side

a RC FLUSH packet consists of:
+----+------+------+
|BTH | FETH | RETH |
+----+------+------+

oA19-2: FLUSH shall be single packet message and shall have no payload.
oA19-5: FLUSH BTH shall hold the Opcode = 0x1C

FLUSH Extended Transport Header(FETH)
+-----+-----------+------------------------+----------------------+
|Bits | 31-6 | 5-4 | 3-0 |
+-----+-----------+------------------------+----------------------+
| | Reserved | Selectivity Level(SEL) | Placement Type(PLT) |
+-----+-----------+------------------------+----------------------+

Selectivity Level (SEL) – defines the memory region scope the FLUSH
should apply on. Values are as follows:
• b’00 - Memory Region Range: FLUSH applies for all preceding memory
updates to the RETH range on this QP. All RETH fields shall be
valid in this selectivity mode. RETH:DMALen field shall be between
zero and (2 31 -1) bytes (inclusive).
• b’01 - Memory Region: FLUSH applies for all preceding memory updates
to RETH.R_key on this QP. RETH:DMALen and RETH:VA shall be
ignored in this mode.
• b'10 - Reserved.
• b'11 - Reserved.

Placement Type (PLT) – Defines the memory placement guarantee of
this FLUSH. Multiple bits may be set in this field. Values are as follows:
• Bit 0 if set to '1' indicated that the FLUSH should guarantee Global
Visibility.
• Bit 1 if set to '1' indicated that the FLUSH should guarantee
Persistence.
• Bits 3:2 are reserved

Signed-off-by: Li Zhijian <[email protected]>
---
V2: extend flush to include length field.
---
drivers/infiniband/core/uverbs_cmd.c | 17 +++++++++++++++++
drivers/infiniband/sw/rxe/rxe_hdr.h | 24 ++++++++++++++++++++++++
drivers/infiniband/sw/rxe/rxe_opcode.c | 15 +++++++++++++++
drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +++
drivers/infiniband/sw/rxe/rxe_req.c | 15 ++++++++++++++-
include/rdma/ib_pack.h | 2 ++
include/rdma/ib_verbs.h | 10 ++++++++++
include/uapi/rdma/ib_user_verbs.h | 8 ++++++++
include/uapi/rdma/rdma_user_rxe.h | 7 +++++++
9 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 6b6393176b3c..632e1747fb60 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2080,6 +2080,23 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
rdma->rkey = user_wr->wr.rdma.rkey;

next = &rdma->wr;
+ } else if (user_wr->opcode == IB_WR_RDMA_FLUSH) {
+ struct ib_flush_wr *flush;
+
+ next_size = sizeof(*flush);
+ flush = alloc_wr(next_size, user_wr->num_sge);
+ if (!flush) {
+ ret = -ENOMEM;
+ goto out_put;
+ }
+
+ flush->remote_addr = user_wr->wr.flush.remote_addr;
+ flush->length = user_wr->wr.flush.length;
+ flush->rkey = user_wr->wr.flush.rkey;
+ flush->type = user_wr->wr.flush.type;
+ flush->level = user_wr->wr.flush.level;
+
+ next = &flush->wr;
} else if (user_wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
user_wr->opcode == IB_WR_ATOMIC_FETCH_AND_ADD) {
struct ib_atomic_wr *atomic;
diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
index e432f9e37795..e37aa1944b18 100644
--- a/drivers/infiniband/sw/rxe/rxe_hdr.h
+++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
@@ -607,6 +607,29 @@ static inline void reth_set_len(struct rxe_pkt_info *pkt, u32 len)
rxe_opcode[pkt->opcode].offset[RXE_RETH], len);
}

+/*
+ * FLUSH Extended Transport Header(FETH)
+ * +-----+-----------+------------------------+----------------------+
+ * |Bits | 31-6 | 5-4 | 3-0 |
+ * +-----+-----------+------------------------+----------------------+
+ * | Reserved | Selectivity Level(SEL) | Placement Type(PLT) |
+ * +-----+-----------+------------------------+----------------------+
+ */
+#define FETH_PLT_SHIFT 0UL
+#define FETH_SEL_SHIFT 4UL
+#define FETH_RESERVED_SHIFT 6UL
+#define FETH_PLT_MASK ((1UL << FETH_SEL_SHIFT) - 1UL)
+#define FETH_SEL_MASK (~FETH_PLT_MASK & ((1UL << FETH_RESERVED_SHIFT) - 1UL))
+
+static inline void feth_init(struct rxe_pkt_info *pkt, u32 type, u32 level)
+{
+ u32 *p = (u32 *)(pkt->hdr + rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+ u32 feth = ((level << FETH_SEL_SHIFT) & FETH_SEL_MASK) |
+ ((type << FETH_PLT_SHIFT) & FETH_PLT_MASK);
+
+ *p = cpu_to_be32(feth);
+}
+
/******************************************************************************
* Atomic Extended Transport Header
******************************************************************************/
@@ -910,6 +933,7 @@ enum rxe_hdr_length {
RXE_ATMETH_BYTES = sizeof(struct rxe_atmeth),
RXE_IETH_BYTES = sizeof(struct rxe_ieth),
RXE_RDETH_BYTES = sizeof(struct rxe_rdeth),
+ RXE_FETH_BYTES = sizeof(u32),
};

static inline size_t header_size(struct rxe_pkt_info *pkt)
diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c
index df596ba7527d..adea6c16dfb5 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.c
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
@@ -316,6 +316,21 @@ struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
RXE_AETH_BYTES,
}
},
+ [IB_OPCODE_RC_RDMA_FLUSH] = {
+ .name = "IB_OPCODE_RC_RDMA_FLUSH",
+ .mask = RXE_FETH_MASK | RXE_RETH_MASK | RXE_FLUSH_MASK |
+ RXE_START_MASK | RXE_END_MASK | RXE_REQ_MASK,
+ .length = RXE_BTH_BYTES + RXE_FETH_BYTES + RXE_RETH_BYTES,
+ .offset = {
+ [RXE_BTH] = 0,
+ [RXE_FETH] = RXE_BTH_BYTES,
+ [RXE_RETH] = RXE_BTH_BYTES +
+ RXE_FETH_BYTES,
+ [RXE_PAYLOAD] = RXE_BTH_BYTES +
+ RXE_FETH_BYTES +
+ RXE_RETH_BYTES,
+ }
+ },
[IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE] = {
.name = "IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE",
.mask = RXE_AETH_MASK | RXE_ATMACK_MASK | RXE_ACK_MASK |
diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.h b/drivers/infiniband/sw/rxe/rxe_opcode.h
index 8f9aaaf260f2..dbc2eca8a92c 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.h
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.h
@@ -48,6 +48,7 @@ enum rxe_hdr_type {
RXE_DETH,
RXE_IMMDT,
RXE_PAYLOAD,
+ RXE_FETH,
NUM_HDR_TYPES
};

@@ -63,6 +64,7 @@ enum rxe_hdr_mask {
RXE_IETH_MASK = BIT(RXE_IETH),
RXE_RDETH_MASK = BIT(RXE_RDETH),
RXE_DETH_MASK = BIT(RXE_DETH),
+ RXE_FETH_MASK = BIT(RXE_FETH),
RXE_PAYLOAD_MASK = BIT(RXE_PAYLOAD),

RXE_REQ_MASK = BIT(NUM_HDR_TYPES + 0),
@@ -80,6 +82,7 @@ enum rxe_hdr_mask {
RXE_END_MASK = BIT(NUM_HDR_TYPES + 10),

RXE_LOOPBACK_MASK = BIT(NUM_HDR_TYPES + 12),
+ RXE_FLUSH_MASK = BIT(NUM_HDR_TYPES + 13),

RXE_READ_OR_ATOMIC_MASK = (RXE_READ_MASK | RXE_ATOMIC_MASK),
RXE_WRITE_OR_SEND_MASK = (RXE_WRITE_MASK | RXE_SEND_MASK),
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 5eb89052dd66..708138117136 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -220,6 +220,9 @@ static int next_opcode_rc(struct rxe_qp *qp, u32 opcode, int fits)
IB_OPCODE_RC_SEND_ONLY_WITH_IMMEDIATE :
IB_OPCODE_RC_SEND_FIRST;

+ case IB_WR_RDMA_FLUSH:
+ return IB_OPCODE_RC_RDMA_FLUSH;
+
case IB_WR_RDMA_READ:
return IB_OPCODE_RC_RDMA_READ_REQUEST;

@@ -413,11 +416,18 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,

/* init optional headers */
if (pkt->mask & RXE_RETH_MASK) {
- reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
+ if (pkt->mask & RXE_FETH_MASK)
+ reth_set_rkey(pkt, ibwr->wr.flush.rkey);
+ else
+ reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
reth_set_va(pkt, wqe->iova);
reth_set_len(pkt, wqe->dma.resid);
}

+ /* Fill Flush Extension Transport Header */
+ if (pkt->mask & RXE_FETH_MASK)
+ feth_init(pkt, ibwr->wr.flush.type, ibwr->wr.flush.level);
+
if (pkt->mask & RXE_IMMDT_MASK)
immdt_set_imm(pkt, ibwr->ex.imm_data);

@@ -477,6 +487,9 @@ static int finish_packet(struct rxe_qp *qp, struct rxe_send_wqe *wqe,

memset(pad, 0, bth_pad(pkt));
}
+ } else if (pkt->mask & RXE_FLUSH_MASK) {
+ // oA19-2: shall have no payload.
+ wqe->dma.resid = 0;
}

return 0;
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index a9162f25beaf..d19edb502de6 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -84,6 +84,7 @@ enum {
/* opcode 0x15 is reserved */
IB_OPCODE_SEND_LAST_WITH_INVALIDATE = 0x16,
IB_OPCODE_SEND_ONLY_WITH_INVALIDATE = 0x17,
+ IB_OPCODE_RDMA_FLUSH = 0x1C,

/* real constants follow -- see comment about above IB_OPCODE()
macro for more details */
@@ -112,6 +113,7 @@ enum {
IB_OPCODE(RC, FETCH_ADD),
IB_OPCODE(RC, SEND_LAST_WITH_INVALIDATE),
IB_OPCODE(RC, SEND_ONLY_WITH_INVALIDATE),
+ IB_OPCODE(RC, RDMA_FLUSH),

/* UC */
IB_OPCODE(UC, SEND_FIRST),
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7f5905180636..d8555b6e4eba 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1299,6 +1299,7 @@ struct ib_qp_attr {
enum ib_wr_opcode {
/* These are shared with userspace */
IB_WR_RDMA_WRITE = IB_UVERBS_WR_RDMA_WRITE,
+ IB_WR_RDMA_FLUSH = IB_UVERBS_WR_RDMA_FLUSH,
IB_WR_RDMA_WRITE_WITH_IMM = IB_UVERBS_WR_RDMA_WRITE_WITH_IMM,
IB_WR_SEND = IB_UVERBS_WR_SEND,
IB_WR_SEND_WITH_IMM = IB_UVERBS_WR_SEND_WITH_IMM,
@@ -1393,6 +1394,15 @@ struct ib_atomic_wr {
u32 rkey;
};

+struct ib_flush_wr {
+ struct ib_send_wr wr;
+ u64 remote_addr;
+ u32 length;
+ u32 rkey;
+ u8 type;
+ u8 level;
+};
+
static inline const struct ib_atomic_wr *atomic_wr(const struct ib_send_wr *wr)
{
return container_of(wr, struct ib_atomic_wr, wr);
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 7ee73a0652f1..c4131913ef6a 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -784,6 +784,7 @@ enum ib_uverbs_wr_opcode {
IB_UVERBS_WR_RDMA_READ_WITH_INV = 11,
IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP = 12,
IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD = 13,
+ IB_UVERBS_WR_RDMA_FLUSH = 14,
/* Review enum ib_wr_opcode before modifying this */
};

@@ -797,6 +798,13 @@ struct ib_uverbs_send_wr {
__u32 invalidate_rkey;
} ex;
union {
+ struct {
+ __aligned_u64 remote_addr;
+ __u32 length;
+ __u32 rkey;
+ __u8 type;
+ __u8 level;
+ } flush;
struct {
__aligned_u64 remote_addr;
__u32 rkey;
diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h
index f09c5c9e3dd5..3de56ed5c24f 100644
--- a/include/uapi/rdma/rdma_user_rxe.h
+++ b/include/uapi/rdma/rdma_user_rxe.h
@@ -82,6 +82,13 @@ struct rxe_send_wr {
__u32 invalidate_rkey;
} ex;
union {
+ struct {
+ __aligned_u64 remote_addr;
+ __u32 length;
+ __u32 rkey;
+ __u8 type;
+ __u8 level;
+ } flush;
struct {
__aligned_u64 remote_addr;
__u32 rkey;
--
2.31.1



2022-01-25 12:59:40

by Li Zhijian

[permalink] [raw]
Subject: [RFC PATCH v2 9/9] RDMA/rxe: Add RD FLUSH service support

Although the SPEC said FLUSH is supported by RC/RD/XRC services, XRC has
not been supported by the rxe.

So XRC FLUSH will not be supported until rxe implements XRC service.

Signed-off-by: Li Zhijian <[email protected]>
---
I have not setup a RD environment to test this protocol
---
drivers/infiniband/sw/rxe/rxe_opcode.c | 20 ++++++++++++++++++++
include/rdma/ib_pack.h | 1 +
2 files changed, 21 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c
index adea6c16dfb5..3d86129558f7 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.c
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
@@ -922,6 +922,26 @@ struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
RXE_RDETH_BYTES,
}
},
+ [IB_OPCODE_RD_RDMA_FLUSH] = {
+ .name = "IB_OPCODE_RD_RDMA_FLUSH",
+ .mask = RXE_RDETH_MASK | RXE_FETH_MASK | RXE_RETH_MASK |
+ RXE_FLUSH_MASK | RXE_START_MASK |
+ RXE_END_MASK | RXE_REQ_MASK,
+ .length = RXE_BTH_BYTES + RXE_FETH_BYTES + RXE_RETH_BYTES,
+ .offset = {
+ [RXE_BTH] = 0,
+ [RXE_RDETH] = RXE_BTH_BYTES,
+ [RXE_FETH] = RXE_BTH_BYTES +
+ RXE_RDETH_BYTES,
+ [RXE_RETH] = RXE_BTH_BYTES +
+ RXE_RDETH_BYTES +
+ RXE_FETH_BYTES,
+ [RXE_PAYLOAD] = RXE_BTH_BYTES +
+ RXE_RDETH_BYTES +
+ RXE_FETH_BYTES +
+ RXE_RETH_BYTES,
+ }
+ },

/* UD */
[IB_OPCODE_UD_SEND_ONLY] = {
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index d19edb502de6..40568a33ead8 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -151,6 +151,7 @@ enum {
IB_OPCODE(RD, ATOMIC_ACKNOWLEDGE),
IB_OPCODE(RD, COMPARE_SWAP),
IB_OPCODE(RD, FETCH_ADD),
+ IB_OPCODE(RD, RDMA_FLUSH),

/* UD */
IB_OPCODE(UD, SEND_ONLY),
--
2.31.1



2022-01-25 14:20:39

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/9] RDMA/rxe: Add RDMA FLUSH operation

On Tue, Jan 25, 2022 at 4:45 PM Li Zhijian <[email protected]> wrote:
>
> Hey folks,
>
> I wanna thank all of you for the kind feedback in my previous RFC.
> Recently, i have tried my best to do some updates as per your comments.
> Indeed, not all comments have been addressed for some reasons, i still
> wish to post this new one to start a new discussion.
>
> Outstanding issues:
> - iova_to_addr() without any kmap/kmap_local_page flows might not always
> work. # existing issue.
> - responder should reply error to requested side when it requests a
> persistence placement type to DRAM ?
> -------
>
> These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
> In IB SPEC 1.5[1][2], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
> added in the MEMORY PLACEMENT EXTENSIONS section.
>
> FLUSH is used by the requesting node to achieve guarantees on the data
> placement within the memory subsystem of preceding accesses to a
> single memory region, such as those performed by RDMA WRITE, Atomics
> and ATOMIC WRITE requests.
>
> The operation indicates the virtual address space of a destination node
> and where the guarantees should apply. This range must be contiguous
> in the virtual space of the memory key but it is not necessarily a
> contiguous range of physical memory.
>
> FLUSH packets carry FLUSH extended transport header (see below) to
> specify the placement type and the selectivity level of the operation
> and RDMA extended header (RETH, see base document RETH definition) to
> specify the R_Key VA and Length associated with this request following
> the BTH in RC, RDETH in RD and XRCETH in XRC.

Thanks. Would you like to add some test cases in the latest rdma-core
about this RDMA FLUSH operation?

Thanks a lot.
Zhu Yanjun

>
> RC FLUSH:
> +----+------+------+
> |BTH | FETH | RETH |
> +----+------+------+
>
> RD FLUSH:
> +----+------+------+------+
> |BTH | RDETH| FETH | RETH |
> +----+------+------+------+
>
> XRC FLUSH:
> +----+-------+------+------+
> |BTH | XRCETH| FETH | RETH |
> +----+-------+------+------+
>
> Currently, we introduce RC and RD services only, since XRC has not been
> implemented by rxe yet.
> NOTE: only RC service is tested now, and since other HCAs have not
> added/implemented FLUSH yet, we can only test FLUSH operation in both
> SoftRoCE/rxe devices.
>
> The corresponding rdma-core and FLUSH example are available on:
> https://github.com/zhijianli88/rdma-core/tree/rfc
> Can access the kernel source in:
> https://github.com/zhijianli88/linux/tree/rdma-flush
>
> - We introduce is_pmem attribute to MR(memory region)
> - We introduce FLUSH placement type attributes to HCA
> - We introduce FLUSH access flags that users are able to register with
> Below figure shows the valid access flags uses can register with:
> +------------------------+------------------+--------------+
> | HCA attributes | register access flags |
> | and +-----------------+---------------+
> | MR attribute(is_pmem) |global visibility | persistence |
> |------------------------+------------------+--------------+
> | global visibility(DRAM)| O | X |
> |------------------------+------------------+--------------+
> | global visibility(PMEM)| O | X |
> |------------------------+------------------+--------------+
> | persistence(DRAM) | X | X |
> |------------------------+------------------+--------------+
> | persistence(PMEM) | X | O |
> +------------------------+------------------+--------------+
> O: allow to register such access flag
>
> In order to make placement guarentees, we currently reject requesting a
> persistent flush to a non-pmem.
> The responder will check the remote requested placement types by checking
> the registered access flags.
> +------------------------+------------------+--------------+
> | | registered flags |
> | remote requested types +------------------+--------------+
> | |global visibility | persistence |
> |------------------------+------------------+--------------+
> | global visibility | O | x |
> +------------------------+------------------+--------------+
> | persistence | X | O |
> +------------------------+------------------+--------------+
> O: allow to request such placement type
>
> Below list some details about FLUSH transport packet:
>
> A FLUSH message is built upon FLUSH request packet and is responded
> successfully by RDMA READ response of zero size.
>
> oA19-2: FLUSH shall be single packet message and shall have no payload.
> oA19-5: FLUSH BTH shall hold the Opcode = 0x1C
>
> FLUSH Extended Transport Header(FETH)
> +-----+-----------+------------------------+----------------------+
> |Bits | 31-6 | 5-4 | 3-0 |
> +-----+-----------+------------------------+----------------------+
> | | Reserved | Selectivity Level(SEL) | Placement Type(PLT) |
> +-----+-----------+------------------------+----------------------+
>
> Selectivity Level (SEL) – defines the memory region scope the FLUSH
> should apply on. Values are as follows:
> • b’00 - Memory Region Range: FLUSH applies for all preceding memory
> updates to the RETH range on this QP. All RETH fields shall be
> valid in this selectivity mode. RETH:DMALen field shall be
> between zero and (2 31 -1) bytes (inclusive).
> • b’01 - Memory Region: FLUSH applies for all preceding memory up-
> dates to RETH.R_key on this QP. RETH:DMALen and RETH:VA
> shall be ignored in this mode.
> • b'10 - Reserved.
> • b'11 - Reserved.
>
> Placement Type (PLT) – Defines the memory placement guarantee of
> this FLUSH. Multiple bits may be set in this field. Values are as follows:
> • Bit 0 if set to '1' indicated that the FLUSH should guarantee Global
> Visibility.
> • Bit 1 if set to '1' indicated that the FLUSH should guarantee
> Persistence.
> • Bits 3:2 are reserved
>
> [1]: https://www.infinibandta.org/ibta-specification/ # login required
> [2]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
>
> CC: [email protected]
> CC: [email protected]
> CC: Jason Gunthorpe <[email protected]>
> CC: Zhu Yanjun <[email protected]
> CC: Leon Romanovsky <[email protected]>
> CC: Bob Pearson <[email protected]>
> CC: Mark Bloch <[email protected]>
> CC: Wenpeng Liang <[email protected]>
> CC: Aharon Landau <[email protected]>
> CC: Tom Talpey <[email protected]>
> CC: "Gromadzki, Tomasz" <[email protected]>
> CC: Dan Williams <[email protected]>
> CC: [email protected]
> CC: [email protected]
>
> V1:
> https://lore.kernel.org/lkml/[email protected]/T/
> or https://github.com/zhijianli88/linux/tree/rdma-flush-rfcv1
>
> Changes log
> V2:
> https://github.com/zhijianli88/linux/tree/rdma-flush
> RDMA: mr: Introduce is_pmem
> check 1st byte to avoid crossing page boundary
> new scheme to check is_pmem # Dan
>
> RDMA: Allow registering MR with flush access flags
> combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
> split RDMA_FLUSH to 2 capabilities
>
> RDMA/rxe: Allow registering persistent flag for pmem MR only
> update commit message, get rid of confusing ib_check_flush_access_flags() # Tom
>
> RDMA/rxe: Implement RC RDMA FLUSH service in requester side
> extend flush to include length field. # Tom and Tomasz
>
> RDMA/rxe: Implement flush execution in responder side
> adjust start for WHOLE MR level # Tom
> don't support DMA mr for flush # Tom
> check flush return value
>
> RDMA/rxe: Enable RDMA FLUSH capability for rxe device
> adjust patch's order. move it here from [04/10]
>
> Li Zhijian (9):
> RDMA: mr: Introduce is_pmem
> RDMA: Allow registering MR with flush access flags
> RDMA/rxe: Allow registering persistent flag for pmem MR only
> RDMA/rxe: Implement RC RDMA FLUSH service in requester side
> RDMA/rxe: Set BTH's SE to zero for FLUSH packet
> RDMA/rxe: Implement flush execution in responder side
> RDMA/rxe: Implement flush completion
> RDMA/rxe: Enable RDMA FLUSH capability for rxe device
> RDMA/rxe: Add RD FLUSH service support
>
> drivers/infiniband/core/uverbs_cmd.c | 17 +++
> drivers/infiniband/sw/rxe/rxe_comp.c | 4 +-
> drivers/infiniband/sw/rxe/rxe_hdr.h | 52 +++++++++
> drivers/infiniband/sw/rxe/rxe_loc.h | 2 +
> drivers/infiniband/sw/rxe/rxe_mr.c | 37 ++++++-
> drivers/infiniband/sw/rxe/rxe_opcode.c | 35 +++++++
> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +
> drivers/infiniband/sw/rxe/rxe_param.h | 4 +-
> drivers/infiniband/sw/rxe/rxe_req.c | 19 +++-
> drivers/infiniband/sw/rxe/rxe_resp.c | 133 +++++++++++++++++++++++-
> include/rdma/ib_pack.h | 3 +
> include/rdma/ib_verbs.h | 30 +++++-
> include/uapi/rdma/ib_user_ioctl_verbs.h | 2 +
> include/uapi/rdma/ib_user_verbs.h | 19 ++++
> include/uapi/rdma/rdma_user_rxe.h | 7 ++
> 15 files changed, 355 insertions(+), 12 deletions(-)
>
> --
> 2.31.1
>
>
>

2022-01-25 14:56:00

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/9] RDMA/rxe: Add RDMA FLUSH operation



On 25/01/2022 16:57, Zhu Yanjun wrote:
> On Tue, Jan 25, 2022 at 4:45 PM Li Zhijian <[email protected]> wrote:
>> Hey folks,
>>
>> I wanna thank all of you for the kind feedback in my previous RFC.
>> Recently, i have tried my best to do some updates as per your comments.
>> Indeed, not all comments have been addressed for some reasons, i still
>> wish to post this new one to start a new discussion.
>>
>> Outstanding issues:
>> - iova_to_addr() without any kmap/kmap_local_page flows might not always
>> work. # existing issue.
>> - responder should reply error to requested side when it requests a
>> persistence placement type to DRAM ?
>> -------
>>
>> These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
>> In IB SPEC 1.5[1][2], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
>> added in the MEMORY PLACEMENT EXTENSIONS section.
>>
>> FLUSH is used by the requesting node to achieve guarantees on the data
>> placement within the memory subsystem of preceding accesses to a
>> single memory region, such as those performed by RDMA WRITE, Atomics
>> and ATOMIC WRITE requests.
>>
>> The operation indicates the virtual address space of a destination node
>> and where the guarantees should apply. This range must be contiguous
>> in the virtual space of the memory key but it is not necessarily a
>> contiguous range of physical memory.
>>
>> FLUSH packets carry FLUSH extended transport header (see below) to
>> specify the placement type and the selectivity level of the operation
>> and RDMA extended header (RETH, see base document RETH definition) to
>> specify the R_Key VA and Length associated with this request following
>> the BTH in RC, RDETH in RD and XRCETH in XRC.
> Thanks. Would you like to add some test cases in the latest rdma-core
> about this RDMA FLUSH operation?

Of course, they are on the way. Actually i had WIP PR to do that:
https://github.com/linux-rdma/rdma-core/pull/1119

But some stuffs cannot start until we have a more stable proposal and APIs.

Thanks
Zhijian

>
> Thanks a lot.
> Zhu Yanjun
>
>> RC FLUSH:
>> +----+------+------+
>> |BTH | FETH | RETH |
>> +----+------+------+
>>
>> RD FLUSH:
>> +----+------+------+------+
>> |BTH | RDETH| FETH | RETH |
>> +----+------+------+------+
>>
>> XRC FLUSH:
>> +----+-------+------+------+
>> |BTH | XRCETH| FETH | RETH |
>> +----+-------+------+------+
>>
>> Currently, we introduce RC and RD services only, since XRC has not been
>> implemented by rxe yet.
>> NOTE: only RC service is tested now, and since other HCAs have not
>> added/implemented FLUSH yet, we can only test FLUSH operation in both
>> SoftRoCE/rxe devices.
>>
>> The corresponding rdma-core and FLUSH example are available on:
>> https://github.com/zhijianli88/rdma-core/tree/rfc
>> Can access the kernel source in:
>> https://github.com/zhijianli88/linux/tree/rdma-flush
>>
>> - We introduce is_pmem attribute to MR(memory region)
>> - We introduce FLUSH placement type attributes to HCA
>> - We introduce FLUSH access flags that users are able to register with
>> Below figure shows the valid access flags uses can register with:
>> +------------------------+------------------+--------------+
>> | HCA attributes | register access flags |
>> | and +-----------------+---------------+
>> | MR attribute(is_pmem) |global visibility | persistence |
>> |------------------------+------------------+--------------+
>> | global visibility(DRAM)| O | X |
>> |------------------------+------------------+--------------+
>> | global visibility(PMEM)| O | X |
>> |------------------------+------------------+--------------+
>> | persistence(DRAM) | X | X |
>> |------------------------+------------------+--------------+
>> | persistence(PMEM) | X | O |
>> +------------------------+------------------+--------------+
>> O: allow to register such access flag
>>
>> In order to make placement guarentees, we currently reject requesting a
>> persistent flush to a non-pmem.
>> The responder will check the remote requested placement types by checking
>> the registered access flags.
>> +------------------------+------------------+--------------+
>> | | registered flags |
>> | remote requested types +------------------+--------------+
>> | |global visibility | persistence |
>> |------------------------+------------------+--------------+
>> | global visibility | O | x |
>> +------------------------+------------------+--------------+
>> | persistence | X | O |
>> +------------------------+------------------+--------------+
>> O: allow to request such placement type
>>
>> Below list some details about FLUSH transport packet:
>>
>> A FLUSH message is built upon FLUSH request packet and is responded
>> successfully by RDMA READ response of zero size.
>>
>> oA19-2: FLUSH shall be single packet message and shall have no payload.
>> oA19-5: FLUSH BTH shall hold the Opcode = 0x1C
>>
>> FLUSH Extended Transport Header(FETH)
>> +-----+-----------+------------------------+----------------------+
>> |Bits | 31-6 | 5-4 | 3-0 |
>> +-----+-----------+------------------------+----------------------+
>> | | Reserved | Selectivity Level(SEL) | Placement Type(PLT) |
>> +-----+-----------+------------------------+----------------------+
>>
>> Selectivity Level (SEL) – defines the memory region scope the FLUSH
>> should apply on. Values are as follows:
>> • b’00 - Memory Region Range: FLUSH applies for all preceding memory
>> updates to the RETH range on this QP. All RETH fields shall be
>> valid in this selectivity mode. RETH:DMALen field shall be
>> between zero and (2 31 -1) bytes (inclusive).
>> • b’01 - Memory Region: FLUSH applies for all preceding memory up-
>> dates to RETH.R_key on this QP. RETH:DMALen and RETH:VA
>> shall be ignored in this mode.
>> • b'10 - Reserved.
>> • b'11 - Reserved.
>>
>> Placement Type (PLT) – Defines the memory placement guarantee of
>> this FLUSH. Multiple bits may be set in this field. Values are as follows:
>> • Bit 0 if set to '1' indicated that the FLUSH should guarantee Global
>> Visibility.
>> • Bit 1 if set to '1' indicated that the FLUSH should guarantee
>> Persistence.
>> • Bits 3:2 are reserved
>>
>> [1]: https://www.infinibandta.org/ibta-specification/ # login required
>> [2]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
>>
>> CC: [email protected]
>> CC: [email protected]
>> CC: Jason Gunthorpe <[email protected]>
>> CC: Zhu Yanjun <[email protected]
>> CC: Leon Romanovsky <[email protected]>
>> CC: Bob Pearson <[email protected]>
>> CC: Mark Bloch <[email protected]>
>> CC: Wenpeng Liang <[email protected]>
>> CC: Aharon Landau <[email protected]>
>> CC: Tom Talpey <[email protected]>
>> CC: "Gromadzki, Tomasz" <[email protected]>
>> CC: Dan Williams <[email protected]>
>> CC: [email protected]
>> CC: [email protected]
>>
>> V1:
>> https://lore.kernel.org/lkml/[email protected]/T/
>> or https://github.com/zhijianli88/linux/tree/rdma-flush-rfcv1
>>
>> Changes log
>> V2:
>> https://github.com/zhijianli88/linux/tree/rdma-flush
>> RDMA: mr: Introduce is_pmem
>> check 1st byte to avoid crossing page boundary
>> new scheme to check is_pmem # Dan
>>
>> RDMA: Allow registering MR with flush access flags
>> combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
>> split RDMA_FLUSH to 2 capabilities
>>
>> RDMA/rxe: Allow registering persistent flag for pmem MR only
>> update commit message, get rid of confusing ib_check_flush_access_flags() # Tom
>>
>> RDMA/rxe: Implement RC RDMA FLUSH service in requester side
>> extend flush to include length field. # Tom and Tomasz
>>
>> RDMA/rxe: Implement flush execution in responder side
>> adjust start for WHOLE MR level # Tom
>> don't support DMA mr for flush # Tom
>> check flush return value
>>
>> RDMA/rxe: Enable RDMA FLUSH capability for rxe device
>> adjust patch's order. move it here from [04/10]
>>
>> Li Zhijian (9):
>> RDMA: mr: Introduce is_pmem
>> RDMA: Allow registering MR with flush access flags
>> RDMA/rxe: Allow registering persistent flag for pmem MR only
>> RDMA/rxe: Implement RC RDMA FLUSH service in requester side
>> RDMA/rxe: Set BTH's SE to zero for FLUSH packet
>> RDMA/rxe: Implement flush execution in responder side
>> RDMA/rxe: Implement flush completion
>> RDMA/rxe: Enable RDMA FLUSH capability for rxe device
>> RDMA/rxe: Add RD FLUSH service support
>>
>> drivers/infiniband/core/uverbs_cmd.c | 17 +++
>> drivers/infiniband/sw/rxe/rxe_comp.c | 4 +-
>> drivers/infiniband/sw/rxe/rxe_hdr.h | 52 +++++++++
>> drivers/infiniband/sw/rxe/rxe_loc.h | 2 +
>> drivers/infiniband/sw/rxe/rxe_mr.c | 37 ++++++-
>> drivers/infiniband/sw/rxe/rxe_opcode.c | 35 +++++++
>> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +
>> drivers/infiniband/sw/rxe/rxe_param.h | 4 +-
>> drivers/infiniband/sw/rxe/rxe_req.c | 19 +++-
>> drivers/infiniband/sw/rxe/rxe_resp.c | 133 +++++++++++++++++++++++-
>> include/rdma/ib_pack.h | 3 +
>> include/rdma/ib_verbs.h | 30 +++++-
>> include/uapi/rdma/ib_user_ioctl_verbs.h | 2 +
>> include/uapi/rdma/ib_user_verbs.h | 19 ++++
>> include/uapi/rdma/rdma_user_rxe.h | 7 ++
>> 15 files changed, 355 insertions(+), 12 deletions(-)
>>
>> --
>> 2.31.1
>>
>>
>>