2022-09-27 06:19:01

by Li Zhijian

[permalink] [raw]
Subject: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation

Hey folks,

Firstly i want to say thank you to all you guys, especially Bob, who in the
past 1+ month, gave me a lots of idea and inspiration.

With the your help, some changes are make in 5th version, such as:
- new names and new patch split schemem, suggested by Bob
- bugfix: set is_pmem true only if the whole MR is pmem. it's possible the
one MR container both PMEM and DRAM.
- introduce feth structure, instead of u32
- new bugfix to rxe_lookup_mw() and lookup_mr(), see (RDMA/rxe: make sure requested access is a subset of {mr,mw}->access),
with this fix, we remove check_placement_type(), lookup_mr() has done the such check.
- Enable QP attr flushable
These change logs also appear in the patch it belongs to.

These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
added in the MEMORY PLACEMENT EXTENSIONS section.

This patchset makes SoftRoCE support new RDMA FLUSH on RC service.

You can verify the patchset by building and running the rdma_flush example[2].
server:
$ ./rdma_flush_server -s [server_address] -p [port_number]
client:
$ ./rdma_flush_client -s [server_address] -p [port_number]

Corresponding pyverbs and tests(tests.test_qpex.QpExTestCase.test_qp_ex_rc_rdma_flush)
are also added to rdma-core

[1]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
[2]: https://github.com/zhijianli88/rdma-core/tree/rdma-flush-v5

CC: Xiao Yang <[email protected]>
CC: "Gotou, Yasunori" <[email protected]>
CC: Jason Gunthorpe <[email protected]>
CC: Zhu Yanjun <[email protected]>
CC: Leon Romanovsky <[email protected]>
CC: Bob Pearson <[email protected]>
CC: Mark Bloch <[email protected]>
CC: Wenpeng Liang <[email protected]>
CC: Tom Talpey <[email protected]>
CC: "Gromadzki, Tomasz" <[email protected]>
CC: Dan Williams <[email protected]>
CC: [email protected]
CC: [email protected]

Can also access the kernel source in:
https://github.com/zhijianli88/linux/tree/rdma-flush-v5
Changes log
V4:
- rework responder process
- rebase to v5.19+
- remove [7/7]: RDMA/rxe: Add RD FLUSH service support since RD is not really supported

V3:
- Just rebase and commit log and comment updates
- delete patch-1: "RDMA: mr: Introduce is_pmem", which will be combined into "Allow registering persistent flag for pmem MR only"
- delete patch-7

V2:
RDMA: mr: Introduce is_pmem
check 1st byte to avoid crossing page boundary
new scheme to check is_pmem # Dan

RDMA: Allow registering MR with flush access flags
combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
split RDMA_FLUSH to 2 capabilities

RDMA/rxe: Allow registering persistent flag for pmem MR only
update commit message, get rid of confusing ib_check_flush_access_flags() # Tom

RDMA/rxe: Implement RC RDMA FLUSH service in requester side
extend flush to include length field. # Tom and Tomasz

RDMA/rxe: Implement flush execution in responder side
adjust start for WHOLE MR level # Tom
don't support DMA mr for flush # Tom
check flush return value

RDMA/rxe: Enable RDMA FLUSH capability for rxe device
adjust patch's order. move it here from [04/10]

Li Zhijian (11):
RDMA/rxe: make sure requested access is a subset of {mr,mw}->access
RDMA: Extend RDMA user ABI to support flush
RDMA: Extend RDMA kernel verbs ABI to support flush
RDMA/rxe: Extend rxe user ABI to support flush
RDMA/rxe: Allow registering persistent flag for pmem MR only
RDMA/rxe: Extend rxe packet format to support flush
RDMA/rxe: Implement RC RDMA FLUSH service in requester side
RDMA/rxe: Implement flush execution in responder side
RDMA/rxe: Implement flush completion
RDMA/cm: Make QP FLUSHABLE
RDMA/rxe: Enable RDMA FLUSH capability for rxe device

drivers/infiniband/core/cm.c | 3 +-
drivers/infiniband/sw/rxe/rxe_comp.c | 4 +-
drivers/infiniband/sw/rxe/rxe_hdr.h | 47 +++++++
drivers/infiniband/sw/rxe/rxe_loc.h | 1 +
drivers/infiniband/sw/rxe/rxe_mr.c | 81 ++++++++++-
drivers/infiniband/sw/rxe/rxe_mw.c | 3 +-
drivers/infiniband/sw/rxe/rxe_opcode.c | 17 +++
drivers/infiniband/sw/rxe/rxe_opcode.h | 16 ++-
drivers/infiniband/sw/rxe/rxe_param.h | 4 +-
drivers/infiniband/sw/rxe/rxe_req.c | 15 +-
drivers/infiniband/sw/rxe/rxe_resp.c | 180 +++++++++++++++++++++---
drivers/infiniband/sw/rxe/rxe_verbs.h | 6 +
include/rdma/ib_pack.h | 3 +
include/rdma/ib_verbs.h | 20 ++-
include/uapi/rdma/ib_user_ioctl_verbs.h | 2 +
include/uapi/rdma/ib_user_verbs.h | 16 +++
include/uapi/rdma/rdma_user_rxe.h | 7 +
17 files changed, 389 insertions(+), 36 deletions(-)

--
2.31.1


2022-09-27 06:22:59

by Li Zhijian

[permalink] [raw]
Subject: [for-next PATCH v5 10/11] RDMA/cm: Make QP FLUSHABLE

It enables flushable access flag for qp

Signed-off-by: Li Zhijian <[email protected]>
---
V5: new patch, inspired by Bob
---
drivers/infiniband/core/cm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 1f9938a2c475..58837aac980b 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -4096,7 +4096,8 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
if (cm_id_priv->responder_resources)
qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
- IB_ACCESS_REMOTE_ATOMIC;
+ IB_ACCESS_REMOTE_ATOMIC |
+ IB_ACCESS_FLUSHABLE;
qp_attr->pkey_index = cm_id_priv->av.pkey_index;
if (cm_id_priv->av.port)
qp_attr->port_num = cm_id_priv->av.port->port_num;
--
2.31.1

2022-10-28 18:00:28

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation

On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
> Hey folks,
>
> Firstly i want to say thank you to all you guys, especially Bob, who in the
> past 1+ month, gave me a lots of idea and inspiration.
>
> With the your help, some changes are make in 5th version, such as:
> - new names and new patch split schemem, suggested by Bob
> - bugfix: set is_pmem true only if the whole MR is pmem. it's possible the
> one MR container both PMEM and DRAM.
> - introduce feth structure, instead of u32
> - new bugfix to rxe_lookup_mw() and lookup_mr(), see (RDMA/rxe: make sure requested access is a subset of {mr,mw}->access),
> with this fix, we remove check_placement_type(), lookup_mr() has done the such check.
> - Enable QP attr flushable
> These change logs also appear in the patch it belongs to.
>
> These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
> In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
> added in the MEMORY PLACEMENT EXTENSIONS section.

This doesn't apply anymore, I did try to fix it, but it ended up not
compiling, so it is better if you handle it and repost.

Thanks,
Jason

2022-10-28 18:04:57

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation

On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
> Hey folks,
>
> Firstly i want to say thank you to all you guys, especially Bob, who in the
> past 1+ month, gave me a lots of idea and inspiration.

I would like it if someone familiar with rxe could reviewed-by the
protocol parts.

Jason

2022-11-11 03:05:17

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation

在 2022/10/29 1:57, Jason Gunthorpe 写道:
> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>> Hey folks,
>>
>> Firstly i want to say thank you to all you guys, especially Bob, who in the
>> past 1+ month, gave me a lots of idea and inspiration.
>
> I would like it if someone familiar with rxe could reviewed-by the
> protocol parts.

Hi, Jason

I reviewed these patches. I am fine with these patches.

Hi, Zhijian

I noticed the followings:
"
$ ./rdma_flush_server -s [server_address] -p [port_number]
client:
$ ./rdma_flush_client -s [server_address] -p [port_number]
"
Can you merge the server and the client to rdma-core?

Thanks,
Zhu Yanjun

>
> Jason


2022-11-11 05:29:59

by Li Zhijian

[permalink] [raw]
Subject: Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation



On 11/11/2022 10:49, Yanjun Zhu wrote:
> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>> Hey folks,
>>>
>>> Firstly i want to say thank you to all you guys, especially Bob, who
>>> in the
>>> past 1+ month, gave me a lots of idea and inspiration.
>>
>> I would like it if someone familiar with rxe could reviewed-by the
>> protocol parts.
>
> Hi, Jason
>
> I reviewed these patches. I am fine with these patches.
>
> Hi, Zhijian
>
> I noticed the followings:
> "
> $ ./rdma_flush_server -s [server_address] -p [port_number]
> client:
> $ ./rdma_flush_client -s [server_address] -p [port_number]
> "
> Can you merge the server and the client to rdma-core?

Yanjun,

Yes, there was already a draft PR here
https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go
ahead until the kernel's patches are merged.

and i will post a new version these days, would you mind if i add your
"Reviewed-by" in next version ?



>
> Thanks,
> Zhu Yanjun
>
>>
>> Jason
>

2022-11-11 06:08:15

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation

在 2022/11/11 13:10, [email protected] 写道:
>
>
> On 11/11/2022 10:49, Yanjun Zhu wrote:
>> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>>> Hey folks,
>>>>
>>>> Firstly i want to say thank you to all you guys, especially Bob, who
>>>> in the
>>>> past 1+ month, gave me a lots of idea and inspiration.
>>>
>>> I would like it if someone familiar with rxe could reviewed-by the
>>> protocol parts.
>>
>> Hi, Jason
>>
>> I reviewed these patches. I am fine with these patches.
>>
>> Hi, Zhijian
>>
>> I noticed the followings:
>> "
>> $ ./rdma_flush_server -s [server_address] -p [port_number]
>> client:
>> $ ./rdma_flush_client -s [server_address] -p [port_number]
>> "
>> Can you merge the server and the client to rdma-core?
>
> Yanjun,
>
> Yes, there was already a draft PR here
> https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go
> ahead until the kernel's patches are merged.
>
> and i will post a new version these days, would you mind if i add your
> "Reviewed-by" in next version ?

Reviewed-by: Zhu Yanjun <[email protected]>
Thanks.

Another problem, normally rxe should connect to physical ib devices,
such as mlx ib device. That is, one host is rxe, the other host is mlx
ib device. The rdma connection should be created between the 2 hosts.

Do you connect to mlx ib device with this RDMA FLUSH operation?
And what is the test result?

Thanks a lot.
Zhu Yanjun

>
>
>
>>
>> Thanks,
>> Zhu Yanjun
>>
>>>
>>> Jason


2022-11-11 06:19:49

by Li Zhijian

[permalink] [raw]
Subject: Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation



On 11/11/2022 13:52, Yanjun Zhu wrote:
> 在 2022/11/11 13:10, [email protected] 写道:
>>
>>
>> On 11/11/2022 10:49, Yanjun Zhu wrote:
>>> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>>>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>>>> Hey folks,
>>>>>
>>>>> Firstly i want to say thank you to all you guys, especially Bob, who
>>>>> in the
>>>>> past 1+ month, gave me a lots of idea and inspiration.
>>>>
>>>> I would like it if someone familiar with rxe could reviewed-by the
>>>> protocol parts.
>>>
>>> Hi, Jason
>>>
>>> I reviewed these patches. I am fine with these patches.
>>>
>>> Hi, Zhijian
>>>
>>> I noticed the followings:
>>> "
>>> $ ./rdma_flush_server -s [server_address] -p [port_number]
>>> client:
>>> $ ./rdma_flush_client -s [server_address] -p [port_number]
>>> "
>>> Can you merge the server and the client to rdma-core?
>>
>> Yanjun,
>>
>> Yes, there was already a draft PR here
>> https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go
>> ahead until the kernel's patches are merged.
>>
>> and i will post a new version these days, would you mind if i add your
>> "Reviewed-by" in next version ?
>
> Reviewed-by: Zhu Yanjun <[email protected]>
> Thanks.
>
> Another problem, normally rxe should connect to physical ib devices,
> such as mlx ib device. That is, one host is rxe, the other host is mlx
> ib device. The rdma connection should be created between the 2 hosts.

it's fully compatible with old operation.


>
> Do you connect to mlx ib device with this RDMA FLUSH operation?
> And what is the test result?

Yes, i tested it.

After these patches, only RXE device can register *FLUSHABLE* MRs
successfully. If mlx try that, EOPNOSUPP will be returned.

Similarly, Since other hardwares(MLX for example) have not supported
FLUSH operation, EOPNOSUPP will be returned if users try to to that.

In short, for RXE requester, MLX responder will return error for the
request. MLX requester is not able to request a FLUSH operation.

Thanks
Zhijian


>
> Thanks a lot.
> Zhu Yanjun
>
>>
>>
>>
>>>
>>> Thanks,
>>> Zhu Yanjun
>>>
>>>>
>>>> Jason
>

2022-11-11 06:41:04

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation


在 2022/11/11 14:10, [email protected] 写道:
>
> On 11/11/2022 13:52, Yanjun Zhu wrote:
>> 在 2022/11/11 13:10, [email protected] 写道:
>>>
>>> On 11/11/2022 10:49, Yanjun Zhu wrote:
>>>> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>>>>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>>>>> Hey folks,
>>>>>>
>>>>>> Firstly i want to say thank you to all you guys, especially Bob, who
>>>>>> in the
>>>>>> past 1+ month, gave me a lots of idea and inspiration.
>>>>> I would like it if someone familiar with rxe could reviewed-by the
>>>>> protocol parts.
>>>> Hi, Jason
>>>>
>>>> I reviewed these patches. I am fine with these patches.
>>>>
>>>> Hi, Zhijian
>>>>
>>>> I noticed the followings:
>>>> "
>>>> $ ./rdma_flush_server -s [server_address] -p [port_number]
>>>> client:
>>>> $ ./rdma_flush_client -s [server_address] -p [port_number]
>>>> "
>>>> Can you merge the server and the client to rdma-core?
>>> Yanjun,
>>>
>>> Yes, there was already a draft PR here
>>> https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go
>>> ahead until the kernel's patches are merged.
>>>
>>> and i will post a new version these days, would you mind if i add your
>>> "Reviewed-by" in next version ?
>> Reviewed-by: Zhu Yanjun <[email protected]>
>> Thanks.
>>
>> Another problem, normally rxe should connect to physical ib devices,
>> such as mlx ib device. That is, one host is rxe, the other host is mlx
>> ib device. The rdma connection should be created between the 2 hosts.
> it's fully compatible with old operation.
>
>
>> Do you connect to mlx ib device with this RDMA FLUSH operation?
>> And what is the test result?
> Yes, i tested it.
>
> After these patches, only RXE device can register *FLUSHABLE* MRs
> successfully. If mlx try that, EOPNOSUPP will be returned.
>
> Similarly, Since other hardwares(MLX for example) have not supported
> FLUSH operation, EOPNOSUPP will be returned if users try to to that.
>
> In short, for RXE requester, MLX responder will return error for the
> request. MLX requester is not able to request a FLUSH operation.

Thanks. Do you mean that FLUSH operation is only supported in RXE? ^_^

And MLX does not support FLUSH operation currently?

Zhu Yanjun

>
> Thanks
> Zhijian
>
>
>> Thanks a lot.
>> Zhu Yanjun
>>
>>>
>>>
>>>> Thanks,
>>>> Zhu Yanjun
>>>>
>>>>> Jason

2022-11-11 07:25:59

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation


在 2022/11/11 14:38, [email protected] 写道:
>
> On 11/11/2022 14:30, Yanjun Zhu wrote:
>>> After these patches, only RXE device can register *FLUSHABLE* MRs
>>> successfully. If mlx try that, EOPNOSUPP will be returned.
>>>
>>> Similarly, Since other hardwares(MLX for example) have not supported
>>> FLUSH operation, EOPNOSUPP will be returned if users try to to that.
>>>
>>> In short, for RXE requester, MLX responder will return error for the
>>> request. MLX requester is not able to request a FLUSH operation.
>> Thanks. Do you mean that FLUSH operation is only supported in RXE? ^_^
>>
>> And MLX does not support FLUSH operation currently?
> IMO, FLUSH and Atomic Write are newly introduced by IBA spec 1.5
> published in 2021. So hardware/drivers(MLX) should do something to
> support it.

Thanks.

If I got you correctly, FLUSH and Atomic Write is a new feature. And
from the test result, it is not supported by MLX driver currently.

Wait for MLX Engineer for updates about FLUSH and Atomic Write.

IMO, it had better make rxe successfully connect to one physical ib
device with FLUSH and Atomic Write, such as MLX or others.

Zhu Yanjun


2022-11-11 07:28:11

by Li Zhijian

[permalink] [raw]
Subject: Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation



On 11/11/2022 14:30, Yanjun Zhu wrote:
>>
>> After these patches, only RXE device can register *FLUSHABLE* MRs
>> successfully. If mlx try that, EOPNOSUPP will be returned.
>>
>> Similarly, Since other hardwares(MLX for example) have not supported
>> FLUSH operation, EOPNOSUPP will be returned if users try to to that.
>>
>> In short, for RXE requester, MLX responder will return error for the
>> request. MLX requester is not able to request a FLUSH operation.
>
> Thanks. Do you mean that FLUSH operation is only supported in RXE? ^_^
>
> And MLX does not support FLUSH operation currently?

IMO, FLUSH and Atomic Write are newly introduced by IBA spec 1.5
published in 2021. So hardware/drivers(MLX) should do something to
support it.