2023-10-27 05:44:03

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr

mr->page_list only encodes *page without page offset, when
page_size != PAGE_SIZE, we cannot restore the address with a wrong
page_offset.

Note that this patch will break some ULPs that try to register 4K
MR when PAGE_SIZE is not 4K.
SRP and nvme over RXE is known to be impacted.

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index f54042e9aeb2..61a136ea1d91 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
struct rxe_mr *mr = to_rmr(ibmr);
unsigned int page_size = mr_page_size(mr);

+ if (page_size != PAGE_SIZE) {
+ rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n",
+ page_size, PAGE_SIZE);
+ return -EOPNOTSUPP;
+ }
+
mr->nbuf = 0;
mr->page_shift = ilog2(page_size);
mr->page_mask = ~((u64)page_size - 1);
--
2.41.0


2023-10-27 08:27:25

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr

在 2023/10/27 13:41, Li Zhijian 写道:
> mr->page_list only encodes *page without page offset, when
> page_size != PAGE_SIZE, we cannot restore the address with a wrong
> page_offset.
>
> Note that this patch will break some ULPs that try to register 4K
> MR when PAGE_SIZE is not 4K.
> SRP and nvme over RXE is known to be impacted.

When ULP uses folio or compound page, ULP can not work well with RXE
after this commit is applied.

Perhaps removing page_size set in RXE is a good solution because
page_size is set twice, firstly page_size is set in infiniband/core,
secondly it is set in RXE.

When folio or compound page is used in ULP, it is very possible that
page_size in infiniband/core is different from the page_size in RXE.

Not sure what problem this difference will cause.

Zhu Yanjun

>
> Signed-off-by: Li Zhijian <[email protected]>
> ---
> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index f54042e9aeb2..61a136ea1d91 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
> struct rxe_mr *mr = to_rmr(ibmr);
> unsigned int page_size = mr_page_size(mr);
>
> + if (page_size != PAGE_SIZE) {
> + rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n",
> + page_size, PAGE_SIZE);
> + return -EOPNOTSUPP;
> + }
> +
> mr->nbuf = 0;
> mr->page_shift = ilog2(page_size);
> mr->page_mask = ~((u64)page_size - 1);

2023-10-27 21:46:53

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr

On 10/27/23 01:17, Zhu Yanjun wrote:
> When ULP uses folio or compound page, ULP can not work well with RXE
> after this commit is applied.

Isn't it the responsibility of the DMA mapping code to build an sg-list
for folios? Drivers like ib_srp see an sg-list, whether that comes from
a folio or from something else.

Bart.

2023-10-28 02:48:29

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr

在 2023/10/28 5:46, Bart Van Assche 写道:
> On 10/27/23 01:17, Zhu Yanjun wrote:
>> When ULP uses folio or compound page, ULP can not work well with RXE
>> after this commit is applied.
>
> Isn't it the responsibility of the DMA mapping code to build an sg-list
> for folios? Drivers like ib_srp see an sg-list, whether that comes from

A folio is a way of representing a set of physically contiguous base
pages. In current implementations of folio, it seems that sg-list is not
used.

In Folio, some huge pages whose size is not PAGE_SIZE is dma-mapped into
hardware.

So the page size of folio is not equal to PAGE_SIZE. If this commit is
applied, it causes potential risks to the future folio.

I have developed some folio work for some NIC and RDMA drivers. In
Folio, the page size of Folio is possibly not equal to PAGE_SIZE, it is
multiple PAGE_SIZE. And when folio is dma-mapped to HW, the page size is
equal to multiple PAGE_SIZE.

In this case, ULP with folio will not work well with current RXE after
this commit is applied.

Removing page_size in RXE seems a plan for this problem.

Zhu Yanjun


> a folio or from something else.
>
> Bart.

2023-10-28 23:13:50

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr

On 10/27/23 19:48, Zhu Yanjun wrote:
> In this case, ULP with folio will not work well with current RXE after
> this commit is applied.

Why not? RDMA ULPs like the SRP initiator driver use ib_map_mr_sg(). The
latter function calls rxe_map_mr_sg() if the RXE driver is used.
rxe_map_mr_sg() calls ib_sg_to_pages(). ib_sg_to_pages() translates
SG-entries that are larger than a virtual memory page into multiple
entries with size mr->page_size.

Thanks,

Bart.

2023-10-29 03:23:21

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr

在 2023/10/29 7:07, Bart Van Assche 写道:
> On 10/27/23 19:48, Zhu Yanjun wrote:
>> In this case, ULP with folio will not work well with current RXE after
>> this commit is applied.
>
> Why not? RDMA ULPs like the SRP initiator driver use ib_map_mr_sg(). The
> latter function calls rxe_map_mr_sg() if the RXE driver is used.
> rxe_map_mr_sg() calls ib_sg_to_pages(). ib_sg_to_pages() translates
> SG-entries that are larger than a virtual memory page into multiple
> entries with size mr->page_size.

It seems that we are not on the same page. I am thinking that this fix
should also work for the future folio. And your idea is based on the
current implementation.

Perhaps it is not a good time to discuss folio currently. Let me focus
on the current implementation.

In this commit, if the page size is not equal to PAGE_SIZE, it will pop
out warning then exit. So "rxe_info_mr" should be changed to rxe_warning_mr?

In fact, I like to remove the page size assignment in rxe, that is,
partly reverting the commit 325a7eb85199 ("RDMA/rxe: Cleanup page
variables in rxe_mr.c"). But it seems that Jason did not like this.
I no longer insist on this.

I am not sure whether Jason, Leon, Bob and you have better solution to
this problem or not.

If not, this commit can fix this problem if no better solution.

Reviewed-by: Zhu Yanjun <[email protected]>

Thanks,
Zhu Yanjun

>
> Thanks,
>
> Bart.
>

2023-10-30 07:52:08

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr



On 27/10/2023 13:41, Li Zhijian wrote:
> mr->page_list only encodes *page without page offset, when
> page_size != PAGE_SIZE, we cannot restore the address with a wrong
> page_offset.
>
> Note that this patch will break some ULPs that try to register 4K
> MR when PAGE_SIZE is not 4K.
> SRP and nvme over RXE is known to be impacted.
>
> Signed-off-by: Li Zhijian <[email protected]>
> ---
> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index f54042e9aeb2..61a136ea1d91 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
> struct rxe_mr *mr = to_rmr(ibmr);
> unsigned int page_size = mr_page_size(mr);
>
> + if (page_size != PAGE_SIZE) {

It seems this condition is too strict, it should be:
if (!IS_ALIGNED(page_size, PAGE_SIZE))

So that, page_size with (N * PAGE_SIZE) can work as previously.
Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)

Thanks
Zhijian



> + rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n",
> + page_size, PAGE_SIZE);
> + return -EOPNOTSUPP;
> + }
> +
> mr->nbuf = 0;
> mr->page_shift = ilog2(page_size);
> mr->page_mask = ~((u64)page_size - 1);

2023-10-30 08:13:54

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr



On 27/10/2023 16:17, Zhu Yanjun wrote:
> 在 2023/10/27 13:41, Li Zhijian 写道:
>> mr->page_list only encodes *page without page offset, when
>> page_size != PAGE_SIZE, we cannot restore the address with a wrong
>> page_offset.
>>
>> Note that this patch will break some ULPs that try to register 4K
>> MR when PAGE_SIZE is not 4K.
>> SRP and nvme over RXE is known to be impacted.
>
> When ULP uses folio or compound page, ULP can not work well with RXE after this commit is applied.
>
> Perhaps removing page_size set in RXE is a good solution because page_size is set twice, firstly page_size is set in infiniband/core, secondly it is set in RXE.

Does The RXE one mean rxe_mr_init(), I think rxe_reg_user_mr() requires this.

48 static void rxe_mr_init(int access, struct rxe_mr *mr)
49 {
50 u32 key = mr->elem.index << 8 | rxe_get_next_key(-1);
51
52 /* set ibmr->l/rkey and also copy into private l/rkey
53 * for user MRs these will always be the same
54 * for cases where caller 'owns' the key portion
55 * they may be different until REG_MR WQE is executed.
56 */
57 mr->lkey = mr->ibmr.lkey = key;
58 mr->rkey = mr->ibmr.rkey = key;
59
60 mr->access = access;
61 mr->ibmr.page_size = PAGE_SIZE;
62 mr->page_mask = PAGE_MASK;
63 mr->page_shift = PAGE_SHIFT;
64 mr->state = RXE_MR_STATE_INVALID;
65 }


Thanks
Zhijian

>
> When folio or compound page is used in ULP, it is very possible that page_size in infiniband/core is different from the page_size in RXE
>
> Not sure what problem this difference will cause.
>
> Zhu Yanjun
>
>>
>> Signed-off-by: Li Zhijian <[email protected]>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index f54042e9aeb2..61a136ea1d91 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>>       struct rxe_mr *mr = to_rmr(ibmr);
>>       unsigned int page_size = mr_page_size(mr);
>> +    if (page_size != PAGE_SIZE) {
>> +        rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n",
>> +               page_size, PAGE_SIZE);
>> +        return -EOPNOTSUPP;
>> +    }
>> +
>>       mr->nbuf = 0;
>>       mr->page_shift = ilog2(page_size);
>>       mr->page_mask = ~((u64)page_size - 1);
>

2023-10-30 09:44:01

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr

在 2023/10/30 16:13, Zhijian Li (Fujitsu) 写道:
>
>
> On 27/10/2023 16:17, Zhu Yanjun wrote:
>> 在 2023/10/27 13:41, Li Zhijian 写道:
>>> mr->page_list only encodes *page without page offset, when
>>> page_size != PAGE_SIZE, we cannot restore the address with a wrong
>>> page_offset.
>>>
>>> Note that this patch will break some ULPs that try to register 4K
>>> MR when PAGE_SIZE is not 4K.
>>> SRP and nvme over RXE is known to be impacted.
>>
>> When ULP uses folio or compound page, ULP can not work well with RXE after this commit is applied.
>>
>> Perhaps removing page_size set in RXE is a good solution because page_size is set twice, firstly page_size is set in infiniband/core, secondly it is set in RXE.
>
> Does The RXE one mean rxe_mr_init(), I think rxe_reg_user_mr() requires this.

Please read the discussions carefully. This problem has been discussed.

Best Regards,
Zhu Yanjun

>
> 48 static void rxe_mr_init(int access, struct rxe_mr *mr)
> 49 {
> 50 u32 key = mr->elem.index << 8 | rxe_get_next_key(-1);
> 51
> 52 /* set ibmr->l/rkey and also copy into private l/rkey
> 53 * for user MRs these will always be the same
> 54 * for cases where caller 'owns' the key portion
> 55 * they may be different until REG_MR WQE is executed.
> 56 */
> 57 mr->lkey = mr->ibmr.lkey = key;
> 58 mr->rkey = mr->ibmr.rkey = key;
> 59
> 60 mr->access = access;
> 61 mr->ibmr.page_size = PAGE_SIZE;
> 62 mr->page_mask = PAGE_MASK;
> 63 mr->page_shift = PAGE_SHIFT;
> 64 mr->state = RXE_MR_STATE_INVALID;
> 65 }
>
>
> Thanks
> Zhijian
>
>>
>> When folio or compound page is used in ULP, it is very possible that page_size in infiniband/core is different from the page_size in RXE
>>
>> Not sure what problem this difference will cause.
>>
>> Zhu Yanjun
>>
>>>
>>> Signed-off-by: Li Zhijian <[email protected]>
>>> ---
>>>   drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> index f54042e9aeb2..61a136ea1d91 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>>>       struct rxe_mr *mr = to_rmr(ibmr);
>>>       unsigned int page_size = mr_page_size(mr);
>>> +    if (page_size != PAGE_SIZE) {
>>> +        rxe_info_mr(mr, "Unsupported MR with page_size %u, expect %lu\n",
>>> +               page_size, PAGE_SIZE);
>>> +        return -EOPNOTSUPP;
>>> +    }
>>> +
>>>       mr->nbuf = 0;
>>>       mr->page_shift = ilog2(page_size);
>>>       mr->page_mask = ~((u64)page_size - 1);

2023-10-30 12:41:31

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr

On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote:
>
>
> On 27/10/2023 13:41, Li Zhijian wrote:
> > mr->page_list only encodes *page without page offset, when
> > page_size != PAGE_SIZE, we cannot restore the address with a wrong
> > page_offset.
> >
> > Note that this patch will break some ULPs that try to register 4K
> > MR when PAGE_SIZE is not 4K.
> > SRP and nvme over RXE is known to be impacted.
> >
> > Signed-off-by: Li Zhijian <[email protected]>
> > ---
> > drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> > index f54042e9aeb2..61a136ea1d91 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> > @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
> > struct rxe_mr *mr = to_rmr(ibmr);
> > unsigned int page_size = mr_page_size(mr);
> >
> > + if (page_size != PAGE_SIZE) {
>
> It seems this condition is too strict, it should be:
> if (!IS_ALIGNED(page_size, PAGE_SIZE))
>
> So that, page_size with (N * PAGE_SIZE) can work as previously.
> Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)

That makes sense

Jason

2023-10-31 08:53:07

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr

在 2023/10/30 20:40, Jason Gunthorpe 写道:
> On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 27/10/2023 13:41, Li Zhijian wrote:
>>> mr->page_list only encodes *page without page offset, when
>>> page_size != PAGE_SIZE, we cannot restore the address with a wrong
>>> page_offset.
>>>
>>> Note that this patch will break some ULPs that try to register 4K
>>> MR when PAGE_SIZE is not 4K.
>>> SRP and nvme over RXE is known to be impacted.
>>>
>>> Signed-off-by: Li Zhijian <[email protected]>
>>> ---
>>> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
>>> 1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> index f54042e9aeb2..61a136ea1d91 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>>> struct rxe_mr *mr = to_rmr(ibmr);
>>> unsigned int page_size = mr_page_size(mr);
>>>
>>> + if (page_size != PAGE_SIZE) {
>>
>> It seems this condition is too strict, it should be:
>> if (!IS_ALIGNED(page_size, PAGE_SIZE))
>>
>> So that, page_size with (N * PAGE_SIZE) can work as previously.
>> Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)
>
> That makes sense

I read all the discussions very carefully.

Thanks, Greg.

Because RXE only supports PAGE_SIZE, when CONFIG_ARM64_64K_PAGES is
enabled, the PAGE_SIZE is 64K, when CONFIG_ARM64_64K_PAGES is disabled,
PAGE_SIZE is 4K.

But NVMe calls ib_map_mr_sg with a fixed size SZ_4K. When
CONFIG_ARM64_64K_PAGES is enabled, it is still 4K. This is not a problem
in RXE. This problem is in NVMe.

If this problem in NVMe is not fixed, NVMe still can not use RXE to make
tests with blktests when CONFIG_ARM64_64K_PAGES is enabled.

This is not a problem in RXE, it should not be fixed in RXE.

Zhu Yanjun

>
> Jason

2023-10-31 10:00:09

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr



On 30/10/2023 20:40, Jason Gunthorpe wrote:
> On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 27/10/2023 13:41, Li Zhijian wrote:
>>> mr->page_list only encodes *page without page offset, when
>>> page_size != PAGE_SIZE, we cannot restore the address with a wrong
>>> page_offset.
>>>
>>> Note that this patch will break some ULPs that try to register 4K
>>> MR when PAGE_SIZE is not 4K.
>>> SRP and nvme over RXE is known to be impacted.
>>>
>>> Signed-off-by: Li Zhijian <[email protected]>
>>> ---
>>> drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
>>> 1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> index f54042e9aeb2..61a136ea1d91 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>>> struct rxe_mr *mr = to_rmr(ibmr);
>>> unsigned int page_size = mr_page_size(mr);
>>>
>>> + if (page_size != PAGE_SIZE) {
>>
>> It seems this condition is too strict, it should be:
>> if (!IS_ALIGNED(page_size, PAGE_SIZE))
>>

I have to say I retract this conclusion. It still misses something.

To support PAGE_SIZE aligned MR, we have to refactor rxe_map_mr_sg() or rxe_set_page()

Currently, rxe_set_page() will be called in the step of page_size, this doesn't split N*PAGE_SIZE memory into
N *page. So when we restore an iova from xarray, the array index is wrong as well.

So i'm going to refactor rxe_map_mr_sg() to iterate the sgl by myself in rxe_map_mr_sg() like SIW does.
Hope this refactor can help RXE to support SZ_4K when PAGE_SIZE!=4K as well.


Thanks
Zhijian


>> So that, page_size with (N * PAGE_SIZE) can work as previously.
>> Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)
>
> That makes sense
>
> Jason

2023-10-31 13:19:46

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr

On Tue, Oct 31, 2023 at 04:52:23PM +0800, Zhu Yanjun wrote:
> 在 2023/10/30 20:40, Jason Gunthorpe 写道:
> > On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote:
> > >
> > >
> > > On 27/10/2023 13:41, Li Zhijian wrote:
> > > > mr->page_list only encodes *page without page offset, when
> > > > page_size != PAGE_SIZE, we cannot restore the address with a wrong
> > > > page_offset.
> > > >
> > > > Note that this patch will break some ULPs that try to register 4K
> > > > MR when PAGE_SIZE is not 4K.
> > > > SRP and nvme over RXE is known to be impacted.
> > > >
> > > > Signed-off-by: Li Zhijian <[email protected]>
> > > > ---
> > > > drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
> > > > 1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > index f54042e9aeb2..61a136ea1d91 100644
> > > > --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
> > > > struct rxe_mr *mr = to_rmr(ibmr);
> > > > unsigned int page_size = mr_page_size(mr);
> > > > + if (page_size != PAGE_SIZE) {
> > >
> > > It seems this condition is too strict, it should be:
> > > if (!IS_ALIGNED(page_size, PAGE_SIZE))
> > >
> > > So that, page_size with (N * PAGE_SIZE) can work as previously.
> > > Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)
> >
> > That makes sense
>
> I read all the discussions very carefully.
>
> Thanks, Greg.
>
> Because RXE only supports PAGE_SIZE, when CONFIG_ARM64_64K_PAGES is enabled,
> the PAGE_SIZE is 64K, when CONFIG_ARM64_64K_PAGES is disabled, PAGE_SIZE is
> 4K.
>
> But NVMe calls ib_map_mr_sg with a fixed size SZ_4K. When
> CONFIG_ARM64_64K_PAGES is enabled, it is still 4K. This is not a problem in
> RXE. This problem is in NVMe.

Maybe, but no real RDMA devices don't support 4K.

The xarray conversion may need revision to use physical addresses
instead of storing struct pages so it can handle this kind of
segmentation.

Certainly in the mean time it should be rejected.

Jason

2023-11-01 00:59:06

by Greg Sword

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] RDMA/rxe: don't allow registering !PAGE_SIZE mr

On Tue, Oct 31, 2023 at 9:19 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Tue, Oct 31, 2023 at 04:52:23PM +0800, Zhu Yanjun wrote:
> > 在 2023/10/30 20:40, Jason Gunthorpe 写道:
> > > On Mon, Oct 30, 2023 at 07:51:41AM +0000, Zhijian Li (Fujitsu) wrote:
> > > >
> > > >
> > > > On 27/10/2023 13:41, Li Zhijian wrote:
> > > > > mr->page_list only encodes *page without page offset, when
> > > > > page_size != PAGE_SIZE, we cannot restore the address with a wrong
> > > > > page_offset.
> > > > >
> > > > > Note that this patch will break some ULPs that try to register 4K
> > > > > MR when PAGE_SIZE is not 4K.
> > > > > SRP and nvme over RXE is known to be impacted.
> > > > >
> > > > > Signed-off-by: Li Zhijian <[email protected]>
> > > > > ---
> > > > > drivers/infiniband/sw/rxe/rxe_mr.c | 6 ++++++
> > > > > 1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > > index f54042e9aeb2..61a136ea1d91 100644
> > > > > --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> > > > > @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
> > > > > struct rxe_mr *mr = to_rmr(ibmr);
> > > > > unsigned int page_size = mr_page_size(mr);
> > > > > + if (page_size != PAGE_SIZE) {
> > > >
> > > > It seems this condition is too strict, it should be:
> > > > if (!IS_ALIGNED(page_size, PAGE_SIZE))
> > > >
> > > > So that, page_size with (N * PAGE_SIZE) can work as previously.
> > > > Because the offset(mr.iova & page_mask) will get lost only when !IS_ALIGNED(page_size, PAGE_SIZE)
> > >
> > > That makes sense
> >
> > I read all the discussions very carefully.
> >
> > Thanks, Greg.
> >
> > Because RXE only supports PAGE_SIZE, when CONFIG_ARM64_64K_PAGES is enabled,
> > the PAGE_SIZE is 64K, when CONFIG_ARM64_64K_PAGES is disabled, PAGE_SIZE is
> > 4K.
> >
> > But NVMe calls ib_map_mr_sg with a fixed size SZ_4K. When
> > CONFIG_ARM64_64K_PAGES is enabled, it is still 4K. This is not a problem in
> > RXE. This problem is in NVMe.
>
> Maybe, but no real RDMA devices don't support 4K.
>
> The xarray conversion may need revision to use physical addresses
> instead of storing struct pages so it can handle this kind of
> segmentation.

This problem can not be fixed until rxe supports multiple page sizes
including 4K page size.
Now it is not fixed. It is an intermediate way.

>
> Certainly in the mean time it should be rejected.
>
> Jason