2023-11-03 09:56:51

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor

I don't collect the Reviewed-by to the patch1-2 this time, since i
think we can make it better.

Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
Almost nothing change from V1.
Patch3-5: cleanups # newly add
Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested

My bad arm64 mechine offten hangs when doing blktests even though i use the
default siw driver.

- nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.

[1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/

Li Zhijian (6):
RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
RDMA/rxe: remove unused rxe_mr.page_shift
RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
page_list
RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
RDMA/rxe: Support PAGE_SIZE aligned MR

drivers/infiniband/sw/rxe/rxe_mr.c | 80 ++++++++++++++++-----------
drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
drivers/infiniband/sw/rxe/rxe_verbs.h | 9 ---
3 files changed, 48 insertions(+), 43 deletions(-)

--
2.41.0


2023-11-03 09:57:00

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: [PATCH RFC V2 2/6] RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE

RXE_PAGE_SIZE_CAP means the MR page size supported by RXE. However
in current RXE implementation, only PAGE_SIZE MR works well.
So change it to PAGE_SIZE only.

ULPs such as SRP calculating the page size according to this attribute
get worked again with this change.

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index d2f57ead78ad..b1cf1e1c0ce1 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -38,7 +38,7 @@ static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
/* default/initial rxe device parameter settings */
enum rxe_device_param {
RXE_MAX_MR_SIZE = -1ull,
- RXE_PAGE_SIZE_CAP = 0xfffff000,
+ RXE_PAGE_SIZE_CAP = PAGE_SIZE,
RXE_MAX_QP_WR = DEFAULT_MAX_VALUE,
RXE_DEVICE_CAP_FLAGS = IB_DEVICE_BAD_PKEY_CNTR
| IB_DEVICE_BAD_QKEY_CNTR
--
2.41.0

2023-11-03 09:57:37

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: [PATCH RFC V2 5/6] RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}

This 2 elements were believed to be designed for extracting address
from the page_list before. But now we use PAGE_SIZE and PAGE_SHIFT
directly, so we can drop it.

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_mr.c | 4 ----
drivers/infiniband/sw/rxe/rxe_verbs.h | 3 ---
2 files changed, 7 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index d39c02f0c51e..a038133e1322 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -59,8 +59,6 @@ static void rxe_mr_init(int access, struct rxe_mr *mr)

mr->access = access;
mr->ibmr.page_size = PAGE_SIZE;
- mr->page_mask = PAGE_MASK;
- mr->page_shift = PAGE_SHIFT;
mr->state = RXE_MR_STATE_INVALID;
}

@@ -230,8 +228,6 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
}

mr->nbuf = 0;
- mr->page_shift = PAGE_SHIFT;
- mr->page_mask = PAGE_MASK;

return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page);
}
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index ccc75f8c0985..ef813560b0ab 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -309,9 +309,6 @@ struct rxe_mr {
int access;
atomic_t num_mw;

- unsigned int page_shift;
- u64 page_mask;
-
u32 num_buf;
u32 nbuf;

--
2.41.0

2023-11-03 09:57:46

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: [PATCH RFC V2 3/6] RDMA/rxe: remove unused rxe_mr.page_shift

it's assigned but never used.

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_mr.c | 1 -
drivers/infiniband/sw/rxe/rxe_verbs.h | 1 -
2 files changed, 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 3755e530e6dc..bbfedcd8d2cb 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -243,7 +243,6 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
mr->nbuf = 0;
mr->page_shift = ilog2(page_size);
mr->page_mask = ~((u64)page_size - 1);
- mr->page_offset = mr->ibmr.iova & (page_size - 1);

return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page);
}
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index ccb9d19ffe8a..11647e976282 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -309,7 +309,6 @@ struct rxe_mr {
int access;
atomic_t num_mw;

- unsigned int page_offset;
unsigned int page_shift;
u64 page_mask;

--
2.41.0

2023-11-03 09:58:12

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: [PATCH RFC V2 4/6] RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from page_list

As we said in previous commit, page_list only stores PAGE_SIZE page, so
when we extract an address from the page_list, we should use PAGE_SIZE
and PAGE_SHIFT instead of the ibmr.page_size.

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_mr.c | 42 +++++++++------------------
drivers/infiniband/sw/rxe/rxe_verbs.h | 5 ----
2 files changed, 14 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index bbfedcd8d2cb..d39c02f0c51e 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -72,16 +72,6 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr)
mr->ibmr.type = IB_MR_TYPE_DMA;
}

-static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova)
-{
- return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift);
-}
-
-static unsigned long rxe_mr_iova_to_page_offset(struct rxe_mr *mr, u64 iova)
-{
- return iova & (mr_page_size(mr) - 1);
-}
-
static bool is_pmem_page(struct page *pg)
{
unsigned long paddr = page_to_phys(pg);
@@ -232,17 +222,16 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
int sg_nents, unsigned int *sg_offset)
{
struct rxe_mr *mr = to_rmr(ibmr);
- unsigned int page_size = mr_page_size(mr);

- if (page_size != PAGE_SIZE) {
+ if (ibmr->page_size != PAGE_SIZE) {
rxe_err_mr(mr, "Unsupport mr page size %x, expect PAGE_SIZE(%lx)\n",
- page_size, PAGE_SIZE);
+ ibmr->page_size, PAGE_SIZE);
return -EINVAL;
}

mr->nbuf = 0;
- mr->page_shift = ilog2(page_size);
- mr->page_mask = ~((u64)page_size - 1);
+ mr->page_shift = PAGE_SHIFT;
+ mr->page_mask = PAGE_MASK;

return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page);
}
@@ -250,8 +239,8 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
unsigned int length, enum rxe_mr_copy_dir dir)
{
- unsigned int page_offset = rxe_mr_iova_to_page_offset(mr, iova);
- unsigned long index = rxe_mr_iova_to_index(mr, iova);
+ unsigned int page_offset = iova & (PAGE_SIZE - 1);
+ unsigned long index = (iova - mr->ibmr.iova) >> PAGE_SHIFT;
unsigned int bytes;
struct page *page;
void *va;
@@ -261,8 +250,7 @@ static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
if (!page)
return -EFAULT;

- bytes = min_t(unsigned int, length,
- mr_page_size(mr) - page_offset);
+ bytes = min_t(unsigned int, length, PAGE_SIZE - page_offset);
va = kmap_local_page(page);
if (dir == RXE_FROM_MR_OBJ)
memcpy(addr, va + page_offset, bytes);
@@ -450,14 +438,12 @@ int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, unsigned int length)
return err;

while (length > 0) {
- index = rxe_mr_iova_to_index(mr, iova);
+ index = (iova - mr->ibmr.iova) >> PAGE_SHIFT;
page = xa_load(&mr->page_list, index);
- page_offset = rxe_mr_iova_to_page_offset(mr, iova);
+ page_offset = iova & (PAGE_SIZE - 1);
if (!page)
return -EFAULT;
- bytes = min_t(unsigned int, length,
- mr_page_size(mr) - page_offset);
-
+ bytes = min_t(unsigned int, length, PAGE_SIZE - page_offset);
va = kmap_local_page(page);
arch_wb_cache_pmem(va + page_offset, bytes);
kunmap_local(va);
@@ -498,8 +484,8 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
rxe_dbg_mr(mr, "iova out of range");
return RESPST_ERR_RKEY_VIOLATION;
}
- page_offset = rxe_mr_iova_to_page_offset(mr, iova);
- index = rxe_mr_iova_to_index(mr, iova);
+ page_offset = iova & (PAGE_SIZE - 1);
+ index = (iova - mr->ibmr.iova) >> PAGE_SHIFT;
page = xa_load(&mr->page_list, index);
if (!page)
return RESPST_ERR_RKEY_VIOLATION;
@@ -556,8 +542,8 @@ int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value)
rxe_dbg_mr(mr, "iova out of range");
return RESPST_ERR_RKEY_VIOLATION;
}
- page_offset = rxe_mr_iova_to_page_offset(mr, iova);
- index = rxe_mr_iova_to_index(mr, iova);
+ page_offset = iova & (PAGE_SIZE - 1);
+ index = (iova - mr->ibmr.iova) >> PAGE_SHIFT;
page = xa_load(&mr->page_list, index);
if (!page)
return RESPST_ERR_RKEY_VIOLATION;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 11647e976282..ccc75f8c0985 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -318,11 +318,6 @@ struct rxe_mr {
struct xarray page_list;
};

-static inline unsigned int mr_page_size(struct rxe_mr *mr)
-{
- return mr ? mr->ibmr.page_size : PAGE_SIZE;
-}
-
enum rxe_mw_state {
RXE_MW_STATE_INVALID = RXE_MR_STATE_INVALID,
RXE_MW_STATE_FREE = RXE_MR_STATE_FREE,
--
2.41.0

2023-11-03 09:58:40

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: [PATCH RFC V2 6/6] RDMA/rxe: Support PAGE_SIZE aligned MR

In order to support PAGE_SIZE aligned MR, rxe_map_mr_sg() should be able
to split a large buffer to N * page entry into the xarray page_list.

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_mr.c | 39 +++++++++++++++++++++++++-----
1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index a038133e1322..3761740af986 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -193,9 +193,8 @@ int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr)
return err;
}

-static int rxe_set_page(struct ib_mr *ibmr, u64 dma_addr)
+static int rxe_store_page(struct rxe_mr *mr, u64 dma_addr)
{
- struct rxe_mr *mr = to_rmr(ibmr);
struct page *page = ib_virt_dma_to_page(dma_addr);
bool persistent = !!(mr->access & IB_ACCESS_FLUSH_PERSISTENT);
int err;
@@ -216,20 +215,48 @@ static int rxe_set_page(struct ib_mr *ibmr, u64 dma_addr)
return 0;
}

+static int rxe_set_page(struct ib_mr *base_mr, u64 buf_addr)
+{
+ return 0;
+}
+
int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
- int sg_nents, unsigned int *sg_offset)
+ int sg_nents, unsigned int *sg_offset_p)
{
struct rxe_mr *mr = to_rmr(ibmr);
+ struct scatterlist *sg;
+ unsigned int sg_offset = sg_offset_p ? *sg_offset_p : 0;
+ int i;

- if (ibmr->page_size != PAGE_SIZE) {
- rxe_err_mr(mr, "Unsupport mr page size %x, expect PAGE_SIZE(%lx)\n",
+ if (!IS_ALIGNED(ibmr->page_size, PAGE_SIZE)) {
+ rxe_err_mr(mr, "Misaligned page size %x, expect PAGE_SIZE(%lx) aligned\n",
ibmr->page_size, PAGE_SIZE);
return -EINVAL;
}

mr->nbuf = 0;

- return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page);
+ for_each_sg(sgl, sg, sg_nents, i) {
+ u64 dma_addr = sg_dma_address(sg) + sg_offset;
+ unsigned int dma_len = sg_dma_len(sg) - sg_offset;
+ u64 end_dma_addr = dma_addr + dma_len;
+ u64 page_addr = dma_addr & PAGE_MASK;
+
+ if (sg_dma_len(sg) == 0) {
+ rxe_dbg_mr(mr, "empty SGE\n");
+ return -EINVAL;
+ }
+ do {
+ int ret = rxe_store_page(mr, page_addr);
+ if (ret)
+ return ret;
+
+ page_addr += PAGE_SIZE;
+ } while (page_addr < end_dma_addr);
+ sg_offset = 0;
+ }
+
+ return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset_p, rxe_set_page);
}

static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
--
2.41.0

2023-11-03 10:18:02

by Greg Sword

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor

On Fri, Nov 3, 2023 at 5:58 PM Li Zhijian <[email protected]> wrote:
>
> I don't collect the Reviewed-by to the patch1-2 this time, since i
> think we can make it better.
>
> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
> Almost nothing change from V1.
> Patch3-5: cleanups # newly add
> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested

Do some work. Do not use these rubbish patch to waste our time.

>
> My bad arm64 mechine offten hangs when doing blktests even though i use the
> default siw driver.
>
> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
>
> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>
> Li Zhijian (6):
> RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
> RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
> RDMA/rxe: remove unused rxe_mr.page_shift
> RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
> page_list
> RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
> RDMA/rxe: Support PAGE_SIZE aligned MR
>
> drivers/infiniband/sw/rxe/rxe_mr.c | 80 ++++++++++++++++-----------
> drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
> drivers/infiniband/sw/rxe/rxe_verbs.h | 9 ---
> 3 files changed, 48 insertions(+), 43 deletions(-)
>
> --
> 2.41.0
>

2023-11-03 13:02:33

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor

在 2023/11/3 17:55, Li Zhijian 写道:
> I don't collect the Reviewed-by to the patch1-2 this time, since i
> think we can make it better.
>
> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
> Almost nothing change from V1.
> Patch3-5: cleanups # newly add
> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
>
> My bad arm64 mechine offten hangs when doing blktests even though i use the
> default siw driver.
>
> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.

Zhijian

Please read carefully the whole discussion about this problem. You will
find a lot of valuable suggestions, especially suggestions from Jason.

From the whole discussion, it seems that the root cause is very clear.
We need to fix this prolem. Please do not send this kind of commits again.

Zhu Yanjun

>
> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>
> Li Zhijian (6):
> RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
> RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
> RDMA/rxe: remove unused rxe_mr.page_shift
> RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
> page_list
> RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
> RDMA/rxe: Support PAGE_SIZE aligned MR
>
> drivers/infiniband/sw/rxe/rxe_mr.c | 80 ++++++++++++++++-----------
> drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
> drivers/infiniband/sw/rxe/rxe_verbs.h | 9 ---
> 3 files changed, 48 insertions(+), 43 deletions(-)
>

2023-11-03 15:05:06

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH RFC V2 6/6] RDMA/rxe: Support PAGE_SIZE aligned MR


On 11/3/23 02:55, Li Zhijian wrote:
> - return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page);
> + for_each_sg(sgl, sg, sg_nents, i) {
> + u64 dma_addr = sg_dma_address(sg) + sg_offset;
> + unsigned int dma_len = sg_dma_len(sg) - sg_offset;
> + u64 end_dma_addr = dma_addr + dma_len;
> + u64 page_addr = dma_addr & PAGE_MASK;
> +
> + if (sg_dma_len(sg) == 0) {
> + rxe_dbg_mr(mr, "empty SGE\n");
> + return -EINVAL;
> + }
> + do {
> + int ret = rxe_store_page(mr, page_addr);
> + if (ret)
> + return ret;
> +
> + page_addr += PAGE_SIZE;
> + } while (page_addr < end_dma_addr);
> + sg_offset = 0;
> + }
> +
> + return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset_p, rxe_set_page);
> }

Is this change necessary? There is already a loop in ib_sg_to_pages()
that splits SG entries that are larger than mr->page_size into entries
with size mr->page_size.

Bart.

2023-11-03 18:01:13

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH RFC V2 4/6] RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from page_list

On Fri, Nov 03, 2023 at 05:55:47PM +0800, Li Zhijian wrote:
> As we said in previous commit, page_list only stores PAGE_SIZE page, so
> when we extract an address from the page_list, we should use PAGE_SIZE
> and PAGE_SHIFT instead of the ibmr.page_size.

The concept was that the xarray could store anything larger than
PAGE_SIZE and the entry would point at the first struct page of the
contiguous chunk

That looks like it is right, or at least close to right, so lets try
to keep it

Jason

2023-11-06 03:07:26

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [PATCH RFC V2 6/6] RDMA/rxe: Support PAGE_SIZE aligned MR



On 03/11/2023 23:04, Bart Van Assche wrote:
>
> On 11/3/23 02:55, Li Zhijian wrote:
>> -    return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page);
>> +    for_each_sg(sgl, sg, sg_nents, i) {
>> +        u64 dma_addr = sg_dma_address(sg) + sg_offset;
>> +        unsigned int dma_len = sg_dma_len(sg) - sg_offset;
>> +        u64 end_dma_addr = dma_addr + dma_len;
>> +        u64 page_addr = dma_addr & PAGE_MASK;
>> +
>> +        if (sg_dma_len(sg) == 0) {
>> +            rxe_dbg_mr(mr, "empty SGE\n");
>> +            return -EINVAL;
>> +        }
>> +        do {
>> +            int ret = rxe_store_page(mr, page_addr);
>> +            if (ret)
>> +                return ret;
>> +
>> +            page_addr += PAGE_SIZE;
>> +        } while (page_addr < end_dma_addr);
>> +        sg_offset = 0;
>> +    }
>> +
>> +    return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset_p, rxe_set_page);
>>   }
>
> Is this change necessary?

There is already a loop in ib_sg_to_pages()
> that splits SG entries that are larger than mr->page_size into entries
> with size mr->page_size.

I see.

My thought was that we are only able to safely access PAGE_SIZE memory scope [page_va, page_va + PAGE_SIZE)
from the return of kmap_local_page(page).
However when mr->page_size is larger than PAGE_SIZE, we may access the next pages without mapping it.

Thanks
Zhijian

2023-11-06 03:47:23

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor



On 03/11/2023 18:17, Greg Sword wrote:
> On Fri, Nov 3, 2023 at 5:58 PM Li Zhijian <[email protected]> wrote:
>>
>> I don't collect the Reviewed-by to the patch1-2 this time, since i
>> think we can make it better.
>>
>> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
>> Almost nothing change from V1.
>> Patch3-5: cleanups # newly add
>> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
>
> Do some work. Do not use these rubbish patch to waste our time.

So sorry about this. Of course, any other proposals are welcomed.




>
>>
>> My bad arm64 mechine offten hangs when doing blktests even though i use the
>> default siw driver.
>>
>> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
>>
>> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>>
>> Li Zhijian (6):
>> RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
>> RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
>> RDMA/rxe: remove unused rxe_mr.page_shift
>> RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
>> page_list
>> RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
>> RDMA/rxe: Support PAGE_SIZE aligned MR
>>
>> drivers/infiniband/sw/rxe/rxe_mr.c | 80 ++++++++++++++++-----------
>> drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
>> drivers/infiniband/sw/rxe/rxe_verbs.h | 9 ---
>> 3 files changed, 48 insertions(+), 43 deletions(-)
>>
>> --
>> 2.41.0
>>

2023-11-06 04:07:47

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor



On 03/11/2023 21:00, Zhu Yanjun wrote:
> 在 2023/11/3 17:55, Li Zhijian 写道:
>> I don't collect the Reviewed-by to the patch1-2 this time, since i
>> think we can make it better.
>>
>> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
>>            Almost nothing change from V1.
>> Patch3-5: cleanups # newly add
>> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
>>
>> My bad arm64 mechine offten hangs when doing blktests even though i use the
>> default siw driver.
>>
>> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
>
> Zhijian
>
> Please read carefully the whole discussion about this problem. You will find a lot of valuable suggestions, especially suggestions from Jason.

Okay, i will read it again. If you can tell me which thread, that would be better.


>
> From the whole discussion, it seems that the root cause is very clear.
> We need to fix this prolem. Please do not send this kind of commits again.
>

Let's think about what's our goal first.

- 1) Fix the panic[1] and only support PAGE_SIZE MR
- 2) support PAGE_SIZE aligned MR
- 3) support any page_size MR.

I'm sorry i'm not familiar with the linux MM subsystem. It seem it's safe/correct to access
address/memory across pages start from the return of kmap_loca_page(page).
In other words, 2) is already native supported, right?

I get totally confused now.



> Zhu Yanjun
>
>>
>> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>>
>> Li Zhijian (6):
>>    RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
>>    RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
>>    RDMA/rxe: remove unused rxe_mr.page_shift
>>    RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
>>      page_list
>>    RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
>>    RDMA/rxe: Support PAGE_SIZE aligned MR
>>
>>   drivers/infiniband/sw/rxe/rxe_mr.c    | 80 ++++++++++++++++-----------
>>   drivers/infiniband/sw/rxe/rxe_param.h |  2 +-
>>   drivers/infiniband/sw/rxe/rxe_verbs.h |  9 ---
>>   3 files changed, 48 insertions(+), 43 deletions(-)
>>
>

2023-11-06 08:01:07

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor



Very thanks for all your feedback.

On 03/11/2023 17:55, Li Zhijian wrote:
> I don't collect the Reviewed-by to the patch1-2 this time, since i
> think we can make it better.
>
> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
> Almost nothing change from V1.

Quote from Jason:
"
> The concept was that the xarray could store anything larger than
> PAGE_SIZE and the entry would point at the first struct page of the
> contiguous chunk
>
> That looks like it is right, or at least close to right, so lets try
> to keep it
"


It seems it's okay to access address/memory across pages on RXE even though
we only map the first page.

That also means PAGE_SIZE aligned MR is already supported, so only check
`if (IS_ALIGNED(page_size, PAGE_SIZE))` is sufficient, right?

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index f54042e9aeb2..3755e530e6dc 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
struct rxe_mr *mr = to_rmr(ibmr);
unsigned int page_size = mr_page_size(mr);

+ if (!IS_ALIGNED(page_size, PAGE_SIZE)) {
+ rxe_err_mr(mr, "FIXME...\n")
+ return -EINVAL;
+ }
+
mr->nbuf = 0;
mr->page_shift = ilog2(page_size);
mr->page_mask = ~((u64)page_size - 1);
diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index d2f57ead78ad..b1cf1e1c0ce1 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -38,7 +38,7 @@ static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
/* default/initial rxe device parameter settings */
enum rxe_device_param {
RXE_MAX_MR_SIZE = -1ull,
- RXE_PAGE_SIZE_CAP = 0xfffff000,
+ RXE_PAGE_SIZE_CAP = 0xffffffff - (PAGE_SIZE - 1),
RXE_MAX_QP_WR = DEFAULT_MAX_VALUE,
RXE_DEVICE_CAP_FLAGS = IB_DEVICE_BAD_PKEY_CNTR
| IB_DEVICE_BAD_QKEY_CNTR


* minor cleanup will be done after this.

Thanks
Zhijian

> Patch3-5: cleanups # newly add
> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
>
> My bad arm64 mechine offten hangs when doing blktests even though i use the
> default siw driver.
>
> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
>
> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>
> Li Zhijian (6):
> RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
> RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
> RDMA/rxe: remove unused rxe_mr.page_shift
> RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
> page_list
> RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
> RDMA/rxe: Support PAGE_SIZE aligned MR
>
> drivers/infiniband/sw/rxe/rxe_mr.c | 80 ++++++++++++++++-----------
> drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
> drivers/infiniband/sw/rxe/rxe_verbs.h | 9 ---
> 3 files changed, 48 insertions(+), 43 deletions(-)
>

2023-11-06 09:35:35

by Greg Sword

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor

On Mon, Nov 6, 2023 at 4:01 PM Zhijian Li (Fujitsu)
<[email protected]> wrote:
>
>
>
> Very thanks for all your feedback.
>
> On 03/11/2023 17:55, Li Zhijian wrote:
> > I don't collect the Reviewed-by to the patch1-2 this time, since i
> > think we can make it better.
> >
> > Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
> > Almost nothing change from V1.
>
> Quote from Jason:
> "
> > The concept was that the xarray could store anything larger than
> > PAGE_SIZE and the entry would point at the first struct page of the
> > contiguous chunk
> >
> > That looks like it is right, or at least close to right, so lets try
> > to keep it
> "
>
>
> It seems it's okay to access address/memory across pages on RXE even though
> we only map the first page.

Do you really make tests in your test environment? Do you have test environment?
Do you really reproduce this problem in your test environment?
Your patches do not work actually. Please do not send these rubbish patches out.

>
> That also means PAGE_SIZE aligned MR is already supported, so only check
> `if (IS_ALIGNED(page_size, PAGE_SIZE))` is sufficient, right?
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index f54042e9aeb2..3755e530e6dc 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
> struct rxe_mr *mr = to_rmr(ibmr);
> unsigned int page_size = mr_page_size(mr);
>
> + if (!IS_ALIGNED(page_size, PAGE_SIZE)) {
> + rxe_err_mr(mr, "FIXME...\n")
> + return -EINVAL;
> + }
> +
> mr->nbuf = 0;
> mr->page_shift = ilog2(page_size);
> mr->page_mask = ~((u64)page_size - 1);
> diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
> index d2f57ead78ad..b1cf1e1c0ce1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_param.h
> +++ b/drivers/infiniband/sw/rxe/rxe_param.h
> @@ -38,7 +38,7 @@ static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
> /* default/initial rxe device parameter settings */
> enum rxe_device_param {
> RXE_MAX_MR_SIZE = -1ull,
> - RXE_PAGE_SIZE_CAP = 0xfffff000,
> + RXE_PAGE_SIZE_CAP = 0xffffffff - (PAGE_SIZE - 1),
> RXE_MAX_QP_WR = DEFAULT_MAX_VALUE,
> RXE_DEVICE_CAP_FLAGS = IB_DEVICE_BAD_PKEY_CNTR
> | IB_DEVICE_BAD_QKEY_CNTR
>
>
> * minor cleanup will be done after this.
>
> Thanks
> Zhijian
>
> > Patch3-5: cleanups # newly add
> > Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
> >
> > My bad arm64 mechine offten hangs when doing blktests even though i use the
> > default siw driver.
> >
> > - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
> >
> > [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
> >
> > Li Zhijian (6):
> > RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
> > RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
> > RDMA/rxe: remove unused rxe_mr.page_shift
> > RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
> > page_list
> > RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
> > RDMA/rxe: Support PAGE_SIZE aligned MR
> >
> > drivers/infiniband/sw/rxe/rxe_mr.c | 80 ++++++++++++++++-----------
> > drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
> > drivers/infiniband/sw/rxe/rxe_verbs.h | 9 ---
> > 3 files changed, 48 insertions(+), 43 deletions(-)
> >

2023-11-06 09:56:57

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor



On 06/11/2023 17:35, Greg Sword wrote:
> On Mon, Nov 6, 2023 at 4:01 PM Zhijian Li (Fujitsu)
> <[email protected]> wrote:
>>
>>
>>
>> Very thanks for all your feedback.
>>
>> On 03/11/2023 17:55, Li Zhijian wrote:
>>> I don't collect the Reviewed-by to the patch1-2 this time, since i
>>> think we can make it better.
>>>
>>> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
>>> Almost nothing change from V1.
>>
>> Quote from Jason:
>> "
>>> The concept was that the xarray could store anything larger than
>>> PAGE_SIZE and the entry would point at the first struct page of the
>>> contiguous chunk
>>>
>>> That looks like it is right, or at least close to right, so lets try
>>> to keep it
>> "
>>
>>
>> It seems it's okay to access address/memory across pages on RXE even though
>> we only map the first page.
>
> Do you really make tests in your test environment? Do you have test environment?



> Do you really reproduce this problem in your test environment?
I did the test, the kernel panic[1] is gone after patch1-patch2


Thanks
Zhijian


> Your patches do not work actually. Please do not send these rubbish patches out.
>
>>
>> That also means PAGE_SIZE aligned MR is already supported, so only check
>> `if (IS_ALIGNED(page_size, PAGE_SIZE))` is sufficient, right?
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index f54042e9aeb2..3755e530e6dc 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>> struct rxe_mr *mr = to_rmr(ibmr);
>> unsigned int page_size = mr_page_size(mr);
>>
>> + if (!IS_ALIGNED(page_size, PAGE_SIZE)) {
>> + rxe_err_mr(mr, "FIXME...\n")
>> + return -EINVAL;
>> + }
>> +
>> mr->nbuf = 0;
>> mr->page_shift = ilog2(page_size);
>> mr->page_mask = ~((u64)page_size - 1);
>> diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
>> index d2f57ead78ad..b1cf1e1c0ce1 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_param.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_param.h
>> @@ -38,7 +38,7 @@ static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
>> /* default/initial rxe device parameter settings */
>> enum rxe_device_param {
>> RXE_MAX_MR_SIZE = -1ull,
>> - RXE_PAGE_SIZE_CAP = 0xfffff000,
>> + RXE_PAGE_SIZE_CAP = 0xffffffff - (PAGE_SIZE - 1),
>> RXE_MAX_QP_WR = DEFAULT_MAX_VALUE,
>> RXE_DEVICE_CAP_FLAGS = IB_DEVICE_BAD_PKEY_CNTR
>> | IB_DEVICE_BAD_QKEY_CNTR
>>
>>
>> * minor cleanup will be done after this.
>>
>> Thanks
>> Zhijian
>>
>>> Patch3-5: cleanups # newly add
>>> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
>>>
>>> My bad arm64 mechine offten hangs when doing blktests even though i use the
>>> default siw driver.
>>>
>>> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
>>>
>>> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>>>
>>> Li Zhijian (6):
>>> RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
>>> RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
>>> RDMA/rxe: remove unused rxe_mr.page_shift
>>> RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
>>> page_list
>>> RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
>>> RDMA/rxe: Support PAGE_SIZE aligned MR
>>>
>>> drivers/infiniband/sw/rxe/rxe_mr.c | 80 ++++++++++++++++-----------
>>> drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
>>> drivers/infiniband/sw/rxe/rxe_verbs.h | 9 ---
>>> 3 files changed, 48 insertions(+), 43 deletions(-)
>>>

2023-11-06 13:58:26

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor

在 2023/11/6 12:07, Zhijian Li (Fujitsu) 写道:
>
>
> On 03/11/2023 21:00, Zhu Yanjun wrote:
>> 在 2023/11/3 17:55, Li Zhijian 写道:
>>> I don't collect the Reviewed-by to the patch1-2 this time, since i
>>> think we can make it better.
>>>
>>> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
>>>            Almost nothing change from V1.
>>> Patch3-5: cleanups # newly add
>>> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
>>>
>>> My bad arm64 mechine offten hangs when doing blktests even though i use the
>>> default siw driver.
>>>
>>> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
>>
>> Zhijian
>>
>> Please read carefully the whole discussion about this problem. You will find a lot of valuable suggestions, especially suggestions from Jason.
>
> Okay, i will read it again. If you can tell me which thread, that would be better.
>
>
>>
>> From the whole discussion, it seems that the root cause is very clear.
>> We need to fix this prolem. Please do not send this kind of commits again.
>>
>
> Let's think about what's our goal first.
>
> - 1) Fix the panic[1] and only support PAGE_SIZE MR
> - 2) support PAGE_SIZE aligned MR
> - 3) support any page_size MR.
>
> I'm sorry i'm not familiar with the linux MM subsystem. It seem it's safe/correct to access
> address/memory across pages start from the return of kmap_loca_page(page).
> In other words, 2) is already native supported, right?

Yes. Please read the comments from Jason, Leon and Bart. They shared a
lot of good advice. From them, we can know the root cause and how to fix
this problem.

Good Luck.

Zhu Yanjun

>
> I get totally confused now.
>
>
>
>> Zhu Yanjun
>>
>>>
>>> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>>>
>>> Li Zhijian (6):
>>>    RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
>>>    RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
>>>    RDMA/rxe: remove unused rxe_mr.page_shift
>>>    RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
>>>      page_list
>>>    RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
>>>    RDMA/rxe: Support PAGE_SIZE aligned MR
>>>
>>>   drivers/infiniband/sw/rxe/rxe_mr.c    | 80 ++++++++++++++++-----------
>>>   drivers/infiniband/sw/rxe/rxe_param.h |  2 +-
>>>   drivers/infiniband/sw/rxe/rxe_verbs.h |  9 ---
>>>   3 files changed, 48 insertions(+), 43 deletions(-)
>>>

2023-11-06 14:14:27

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor

On Mon, Nov 06, 2023 at 04:07:19AM +0000, Zhijian Li (Fujitsu) wrote:

> I'm sorry i'm not familiar with the linux MM subsystem. It seem it's safe/correct to access
> address/memory across pages start from the return of
> kmap_loca_page(page).

kmap_local_page() gives you a PAGE_SIZE window only

Jason