2022-05-18 05:14:18

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: [PATCH] RDMA/rxe: Use kzalloc() to alloc map_set

Below call chains will alloc map_set without fully initializing map_set.
rxe_mr_init_fast()
-> rxe_mr_alloc()
-> rxe_mr_alloc_map_set()

Uninitialized values inside struct rxe_map_set are possible to cause
kernel panic.

It's noticed that crashes were caused by rnbd user cases, it can be
easily reproduced by:
$ while true; do echo "sessname=bla path=ip:<ip> device_path=<device>" > /sys/devices/virtual/rnbd-client/ctl/map_device; done

The backtraces are not always identical.
[1st]----------
[ 80.158930] CPU: 0 PID: 11 Comm: ksoftirqd/0 Not tainted 5.18.0-rc1-roce-flush+ #60 [0/9090]
[ 80.160736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
[ 80.163579] RIP: 0010:lookup_iova+0x66/0xa0 [rdma_rxe]
[ 80.164825] Code: 00 00 00 48 d3 ee 89 32 c3 4c 8b 18 49 8b 3b 48 8b 47 08 48 39 c6 72 38 48 29 c6 45 31 d2 b8 01 00 00 00 48 63 c8 48 c1 e1 04 <48> 8b 4c 0f 08 48 39 f1 77 21 83 c0 01 48 29 ce 3d 00 01 00 00 75
[ 80.168935] RSP: 0018:ffffb7ff80063bf0 EFLAGS: 00010246
[ 80.170333] RAX: 0000000000000000 RBX: ffff9b9949d86800 RCX: 0000000000000000
[ 80.171976] RDX: ffffb7ff80063c00 RSI: 0000000049f6b378 RDI: 002818da00000004
[ 80.173606] RBP: 0000000000000120 R08: ffffb7ff80063c08 R09: ffffb7ff80063c04
[ 80.176933] R10: 0000000000000002 R11: ffff9b9916f7eef8 R12: ffff9b99488a0038
[ 80.178526] R13: ffff9b99488a0038 R14: ffff9b9914fb346a R15: ffff9b990ab27000
[ 80.180378] FS: 0000000000000000(0000) GS:ffff9b997dc00000(0000) knlGS:0000000000000000
[ 80.182257] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 80.183577] CR2: 00007efc33a98ed0 CR3: 0000000014f32004 CR4: 00000000001706f0
[ 80.185210] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 80.186890] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 80.188517] Call Trace:
[ 80.189269] <TASK>
[ 80.189949] rxe_mr_copy.part.0+0x6f/0x140 [rdma_rxe]
[ 80.191173] rxe_responder+0x12ee/0x1b60 [rdma_rxe]
[ 80.192409] ? rxe_icrc_check+0x7e/0x100 [rdma_rxe]
[ 80.193576] ? rxe_rcv+0x1d0/0x780 [rdma_rxe]
[ 80.194668] ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe]
[ 80.195952] rxe_do_task+0x67/0xb0 [rdma_rxe]
[ 80.197081] rxe_xmit_packet+0xc7/0x210 [rdma_rxe]
[ 80.198253] rxe_requester+0x680/0xee0 [rdma_rxe]
[ 80.199439] ? update_load_avg+0x5f/0x690
[ 80.200530] ? update_load_avg+0x5f/0x690
[ 80.213968] ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client]

[2nd]----------
[ 5213.049494] RIP: 0010:rxe_mr_copy.part.0+0xa8/0x140 [rdma_rxe]
[ 5213.050978] Code: 00 00 49 c1 e7 04 48 8b 00 4c 8d 2c d0 48 8b 44 24 10 4d 03 7d 00 85 ed 7f 10 eb 6c 89 54 24 0c 49 83 c7 10 31 c0 85 ed 7e 5e <49> 8b 3f 8b 14 24 4c 89 f6 48 01 c7 85 d2 74 06 48 89 fe 4c 89 f7
[ 5213.056463] RSP: 0018:ffffae3580063bf8 EFLAGS: 00010202
[ 5213.057986] RAX: 0000000000018978 RBX: ffff9d7ef7a03600 RCX: 0000000000000008
[ 5213.059797] RDX: 000000000000007c RSI: 000000000000007c RDI: ffff9d7ef7a03600
[ 5213.061720] RBP: 0000000000000120 R08: ffffae3580063c08 R09: ffffae3580063c04
[ 5213.063532] R10: ffff9d7efece0038 R11: ffff9d7ec4b1db00 R12: ffff9d7efece0038
[ 5213.065445] R13: ffff9d7ef4098260 R14: ffff9d7f11e23c6a R15: 4c79500065708144
[ 5213.067264] FS: 0000000000000000(0000) GS:ffff9d7f3dc00000(0000) knlGS:0000000000000000
[ 5213.069442] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5213.071004] CR2: 00007fce47276c60 CR3: 0000000003f66004 CR4: 00000000001706f0
[ 5213.072827] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5213.074484] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5213.076292] Call Trace:
[ 5213.077027] <TASK>
[ 5213.077718] rxe_responder+0x12ee/0x1b60 [rdma_rxe]
[ 5213.079019] ? rxe_icrc_check+0x7e/0x100 [rdma_rxe]
[ 5213.080380] ? rxe_rcv+0x1d0/0x780 [rdma_rxe]
[ 5213.081708] ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe]
[ 5213.082990] rxe_do_task+0x67/0xb0 [rdma_rxe]
[ 5213.084030] rxe_xmit_packet+0xc7/0x210 [rdma_rxe]
[ 5213.085156] rxe_requester+0x680/0xee0 [rdma_rxe]
[ 5213.088258] ? update_load_avg+0x5f/0x690
[ 5213.089381] ? update_load_avg+0x5f/0x690
[ 5213.090446] ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client]
[ 5213.092087] rxe_do_task+0x67/0xb0 [rdma_rxe]
[ 5213.093125] tasklet_action_common.constprop.0+0x92/0xc0
[ 5213.094366] __do_softirq+0xe1/0x2d8
[ 5213.095287] run_ksoftirqd+0x21/0x30
[ 5213.096456] smpboot_thread_fn+0x183/0x220
[ 5213.097519] ? sort_range+0x20/0x20
[ 5213.098761] kthread+0xe2/0x110
[ 5213.099638] ? kthread_complete_and_exit+0x20/0x20
[ 5213.100948] ret_from_fork+0x22/0x30

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_mr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 60a31b718774..bfd2d9db3deb 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -81,7 +81,7 @@ static int rxe_mr_alloc_map_set(int num_map, struct rxe_map_set **setp)
int i;
struct rxe_map_set *set;

- set = kmalloc(sizeof(*set), GFP_KERNEL);
+ set = kzalloc(sizeof(*set), GFP_KERNEL);
if (!set)
goto err_out;

@@ -90,7 +90,7 @@ static int rxe_mr_alloc_map_set(int num_map, struct rxe_map_set **setp)
goto err_free_set;

for (i = 0; i < num_map; i++) {
- set->map[i] = kmalloc(sizeof(struct rxe_map), GFP_KERNEL);
+ set->map[i] = kzalloc(sizeof(struct rxe_map), GFP_KERNEL);
if (!set->map[i])
goto err_free_map;
}
--
2.31.1





2022-05-20 15:46:39

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] RDMA/rxe: Use kzalloc() to alloc map_set

On Wed, May 18, 2022 at 12:37:25PM +0800, Li Zhijian wrote:
> Below call chains will alloc map_set without fully initializing map_set.
> rxe_mr_init_fast()
> -> rxe_mr_alloc()
> -> rxe_mr_alloc_map_set()
>
> Uninitialized values inside struct rxe_map_set are possible to cause
> kernel panic.

If the value is uninitialized then why is 0 an OK value?

Would be happier to know the exact value that is not initialized

Jason

2022-05-23 14:03:17

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [PATCH] RDMA/rxe: Use kzalloc() to alloc map_set


on 2022/5/20 22:45, Jason Gunthorpe wrote:
> On Wed, May 18, 2022 at 12:37:25PM +0800, Li Zhijian wrote:
>> Below call chains will alloc map_set without fully initializing map_set.
>> rxe_mr_init_fast()
>> -> rxe_mr_alloc()
>> -> rxe_mr_alloc_map_set()
>>
>> Uninitialized values inside struct rxe_map_set are possible to cause
>> kernel panic.
> If the value is uninitialized then why is 0 an OK value?
>
> Would be happier to know the exact value that is not initialized

Well, good question. After re-think of this issue, it seems this patch
wasn't the root cause though it made the crash disappear in some extent.

I'm still working on the root cause :)

Thanks

Zhijian


>
> Jason



2022-05-24 06:17:43

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [PATCH] RDMA/rxe: Use kzalloc() to alloc map_set

Hi Jason & Bob
CC Guoqing

@Guoqing, It may correlate with your previous bug report: https://lore.kernel.org/all/[email protected]/T/


It's observed that a same MR in rnbd server will trigger below code
path:
-> rxe_mr_init_fast()
|-> alloc map_set() # map_set is uninitialized
|...-> rxe_map_mr_sg() # build the map_set
|-> rxe_mr_set_page()
|...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
# we can access host memory(such rxe_mr_copy)
|...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
|...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
# but map_set was not built again
|...-> rxe_mr_copy() # kernel crash due to access wild addresses
# that lookup from the map_set

I draft a patch like below for it, but i wonder if it's rxe's responsibility to do such checking.
Any comments are very welcome.


From e9d0bd821f07f5e049027f07b3ce9dc283624201 Mon Sep 17 00:00:00 2001
From: Li Zhijian <[email protected]>
Date: Tue, 24 May 2022 10:56:19 +0800
Subject: [PATCH] RDMA/rxe: check map_set valid when handle IB_WR_REG_MR

It's observed that a same MR in rnbd server will trigger below code
path:
-> rxe_mr_init_fast()
|-> alloc map_set() # map_set is uninitialized
|...-> rxe_map_mr_sg() # build the map_set
|-> rxe_mr_set_page()
|...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
# we can access host memory(such rxe_mr_copy)
|...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
|...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
# but map_set was not built again
|...-> rxe_mr_copy() # kernel crash due to access wild addresses
# that lookup from the map_set

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_mr.c | 9 +++++++++
drivers/infiniband/sw/rxe/rxe_verbs.c | 1 +
drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
3 files changed, 11 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 787c7dadc14f..09673d559c06 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -90,6 +90,7 @@ static int rxe_mr_alloc_map_set(int num_map, struct rxe_map_set **setp)
if (!set->map)
goto err_free_set;

+ set->valid = false;
for (i = 0; i < num_map; i++) {
set->map[i] = kmalloc(sizeof(struct rxe_map), GFP_KERNEL);
if (!set->map[i])
@@ -216,6 +217,7 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
}

set = mr->cur_map_set;
+ set->valid = true;
set->page_shift = PAGE_SHIFT;
set->page_mask = PAGE_SIZE - 1;

@@ -643,6 +645,7 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey)
}

mr->state = RXE_MR_STATE_FREE;
+ mr->cur_map_set->valid = mr->next_map_set->valid = false;
ret = 0;

err_drop_ref:
@@ -679,12 +682,18 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
return -EINVAL;
}

+ if (!mr->next_map_set->valid) {
+ pr_warn("%s: map set is not valid\n", __func__);
+ return -EINVAL;
+ }
+
mr->access = access;
mr->lkey = (mr->lkey & ~0xff) | key;
mr->rkey = (access & IB_ACCESS_REMOTE) ? mr->lkey : 0;
mr->state = RXE_MR_STATE_VALID;

set = mr->cur_map_set;
+ set->valid = false;
mr->cur_map_set = mr->next_map_set;
mr->cur_map_set->iova = wqe->wr.wr.reg.mr->iova;
mr->next_map_set = set;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 58e4412b1d16..4b7ae2d1d921 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -992,6 +992,7 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
set->page_shift = ilog2(ibmr->page_size);
set->page_mask = ibmr->page_size - 1;
set->offset = set->iova & set->page_mask;
+ set->valid = true;

return n;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 86068d70cd95..2edf31aab7e1 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -289,6 +289,7 @@ struct rxe_map {

struct rxe_map_set {
struct rxe_map **map;
+ bool valid;
u64 va;
u64 iova;
size_t length;
--
2.31.1


On 23/05/2022 22:02, Li, Zhijian wrote:
>
> on 2022/5/20 22:45, Jason Gunthorpe wrote:
>> On Wed, May 18, 2022 at 12:37:25PM +0800, Li Zhijian wrote:
>>> Below call chains will alloc map_set without fully initializing map_set.
>>> rxe_mr_init_fast()
>>>   -> rxe_mr_alloc()
>>>      -> rxe_mr_alloc_map_set()
>>>
>>> Uninitialized values inside struct rxe_map_set are possible to cause
>>> kernel panic.
>> If the value is uninitialized then why is 0 an OK value?
>>
>> Would be happier to know the exact value that is not initialized
>
> Well, good question. After re-think of this issue, it seems this patch wasn't the root cause though it made the crash disappear in some extent.
>
> I'm still working on the root cause :)
>
> Thanks
>
> Zhijian
>
>
>>
>> Jason
>
>

2022-05-24 14:49:55

by Haris Iqbal

[permalink] [raw]
Subject: Re: [PATCH] RDMA/rxe: Use kzalloc() to alloc map_set

On Tue, May 24, 2022 at 6:00 AM [email protected]
<[email protected]> wrote:
>
> Hi Jason & Bob
> CC Guoqing
>
> @Guoqing, It may correlate with your previous bug report: https://lore.kernel.org/all/[email protected]/T/
>
>
> It's observed that a same MR in rnbd server will trigger below code
> path:
> -> rxe_mr_init_fast()
> |-> alloc map_set() # map_set is uninitialized
> |...-> rxe_map_mr_sg() # build the map_set
> |-> rxe_mr_set_page()
> |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
> # we can access host memory(such rxe_mr_copy)
> |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
> |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
> # but map_set was not built again
> |...-> rxe_mr_copy() # kernel crash due to access wild addresses
> # that lookup from the map_set
>
> I draft a patch like below for it, but i wonder if it's rxe's responsibility to do such checking.
> Any comments are very welcome.
>
>
> From e9d0bd821f07f5e049027f07b3ce9dc283624201 Mon Sep 17 00:00:00 2001
> From: Li Zhijian <[email protected]>
> Date: Tue, 24 May 2022 10:56:19 +0800
> Subject: [PATCH] RDMA/rxe: check map_set valid when handle IB_WR_REG_MR
>
> It's observed that a same MR in rnbd server will trigger below code
> path:
> -> rxe_mr_init_fast()
> |-> alloc map_set() # map_set is uninitialized
> |...-> rxe_map_mr_sg() # build the map_set
> |-> rxe_mr_set_page()
> |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
> # we can access host memory(such rxe_mr_copy)
> |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
> |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
> # but map_set was not built again
> |...-> rxe_mr_copy() # kernel crash due to access wild addresses
> # that lookup from the map_set
>
> Signed-off-by: Li Zhijian <[email protected]>
> ---
> drivers/infiniband/sw/rxe/rxe_mr.c | 9 +++++++++
> drivers/infiniband/sw/rxe/rxe_verbs.c | 1 +
> drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
> 3 files changed, 11 insertions(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index 787c7dadc14f..09673d559c06 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -90,6 +90,7 @@ static int rxe_mr_alloc_map_set(int num_map, struct rxe_map_set **setp)
> if (!set->map)
> goto err_free_set;
>
> + set->valid = false;
> for (i = 0; i < num_map; i++) {
> set->map[i] = kmalloc(sizeof(struct rxe_map), GFP_KERNEL);
> if (!set->map[i])
> @@ -216,6 +217,7 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
> }
>
> set = mr->cur_map_set;
> + set->valid = true;
> set->page_shift = PAGE_SHIFT;
> set->page_mask = PAGE_SIZE - 1;
>
> @@ -643,6 +645,7 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey)
> }
>
> mr->state = RXE_MR_STATE_FREE;
> + mr->cur_map_set->valid = mr->next_map_set->valid = false;
> ret = 0;
>
> err_drop_ref:
> @@ -679,12 +682,18 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
> return -EINVAL;
> }
>
> + if (!mr->next_map_set->valid) {
> + pr_warn("%s: map set is not valid\n", __func__);
> + return -EINVAL;
> + }
> +
> mr->access = access;
> mr->lkey = (mr->lkey & ~0xff) | key;
> mr->rkey = (access & IB_ACCESS_REMOTE) ? mr->lkey : 0;
> mr->state = RXE_MR_STATE_VALID;
>
> set = mr->cur_map_set;
> + set->valid = false;
> mr->cur_map_set = mr->next_map_set;
> mr->cur_map_set->iova = wqe->wr.wr.reg.mr->iova;
> mr->next_map_set = set;
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 58e4412b1d16..4b7ae2d1d921 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -992,6 +992,7 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
> set->page_shift = ilog2(ibmr->page_size);
> set->page_mask = ibmr->page_size - 1;
> set->offset = set->iova & set->page_mask;
> + set->valid = true;
>
> return n;
> }
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index 86068d70cd95..2edf31aab7e1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -289,6 +289,7 @@ struct rxe_map {
>
> struct rxe_map_set {
> struct rxe_map **map;
> + bool valid;
> u64 va;
> u64 iova;
> size_t length;
> --
> 2.31.1

Thanks for posting the description and the patch.

We have been facing the exact same issue (only with rxe), and we also
realized that to get around this we will have to call ib_map_mr_sg()
before every IB_WR_REG_MR wr; even if we are reusing the MR and simply
invalidating and re-validating them.

In reference to this, we have 2 questions.

1) This change was made in the following commit.

commit 647bf13ce944f20f7402f281578423a952274e4a
Author: Bob Pearson <[email protected]>
Date: Tue Sep 14 11:42:06 2021 -0500

RDMA/rxe: Create duplicate mapping tables for FMRs

For fast memory regions create duplicate mapping tables so ib_map_mr_sg()
can build a new mapping table which is then swapped into place
synchronously with the execution of an IB_WR_REG_MR work request.

Currently the rxe driver uses the same table for receiving RDMA operations
and for building new tables in preparation for reusing the MR. This
exposes users to potentially incorrect results.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

We tried to understand what potential incorrect result that commit
message talks about, but were not able to. If someone can through
light into this scenario where single mapping table can result into
issues, it would be great.

2)
We wanted to confirm that, with the above patch, it clearly means that
the use-case where we reuse the MR, by simply invalidating and
re-validating (IB_WR_REG_MR wr) is a correct one.
And there is no requirement saying that ib_map_mr_sg() needs to be
done everytime regardless.

(We were planning to send this in the coming days, but wanted other
discussions to get over. Since the patch got posted and discussion
started, we thought better to put this out.)

Regards

>
>
> On 23/05/2022 22:02, Li, Zhijian wrote:
> >
> > on 2022/5/20 22:45, Jason Gunthorpe wrote:
> >> On Wed, May 18, 2022 at 12:37:25PM +0800, Li Zhijian wrote:
> >>> Below call chains will alloc map_set without fully initializing map_set.
> >>> rxe_mr_init_fast()
> >>> -> rxe_mr_alloc()
> >>> -> rxe_mr_alloc_map_set()
> >>>
> >>> Uninitialized values inside struct rxe_map_set are possible to cause
> >>> kernel panic.
> >> If the value is uninitialized then why is 0 an OK value?
> >>
> >> Would be happier to know the exact value that is not initialized
> >
> > Well, good question. After re-think of this issue, it seems this patch wasn't the root cause though it made the crash disappear in some extent.
> >
> > I'm still working on the root cause :)
> >
> > Thanks
> >
> > Zhijian
> >
> >
> >>
> >> Jason
> >
> >

2022-05-25 04:41:14

by Guoqing Jiang

[permalink] [raw]
Subject: Re: [PATCH] RDMA/rxe: Use kzalloc() to alloc map_set



On 5/25/22 9:31 AM, [email protected] wrote:
>
> On 24/05/2022 18:56, Haris Iqbal wrote:
>> On Tue, May 24, 2022 at 6:00 AM [email protected]
>> <[email protected]> wrote:
>>> Hi Jason & Bob
>>> CC Guoqing
>>>
>>> @Guoqing, It may correlate with your previous bug report: https://lore.kernel.org/all/[email protected]/T/
>>>
>>>
>>> It's observed that a same MR in rnbd server will trigger below code
>>> path:
>>> -> rxe_mr_init_fast()
>>> |-> alloc map_set() # map_set is uninitialized
>>> |...-> rxe_map_mr_sg() # build the map_set
>>> |-> rxe_mr_set_page()
>>> |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
>>> # we can access host memory(such rxe_mr_copy)
>>> |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
>>> |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
>>> # but map_set was not built again
>>> |...-> rxe_mr_copy() # kernel crash due to access wild addresses
>>> # that lookup from the map_set

Yes, it could be similar issue thought I didn't get kernel crash, but it
was FMR relevant.

https://lore.kernel.org/all/[email protected]/T/#m5dc6898375cedf17fea13ccebf595aac0454c841

> Yes, this workaround should work but expensive.
> It seems Bob has started a new thread to discuss the FMRs in https://www.spinics.net/lists/linux-rdma/msg110836.html

Will give it a try, thanks for the link.

Thanks,
Guoqing

2022-05-25 05:28:20

by Zhijian Li (Fujitsu)

[permalink] [raw]
Subject: Re: [PATCH] RDMA/rxe: Use kzalloc() to alloc map_set



On 24/05/2022 18:56, Haris Iqbal wrote:
> On Tue, May 24, 2022 at 6:00 AM [email protected]
> <[email protected]> wrote:
>> Hi Jason & Bob
>> CC Guoqing
>>
>> @Guoqing, It may correlate with your previous bug report: https://lore.kernel.org/all/[email protected]/T/
>>
>>
>> It's observed that a same MR in rnbd server will trigger below code
>> path:
>> -> rxe_mr_init_fast()
>> |-> alloc map_set() # map_set is uninitialized
>> |...-> rxe_map_mr_sg() # build the map_set
>> |-> rxe_mr_set_page()
>> |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
>> # we can access host memory(such rxe_mr_copy)
>> |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
>> |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
>> # but map_set was not built again
>> |...-> rxe_mr_copy() # kernel crash due to access wild addresses
>> # that lookup from the map_set
>>
>> I draft a patch like below for it, but i wonder if it's rxe's responsibility to do such checking.
>> Any comments are very welcome.
>>
> Thanks for posting the description and the patch.
>
> We have been facing the exact same issue (only with rxe),

> and we also
> realized that to get around this we will have to call ib_map_mr_sg()
> before every IB_WR_REG_MR wr; even if we are reusing the MR and simply
> invalidating and re-validating them.
Yes, this workaround should work but expensive.
It seems Bob has started a new thread to discuss the FMRs in https://www.spinics.net/lists/linux-rdma/msg110836.html

Thanks
Zhijian

>
> In reference to this, we have 2 questions.
>
> 1) This change was made in the following commit.
>
> commit 647bf13ce944f20f7402f281578423a952274e4a
> Author: Bob Pearson <[email protected]>
> Date: Tue Sep 14 11:42:06 2021 -0500
>
> RDMA/rxe: Create duplicate mapping tables for FMRs
>
> For fast memory regions create duplicate mapping tables so ib_map_mr_sg()
> can build a new mapping table which is then swapped into place
> synchronously with the execution of an IB_WR_REG_MR work request.
>
> Currently the rxe driver uses the same table for receiving RDMA operations
> and for building new tables in preparation for reusing the MR. This
> exposes users to potentially incorrect results.
>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Bob Pearson <[email protected]>
> Signed-off-by: Jason Gunthorpe <[email protected]>
>
> We tried to understand what potential incorrect result that commit
> message talks about, but were not able to. If someone can through
> light into this scenario where single mapping table can result into
> issues, it would be great.
>
> 2)
> We wanted to confirm that, with the above patch, it clearly means that
> the use-case where we reuse the MR, by simply invalidating and
> re-validating (IB_WR_REG_MR wr) is a correct one.
> And there is no requirement saying that ib_map_mr_sg() needs to be
> done everytime regardless.
>
> (We were planning to send this in the coming days, but wanted other
> discussions to get over. Since the patch got posted and discussion
> started, we thought better to put this out.)
>
> Regards
>
>>
>> On 23/05/2022 22:02, Li, Zhijian wrote:
>>> on 2022/5/20 22:45, Jason Gunthorpe wrote:
>>>> On Wed, May 18, 2022 at 12:37:25PM +0800, Li Zhijian wrote:
>>>>> Below call chains will alloc map_set without fully initializing map_set.
>>>>> rxe_mr_init_fast()
>>>>> -> rxe_mr_alloc()
>>>>> -> rxe_mr_alloc_map_set()
>>>>>
>>>>> Uninitialized values inside struct rxe_map_set are possible to cause
>>>>> kernel panic.
>>>> If the value is uninitialized then why is 0 an OK value?
>>>>
>>>> Would be happier to know the exact value that is not initialized
>>> Well, good question. After re-think of this issue, it seems this patch wasn't the root cause though it made the crash disappear in some extent.
>>>
>>> I'm still working on the root cause :)
>>>
>>> Thanks
>>>
>>> Zhijian
>>>
>>>
>>>> Jason
>>>