2022-10-21 08:17:19

by Li Zhijian

[permalink] [raw]
Subject: [for-next PATCH v2 1/2] RDMA/rxe: Remove unnecessary mr testing

Before the testing, we already passed it to rxe_mr_copy() where mr could
be dereferenced. so this checking is not exactly correct.

I tried to figure out the details how/when mr could be NULL, but failed
at last. Add a WARN_ON(!mr) to that path to tell us more when it
happends.

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index ed5a09e86417..218c14fb07c6 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -778,6 +778,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
if (res->state == rdatm_res_state_new) {
if (!res->replay) {
mr = qp->resp.mr;
+ WARN_ON(!mr);
qp->resp.mr = NULL;
} else {
mr = rxe_recheck_mr(qp, res->read.rkey);
@@ -811,8 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,

rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
payload, RXE_FROM_MR_OBJ);
- if (mr)
- rxe_put(mr);
+ rxe_put(mr);

if (bth_pad(&ack_pkt)) {
u8 *pad = payload_addr(&ack_pkt) + payload;
--
2.31.1


2022-10-21 08:28:10

by Li Zhijian

[permalink] [raw]
Subject: [for-next PATCH v2 1/2] RDMA/rxe: Remove unnecessary mr testing

Before the testing, we already passed it to rxe_mr_copy() where mr could
be dereferenced. so this checking is not exactly correct.

I tried to figure out the details how/when mr could be NULL, but failed
at last. Add a WARN_ON(!mr) to that path to tell us more when it
happends.

Signed-off-by: Li Zhijian <[email protected]>
---
drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index ed5a09e86417..218c14fb07c6 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -778,6 +778,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
if (res->state == rdatm_res_state_new) {
if (!res->replay) {
mr = qp->resp.mr;
+ WARN_ON(!mr);
qp->resp.mr = NULL;
} else {
mr = rxe_recheck_mr(qp, res->read.rkey);
@@ -811,8 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,

rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
payload, RXE_FROM_MR_OBJ);
- if (mr)
- rxe_put(mr);
+ rxe_put(mr);

if (bth_pad(&ack_pkt)) {
u8 *pad = payload_addr(&ack_pkt) + payload;
--
2.31.1

2022-10-21 15:11:13

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [for-next PATCH v2 1/2] RDMA/rxe: Remove unnecessary mr testing

On Fri, Oct 21, 2022 at 3:53 PM Li Zhijian <[email protected]> wrote:
>
> Before the testing, we already passed it to rxe_mr_copy() where mr could
> be dereferenced. so this checking is not exactly correct.
>
> I tried to figure out the details how/when mr could be NULL, but failed
> at last. Add a WARN_ON(!mr) to that path to tell us more when it
> happends.

If I get you correctly, you confronted a problem, but you can not figure it out.
So you send it upstream as a patch?

I am not sure if it is a good idea.

Zhu Yanjun

>
> Signed-off-by: Li Zhijian <[email protected]>
> ---
> drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index ed5a09e86417..218c14fb07c6 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -778,6 +778,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> if (res->state == rdatm_res_state_new) {
> if (!res->replay) {
> mr = qp->resp.mr;
> + WARN_ON(!mr);
> qp->resp.mr = NULL;
> } else {
> mr = rxe_recheck_mr(qp, res->read.rkey);
> @@ -811,8 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>
> rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> payload, RXE_FROM_MR_OBJ);
> - if (mr)
> - rxe_put(mr);
> + rxe_put(mr);
>
> if (bth_pad(&ack_pkt)) {
> u8 *pad = payload_addr(&ack_pkt) + payload;
> --
> 2.31.1
>

2022-10-22 01:25:00

by Li Zhijian

[permalink] [raw]
Subject: Re: [for-next PATCH v2 1/2] RDMA/rxe: Remove unnecessary mr testing



On 21/10/2022 22:39, Zhu Yanjun wrote:
> On Fri, Oct 21, 2022 at 3:53 PM Li Zhijian <[email protected]> wrote:
>> Before the testing, we already passed it to rxe_mr_copy() where mr could
>> be dereferenced. so this checking is not exactly correct.
>>
>> I tried to figure out the details how/when mr could be NULL, but failed
>> at last. Add a WARN_ON(!mr) to that path to tell us more when it
>> happends.
> If I get you correctly, you confronted a problem,
Not exactly,  I removed the mr checking since i think this checking is not correct.
the newly added WARN_ON(!mr) is the only once place where the mr can be NULL but not handled correctly.
At least with/without this patch, once WARN_ON(!mr) is triggered, kernel will go something wrong.

so i want to place this  WARN_ON(!mr) to point to the problem.

Thanks
Zhijian

> but you can not figure it out.
> So you send it upstream as a patch?
>
> I am not sure if it is a good idea.
>
> Zhu Yanjun
>
>> Signed-off-by: Li Zhijian <[email protected]>
>> ---
>> drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>> index ed5a09e86417..218c14fb07c6 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>> @@ -778,6 +778,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>> if (res->state == rdatm_res_state_new) {
>> if (!res->replay) {
>> mr = qp->resp.mr;
>> + WARN_ON(!mr);
>> qp->resp.mr = NULL;
>> } else {
>> mr = rxe_recheck_mr(qp, res->read.rkey);
>> @@ -811,8 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>
>> rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>> payload, RXE_FROM_MR_OBJ);
>> - if (mr)
>> - rxe_put(mr);
>> + rxe_put(mr);
>>
>> if (bth_pad(&ack_pkt)) {
>> u8 *pad = payload_addr(&ack_pkt) + payload;
>> --
>> 2.31.1
>>

2022-10-23 18:10:18

by Bob Pearson

[permalink] [raw]
Subject: Re: [for-next PATCH v2 1/2] RDMA/rxe: Remove unnecessary mr testing

On 10/21/22 20:09, Li Zhijian wrote:
>
>
> On 21/10/2022 22:39, Zhu Yanjun wrote:
>> On Fri, Oct 21, 2022 at 3:53 PM Li Zhijian <[email protected]> wrote:
>>> Before the testing, we already passed it to rxe_mr_copy() where mr could
>>> be dereferenced. so this checking is not exactly correct.
>>>
>>> I tried to figure out the details how/when mr could be NULL, but failed
>>> at last. Add a WARN_ON(!mr) to that path to tell us more when it
>>> happends.
>> If I get you correctly, you confronted a problem,
> Not exactly,  I removed the mr checking since i think this checking is not correct.
> the newly added WARN_ON(!mr) is the only once place where the mr can be NULL but not handled correctly.
> At least with/without this patch, once WARN_ON(!mr) is triggered, kernel will go something wrong.
>
> so i want to place this  WARN_ON(!mr) to point to the problem.
>
> Thanks
> Zhijian
>
>>   but you can not figure it out.
>> So you send it upstream as a patch?
>>
>> I am not sure if it is a good idea.
>>
>> Zhu Yanjun
>>
>>> Signed-off-by: Li Zhijian <[email protected]>
>>> ---
>>>   drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>>> index ed5a09e86417..218c14fb07c6 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>>> @@ -778,6 +778,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>          if (res->state == rdatm_res_state_new) {
>>>                  if (!res->replay) {
>>>                          mr = qp->resp.mr;
>>> +                       WARN_ON(!mr);
>>>                          qp->resp.mr = NULL;
>>>                  } else {
>>>                          mr = rxe_recheck_mr(qp, res->read.rkey);
>>> @@ -811,8 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>
>>>          rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>>>                      payload, RXE_FROM_MR_OBJ);
>>> -       if (mr)
>>> -               rxe_put(mr);
>>> +       rxe_put(mr);
>>>
>>>          if (bth_pad(&ack_pkt)) {
>>>                  u8 *pad = payload_addr(&ack_pkt) + payload
>>> --
>>> 2.31.1
>>>
>

Li is correct that the only way mr could be NULL is if qp->resp.mr == NULL. So the
'if (mr)' is not needed if that is the case. The read_reply subroutine is reached
from a new rdma read operation after going through check_rkey or from a previous
rdma read operations from get_req if qp->resp.res != NULL or from a duplicate request
where the previous responder resource is found. In all these cases the mr is set.
Initially in check_rkey where if it can't find the mr it causes an RKEY_VIOLATION.
Thereafter the rkey is stored in the responder resources and looked up for each
packet to get an mr or cause an RKEY_VIOLATION. So the mr can't be NULL. I think
you can leave out the WARN and just drop the if (mr).

Bob

2022-10-24 02:38:02

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [for-next PATCH v2 1/2] RDMA/rxe: Remove unnecessary mr testing

On Mon, Oct 24, 2022 at 2:05 AM Bob Pearson <[email protected]> wrote:
>
> On 10/21/22 20:09, Li Zhijian wrote:
> >
> >
> > On 21/10/2022 22:39, Zhu Yanjun wrote:
> >> On Fri, Oct 21, 2022 at 3:53 PM Li Zhijian <[email protected]> wrote:
> >>> Before the testing, we already passed it to rxe_mr_copy() where mr could
> >>> be dereferenced. so this checking is not exactly correct.
> >>>
> >>> I tried to figure out the details how/when mr could be NULL, but failed
> >>> at last. Add a WARN_ON(!mr) to that path to tell us more when it
> >>> happends.
> >> If I get you correctly, you confronted a problem,
> > Not exactly, I removed the mr checking since i think this checking is not correct.
> > the newly added WARN_ON(!mr) is the only once place where the mr can be NULL but not handled correctly.
> > At least with/without this patch, once WARN_ON(!mr) is triggered, kernel will go something wrong.
> >
> > so i want to place this WARN_ON(!mr) to point to the problem.
> >
> > Thanks
> > Zhijian
> >
> >> but you can not figure it out.
> >> So you send it upstream as a patch?
> >>
> >> I am not sure if it is a good idea.
> >>
> >> Zhu Yanjun
> >>
> >>> Signed-off-by: Li Zhijian <[email protected]>
> >>> ---
> >>> drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
> >>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> >>> index ed5a09e86417..218c14fb07c6 100644
> >>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> >>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> >>> @@ -778,6 +778,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> >>> if (res->state == rdatm_res_state_new) {
> >>> if (!res->replay) {
> >>> mr = qp->resp.mr;
> >>> + WARN_ON(!mr);
> >>> qp->resp.mr = NULL;
> >>> } else {
> >>> mr = rxe_recheck_mr(qp, res->read.rkey);
> >>> @@ -811,8 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> >>>
> >>> rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> >>> payload, RXE_FROM_MR_OBJ);
> >>> - if (mr)
> >>> - rxe_put(mr);
> >>> + rxe_put(mr);
> >>>
> >>> if (bth_pad(&ack_pkt)) {
> >>> u8 *pad = payload_addr(&ack_pkt) + payload
> >>> --
> >>> 2.31.1
> >>>
> >
>
> Li is correct that the only way mr could be NULL is if qp->resp.mr == NULL. So the

What I am concerned about is if "WARN_ON(!mr);" should be added or not.
IMO, if the root cause remains unclear, this should be a problem.
Currently this problem is not fixed. It is useless to send a debug
statement to the maillist.

Zhu Yanjun

> 'if (mr)' is not needed if that is the case. The read_reply subroutine is reached
> from a new rdma read operation after going through check_rkey or from a previous
> rdma read operations from get_req if qp->resp.res != NULL or from a duplicate request
> where the previous responder resource is found. In all these cases the mr is set.
> Initially in check_rkey where if it can't find the mr it causes an RKEY_VIOLATION.
> Thereafter the rkey is stored in the responder resources and looked up for each
> packet to get an mr or cause an RKEY_VIOLATION. So the mr can't be NULL. I think
> you can leave out the WARN and just drop the if (mr).
>
> Bob
>

2022-10-24 03:30:33

by Li Zhijian

[permalink] [raw]
Subject: Re: [for-next PATCH v2 1/2] RDMA/rxe: Remove unnecessary mr testing



On 24/10/2022 10:25, Zhu Yanjun wrote:
>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>>>>> @@ -778,6 +778,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>>> if (res->state == rdatm_res_state_new) {
>>>>> if (!res->replay) {
>>>>> mr = qp->resp.mr;
>>>>> + WARN_ON(!mr);
>>>>> qp->resp.mr = NULL;
>>>>> } else {
>>>>> mr = rxe_recheck_mr(qp, res->read.rkey);
>>>>> @@ -811,8 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>>>
>>>>> rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>>>>> payload, RXE_FROM_MR_OBJ);
>>>>> - if (mr)
>>>>> - rxe_put(mr);
>>>>> + rxe_put(mr);
>>>>>
>>>>> if (bth_pad(&ack_pkt)) {
>>>>> u8 *pad = payload_addr(&ack_pkt) + payload
>>>>> --
>>>>> 2.31.1
>>>>>
>> Li is correct that the only way mr could be NULL is if qp->resp.mr == NULL. So the
> What I am concerned about is if "WARN_ON(!mr);" should be added or not.
> IMO, if the root cause remains unclear, this should be a problem.
> Currently this problem is not fixed. It is useless to send a debug
> statement to the maillist.

As per Bob's explanation,  no 'WARN_ON(!mr)' is needed.
I will update the patch soon.

> Zhu Yanjun
>
>> 'if (mr)' is not needed if that is the case. The read_reply subroutine is reached
>> from a new rdma read operation after going through check_rkey or from a previous
>> rdma read operations from get_req if qp->resp.res != NULL or from a duplicate request
>> where the previous responder resource is found. In all these cases the mr is set.
>> Initially in check_rkey where if it can't find the mr it causes an RKEY_VIOLATION.
>> Thereafter the rkey is stored in the responder resources and looked up for each
>> packet to get an mr or cause an RKEY_VIOLATION. So the mr can't be NULL. I think
>> you can leave out the WARN and just drop the if (mr).
Very thanks for your explanation

Thanks
Zhijian

>>
>> Bob
>>

2022-10-24 03:36:06

by Li Zhijian

[permalink] [raw]
Subject: Re: [for-next PATCH v2 1/2] RDMA/rxe: Remove unnecessary mr testing

Just noticed i didn't post [2/2] successfully, will sent it in next version.



On 21/10/2022 15:52, Li Zhijian wrote:
> Before the testing, we already passed it to rxe_mr_copy() where mr could
> be dereferenced. so this checking is not exactly correct.
>
> I tried to figure out the details how/when mr could be NULL, but failed
> at last. Add a WARN_ON(!mr) to that path to tell us more when it
> happends.
>
> Signed-off-by: Li Zhijian <[email protected]>
> ---
> drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index ed5a09e86417..218c14fb07c6 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -778,6 +778,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> if (res->state == rdatm_res_state_new) {
> if (!res->replay) {
> mr = qp->resp.mr;
> + WARN_ON(!mr);
> qp->resp.mr = NULL;
> } else {
> mr = rxe_recheck_mr(qp, res->read.rkey);
> @@ -811,8 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>
> rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> payload, RXE_FROM_MR_OBJ);
> - if (mr)
> - rxe_put(mr);
> + rxe_put(mr);
>
> if (bth_pad(&ack_pkt)) {
> u8 *pad = payload_addr(&ack_pkt) + payload;

2022-10-24 20:54:59

by Bob Pearson

[permalink] [raw]
Subject: Re: [for-next PATCH v2 1/2] RDMA/rxe: Remove unnecessary mr testing

On 10/23/22 21:25, Zhu Yanjun wrote:
> On Mon, Oct 24, 2022 at 2:05 AM Bob Pearson <[email protected]> wrote:
>>
>> On 10/21/22 20:09, Li Zhijian wrote:
>>>
>>>
>>> On 21/10/2022 22:39, Zhu Yanjun wrote:
>>>> On Fri, Oct 21, 2022 at 3:53 PM Li Zhijian <[email protected]> wrote:
>>>>> Before the testing, we already passed it to rxe_mr_copy() where mr could
>>>>> be dereferenced. so this checking is not exactly correct.
>>>>>
>>>>> I tried to figure out the details how/when mr could be NULL, but failed
>>>>> at last. Add a WARN_ON(!mr) to that path to tell us more when it
>>>>> happends.
>>>> If I get you correctly, you confronted a problem,
>>> Not exactly, I removed the mr checking since i think this checking is not correct.
>>> the newly added WARN_ON(!mr) is the only once place where the mr can be NULL but not handled correctly.
>>> At least with/without this patch, once WARN_ON(!mr) is triggered, kernel will go something wrong.
>>>
>>> so i want to place this WARN_ON(!mr) to point to the problem.
>>>
>>> Thanks
>>> Zhijian
>>>
>>>> but you can not figure it out.
>>>> So you send it upstream as a patch?
>>>>
>>>> I am not sure if it is a good idea.
>>>>
>>>> Zhu Yanjun
>>>>
>>>>> Signed-off-by: Li Zhijian <[email protected]>
>>>>> ---
>>>>> drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>>>>> index ed5a09e86417..218c14fb07c6 100644
>>>>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>>>>> @@ -778,6 +778,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>>> if (res->state == rdatm_res_state_new) {
>>>>> if (!res->replay) {
>>>>> mr = qp->resp.mr;
>>>>> + WARN_ON(!mr);
>>>>> qp->resp.mr = NULL;
>>>>> } else {
>>>>> mr = rxe_recheck_mr(qp, res->read.rkey);
>>>>> @@ -811,8 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>>>
>>>>> rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>>>>> payload, RXE_FROM_MR_OBJ);
>>>>> - if (mr)
>>>>> - rxe_put(mr);
>>>>> + rxe_put(mr);
>>>>>
>>>>> if (bth_pad(&ack_pkt)) {
>>>>> u8 *pad = payload_addr(&ack_pkt) + payload
>>>>> --
>>>>> 2.31.1
>>>>>
>>>
>>
>> Li is correct that the only way mr could be NULL is if qp->resp.mr == NULL. So the
>
> What I am concerned about is if "WARN_ON(!mr);" should be added or not.
> IMO, if the root cause remains unclear, this should be a problem.
> Currently this problem is not fixed. It is useless to send a debug
> statement to the maillist.

Li was fixing a bug that no one ever saw. mr is not NULL in this case.

Bob
>
> Zhu Yanjun
>
>> 'if (mr)' is not needed if that is the case. The read_reply subroutine is reached
>> from a new rdma read operation after going through check_rkey or from a previous
>> rdma read operations from get_req if qp->resp.res != NULL or from a duplicate request
>> where the previous responder resource is found. In all these cases the mr is set.
>> Initially in check_rkey where if it can't find the mr it causes an RKEY_VIOLATION.
>> Thereafter the rkey is stored in the responder resources and looked up for each
>> packet to get an mr or cause an RKEY_VIOLATION. So the mr can't be NULL. I think
>> you can leave out the WARN and just drop the if (mr).
>>
>> Bob
>>