2023-07-14 15:37:07

by Chuck Lever

[permalink] [raw]
Subject: [PATCH] xprtrdma: Remap Receive buffers after a reconnect

From: Chuck Lever <[email protected]>

On server-initiated disconnect, rpcrdma_xprt_disconnect() DMA-unmaps
the transport's Receive buffers, but rpcrdma_post_recvs() neglected
to remap them after a new connection had been established. The
result is immediate failure of the new connection with the Receives
flushing with LOCAL_PROT_ERR.

Fixes: 671c450b6fe0 ("xprtrdma: Fix oops in Receive handler after device removal")
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/verbs.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

Hi Anna-

Can you apply this for 6.5-rc ?


diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b098fde373ab..28c0771c4e8c 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -935,9 +935,6 @@ struct rpcrdma_rep *rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt,
if (!rep->rr_rdmabuf)
goto out_free;

- if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf))
- goto out_free_regbuf;
-
rep->rr_cid.ci_completion_id =
atomic_inc_return(&r_xprt->rx_ep->re_completion_ids);

@@ -956,8 +953,6 @@ struct rpcrdma_rep *rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt,
spin_unlock(&buf->rb_lock);
return rep;

-out_free_regbuf:
- rpcrdma_regbuf_free(rep->rr_rdmabuf);
out_free:
kfree(rep);
out:
@@ -1363,6 +1358,10 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp)
rep = rpcrdma_rep_create(r_xprt, temp);
if (!rep)
break;
+ if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf)) {
+ rpcrdma_rep_put(buf, rep);
+ break;
+ }

rep->rr_cid.ci_queue_id = ep->re_attr.recv_cq->res.id;
trace_xprtrdma_post_recv(rep);




2023-07-15 17:25:54

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH] xprtrdma: Remap Receive buffers after a reconnect

On 7/14/2023 11:13 AM, Chuck Lever wrote:
> From: Chuck Lever <[email protected]>
>
> On server-initiated disconnect, rpcrdma_xprt_disconnect() DMA-unmaps
> the transport's Receive buffers, but rpcrdma_post_recvs() neglected
> to remap them after a new connection had been established. The
> result is immediate failure of the new connection with the Receives
> flushing with LOCAL_PROT_ERR.

The fix is correct, my only comment is that the failure occurs when the
first message is received, and only the first CQE is a LOCAL_PROT_ERR.
The remainder of the posted receives are simply flushed.

Same result of course! The summary is ok as-is. Important fix.

Reviewed-by: Tom Talpey <[email protected]>

> Fixes: 671c450b6fe0 ("xprtrdma: Fix oops in Receive handler after device removal")
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> net/sunrpc/xprtrdma/verbs.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> Hi Anna-
>
> Can you apply this for 6.5-rc ?
>
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index b098fde373ab..28c0771c4e8c 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -935,9 +935,6 @@ struct rpcrdma_rep *rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt,
> if (!rep->rr_rdmabuf)
> goto out_free;
>
> - if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf))
> - goto out_free_regbuf;
> -
> rep->rr_cid.ci_completion_id =
> atomic_inc_return(&r_xprt->rx_ep->re_completion_ids);
>
> @@ -956,8 +953,6 @@ struct rpcrdma_rep *rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt,
> spin_unlock(&buf->rb_lock);
> return rep;
>
> -out_free_regbuf:
> - rpcrdma_regbuf_free(rep->rr_rdmabuf);
> out_free:
> kfree(rep);
> out:
> @@ -1363,6 +1358,10 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp)
> rep = rpcrdma_rep_create(r_xprt, temp);
> if (!rep)
> break;
> + if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf)) {
> + rpcrdma_rep_put(buf, rep);
> + break;
> + }
>
> rep->rr_cid.ci_queue_id = ep->re_attr.recv_cq->res.id;
> trace_xprtrdma_post_recv(rep);
>
>
>

2023-07-15 20:24:37

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] xprtrdma: Remap Receive buffers after a reconnect



> On Jul 15, 2023, at 1:01 PM, Tom Talpey <[email protected]> wrote:
>
> On 7/14/2023 11:13 AM, Chuck Lever wrote:
>> From: Chuck Lever <[email protected]>
>> On server-initiated disconnect, rpcrdma_xprt_disconnect() DMA-unmaps
>> the transport's Receive buffers, but rpcrdma_post_recvs() neglected
>> to remap them after a new connection had been established. The
>> result is immediate failure of the new connection with the Receives
>> flushing with LOCAL_PROT_ERR.
>
> The fix is correct, my only comment is that the failure occurs when the
> first message is received, and only the first CQE is a LOCAL_PROT_ERR.
> The remainder of the posted receives are simply flushed.
>
> Same result of course! The summary is ok as-is. Important fix.
>
> Reviewed-by: Tom Talpey <[email protected]>

Thanks for the look!

Anna, when applying this patch, can you replace:

"The result is immediate failure of the new connection with the Receives
flushing with LOCAL_PROT_ERR."

with

"The result is immediate failure of the new connection with the first
unmapped Receive completing with LOCAL_PROT_ERR."


>> Fixes: 671c450b6fe0 ("xprtrdma: Fix oops in Receive handler after device removal")
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> net/sunrpc/xprtrdma/verbs.c | 9 ++++-----
>> 1 file changed, 4 insertions(+), 5 deletions(-)
>> Hi Anna-
>> Can you apply this for 6.5-rc ?
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index b098fde373ab..28c0771c4e8c 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -935,9 +935,6 @@ struct rpcrdma_rep *rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt,
>> if (!rep->rr_rdmabuf)
>> goto out_free;
>> - if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf))
>> - goto out_free_regbuf;
>> -
>> rep->rr_cid.ci_completion_id =
>> atomic_inc_return(&r_xprt->rx_ep->re_completion_ids);
>> @@ -956,8 +953,6 @@ struct rpcrdma_rep *rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt,
>> spin_unlock(&buf->rb_lock);
>> return rep;
>> -out_free_regbuf:
>> - rpcrdma_regbuf_free(rep->rr_rdmabuf);
>> out_free:
>> kfree(rep);
>> out:
>> @@ -1363,6 +1358,10 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp)
>> rep = rpcrdma_rep_create(r_xprt, temp);
>> if (!rep)
>> break;
>> + if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf)) {
>> + rpcrdma_rep_put(buf, rep);
>> + break;
>> + }
>> rep->rr_cid.ci_queue_id = ep->re_attr.recv_cq->res.id;
>> trace_xprtrdma_post_recv(rep);

--
Chuck Lever



2023-07-16 17:40:07

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] xprtrdma: Remap Receive buffers after a reconnect



> On Jul 15, 2023, at 2:56 PM, Chuck Lever III <[email protected]> wrote:
>
>
>
>> On Jul 15, 2023, at 1:01 PM, Tom Talpey <[email protected]> wrote:
>>
>> On 7/14/2023 11:13 AM, Chuck Lever wrote:
>>> From: Chuck Lever <[email protected]>
>>> On server-initiated disconnect, rpcrdma_xprt_disconnect() DMA-unmaps
>>> the transport's Receive buffers, but rpcrdma_post_recvs() neglected
>>> to remap them after a new connection had been established. The
>>> result is immediate failure of the new connection with the Receives
>>> flushing with LOCAL_PROT_ERR.
>>
>> The fix is correct, my only comment is that the failure occurs when the
>> first message is received, and only the first CQE is a LOCAL_PROT_ERR.
>> The remainder of the posted receives are simply flushed.
>>
>> Same result of course! The summary is ok as-is. Important fix.
>>
>> Reviewed-by: Tom Talpey <[email protected]>
>
> Thanks for the look!
>
> Anna, when applying this patch, can you replace:
>
> "The result is immediate failure of the new connection with the Receives
> flushing with LOCAL_PROT_ERR."
>
> with
>
> "The result is immediate failure of the new connection with the first
> unmapped Receive completing with LOCAL_PROT_ERR."

Also, please add:

Reported-by: Olga Kornievskaia <[email protected]>


>>> Fixes: 671c450b6fe0 ("xprtrdma: Fix oops in Receive handler after device removal")
>>> Signed-off-by: Chuck Lever <[email protected]>
>>> ---
>>> net/sunrpc/xprtrdma/verbs.c | 9 ++++-----
>>> 1 file changed, 4 insertions(+), 5 deletions(-)
>>> Hi Anna-
>>> Can you apply this for 6.5-rc ?
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index b098fde373ab..28c0771c4e8c 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -935,9 +935,6 @@ struct rpcrdma_rep *rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt,
>>> if (!rep->rr_rdmabuf)
>>> goto out_free;
>>> - if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf))
>>> - goto out_free_regbuf;
>>> -
>>> rep->rr_cid.ci_completion_id =
>>> atomic_inc_return(&r_xprt->rx_ep->re_completion_ids);
>>> @@ -956,8 +953,6 @@ struct rpcrdma_rep *rpcrdma_rep_create(struct rpcrdma_xprt *r_xprt,
>>> spin_unlock(&buf->rb_lock);
>>> return rep;
>>> -out_free_regbuf:
>>> - rpcrdma_regbuf_free(rep->rr_rdmabuf);
>>> out_free:
>>> kfree(rep);
>>> out:
>>> @@ -1363,6 +1358,10 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp)
>>> rep = rpcrdma_rep_create(r_xprt, temp);
>>> if (!rep)
>>> break;
>>> + if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf)) {
>>> + rpcrdma_rep_put(buf, rep);
>>> + break;
>>> + }
>>> rep->rr_cid.ci_queue_id = ep->re_attr.recv_cq->res.id;
>>> trace_xprtrdma_post_recv(rep);
>
> --
> Chuck Lever


--
Chuck Lever