2021-08-19 12:31:07

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected

With NFSv4.1+ on RDMA, backchannel recovery appears not to work.

xprt_setup_xxx_bc() is invoked by the client's first CREATE_SESSION
operation, and it always marks the rpc_clnt's transport as
connected.

On a subsequent CREATE_SESSION, if rpc_create() is called and
xpt_bc_xprt is populated, it might not be connected (for instance,
if a backchannel fault has occurred). Ensure that code path returns
a connected xprt also.

Reported-by: Timo Rothenpieler <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/clnt.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 8b4de70e8ead..570480a649a3 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
xprt = args->bc_xprt->xpt_bc_xprt;
if (xprt) {
xprt_get(xprt);
+ xprt_set_connected(xprt);
return rpc_create_xprt(args, xprt);
}
}



2021-08-19 13:02:22

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected

On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote:
> With NFSv4.1+ on RDMA, backchannel recovery appears not to work.
>
> xprt_setup_xxx_bc() is invoked by the client's first CREATE_SESSION
> operation, and it always marks the rpc_clnt's transport as
> connected.
>
> On a subsequent CREATE_SESSION, if rpc_create() is called and
> xpt_bc_xprt is populated, it might not be connected (for instance,
> if a backchannel fault has occurred). Ensure that code path returns
> a connected xprt also.
>
> Reported-by: Timo Rothenpieler <[email protected]>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
>  net/sunrpc/clnt.c |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 8b4de70e8ead..570480a649a3 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct
> rpc_create_args *args)
>                 xprt = args->bc_xprt->xpt_bc_xprt;
>                 if (xprt) {
>                         xprt_get(xprt);
> +                       xprt_set_connected(xprt);
>                         return rpc_create_xprt(args, xprt);
>                 }
>         }
>
>

No. This is wrong. If the connection got disconnected, then the client
needs to reconnect and build a new connection altogether. We can't just
make pretend that the old connection still exists.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-08-19 13:20:09

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected



> On Aug 19, 2021, at 9:01 AM, Trond Myklebust <[email protected]> wrote:
>
> On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote:
>> With NFSv4.1+ on RDMA, backchannel recovery appears not to work.
>>
>> xprt_setup_xxx_bc() is invoked by the client's first CREATE_SESSION
>> operation, and it always marks the rpc_clnt's transport as
>> connected.
>>
>> On a subsequent CREATE_SESSION, if rpc_create() is called and
>> xpt_bc_xprt is populated, it might not be connected (for instance,
>> if a backchannel fault has occurred). Ensure that code path returns
>> a connected xprt also.
>>
>> Reported-by: Timo Rothenpieler <[email protected]>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> net/sunrpc/clnt.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>> index 8b4de70e8ead..570480a649a3 100644
>> --- a/net/sunrpc/clnt.c
>> +++ b/net/sunrpc/clnt.c
>> @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct
>> rpc_create_args *args)
>> xprt = args->bc_xprt->xpt_bc_xprt;
>> if (xprt) {
>> xprt_get(xprt);
>> + xprt_set_connected(xprt);
>> return rpc_create_xprt(args, xprt);
>> }
>> }
>>
>>
>
> No. This is wrong. If the connection got disconnected, then the client
> needs to reconnect and build a new connection altogether. We can't just
> make pretend that the old connection still exists.

The patch description is not clear: the client has not disconnected.
The forward channel is functioning properly, and the server has set
SEQ4_STATUS_BACKCHANNEL_FAULT.

To recover, the client sends a DESTROY_SESSION / CREATE_SESSION pair
on the existing connection. On the server, setup_callback_client()
invokes rpc_create() again -- it's this step that is failing during
the second CREATE_SESSION on a connection because the old xprt
is returned but it's still marked disconnected.

An alternative would be to ensure that setup_callback_client()
always puts xpt_bc_xprt before it invokes rpc_create(). But it
looked to me like rpc_create() already has a bunch of logic to
deal with an existing xpt_bc_xprt.


--
Chuck Lever



2021-08-19 14:15:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected

On Thu, 2021-08-19 at 13:16 +0000, Chuck Lever III wrote:
>
>
> > On Aug 19, 2021, at 9:01 AM, Trond Myklebust
> > <[email protected]> wrote:
> >
> > On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote:
> > > With NFSv4.1+ on RDMA, backchannel recovery appears not to work.
> > >
> > > xprt_setup_xxx_bc() is invoked by the client's first
> > > CREATE_SESSION
> > > operation, and it always marks the rpc_clnt's transport as
> > > connected.
> > >
> > > On a subsequent CREATE_SESSION, if rpc_create() is called and
> > > xpt_bc_xprt is populated, it might not be connected (for
> > > instance,
> > > if a backchannel fault has occurred). Ensure that code path
> > > returns
> > > a connected xprt also.
> > >
> > > Reported-by: Timo Rothenpieler <[email protected]>
> > > Signed-off-by: Chuck Lever <[email protected]>
> > > ---
> > >  net/sunrpc/clnt.c |    1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > index 8b4de70e8ead..570480a649a3 100644
> > > --- a/net/sunrpc/clnt.c
> > > +++ b/net/sunrpc/clnt.c
> > > @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct
> > > rpc_create_args *args)
> > >                 xprt = args->bc_xprt->xpt_bc_xprt;
> > >                 if (xprt) {
> > >                         xprt_get(xprt);
> > > +                       xprt_set_connected(xprt);
> > >                         return rpc_create_xprt(args, xprt);
> > >                 }
> > >         }
> > >
> > >
> >
> > No. This is wrong. If the connection got disconnected, then the
> > client
> > needs to reconnect and build a new connection altogether. We can't
> > just
> > make pretend that the old connection still exists.
>
> The patch description is not clear: the client has not disconnected.
> The forward channel is functioning properly, and the server has set
> SEQ4_STATUS_BACKCHANNEL_FAULT.
>
> To recover, the client sends a DESTROY_SESSION / CREATE_SESSION pair
> on the existing connection. On the server, setup_callback_client()
> invokes rpc_create() again -- it's this step that is failing during
> the second CREATE_SESSION on a connection because the old xprt
> is returned but it's still marked disconnected.
>

How is that happening? As far as I can tell, the only thing that can
cause the backchannel to be marked as closed is a call to
svc_delete_xprt(), which explicitly calls xprt->xpt_bc_xprt->ops-
>close(xprt->xpt_bc_xprt).

The server shouldn't be reusing anything from that svc_xprt after that.

> An alternative would be to ensure that setup_callback_client()
> always puts xpt_bc_xprt before it invokes rpc_create(). But it
> looked to me like rpc_create() already has a bunch of logic to
> deal with an existing xpt_bc_xprt.
>
>

The connection status is managed by the transport layer, not the RPC
client layer.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-08-19 14:35:33

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected



> On Aug 19, 2021, at 10:14 AM, Trond Myklebust <[email protected]> wrote:
>
> On Thu, 2021-08-19 at 13:16 +0000, Chuck Lever III wrote:
>>
>>
>>> On Aug 19, 2021, at 9:01 AM, Trond Myklebust
>>> <[email protected]> wrote:
>>>
>>> On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote:
>>>> With NFSv4.1+ on RDMA, backchannel recovery appears not to work.
>>>>
>>>> xprt_setup_xxx_bc() is invoked by the client's first
>>>> CREATE_SESSION
>>>> operation, and it always marks the rpc_clnt's transport as
>>>> connected.
>>>>
>>>> On a subsequent CREATE_SESSION, if rpc_create() is called and
>>>> xpt_bc_xprt is populated, it might not be connected (for
>>>> instance,
>>>> if a backchannel fault has occurred). Ensure that code path
>>>> returns
>>>> a connected xprt also.
>>>>
>>>> Reported-by: Timo Rothenpieler <[email protected]>
>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>> ---
>>>> net/sunrpc/clnt.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>>> index 8b4de70e8ead..570480a649a3 100644
>>>> --- a/net/sunrpc/clnt.c
>>>> +++ b/net/sunrpc/clnt.c
>>>> @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct
>>>> rpc_create_args *args)
>>>> xprt = args->bc_xprt->xpt_bc_xprt;
>>>> if (xprt) {
>>>> xprt_get(xprt);
>>>> + xprt_set_connected(xprt);
>>>> return rpc_create_xprt(args, xprt);
>>>> }
>>>> }
>>>>
>>>>
>>>
>>> No. This is wrong. If the connection got disconnected, then the
>>> client
>>> needs to reconnect and build a new connection altogether. We can't
>>> just
>>> make pretend that the old connection still exists.
>>
>> The patch description is not clear: the client has not disconnected.
>> The forward channel is functioning properly, and the server has set
>> SEQ4_STATUS_BACKCHANNEL_FAULT.
>>
>> To recover, the client sends a DESTROY_SESSION / CREATE_SESSION pair
>> on the existing connection. On the server, setup_callback_client()
>> invokes rpc_create() again -- it's this step that is failing during
>> the second CREATE_SESSION on a connection because the old xprt
>> is returned but it's still marked disconnected.
>
> How is that happening?

The RPC client's autodisconnect is firing. This was fixed in 6820bf77864d
("svcrdma: disable timeouts on rdma backchannel").

However we're still seeing cases where backchannel recovery is not
working. See:

https://lore.kernel.org/linux-nfs/[email protected]/T/#t

As an experiment, we've reverted 6820bf77864d to try to provoke the bug.


> As far as I can tell, the only thing that can
> cause the backchannel to be marked as closed is a call to
> svc_delete_xprt(), which explicitly calls xprt->xpt_bc_xprt->ops->close(xprt->xpt_bc_xprt).
>
> The server shouldn't be reusing anything from that svc_xprt after that.

>> An alternative would be to ensure that setup_callback_client()
>> always puts xpt_bc_xprt before it invokes rpc_create(). But it
>> looked to me like rpc_create() already has a bunch of logic to
>> deal with an existing xpt_bc_xprt.
>
> The connection status is managed by the transport layer, not the RPC
> client layer.

If it's a bug to mark a backchannel transport disconnected, then
asserts should be added to capture the problem.

What is the purpose of this code in rpc_create() ?

533 if (args->bc_xprt) {
534 WARN_ON_ONCE(!(args->protocol & XPRT_TRANSPORT_BC));
535 xprt = args->bc_xprt->xpt_bc_xprt;
536 if (xprt) {
537 xprt_get(xprt);
538 return rpc_create_xprt(args, xprt);
539 }
540 }


--
Chuck Lever



2021-08-20 00:15:45

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected

On Thu, 2021-08-19 at 14:34 +0000, Chuck Lever III wrote:
>
>
> > On Aug 19, 2021, at 10:14 AM, Trond Myklebust
> > <[email protected]> wrote:
> >
> > On Thu, 2021-08-19 at 13:16 +0000, Chuck Lever III wrote:
> > >
> > >
> > > > On Aug 19, 2021, at 9:01 AM, Trond Myklebust
> > > > <[email protected]> wrote:
> > > >
> > > > On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote:
> > > > > With NFSv4.1+ on RDMA, backchannel recovery appears not to
> > > > > work.
> > > > >
> > > > > xprt_setup_xxx_bc() is invoked by the client's first
> > > > > CREATE_SESSION
> > > > > operation, and it always marks the rpc_clnt's transport as
> > > > > connected.
> > > > >
> > > > > On a subsequent CREATE_SESSION, if rpc_create() is called and
> > > > > xpt_bc_xprt is populated, it might not be connected (for
> > > > > instance,
> > > > > if a backchannel fault has occurred). Ensure that code path
> > > > > returns
> > > > > a connected xprt also.
> > > > >
> > > > > Reported-by: Timo Rothenpieler <[email protected]>
> > > > > Signed-off-by: Chuck Lever <[email protected]>
> > > > > ---
> > > > >  net/sunrpc/clnt.c |    1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > >
> > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > > > index 8b4de70e8ead..570480a649a3 100644
> > > > > --- a/net/sunrpc/clnt.c
> > > > > +++ b/net/sunrpc/clnt.c
> > > > > @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct
> > > > > rpc_create_args *args)
> > > > >                 xprt = args->bc_xprt->xpt_bc_xprt;
> > > > >                 if (xprt) {
> > > > >                         xprt_get(xprt);
> > > > > +                       xprt_set_connected(xprt);
> > > > >                         return rpc_create_xprt(args, xprt);
> > > > >                 }
> > > > >         }
> > > > >
> > > > >
> > > >
> > > > No. This is wrong. If the connection got disconnected, then the
> > > > client
> > > > needs to reconnect and build a new connection altogether. We
> > > > can't
> > > > just
> > > > make pretend that the old connection still exists.
> > >
> > > The patch description is not clear: the client has not
> > > disconnected.
> > > The forward channel is functioning properly, and the server has
> > > set
> > > SEQ4_STATUS_BACKCHANNEL_FAULT.
> > >
> > > To recover, the client sends a DESTROY_SESSION / CREATE_SESSION
> > > pair
> > > on the existing connection. On the server,
> > > setup_callback_client()
> > > invokes rpc_create() again -- it's this step that is failing
> > > during
> > > the second CREATE_SESSION on a connection because the old xprt
> > > is returned but it's still marked disconnected.
> >
> > How is that happening?
>
> The RPC client's autodisconnect is firing. This was fixed in
> 6820bf77864d
> ("svcrdma: disable timeouts on rdma backchannel").
>
> However we're still seeing cases where backchannel recovery is not
> working. See:
>
> https://lore.kernel.org/linux-nfs/[email protected]/T/#t
>
> As an experiment, we've reverted 6820bf77864d to try to provoke the
> bug.
>
>
> > As far as I can tell, the only thing that can
> > cause the backchannel to be marked as closed is a call to
> > svc_delete_xprt(), which explicitly calls xprt->xpt_bc_xprt->ops-
> > >close(xprt->xpt_bc_xprt).
> >
> > The server shouldn't be reusing anything from that svc_xprt after
> > that.
>
> > > An alternative would be to ensure that setup_callback_client()
> > > always puts xpt_bc_xprt before it invokes rpc_create(). But it
> > > looked to me like rpc_create() already has a bunch of logic to
> > > deal with an existing xpt_bc_xprt.
> >
> > The connection status is managed by the transport layer, not the
> > RPC
> > client layer.
>
> If it's a bug to mark a backchannel transport disconnected, then
> asserts should be added to capture the problem.

It isn't a bug to mark the transport as disconnected. It is a bug to
mark it as disconnected and then to continue using it as if it were
connected.

>
> What is the purpose of this code in rpc_create() ?
>
>  533         if (args->bc_xprt) {
>  534                 WARN_ON_ONCE(!(args->protocol &
> XPRT_TRANSPORT_BC));
>  535                 xprt = args->bc_xprt->xpt_bc_xprt;
>  536                 if (xprt) {
>  537                         xprt_get(xprt);
>  538                         return rpc_create_xprt(args, xprt);
>  539                 }
>  540         }
>

Are you asking me or Bruce?

I'm not much of a fan of this code snippet either, because it doesn't
really appear to have much to do with the normal function of
rpc_create().
However as I understand it, the purpose is to create an instance of
struct rpc_clnt when presented with an existing instance of svc_xprt.
It does not currently modify that svc_xprt instance in any way that
breaks layering. All it does is take a reference to the xpt_bc_xprt
field.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-08-20 13:58:43

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected



> On Aug 19, 2021, at 8:14 PM, Trond Myklebust <[email protected]> wrote:
>
> On Thu, 2021-08-19 at 14:34 +0000, Chuck Lever III wrote:
>>
>>
>>> On Aug 19, 2021, at 10:14 AM, Trond Myklebust
>>> <[email protected]> wrote:
>>>
>>> On Thu, 2021-08-19 at 13:16 +0000, Chuck Lever III wrote:
>>>>
>>>>
>>>>> On Aug 19, 2021, at 9:01 AM, Trond Myklebust
>>>>> <[email protected]> wrote:
>>>>>
>>>>> On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote:
>>>>>> With NFSv4.1+ on RDMA, backchannel recovery appears not to
>>>>>> work.
>>>>>>
>>>>>> xprt_setup_xxx_bc() is invoked by the client's first
>>>>>> CREATE_SESSION
>>>>>> operation, and it always marks the rpc_clnt's transport as
>>>>>> connected.
>>>>>>
>>>>>> On a subsequent CREATE_SESSION, if rpc_create() is called and
>>>>>> xpt_bc_xprt is populated, it might not be connected (for
>>>>>> instance,
>>>>>> if a backchannel fault has occurred). Ensure that code path
>>>>>> returns
>>>>>> a connected xprt also.
>>>>>>
>>>>>> Reported-by: Timo Rothenpieler <[email protected]>
>>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>>> ---
>>>>>> net/sunrpc/clnt.c | 1 +
>>>>>> 1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>>>>> index 8b4de70e8ead..570480a649a3 100644
>>>>>> --- a/net/sunrpc/clnt.c
>>>>>> +++ b/net/sunrpc/clnt.c
>>>>>> @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct
>>>>>> rpc_create_args *args)
>>>>>> xprt = args->bc_xprt->xpt_bc_xprt;
>>>>>> if (xprt) {
>>>>>> xprt_get(xprt);
>>>>>> + xprt_set_connected(xprt);
>>>>>> return rpc_create_xprt(args, xprt);
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>
>>>>> No. This is wrong. If the connection got disconnected, then the
>>>>> client
>>>>> needs to reconnect and build a new connection altogether. We
>>>>> can't
>>>>> just
>>>>> make pretend that the old connection still exists.
>>>>
>>>> The patch description is not clear: the client has not
>>>> disconnected.
>>>> The forward channel is functioning properly, and the server has
>>>> set
>>>> SEQ4_STATUS_BACKCHANNEL_FAULT.
>>>>
>>>> To recover, the client sends a DESTROY_SESSION / CREATE_SESSION
>>>> pair
>>>> on the existing connection. On the server,
>>>> setup_callback_client()
>>>> invokes rpc_create() again -- it's this step that is failing
>>>> during
>>>> the second CREATE_SESSION on a connection because the old xprt
>>>> is returned but it's still marked disconnected.
>>>
>>> How is that happening?
>>
>> The RPC client's autodisconnect is firing. This was fixed in
>> 6820bf77864d
>> ("svcrdma: disable timeouts on rdma backchannel").
>>
>> However we're still seeing cases where backchannel recovery is not
>> working. See:
>>
>> https://lore.kernel.org/linux-nfs/[email protected]/T/#t
>>
>> As an experiment, we've reverted 6820bf77864d to try to provoke the
>> bug.
>>
>>
>>> As far as I can tell, the only thing that can
>>> cause the backchannel to be marked as closed is a call to
>>> svc_delete_xprt(), which explicitly calls xprt->xpt_bc_xprt->ops-
>>>> close(xprt->xpt_bc_xprt).
>>>
>>> The server shouldn't be reusing anything from that svc_xprt after
>>> that.
>>
>>>> An alternative would be to ensure that setup_callback_client()
>>>> always puts xpt_bc_xprt before it invokes rpc_create(). But it
>>>> looked to me like rpc_create() already has a bunch of logic to
>>>> deal with an existing xpt_bc_xprt.
>>>
>>> The connection status is managed by the transport layer, not the
>>> RPC
>>> client layer.
>>
>> If it's a bug to mark a backchannel transport disconnected, then
>> asserts should be added to capture the problem.
>
> It isn't a bug to mark the transport as disconnected. It is a bug to
> mark it as disconnected and then to continue using it as if it were
> connected.

Are you suggesting the server's backchannel logic should
look for -107 status after posting an RPC, and then recover
somehow? Perhaps it could put xpt_bc_xprt then set it to
NULL, and then invoke setup_callback_client() again ?


>> What is the purpose of this code in rpc_create() ?
>>
>> 533 if (args->bc_xprt) {
>> 534 WARN_ON_ONCE(!(args->protocol &
>> XPRT_TRANSPORT_BC));
>> 535 xprt = args->bc_xprt->xpt_bc_xprt;
>> 536 if (xprt) {
>> 537 xprt_get(xprt);
>> 538 return rpc_create_xprt(args, xprt);
>> 539 }
>> 540 }
>>
>
> Are you asking me or Bruce?

Fair enough, adding Bruce.


> I'm not much of a fan of this code snippet either, because it doesn't
> really appear to have much to do with the normal function of
> rpc_create().
> However as I understand it, the purpose is to create an instance of
> struct rpc_clnt when presented with an existing instance of svc_xprt.
> It does not currently modify that svc_xprt instance in any way that
> breaks layering. All it does is take a reference to the xpt_bc_xprt
> field.

I see that the initial addition of this logic was done by
d531c008d7d9 ("NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp").
It was later moved into rpc_create() by d50039ea5ee6 ("nfsd4/rpc:
move backchannel create logic into rpc code").

Probably the essential purpose of this code is explained by a comment
that was deleted by d531c008d7d9:

- if (args->bc_xprt->xpt_bc_xprt) {
- /*
- * This server connection already has a backchannel
- * transport; we can't create a new one, as we wouldn't
- * be able to match replies based on xid any more. So,
- * reuse the already-existing one:
- */
- return args->bc_xprt->xpt_bc_xprt;
- }

Is this still a sensible expectation?

--
Chuck Lever



2021-08-20 14:33:01

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected

On Fri, 2021-08-20 at 13:58 +0000, Chuck Lever III wrote:
>
>
> > On Aug 19, 2021, at 8:14 PM, Trond Myklebust
> > <[email protected]> wrote:
> >
> > On Thu, 2021-08-19 at 14:34 +0000, Chuck Lever III wrote:
> > >
> > >
> > > > On Aug 19, 2021, at 10:14 AM, Trond Myklebust
> > > > <[email protected]> wrote:
> > > >
> > > > On Thu, 2021-08-19 at 13:16 +0000, Chuck Lever III wrote:
> > > > >
> > > > >
> > > > > > On Aug 19, 2021, at 9:01 AM, Trond Myklebust
> > > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote:
> > > > > > > With NFSv4.1+ on RDMA, backchannel recovery appears not to
> > > > > > > work.
> > > > > > >
> > > > > > > xprt_setup_xxx_bc() is invoked by the client's first
> > > > > > > CREATE_SESSION
> > > > > > > operation, and it always marks the rpc_clnt's transport as
> > > > > > > connected.
> > > > > > >
> > > > > > > On a subsequent CREATE_SESSION, if rpc_create() is called
> > > > > > > and
> > > > > > > xpt_bc_xprt is populated, it might not be connected (for
> > > > > > > instance,
> > > > > > > if a backchannel fault has occurred). Ensure that code path
> > > > > > > returns
> > > > > > > a connected xprt also.
> > > > > > >
> > > > > > > Reported-by: Timo Rothenpieler <[email protected]>
> > > > > > > Signed-off-by: Chuck Lever <[email protected]>
> > > > > > > ---
> > > > > > >  net/sunrpc/clnt.c |    1 +
> > > > > > >  1 file changed, 1 insertion(+)
> > > > > > >
> > > > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > > > > > index 8b4de70e8ead..570480a649a3 100644
> > > > > > > --- a/net/sunrpc/clnt.c
> > > > > > > +++ b/net/sunrpc/clnt.c
> > > > > > > @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct
> > > > > > > rpc_create_args *args)
> > > > > > >                 xprt = args->bc_xprt->xpt_bc_xprt;
> > > > > > >                 if (xprt) {
> > > > > > >                         xprt_get(xprt);
> > > > > > > +                       xprt_set_connected(xprt);
> > > > > > >                         return rpc_create_xprt(args, xprt);
> > > > > > >                 }
> > > > > > >         }
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > No. This is wrong. If the connection got disconnected, then
> > > > > > the
> > > > > > client
> > > > > > needs to reconnect and build a new connection altogether. We
> > > > > > can't
> > > > > > just
> > > > > > make pretend that the old connection still exists.
> > > > >
> > > > > The patch description is not clear: the client has not
> > > > > disconnected.
> > > > > The forward channel is functioning properly, and the server has
> > > > > set
> > > > > SEQ4_STATUS_BACKCHANNEL_FAULT.
> > > > >
> > > > > To recover, the client sends a DESTROY_SESSION / CREATE_SESSION
> > > > > pair
> > > > > on the existing connection. On the server,
> > > > > setup_callback_client()
> > > > > invokes rpc_create() again -- it's this step that is failing
> > > > > during
> > > > > the second CREATE_SESSION on a connection because the old xprt
> > > > > is returned but it's still marked disconnected.
> > > >
> > > > How is that happening?
> > >
> > > The RPC client's autodisconnect is firing. This was fixed in
> > > 6820bf77864d
> > > ("svcrdma: disable timeouts on rdma backchannel").
> > >
> > > However we're still seeing cases where backchannel recovery is not
> > > working. See:
> > >
> > > https://lore.kernel.org/linux-nfs/[email protected]/T/#t
> > >
> > > As an experiment, we've reverted 6820bf77864d to try to provoke the
> > > bug.
> > >
> > >
> > > > As far as I can tell, the only thing that can
> > > > cause the backchannel to be marked as closed is a call to
> > > > svc_delete_xprt(), which explicitly calls xprt->xpt_bc_xprt->ops-
> > > > > close(xprt->xpt_bc_xprt).
> > > >
> > > > The server shouldn't be reusing anything from that svc_xprt after
> > > > that.
> > >
> > > > > An alternative would be to ensure that setup_callback_client()
> > > > > always puts xpt_bc_xprt before it invokes rpc_create(). But it
> > > > > looked to me like rpc_create() already has a bunch of logic to
> > > > > deal with an existing xpt_bc_xprt.
> > > >
> > > > The connection status is managed by the transport layer, not the
> > > > RPC
> > > > client layer.
> > >
> > > If it's a bug to mark a backchannel transport disconnected, then
> > > asserts should be added to capture the problem.
> >
> > It isn't a bug to mark the transport as disconnected. It is a bug to
> > mark it as disconnected and then to continue using it as if it were
> > connected.
>
> Are you suggesting the server's backchannel logic should
> look for -107 status after posting an RPC, and then recover
> somehow? Perhaps it could put xpt_bc_xprt then set it to
> NULL, and then invoke setup_callback_client() again ?
>

What I'm saying is that nothing should be calling ->close() on the
struct rpc_xprt held by a struct svc_xprt without doing so through a
call to svc_delete_xprt(), and that nothing should be calling
svc_delete_xprt() without it being part of an effort to close the
svc_xprt.

>
> > > What is the purpose of this code in rpc_create() ?
> > >
> > >  533         if (args->bc_xprt) {
> > >  534                 WARN_ON_ONCE(!(args->protocol &
> > > XPRT_TRANSPORT_BC));
> > >  535                 xprt = args->bc_xprt->xpt_bc_xprt;
> > >  536                 if (xprt) {
> > >  537                         xprt_get(xprt);
> > >  538                         return rpc_create_xprt(args, xprt);
> > >  539                 }
> > >  540         }
> > >
> >
> > Are you asking me or Bruce?
>
> Fair enough, adding Bruce.
>
>
> > I'm not much of a fan of this code snippet either, because it doesn't
> > really appear to have much to do with the normal function of
> > rpc_create().
> > However as I understand it, the purpose is to create an instance of
> > struct rpc_clnt when presented with an existing instance of svc_xprt.
> > It does not currently modify that svc_xprt instance in any way that
> > breaks layering. All it does is take a reference to the xpt_bc_xprt
> > field.
>
> I see that the initial addition of this logic was done by
> d531c008d7d9 ("NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp").
> It was later moved into rpc_create() by d50039ea5ee6 ("nfsd4/rpc:
> move backchannel create logic into rpc code").
>
> Probably the essential purpose of this code is explained by a comment
> that was deleted by d531c008d7d9:
>
> -       if (args->bc_xprt->xpt_bc_xprt) {
> -               /*
> -                * This server connection already has a backchannel
> -                * transport; we can't create a new one, as we wouldn't
> -                * be able to match replies based on xid any more.  So,
> -                * reuse the already-existing one:
> -                */
> -                return args->bc_xprt->xpt_bc_xprt;
> -       }
>
> Is this still a sensible expectation?
>

OK, let me clarify where my statements above are coming from and
perhaps clarify my language.

IMO, we need to distinguish between a closed _transport_ as understood
by the RPC layer, and a closed _channel_ as understood by the NFS
layer.

When the transport is closed at the RPC layer, then it means that the
actual transport connection needs to be broken. Both the forward and
backward channels on that transport connection are affected, and the
client needs to re-establish a transport connection in order to resume
communication with the server.

Sessions and their channels are managed by the NFS layer, and so if
only the back channel is 'closed', then that is a matter for the NFS
layer to deal with. It may decide that it needs to shut down the
transport, or not. However if it does keep the transport open, then it
needs to manage the consequences should something come in over the
'closed' back channel.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]