2014-07-16 19:38:35

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH] svcrdma: Select NFSv4.1 backchannel transport based on forward channel

The current code always selects XPRT_TRANSPORT_BC_TCP for the back
channel, even when the forward channel was not TCP (eg, RDMA). When
a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
code when trying to send CB_NULL.

Instead, construct the transport protocol number from the forward
channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
not support bi-directional RPC will not have registered a "BC"
transport, causing create_backchannel_client() to fail immediately.

Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
Signed-off-by: Chuck Lever <[email protected]>
---
Hi Bruce-

What do you think of this approach?


fs/nfsd/nfs4callback.c | 3 ++-
include/linux/sunrpc/svc_xprt.h | 1 +
net/sunrpc/svcsock.c | 2 ++
net/sunrpc/xprt.c | 2 +-
net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
5 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 2c73cae..0f23ad0 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -689,7 +689,8 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c
clp->cl_cb_session = ses;
args.bc_xprt = conn->cb_xprt;
args.prognumber = clp->cl_cb_session->se_cb_prog;
- args.protocol = XPRT_TRANSPORT_BC_TCP;
+ args.protocol = conn->cb_xprt->xpt_class->xcl_ident |
+ XPRT_TRANSPORT_BC;
args.authflavor = ses->se_cb_sec.flavor;
}
/* Create RPC client */
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 7235040..5d9d6f8 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -33,6 +33,7 @@ struct svc_xprt_class {
struct svc_xprt_ops *xcl_ops;
struct list_head xcl_list;
u32 xcl_max_payload;
+ int xcl_ident;
};

/*
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index b507cd3..b2437ee 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -692,6 +692,7 @@ static struct svc_xprt_class svc_udp_class = {
.xcl_owner = THIS_MODULE,
.xcl_ops = &svc_udp_ops,
.xcl_max_payload = RPCSVC_MAXPAYLOAD_UDP,
+ .xcl_ident = XPRT_TRANSPORT_UDP,
};

static void svc_udp_init(struct svc_sock *svsk, struct svc_serv *serv)
@@ -1292,6 +1293,7 @@ static struct svc_xprt_class svc_tcp_class = {
.xcl_owner = THIS_MODULE,
.xcl_ops = &svc_tcp_ops,
.xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
+ .xcl_ident = XPRT_TRANSPORT_TCP,
};

void svc_init_xprt_sock(void)
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index c3b2b33..51c6316 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1306,7 +1306,7 @@ struct rpc_xprt *xprt_create_transport(struct xprt_create *args)
}
}
spin_unlock(&xprt_list_lock);
- printk(KERN_ERR "RPC: transport (%d) not supported\n", args->ident);
+ dprintk("RPC: transport (%d) not supported\n", args->ident);
return ERR_PTR(-EIO);

found:
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index e7323fb..06a5d92 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -92,6 +92,7 @@ struct svc_xprt_class svc_rdma_class = {
.xcl_owner = THIS_MODULE,
.xcl_ops = &svc_rdma_ops,
.xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
+ .xcl_ident = XPRT_TRANSPORT_RDMA,
};

struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)



2014-07-18 19:27:19

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Select NFSv4.1 backchannel transport based on forward channel


On Jul 18, 2014, at 3:12 PM, J. Bruce Fields <[email protected]> wrote:

> On Fri, Jul 18, 2014 at 02:47:41PM -0400, Chuck Lever wrote:
>>
>> On Jul 17, 2014, at 2:59 PM, Chuck Lever <[email protected]> wrote:
>>
>>>
>>> On Jul 17, 2014, at 2:36 PM, J. Bruce Fields <[email protected]> wrote:
>>>
>>>> On Wed, Jul 16, 2014 at 03:38:32PM -0400, Chuck Lever wrote:
>>>>> The current code always selects XPRT_TRANSPORT_BC_TCP for the back
>>>>> channel, even when the forward channel was not TCP (eg, RDMA). When
>>>>> a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
>>>>> code when trying to send CB_NULL.
>>>>>
>>>>> Instead, construct the transport protocol number from the forward
>>>>> channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
>>>>> not support bi-directional RPC will not have registered a "BC"
>>>>> transport, causing create_backchannel_client() to fail immediately.
>>>>>
>>>>> Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>> ---
>>>>> Hi Bruce-
>>>>>
>>>>> What do you think of this approach?
>>>>
>>>> OK by me. (So clients use a separate tcp connection for the
>>>> backchannel?)
>>>
>>> Right.
>>>
>>> The current plan is that, for NFSv4.1 over RDMA, the Linux client will
>>> create an additional TCP connection and bind it to the same session as
>>> the RDMA transport.
>>>
>>> The TCP connection will be used solely for callback operations. The
>>> forward channel on that connection will not be used, except for
>>> BIND_CONN_TO_SESSION operations.
>>
>> Hi Bruce, pursuant to this goal, I?m trying to understand commit
>> 14a24e99, especially the interior comment:
>>
>> 3114 /* Should we give out recallable state?: */
>> 3115 static bool nfsd4_cb_channel_good(struct nfs4_client *clp)
>> 3116 {
>> 3117 if (clp->cl_cb_state == NFSD4_CB_UP)
>> 3118 return true;
>> 3119 /*
>> 3120 * In the sessions case, since we don't have to establish a
>> 3121 * separate connection for callbacks, we assume it's OK
>> 3122 * until we hear otherwise:
>> 3123 */
>> 3124 return clp->cl_minorversion && clp->cl_cb_state == NFSD4_CB_UNKNOWN;
>> 3125 }
>>
>> I wonder if that?s a valid assumption?
>>
>> A separate virtual connection _does_ have to be established. If the
>> client sent a CREATE_SESSION operation with the SESSION4_BACK_CHAN flag
>> set, the server must verify that the client has an active bi-directional
>> RPC service on the transport.
>
> If the client sets the BACK_CHAN flag, that means it wants that
> connection to be used as a back channel.
>
> So in the RDMA case it sounds like the client needs to clear that flag
> on the create session.

Right.

> Either that or send the original CREATE_SESSION over the connection you
> want used for the backchannel, then send a BIND_CONNECTION_TO_SESSION
> over the RDMA connection.
>
> That said, if the server knows there's no connection at all that's
> available for the backchannel, then yes I agree that it should be
> setting the PATH_DOWN flag:

I?m trying a fix that sets cl_cb_state = NFSD4_CB_DOWN in
init_session() if SESSION4_BACK_CHAN is clear.

>From the looks of it, the server is architected to support one and
only one session per client ID. Is that correct? Thus it expects
just one CREATE_SESSION operation per EXCHANGE_ID. Does the server
enforce this?

Seems like cl_cb_state really should be a per-session thing for
NFSv4.1.

And, while we?re at it, the server should assert
SEQ4_STATUS_CB_PATH_DOWN_SESSION as well as SEQ4_STATUS_CB_PATH_DOWN.

>
>> Otherwise it can hand out recallable state
>> to a client with no callback service. Is that always safe?
>>
>> Or, if the client sent a CREATE_SESSION operation with the
>> SESSION4_BACK_CHAN flag cleared, then either the client intends not to
>> set up a backchannel, or it intends to establish a separate transport
>> for the backchannel. There is no backchannel in that case until the
>> client establishes it.
>>
>> Right now, if the client does not set SESSION4_BACK_CHAN, the server
>> never asserts SEQ4_STATUS_CB_PATH_DOWN, even though there is no
>> backchannel. That seems like a (minor, perhaps) bug.
>
> Yes, agreed, I'll take a look--how are you reproducing?

Prototype Linux client that doesn?t set SESSION4_BACK_CHAN when it does
CREATE_SESSION. Preparing it for RDMA support.

You could code a pyNFS test.

> The
> nfsd4_probe_callback() should result in NFSD4_CB_DOWN getting set as
> soon as the workqueue process it.

Clearing SESSION4_BACK_CHAN squelches nfsd4_probe_callback(), so
the server never checks in this case.

> In theory there's a race there (I
> should fix that) in practice I'd normally expect that to run before we
> process a SEQUENCE request on the new session.

>> Do you remember what this commit was trying to fix?
>
> It's not a bugfix, just a minor optimization. We could revert it if
> need be.

NP, I just don?t see how it helps to make that assumption, plus it
doesn?t seem entirely safe.

My initial question was "why does alloc_client() initialize cl_cb_state
to NFSD4_CB_UNKNOWN instead of NFSD4_CB_DOWN??

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-07-17 18:37:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Select NFSv4.1 backchannel transport based on forward channel

On Thu, Jul 17, 2014 at 02:36:21PM -0400, J. Bruce Fields wrote:
> On Wed, Jul 16, 2014 at 03:38:32PM -0400, Chuck Lever wrote:
> > The current code always selects XPRT_TRANSPORT_BC_TCP for the back
> > channel, even when the forward channel was not TCP (eg, RDMA). When
> > a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
> > code when trying to send CB_NULL.
> >
> > Instead, construct the transport protocol number from the forward
> > channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
> > not support bi-directional RPC will not have registered a "BC"
> > transport, causing create_backchannel_client() to fail immediately.
> >
> > Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
> > Signed-off-by: Chuck Lever <[email protected]>
> > ---
> > Hi Bruce-
> >
> > What do you think of this approach?
>
> OK by me.

(Applying for 3.17 unless you tell me otherwise.)

--b.

> (So clients use a separate tcp connection for the
> backchannel?)
>
> --b.
>
> >
> >
> > fs/nfsd/nfs4callback.c | 3 ++-
> > include/linux/sunrpc/svc_xprt.h | 1 +
> > net/sunrpc/svcsock.c | 2 ++
> > net/sunrpc/xprt.c | 2 +-
> > net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
> > 5 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 2c73cae..0f23ad0 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -689,7 +689,8 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c
> > clp->cl_cb_session = ses;
> > args.bc_xprt = conn->cb_xprt;
> > args.prognumber = clp->cl_cb_session->se_cb_prog;
> > - args.protocol = XPRT_TRANSPORT_BC_TCP;
> > + args.protocol = conn->cb_xprt->xpt_class->xcl_ident |
> > + XPRT_TRANSPORT_BC;
> > args.authflavor = ses->se_cb_sec.flavor;
> > }
> > /* Create RPC client */
> > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> > index 7235040..5d9d6f8 100644
> > --- a/include/linux/sunrpc/svc_xprt.h
> > +++ b/include/linux/sunrpc/svc_xprt.h
> > @@ -33,6 +33,7 @@ struct svc_xprt_class {
> > struct svc_xprt_ops *xcl_ops;
> > struct list_head xcl_list;
> > u32 xcl_max_payload;
> > + int xcl_ident;
> > };
> >
> > /*
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index b507cd3..b2437ee 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -692,6 +692,7 @@ static struct svc_xprt_class svc_udp_class = {
> > .xcl_owner = THIS_MODULE,
> > .xcl_ops = &svc_udp_ops,
> > .xcl_max_payload = RPCSVC_MAXPAYLOAD_UDP,
> > + .xcl_ident = XPRT_TRANSPORT_UDP,
> > };
> >
> > static void svc_udp_init(struct svc_sock *svsk, struct svc_serv *serv)
> > @@ -1292,6 +1293,7 @@ static struct svc_xprt_class svc_tcp_class = {
> > .xcl_owner = THIS_MODULE,
> > .xcl_ops = &svc_tcp_ops,
> > .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
> > + .xcl_ident = XPRT_TRANSPORT_TCP,
> > };
> >
> > void svc_init_xprt_sock(void)
> > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > index c3b2b33..51c6316 100644
> > --- a/net/sunrpc/xprt.c
> > +++ b/net/sunrpc/xprt.c
> > @@ -1306,7 +1306,7 @@ struct rpc_xprt *xprt_create_transport(struct xprt_create *args)
> > }
> > }
> > spin_unlock(&xprt_list_lock);
> > - printk(KERN_ERR "RPC: transport (%d) not supported\n", args->ident);
> > + dprintk("RPC: transport (%d) not supported\n", args->ident);
> > return ERR_PTR(-EIO);
> >
> > found:
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > index e7323fb..06a5d92 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > @@ -92,6 +92,7 @@ struct svc_xprt_class svc_rdma_class = {
> > .xcl_owner = THIS_MODULE,
> > .xcl_ops = &svc_rdma_ops,
> > .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
> > + .xcl_ident = XPRT_TRANSPORT_RDMA,
> > };
> >
> > struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
> >

2014-07-21 15:23:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Select NFSv4.1 backchannel transport based on forward channel

On Fri, Jul 18, 2014 at 03:27:10PM -0400, Chuck Lever wrote:
> On Jul 18, 2014, at 3:12 PM, J. Bruce Fields <[email protected]> wrote:
> > Either that or send the original CREATE_SESSION over the connection you
> > want used for the backchannel, then send a BIND_CONNECTION_TO_SESSION
> > over the RDMA connection.
> >
> > That said, if the server knows there's no connection at all that's
> > available for the backchannel, then yes I agree that it should be
> > setting the PATH_DOWN flag:
>
> I’m trying a fix that sets cl_cb_state = NFSD4_CB_DOWN in
> init_session() if SESSION4_BACK_CHAN is clear.
>
> From the looks of it, the server is architected to support one and
> only one session per client ID. Is that correct? Thus it expects
> just one CREATE_SESSION operation per EXCHANGE_ID. Does the server
> enforce this?

No, that would definitely be a bug, if you see it failing to support
multiple sessions per client id, let me know.

BUT we definitely only use one backchannel connection at a time, even if
the client offers us multiple connections over multiple sessions.

> Seems like cl_cb_state really should be a per-session thing for
> NFSv4.1.

All we care about is whether some session has a working backchannel.

The only callback we currently use is CB_RECALL, and it doesn't matter
which session that's sent over. If we needed to support e.g.
CB_RECALL_SLOT, I think this should change.

> And, while we’re at it, the server should assert
> SEQ4_STATUS_CB_PATH_DOWN_SESSION as well as SEQ4_STATUS_CB_PATH_DOWN.

Looking back at the spec, especially the language about retrying
callbacsk, I think you might be right. (And then maybe we do need to
track callback status per-session.) I'm not sure if this has much
practical effect for now.

> > Yes, agreed, I'll take a look--how are you reproducing?
>
> Prototype Linux client that doesn’t set SESSION4_BACK_CHAN when it does
> CREATE_SESSION. Preparing it for RDMA support.
>
> You could code a pyNFS test.

OK. I may not get to this very soon.

> > The
> > nfsd4_probe_callback() should result in NFSD4_CB_DOWN getting set as
> > soon as the workqueue process it.
>
> Clearing SESSION4_BACK_CHAN squelches nfsd4_probe_callback(), so
> the server never checks in this case.

Oops, you're right. May as well make that unconditional.

> > In theory there's a race there (I
> > should fix that) in practice I'd normally expect that to run before we
> > process a SEQUENCE request on the new session.
>
> >> Do you remember what this commit was trying to fix?
> >
> > It's not a bugfix, just a minor optimization. We could revert it if
> > need be.
>
> NP, I just don’t see how it helps to make that assumption, plus it
> doesn’t seem entirely safe.
>
> My initial question was "why does alloc_client() initialize cl_cb_state
> to NFSD4_CB_UNKNOWN instead of NFSD4_CB_DOWN?”

I don't remember, it could be that we really only need two states.

--b.

2014-07-17 18:36:22

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Select NFSv4.1 backchannel transport based on forward channel

On Wed, Jul 16, 2014 at 03:38:32PM -0400, Chuck Lever wrote:
> The current code always selects XPRT_TRANSPORT_BC_TCP for the back
> channel, even when the forward channel was not TCP (eg, RDMA). When
> a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
> code when trying to send CB_NULL.
>
> Instead, construct the transport protocol number from the forward
> channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
> not support bi-directional RPC will not have registered a "BC"
> transport, causing create_backchannel_client() to fail immediately.
>
> Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> Hi Bruce-
>
> What do you think of this approach?

OK by me. (So clients use a separate tcp connection for the
backchannel?)

--b.

>
>
> fs/nfsd/nfs4callback.c | 3 ++-
> include/linux/sunrpc/svc_xprt.h | 1 +
> net/sunrpc/svcsock.c | 2 ++
> net/sunrpc/xprt.c | 2 +-
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
> 5 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 2c73cae..0f23ad0 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -689,7 +689,8 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c
> clp->cl_cb_session = ses;
> args.bc_xprt = conn->cb_xprt;
> args.prognumber = clp->cl_cb_session->se_cb_prog;
> - args.protocol = XPRT_TRANSPORT_BC_TCP;
> + args.protocol = conn->cb_xprt->xpt_class->xcl_ident |
> + XPRT_TRANSPORT_BC;
> args.authflavor = ses->se_cb_sec.flavor;
> }
> /* Create RPC client */
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> index 7235040..5d9d6f8 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -33,6 +33,7 @@ struct svc_xprt_class {
> struct svc_xprt_ops *xcl_ops;
> struct list_head xcl_list;
> u32 xcl_max_payload;
> + int xcl_ident;
> };
>
> /*
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index b507cd3..b2437ee 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -692,6 +692,7 @@ static struct svc_xprt_class svc_udp_class = {
> .xcl_owner = THIS_MODULE,
> .xcl_ops = &svc_udp_ops,
> .xcl_max_payload = RPCSVC_MAXPAYLOAD_UDP,
> + .xcl_ident = XPRT_TRANSPORT_UDP,
> };
>
> static void svc_udp_init(struct svc_sock *svsk, struct svc_serv *serv)
> @@ -1292,6 +1293,7 @@ static struct svc_xprt_class svc_tcp_class = {
> .xcl_owner = THIS_MODULE,
> .xcl_ops = &svc_tcp_ops,
> .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
> + .xcl_ident = XPRT_TRANSPORT_TCP,
> };
>
> void svc_init_xprt_sock(void)
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index c3b2b33..51c6316 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -1306,7 +1306,7 @@ struct rpc_xprt *xprt_create_transport(struct xprt_create *args)
> }
> }
> spin_unlock(&xprt_list_lock);
> - printk(KERN_ERR "RPC: transport (%d) not supported\n", args->ident);
> + dprintk("RPC: transport (%d) not supported\n", args->ident);
> return ERR_PTR(-EIO);
>
> found:
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index e7323fb..06a5d92 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -92,6 +92,7 @@ struct svc_xprt_class svc_rdma_class = {
> .xcl_owner = THIS_MODULE,
> .xcl_ops = &svc_rdma_ops,
> .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
> + .xcl_ident = XPRT_TRANSPORT_RDMA,
> };
>
> struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
>

2014-07-18 18:47:51

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Select NFSv4.1 backchannel transport based on forward channel


On Jul 17, 2014, at 2:59 PM, Chuck Lever <[email protected]> wrote:

>
> On Jul 17, 2014, at 2:36 PM, J. Bruce Fields <[email protected]> wrote:
>
>> On Wed, Jul 16, 2014 at 03:38:32PM -0400, Chuck Lever wrote:
>>> The current code always selects XPRT_TRANSPORT_BC_TCP for the back
>>> channel, even when the forward channel was not TCP (eg, RDMA). When
>>> a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
>>> code when trying to send CB_NULL.
>>>
>>> Instead, construct the transport protocol number from the forward
>>> channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
>>> not support bi-directional RPC will not have registered a "BC"
>>> transport, causing create_backchannel_client() to fail immediately.
>>>
>>> Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
>>> Signed-off-by: Chuck Lever <[email protected]>
>>> ---
>>> Hi Bruce-
>>>
>>> What do you think of this approach?
>>
>> OK by me. (So clients use a separate tcp connection for the
>> backchannel?)
>
> Right.
>
> The current plan is that, for NFSv4.1 over RDMA, the Linux client will
> create an additional TCP connection and bind it to the same session as
> the RDMA transport.
>
> The TCP connection will be used solely for callback operations. The
> forward channel on that connection will not be used, except for
> BIND_CONN_TO_SESSION operations.

Hi Bruce, pursuant to this goal, I?m trying to understand commit
14a24e99, especially the interior comment:

3114 /* Should we give out recallable state?: */
3115 static bool nfsd4_cb_channel_good(struct nfs4_client *clp)
3116 {
3117 if (clp->cl_cb_state == NFSD4_CB_UP)
3118 return true;
3119 /*
3120 * In the sessions case, since we don't have to establish a
3121 * separate connection for callbacks, we assume it's OK
3122 * until we hear otherwise:
3123 */
3124 return clp->cl_minorversion && clp->cl_cb_state == NFSD4_CB_UNKNOWN;
3125 }

I wonder if that?s a valid assumption?

A separate virtual connection _does_ have to be established. If the
client sent a CREATE_SESSION operation with the SESSION4_BACK_CHAN flag
set, the server must verify that the client has an active bi-directional
RPC service on the transport. Otherwise it can hand out recallable state
to a client with no callback service. Is that always safe?

Or, if the client sent a CREATE_SESSION operation with the
SESSION4_BACK_CHAN flag cleared, then either the client intends not to
set up a backchannel, or it intends to establish a separate transport
for the backchannel. There is no backchannel in that case until the
client establishes it.

Right now, if the client does not set SESSION4_BACK_CHAN, the server
never asserts SEQ4_STATUS_CB_PATH_DOWN, even though there is no
backchannel. That seems like a (minor, perhaps) bug.

Do you remember what this commit was trying to fix?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-07-17 18:59:38

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Select NFSv4.1 backchannel transport based on forward channel


On Jul 17, 2014, at 2:36 PM, J. Bruce Fields <[email protected]> wrote:

> On Wed, Jul 16, 2014 at 03:38:32PM -0400, Chuck Lever wrote:
>> The current code always selects XPRT_TRANSPORT_BC_TCP for the back
>> channel, even when the forward channel was not TCP (eg, RDMA). When
>> a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
>> code when trying to send CB_NULL.
>>
>> Instead, construct the transport protocol number from the forward
>> channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
>> not support bi-directional RPC will not have registered a "BC"
>> transport, causing create_backchannel_client() to fail immediately.
>>
>> Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> Hi Bruce-
>>
>> What do you think of this approach?
>
> OK by me. (So clients use a separate tcp connection for the
> backchannel?)

Right.

The current plan is that, for NFSv4.1 over RDMA, the Linux client will
create an additional TCP connection and bind it to the same session as
the RDMA transport.

The TCP connection will be used solely for callback operations. The
forward channel on that connection will not be used, except for
BIND_CONN_TO_SESSION operations.


> --b.
>
>>
>>
>> fs/nfsd/nfs4callback.c | 3 ++-
>> include/linux/sunrpc/svc_xprt.h | 1 +
>> net/sunrpc/svcsock.c | 2 ++
>> net/sunrpc/xprt.c | 2 +-
>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
>> 5 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>> index 2c73cae..0f23ad0 100644
>> --- a/fs/nfsd/nfs4callback.c
>> +++ b/fs/nfsd/nfs4callback.c
>> @@ -689,7 +689,8 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c
>> clp->cl_cb_session = ses;
>> args.bc_xprt = conn->cb_xprt;
>> args.prognumber = clp->cl_cb_session->se_cb_prog;
>> - args.protocol = XPRT_TRANSPORT_BC_TCP;
>> + args.protocol = conn->cb_xprt->xpt_class->xcl_ident |
>> + XPRT_TRANSPORT_BC;
>> args.authflavor = ses->se_cb_sec.flavor;
>> }
>> /* Create RPC client */
>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
>> index 7235040..5d9d6f8 100644
>> --- a/include/linux/sunrpc/svc_xprt.h
>> +++ b/include/linux/sunrpc/svc_xprt.h
>> @@ -33,6 +33,7 @@ struct svc_xprt_class {
>> struct svc_xprt_ops *xcl_ops;
>> struct list_head xcl_list;
>> u32 xcl_max_payload;
>> + int xcl_ident;
>> };
>>
>> /*
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index b507cd3..b2437ee 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -692,6 +692,7 @@ static struct svc_xprt_class svc_udp_class = {
>> .xcl_owner = THIS_MODULE,
>> .xcl_ops = &svc_udp_ops,
>> .xcl_max_payload = RPCSVC_MAXPAYLOAD_UDP,
>> + .xcl_ident = XPRT_TRANSPORT_UDP,
>> };
>>
>> static void svc_udp_init(struct svc_sock *svsk, struct svc_serv *serv)
>> @@ -1292,6 +1293,7 @@ static struct svc_xprt_class svc_tcp_class = {
>> .xcl_owner = THIS_MODULE,
>> .xcl_ops = &svc_tcp_ops,
>> .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
>> + .xcl_ident = XPRT_TRANSPORT_TCP,
>> };
>>
>> void svc_init_xprt_sock(void)
>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>> index c3b2b33..51c6316 100644
>> --- a/net/sunrpc/xprt.c
>> +++ b/net/sunrpc/xprt.c
>> @@ -1306,7 +1306,7 @@ struct rpc_xprt *xprt_create_transport(struct xprt_create *args)
>> }
>> }
>> spin_unlock(&xprt_list_lock);
>> - printk(KERN_ERR "RPC: transport (%d) not supported\n", args->ident);
>> + dprintk("RPC: transport (%d) not supported\n", args->ident);
>> return ERR_PTR(-EIO);
>>
>> found:
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index e7323fb..06a5d92 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -92,6 +92,7 @@ struct svc_xprt_class svc_rdma_class = {
>> .xcl_owner = THIS_MODULE,
>> .xcl_ops = &svc_rdma_ops,
>> .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
>> + .xcl_ident = XPRT_TRANSPORT_RDMA,
>> };
>>
>> struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
>>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-07-18 19:12:57

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Select NFSv4.1 backchannel transport based on forward channel

On Fri, Jul 18, 2014 at 02:47:41PM -0400, Chuck Lever wrote:
>
> On Jul 17, 2014, at 2:59 PM, Chuck Lever <[email protected]> wrote:
>
> >
> > On Jul 17, 2014, at 2:36 PM, J. Bruce Fields <[email protected]> wrote:
> >
> >> On Wed, Jul 16, 2014 at 03:38:32PM -0400, Chuck Lever wrote:
> >>> The current code always selects XPRT_TRANSPORT_BC_TCP for the back
> >>> channel, even when the forward channel was not TCP (eg, RDMA). When
> >>> a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
> >>> code when trying to send CB_NULL.
> >>>
> >>> Instead, construct the transport protocol number from the forward
> >>> channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
> >>> not support bi-directional RPC will not have registered a "BC"
> >>> transport, causing create_backchannel_client() to fail immediately.
> >>>
> >>> Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
> >>> Signed-off-by: Chuck Lever <[email protected]>
> >>> ---
> >>> Hi Bruce-
> >>>
> >>> What do you think of this approach?
> >>
> >> OK by me. (So clients use a separate tcp connection for the
> >> backchannel?)
> >
> > Right.
> >
> > The current plan is that, for NFSv4.1 over RDMA, the Linux client will
> > create an additional TCP connection and bind it to the same session as
> > the RDMA transport.
> >
> > The TCP connection will be used solely for callback operations. The
> > forward channel on that connection will not be used, except for
> > BIND_CONN_TO_SESSION operations.
>
> Hi Bruce, pursuant to this goal, I’m trying to understand commit
> 14a24e99, especially the interior comment:
>
> 3114 /* Should we give out recallable state?: */
> 3115 static bool nfsd4_cb_channel_good(struct nfs4_client *clp)
> 3116 {
> 3117 if (clp->cl_cb_state == NFSD4_CB_UP)
> 3118 return true;
> 3119 /*
> 3120 * In the sessions case, since we don't have to establish a
> 3121 * separate connection for callbacks, we assume it's OK
> 3122 * until we hear otherwise:
> 3123 */
> 3124 return clp->cl_minorversion && clp->cl_cb_state == NFSD4_CB_UNKNOWN;
> 3125 }
>
> I wonder if that’s a valid assumption?
>
> A separate virtual connection _does_ have to be established. If the
> client sent a CREATE_SESSION operation with the SESSION4_BACK_CHAN flag
> set, the server must verify that the client has an active bi-directional
> RPC service on the transport.

If the client sets the BACK_CHAN flag, that means it wants that
connection to be used as a back channel.

So in the RDMA case it sounds like the client needs to clear that flag
on the create session.

Either that or send the original CREATE_SESSION over the connection you
want used for the backchannel, then send a BIND_CONNECTION_TO_SESSION
over the RDMA connection.

That said, if the server knows there's no connection at all that's
available for the backchannel, then yes I agree that it should be
setting the PATH_DOWN flag:

> Otherwise it can hand out recallable state
> to a client with no callback service. Is that always safe?
>
> Or, if the client sent a CREATE_SESSION operation with the
> SESSION4_BACK_CHAN flag cleared, then either the client intends not to
> set up a backchannel, or it intends to establish a separate transport
> for the backchannel. There is no backchannel in that case until the
> client establishes it.
>
> Right now, if the client does not set SESSION4_BACK_CHAN, the server
> never asserts SEQ4_STATUS_CB_PATH_DOWN, even though there is no
> backchannel. That seems like a (minor, perhaps) bug.

Yes, agreed, I'll take a look--how are you reproducing? The
nfsd4_probe_callback() should result in NFSD4_CB_DOWN getting set as
soon as the workqueue process it. In theory there's a race there (I
should fix that) in practice I'd normally expect that to run before we
process a SEQUENCE request on the new session.

> Do you remember what this commit was trying to fix?

It's not a bugfix, just a minor optimization. We could revert it if
need be.

--b.