2017-02-23 17:03:39

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4

RFC5661 says:

Where an NFSv4.1 implementation supports operation over the IP
network protocol, any transport used between NFS and IP MUST be among
the IETF-approved congestion control transport protocols.

...and RFC7530 has similar verbiage. The NFS server has never enforced
this requirement, however, so a user could issue NFSv4 calls against
the server via UDP.

This patchset adds a small bit of infrastructure to the sunrpc layer
to enforce this requirement, and has the nfs and nfsd layers set the
appropriate flags for it. It also has knfsd skip registering a UDP
port for NFSv4, using the same flags.

Lightly tested by hand, but it's fairly straightforward.

Jeff Layton (4):
sunrpc: flag transports as using IETF approved congestion control
protocols
sunrpc: turn bitfield flags in svc_version into bools
nfs/nfsd/sunrpc: enforce congestion control protocol requirement for
NFSv4
sunrpc: don't register UDP port with rpcbind when version needs
congestion control

fs/nfs/callback_xdr.c | 6 ++++--
fs/nfsd/nfs2acl.c | 1 -
fs/nfsd/nfs3acl.c | 1 -
fs/nfsd/nfs4proc.c | 13 +++++++------
include/linux/sunrpc/svc.h | 12 ++++++++----
include/linux/sunrpc/svc_xprt.h | 1 +
net/sunrpc/svc.c | 22 +++++++++++++++++++++-
net/sunrpc/svcsock.c | 1 +
net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++
9 files changed, 44 insertions(+), 15 deletions(-)

--
2.9.3



2017-02-23 17:03:45

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 2/4] sunrpc: turn bitfield flags in svc_version into bools

It's just simpler to read this way, IMO. Also, no need to explicitly
set vs_hidden to false in the nfsacl ones.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/callback_xdr.c | 4 ++--
fs/nfsd/nfs2acl.c | 1 -
fs/nfsd/nfs3acl.c | 1 -
fs/nfsd/nfs4proc.c | 2 +-
include/linux/sunrpc/svc.h | 9 +++++----
net/sunrpc/svc.c | 2 +-
6 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index eb094c6011d8..e9836f611d9c 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -1083,7 +1083,7 @@ struct svc_version nfs4_callback_version1 = {
.vs_proc = nfs4_callback_procedures1,
.vs_xdrsize = NFS4_CALLBACK_XDRSIZE,
.vs_dispatch = NULL,
- .vs_hidden = 1,
+ .vs_hidden = true,
};

struct svc_version nfs4_callback_version4 = {
@@ -1092,5 +1092,5 @@ struct svc_version nfs4_callback_version4 = {
.vs_proc = nfs4_callback_procedures1,
.vs_xdrsize = NFS4_CALLBACK_XDRSIZE,
.vs_dispatch = NULL,
- .vs_hidden = 1,
+ .vs_hidden = true,
};
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c
index d08cd88155c7..838f90f3f890 100644
--- a/fs/nfsd/nfs2acl.c
+++ b/fs/nfsd/nfs2acl.c
@@ -376,5 +376,4 @@ struct svc_version nfsd_acl_version2 = {
.vs_proc = nfsd_acl_procedures2,
.vs_dispatch = nfsd_dispatch,
.vs_xdrsize = NFS3_SVC_XDRSIZE,
- .vs_hidden = 0,
};
diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c
index 0c890347cde3..dcb5f79076c0 100644
--- a/fs/nfsd/nfs3acl.c
+++ b/fs/nfsd/nfs3acl.c
@@ -266,6 +266,5 @@ struct svc_version nfsd_acl_version3 = {
.vs_proc = nfsd_acl_procedures3,
.vs_dispatch = nfsd_dispatch,
.vs_xdrsize = NFS3_SVC_XDRSIZE,
- .vs_hidden = 0,
};

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 74a6e573e061..2b73b37c2b36 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2481,7 +2481,7 @@ struct svc_version nfsd_version4 = {
.vs_proc = nfsd_procedures4,
.vs_dispatch = nfsd_dispatch,
.vs_xdrsize = NFS4_SVC_XDRSIZE,
- .vs_rpcb_optnl = 1,
+ .vs_rpcb_optnl = true,
};

/*
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 7321ae933867..96467c95f02e 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -400,10 +400,11 @@ struct svc_version {
struct svc_procedure * vs_proc; /* per-procedure info */
u32 vs_xdrsize; /* xdrsize needed for this version */

- unsigned int vs_hidden : 1, /* Don't register with portmapper.
- * Only used for nfsacl so far. */
- vs_rpcb_optnl:1;/* Don't care the result of register.
- * Only used for nfsv4. */
+ /* Don't register with rpcbind */
+ bool vs_hidden;
+
+ /* Don't care if the rpcbind registration fails */
+ bool vs_rpcb_optnl;

/* Override dispatch function (e.g. when caching replies).
* A return value of 0 means drop the request.
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 75f290bddca1..85bcdea67a3f 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -385,7 +385,7 @@ static int svc_uses_rpcbind(struct svc_serv *serv)
for (i = 0; i < progp->pg_nvers; i++) {
if (progp->pg_vers[i] == NULL)
continue;
- if (progp->pg_vers[i]->vs_hidden == 0)
+ if (!progp->pg_vers[i]->vs_hidden)
return 1;
}
}
--
2.9.3


2017-02-23 17:03:41

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 4/4] sunrpc: don't register UDP port with rpcbind when version needs congestion control

Signed-off-by: Jeff Layton <[email protected]>
---
net/sunrpc/svc.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 028c91a07950..984f1e73722f 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -976,6 +976,13 @@ int svc_register(const struct svc_serv *serv, struct net *net,
if (vers->vs_hidden)
continue;

+ /*
+ * Don't register a UDP port if we need congestion
+ * control.
+ */
+ if (vers->vs_need_cong_ctrl && proto == IPPROTO_UDP)
+ continue;
+
error = __svc_register(net, progp->pg_name, progp->pg_prog,
i, family, proto, port);

--
2.9.3


2017-02-23 17:03:40

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 3/4] nfs/nfsd/sunrpc: enforce congestion control protocol requirement for NFSv4

NFSv4 requires an "IETF approved congestion control transport". In
practical terms, that means that you should not run NFSv4 over UDP. The
server has never enforced that requirement though.

This patchset fixes that by adding a new flag to the svc_version that
states that it requires a congestion-controlled transport, and a new
flag to the svc_xprt that declares that the transport fulfills that
requirement. We then use those flags to enforce the requirement in
svc_process_common.

In practical terms, that means that we set the new xprt flag in TCP and
RDMA transports, and the the new svc_version flag in NFSv4 svc_version
structs.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/callback_xdr.c | 2 ++
fs/nfsd/nfs4proc.c | 13 +++++++------
include/linux/sunrpc/svc.h | 3 +++
net/sunrpc/svc.c | 13 +++++++++++++
4 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index e9836f611d9c..fd0284c1dc32 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -1084,6 +1084,7 @@ struct svc_version nfs4_callback_version1 = {
.vs_xdrsize = NFS4_CALLBACK_XDRSIZE,
.vs_dispatch = NULL,
.vs_hidden = true,
+ .vs_need_cong_ctrl = true,
};

struct svc_version nfs4_callback_version4 = {
@@ -1093,4 +1094,5 @@ struct svc_version nfs4_callback_version4 = {
.vs_xdrsize = NFS4_CALLBACK_XDRSIZE,
.vs_dispatch = NULL,
.vs_hidden = true,
+ .vs_need_cong_ctrl = true,
};
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 2b73b37c2b36..f82fcaa2e1d9 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2476,12 +2476,13 @@ static struct svc_procedure nfsd_procedures4[2] = {
};

struct svc_version nfsd_version4 = {
- .vs_vers = 4,
- .vs_nproc = 2,
- .vs_proc = nfsd_procedures4,
- .vs_dispatch = nfsd_dispatch,
- .vs_xdrsize = NFS4_SVC_XDRSIZE,
- .vs_rpcb_optnl = true,
+ .vs_vers = 4,
+ .vs_nproc = 2,
+ .vs_proc = nfsd_procedures4,
+ .vs_dispatch = nfsd_dispatch,
+ .vs_xdrsize = NFS4_SVC_XDRSIZE,
+ .vs_rpcb_optnl = true,
+ .vs_need_cong_ctrl = true,
};

/*
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 96467c95f02e..65b1c707c40b 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -406,6 +406,9 @@ struct svc_version {
/* Don't care if the rpcbind registration fails */
bool vs_rpcb_optnl;

+ /* Only allowed on IETF-approved congestion controlled xprts */
+ bool vs_need_cong_ctrl;
+
/* Override dispatch function (e.g. when caching replies).
* A return value of 0 means drop the request.
* vs_dispatch == NULL means use default dispatcher.
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 85bcdea67a3f..028c91a07950 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1169,6 +1169,19 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
!(versp = progp->pg_vers[vers]))
goto err_bad_vers;

+ /*
+ * Some protocol versions (namely NFSv4) require "IETF approved
+ * congestion control protocols". IOW, UDP is not allowed. We mark
+ * those when setting up the svc_xprt, and verify that for them here.
+ *
+ * The spec is not very clear about what error should be returned when
+ * someone tries to access a server that is listening on UDP for lower
+ * versions though. RPC_PROG_MISMATCH seems to be the closest fit.
+ */
+ if (versp->vs_need_cong_ctrl &&
+ !test_bit(XPT_CONG_CTRL, &rqstp->rq_xprt->xpt_flags))
+ goto err_bad_vers;
+
procp = versp->vs_proc + proc;
if (proc >= versp->vs_nproc || !procp->pc_func)
goto err_bad_proc;
--
2.9.3


2017-02-23 17:03:39

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

Signed-off-by: Jeff Layton <[email protected]>
---
include/linux/sunrpc/svc_xprt.h | 1 +
net/sunrpc/svcsock.c | 1 +
net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++
3 files changed, 4 insertions(+)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 7440290f64ac..f8aa9452b63c 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -67,6 +67,7 @@ struct svc_xprt {
#define XPT_CACHE_AUTH 11 /* cache auth info */
#define XPT_LOCAL 12 /* connection from loopback interface */
#define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */
+#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */

struct svc_serv *xpt_server; /* service for transport */
atomic_t xpt_reserved; /* space on outq that is rsvd */
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index de066acdb34e..1956b8b96b2d 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class,
&svsk->sk_xprt, serv);
set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
+ set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags);
if (sk->sk_state == TCP_LISTEN) {
dprintk("setting up TCP socket for listening\n");
set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 39652d390a9c..96b4797c2c54 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
spin_lock_init(&cma_xprt->sc_ctxt_lock);
spin_lock_init(&cma_xprt->sc_map_lock);

+ set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags);
+
if (listener)
set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);

--
2.9.3


2017-02-23 17:27:32

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4

On Thu, 2017-02-23 at 12:03 -0500, Jeff Layton wrote:
> RFC5661 says:
>
> Where an NFSv4.1 implementation supports operation over the IP
> network protocol, any transport used between NFS and IP MUST be among
> the IETF-approved congestion control transport protocols.
>
> ...and RFC7530 has similar verbiage. The NFS server has never enforced
> this requirement, however, so a user could issue NFSv4 calls against
> the server via UDP.
>
> This patchset adds a small bit of infrastructure to the sunrpc layer
> to enforce this requirement, and has the nfs and nfsd layers set the
> appropriate flags for it. It also has knfsd skip registering a UDP
> port for NFSv4, using the same flags.
>
> Lightly tested by hand, but it's fairly straightforward.
>
> Jeff Layton (4):
> sunrpc: flag transports as using IETF approved congestion control
> protocols
> sunrpc: turn bitfield flags in svc_version into bools
> nfs/nfsd/sunrpc: enforce congestion control protocol requirement for
> NFSv4
> sunrpc: don't register UDP port with rpcbind when version needs
> congestion control
>
> fs/nfs/callback_xdr.c | 6 ++++--
> fs/nfsd/nfs2acl.c | 1 -
> fs/nfsd/nfs3acl.c | 1 -
> fs/nfsd/nfs4proc.c | 13 +++++++------
> include/linux/sunrpc/svc.h | 12 ++++++++----
> include/linux/sunrpc/svc_xprt.h | 1 +
> net/sunrpc/svc.c | 22 +++++++++++++++++++++-
> net/sunrpc/svcsock.c | 1 +
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++
> 9 files changed, 44 insertions(+), 15 deletions(-)
>

I probably should have sent this as an RFC first. I'm not 100% clear on
whether PROG_MISMATCH is the right return code there.

Also, there is still a small wart after this patchset. The high/low
program versions reported look a little odd:

$ rpcinfo -T udp knfsdsrv nfs 4
rpcinfo: RPC: Program/version mismatch; low version = 3, high version = 4
program 100003 version 4 is not available

We could try to fix this and report different values depending on the
socket type, but I'm not sure I really care. AFAIK, this is just
informative anyway, and it's not _technically_ wrong. The server does
support version 4, just not the UDP socket where we sent the RPC ping.

Thoughts?
--
Jeff Layton <[email protected]>

2017-02-23 19:50:31

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

On 2/23/2017 12:03 PM, Jeff Layton wrote:
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> include/linux/sunrpc/svc_xprt.h | 1 +
> net/sunrpc/svcsock.c | 1 +
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++

There's a possibly-important detail here. Not all RDMA transports have
"IETF-approved congestion control", for example, RoCEv1. However, iWARP
and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol,
RoCEv1 may not fall under this restriction.

Net-net, inspecting only the RDMA attribute of the transport may be
insufficient here.

It could be argued however that the xprtrdma layer, with its rpcrdma
crediting, provides such congestion. But that needs to be made
explicit, and perhaps, discussed in IETF. Initially, I think it would
be important to flag this as a point for the future. For now, it may
be best to flag RoCEv1 as not supporting congestion.

Tom.

> 3 files changed, 4 insertions(+)
>
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> index 7440290f64ac..f8aa9452b63c 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -67,6 +67,7 @@ struct svc_xprt {
> #define XPT_CACHE_AUTH 11 /* cache auth info */
> #define XPT_LOCAL 12 /* connection from loopback interface */
> #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */
> +#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */
>
> struct svc_serv *xpt_server; /* service for transport */
> atomic_t xpt_reserved; /* space on outq that is rsvd */
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index de066acdb34e..1956b8b96b2d 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
> svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class,
> &svsk->sk_xprt, serv);
> set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
> + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags);
> if (sk->sk_state == TCP_LISTEN) {
> dprintk("setting up TCP socket for listening\n");
> set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 39652d390a9c..96b4797c2c54 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
> spin_lock_init(&cma_xprt->sc_ctxt_lock);
> spin_lock_init(&cma_xprt->sc_map_lock);
>
> + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags);
> +
> if (listener)
> set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
>
>

2017-02-23 20:00:38

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

On Thu, 2017-02-23 at 14:42 -0500, Tom Talpey wrote:
> On 2/23/2017 12:03 PM, Jeff Layton wrote:
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > include/linux/sunrpc/svc_xprt.h | 1 +
> > net/sunrpc/svcsock.c | 1 +
> > net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++
>
> There's a possibly-important detail here. Not all RDMA transports have
> "IETF-approved congestion control", for example, RoCEv1. However, iWARP
> and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol,
> RoCEv1 may not fall under this restriction.
>
> Net-net, inspecting only the RDMA attribute of the transport may be
> insufficient here.
>
> It could be argued however that the xprtrdma layer, with its rpcrdma
> crediting, provides such congestion. But that needs to be made
> explicit, and perhaps, discussed in IETF. Initially, I think it would
> be important to flag this as a point for the future. For now, it may
> be best to flag RoCEv1 as not supporting congestion.
>
> Tom.
>

(cc'ing Chuck and the linux-rdma list)

Thanks Tom, that's very interesting.

Not being well versed in the xprtrdma layer, what condition should we
use here to set the flag? git grep shows that the string "ROCEV1" only
shows up in the bxnt_en driver. Is there some way to determine this
generically for any given RDMA driver?


> > 3 files changed, 4 insertions(+)
> >
> > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> > index 7440290f64ac..f8aa9452b63c 100644
> > --- a/include/linux/sunrpc/svc_xprt.h
> > +++ b/include/linux/sunrpc/svc_xprt.h
> > @@ -67,6 +67,7 @@ struct svc_xprt {
> > #define XPT_CACHE_AUTH 11 /* cache auth info */
> > #define XPT_LOCAL 12 /* connection from loopback interface */
> > #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */
> > +#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */
> >
> > struct svc_serv *xpt_server; /* service for transport */
> > atomic_t xpt_reserved; /* space on outq that is rsvd */
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index de066acdb34e..1956b8b96b2d 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
> > svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class,
> > &svsk->sk_xprt, serv);
> > set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
> > + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags);
> > if (sk->sk_state == TCP_LISTEN) {
> > dprintk("setting up TCP socket for listening\n");
> > set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > index 39652d390a9c..96b4797c2c54 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > @@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
> > spin_lock_init(&cma_xprt->sc_ctxt_lock);
> > spin_lock_init(&cma_xprt->sc_map_lock);
> >
> > + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags);
> > +
> > if (listener)
> > set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
> >
> >

--
Jeff Layton <[email protected]>

2017-02-23 20:07:07

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

On 2/23/2017 3:00 PM, Jeff Layton wrote:
> On Thu, 2017-02-23 at 14:42 -0500, Tom Talpey wrote:
>> On 2/23/2017 12:03 PM, Jeff Layton wrote:
>>> Signed-off-by: Jeff Layton <[email protected]>
>>> ---
>>> include/linux/sunrpc/svc_xprt.h | 1 +
>>> net/sunrpc/svcsock.c | 1 +
>>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++
>>
>> There's a possibly-important detail here. Not all RDMA transports have
>> "IETF-approved congestion control", for example, RoCEv1. However, iWARP
>> and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol,
>> RoCEv1 may not fall under this restriction.
>>
>> Net-net, inspecting only the RDMA attribute of the transport may be
>> insufficient here.
>>
>> It could be argued however that the xprtrdma layer, with its rpcrdma
>> crediting, provides such congestion. But that needs to be made
>> explicit, and perhaps, discussed in IETF. Initially, I think it would
>> be important to flag this as a point for the future. For now, it may
>> be best to flag RoCEv1 as not supporting congestion.
>>
>> Tom.
>>
>
> (cc'ing Chuck and the linux-rdma list)
>
> Thanks Tom, that's very interesting.
>
> Not being well versed in the xprtrdma layer, what condition should we
> use here to set the flag? git grep shows that the string "ROCEV1" only
> shows up in the bxnt_en driver. Is there some way to determine this
> generically for any given RDMA driver?

I would not code RoCEv1 as an exception, I would code iWARP and RoCEv2
as the only eligible ones. There are any number of other possibilities,
none of which should be automatically flagged as congestion-controlled.

I'm also not sure I'm comfortable with hardcoding such a list into RPC.
But it may be the best you can do for now. Chuck, are you aware of a
verbs interface to obtain the RDMA transport type?

Tom.

>
>
>>> 3 files changed, 4 insertions(+)
>>>
>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
>>> index 7440290f64ac..f8aa9452b63c 100644
>>> --- a/include/linux/sunrpc/svc_xprt.h
>>> +++ b/include/linux/sunrpc/svc_xprt.h
>>> @@ -67,6 +67,7 @@ struct svc_xprt {
>>> #define XPT_CACHE_AUTH 11 /* cache auth info */
>>> #define XPT_LOCAL 12 /* connection from loopback interface */
>>> #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */
>>> +#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */
>>>
>>> struct svc_serv *xpt_server; /* service for transport */
>>> atomic_t xpt_reserved; /* space on outq that is rsvd */
>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>> index de066acdb34e..1956b8b96b2d 100644
>>> --- a/net/sunrpc/svcsock.c
>>> +++ b/net/sunrpc/svcsock.c
>>> @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
>>> svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class,
>>> &svsk->sk_xprt, serv);
>>> set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
>>> + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags);
>>> if (sk->sk_state == TCP_LISTEN) {
>>> dprintk("setting up TCP socket for listening\n");
>>> set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>> index 39652d390a9c..96b4797c2c54 100644
>>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>> @@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
>>> spin_lock_init(&cma_xprt->sc_ctxt_lock);
>>> spin_lock_init(&cma_xprt->sc_map_lock);
>>>
>>> + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags);
>>> +
>>> if (listener)
>>> set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
>>>
>>>
>

2017-02-23 20:12:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

On Thu, Feb 23, 2017 at 03:06:25PM -0500, Tom Talpey wrote:
> On 2/23/2017 3:00 PM, Jeff Layton wrote:
> >On Thu, 2017-02-23 at 14:42 -0500, Tom Talpey wrote:
> >>On 2/23/2017 12:03 PM, Jeff Layton wrote:
> >>>Signed-off-by: Jeff Layton <[email protected]>
> >>>---
> >>> include/linux/sunrpc/svc_xprt.h | 1 +
> >>> net/sunrpc/svcsock.c | 1 +
> >>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++
> >>
> >>There's a possibly-important detail here. Not all RDMA transports have
> >>"IETF-approved congestion control", for example, RoCEv1. However, iWARP
> >>and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol,
> >>RoCEv1 may not fall under this restriction.
> >>
> >>Net-net, inspecting only the RDMA attribute of the transport may be
> >>insufficient here.
> >>
> >>It could be argued however that the xprtrdma layer, with its rpcrdma
> >>crediting, provides such congestion. But that needs to be made
> >>explicit, and perhaps, discussed in IETF. Initially, I think it would
> >>be important to flag this as a point for the future. For now, it may
> >>be best to flag RoCEv1 as not supporting congestion.
> >>
> >>Tom.
> >>
> >
> >(cc'ing Chuck and the linux-rdma list)
> >
> >Thanks Tom, that's very interesting.
> >
> >Not being well versed in the xprtrdma layer, what condition should we
> >use here to set the flag? git grep shows that the string "ROCEV1" only
> >shows up in the bxnt_en driver. Is there some way to determine this
> >generically for any given RDMA driver?
>
> I would not code RoCEv1 as an exception, I would code iWARP and RoCEv2
> as the only eligible ones. There are any number of other possibilities,
> none of which should be automatically flagged as congestion-controlled.
>
> I'm also not sure I'm comfortable with hardcoding such a list into RPC.
> But it may be the best you can do for now. Chuck, are you aware of a
> verbs interface to obtain the RDMA transport type?

If this gets too complicated--we've been allowing NFSv4/UDP for years,
letting this one (arguable?) exception through in RDMA a little longer
won't kill us.

(And if we really shouldn't be doing NFSv4 over some RDMA transports--is
it worth supporting them at all, if the only support we can get is
NFSv3-only?)

--b.

2017-02-23 20:16:48

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols



--
Chuck Lever

> On Feb 23, 2017, at 2:42 PM, Tom Talpey <[email protected]> wrote:
>
>> On 2/23/2017 12:03 PM, Jeff Layton wrote:
>> Signed-off-by: Jeff Layton <[email protected]>
>> ---
>> include/linux/sunrpc/svc_xprt.h | 1 +
>> net/sunrpc/svcsock.c | 1 +
>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++
>
> There's a possibly-important detail here. Not all RDMA transports have
> "IETF-approved congestion control", for example, RoCEv1. However, iWARP
> and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol,
> RoCEv1 may not fall under this restriction.

Probably a discussion that could be handled in rfc5667bis. But it's
not clear what IETF-approved congestion control is, precisely.

We'd have to say something about InfiniBand (a transport which
is not specified by the IETF) too.


> Net-net, inspecting only the RDMA attribute of the transport may be
> insufficient here.

The transport implementation would have to set it in the svc_xprt
rather than having it be a constant field in the xprt class.


> It could be argued however that the xprtrdma layer, with its rpcrdma
> crediting, provides such congestion. But that needs to be made
> explicit, and perhaps, discussed in IETF. Initially, I think it would
> be important to flag this as a point for the future. For now, it may
> be best to flag RoCEv1 as not supporting congestion.
>
> Tom.
>
>> 3 files changed, 4 insertions(+)
>>
>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
>> index 7440290f64ac..f8aa9452b63c 100644
>> --- a/include/linux/sunrpc/svc_xprt.h
>> +++ b/include/linux/sunrpc/svc_xprt.h
>> @@ -67,6 +67,7 @@ struct svc_xprt {
>> #define XPT_CACHE_AUTH 11 /* cache auth info */
>> #define XPT_LOCAL 12 /* connection from loopback interface */
>> #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */
>> +#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */
>>
>> struct svc_serv *xpt_server; /* service for transport */
>> atomic_t xpt_reserved; /* space on outq that is rsvd */
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index de066acdb34e..1956b8b96b2d 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
>> svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class,
>> &svsk->sk_xprt, serv);
>> set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
>> + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags);
>> if (sk->sk_state == TCP_LISTEN) {
>> dprintk("setting up TCP socket for listening\n");
>> set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index 39652d390a9c..96b4797c2c54 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
>> spin_lock_init(&cma_xprt->sc_ctxt_lock);
>> spin_lock_init(&cma_xprt->sc_map_lock);
>>
>> + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags);
>> +
>> if (listener)
>> set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2017-02-23 20:17:26

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols



--
Chuck Lever

> On Feb 23, 2017, at 3:06 PM, Tom Talpey <[email protected]> wrote:
>
>> On 2/23/2017 3:00 PM, Jeff Layton wrote:
>>> On Thu, 2017-02-23 at 14:42 -0500, Tom Talpey wrote:
>>>> On 2/23/2017 12:03 PM, Jeff Layton wrote:
>>>> Signed-off-by: Jeff Layton <[email protected]>
>>>> ---
>>>> include/linux/sunrpc/svc_xprt.h | 1 +
>>>> net/sunrpc/svcsock.c | 1 +
>>>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++
>>>
>>> There's a possibly-important detail here. Not all RDMA transports have
>>> "IETF-approved congestion control", for example, RoCEv1. However, iWARP
>>> and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol,
>>> RoCEv1 may not fall under this restriction.
>>>
>>> Net-net, inspecting only the RDMA attribute of the transport may be
>>> insufficient here.
>>>
>>> It could be argued however that the xprtrdma layer, with its rpcrdma
>>> crediting, provides such congestion. But that needs to be made
>>> explicit, and perhaps, discussed in IETF. Initially, I think it would
>>> be important to flag this as a point for the future. For now, it may
>>> be best to flag RoCEv1 as not supporting congestion.
>>>
>>> Tom.
>>>
>>
>> (cc'ing Chuck and the linux-rdma list)
>>
>> Thanks Tom, that's very interesting.
>>
>> Not being well versed in the xprtrdma layer, what condition should we
>> use here to set the flag? git grep shows that the string "ROCEV1" only
>> shows up in the bxnt_en driver. Is there some way to determine this
>> generically for any given RDMA driver?
>
> I would not code RoCEv1 as an exception, I would code iWARP and RoCEv2
> as the only eligible ones. There are any number of other possibilities,
> none of which should be automatically flagged as congestion-controlled.
>
> I'm also not sure I'm comfortable with hardcoding such a list into RPC.
> But it may be the best you can do for now. Chuck, are you aware of a
> verbs interface to obtain the RDMA transport type?

Yes, I can have a look when I get to Connectathon.



>
> Tom.
>
>>
>>
>>>> 3 files changed, 4 insertions(+)
>>>>
>>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
>>>> index 7440290f64ac..f8aa9452b63c 100644
>>>> --- a/include/linux/sunrpc/svc_xprt.h
>>>> +++ b/include/linux/sunrpc/svc_xprt.h
>>>> @@ -67,6 +67,7 @@ struct svc_xprt {
>>>> #define XPT_CACHE_AUTH 11 /* cache auth info */
>>>> #define XPT_LOCAL 12 /* connection from loopback interface */
>>>> #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */
>>>> +#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */
>>>>
>>>> struct svc_serv *xpt_server; /* service for transport */
>>>> atomic_t xpt_reserved; /* space on outq that is rsvd */
>>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>>> index de066acdb34e..1956b8b96b2d 100644
>>>> --- a/net/sunrpc/svcsock.c
>>>> +++ b/net/sunrpc/svcsock.c
>>>> @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
>>>> svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class,
>>>> &svsk->sk_xprt, serv);
>>>> set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
>>>> + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags);
>>>> if (sk->sk_state == TCP_LISTEN) {
>>>> dprintk("setting up TCP socket for listening\n");
>>>> set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>> index 39652d390a9c..96b4797c2c54 100644
>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>> @@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
>>>> spin_lock_init(&cma_xprt->sc_ctxt_lock);
>>>> spin_lock_init(&cma_xprt->sc_map_lock);
>>>>
>>>> + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags);
>>>> +
>>>> if (listener)
>>>> set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
>>>>
>>>>
>>


2017-02-23 20:27:20

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

On Thu, Feb 23, 2017 at 03:11:09PM -0500, J. Bruce Fields wrote:

> (And if we really shouldn't be doing NFSv4 over some RDMA transports--is
> it worth supporting them at all, if the only support we can get is
> NFSv3-only?)

This seems like a strange comment - NFSv4 should be supported on all
RDMA transports, surely?

Largely RDMA lives in its own congestion management world. If a site
is running RDMA they have done something to mitigate interactions with
TCP style congestion control on the same wire.

Jason

2017-02-23 20:32:34

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

On Thu, 2017-02-23 at 15:11 -0500, J. Bruce Fields wrote:
> On Thu, Feb 23, 2017 at 03:06:25PM -0500, Tom Talpey wrote:
> > On 2/23/2017 3:00 PM, Jeff Layton wrote:
> > > On Thu, 2017-02-23 at 14:42 -0500, Tom Talpey wrote:
> > > > On 2/23/2017 12:03 PM, Jeff Layton wrote:
> > > > > Signed-off-by: Jeff Layton <[email protected]>
> > > > > ---
> > > > > include/linux/sunrpc/svc_xprt.h | 1 +
> > > > > net/sunrpc/svcsock.c | 1 +
> > > > > net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++
> > > >
> > > > There's a possibly-important detail here. Not all RDMA transports have
> > > > "IETF-approved congestion control", for example, RoCEv1. However, iWARP
> > > > and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol,
> > > > RoCEv1 may not fall under this restriction.
> > > >
> > > > Net-net, inspecting only the RDMA attribute of the transport may be
> > > > insufficient here.
> > > >
> > > > It could be argued however that the xprtrdma layer, with its rpcrdma
> > > > crediting, provides such congestion. But that needs to be made
> > > > explicit, and perhaps, discussed in IETF. Initially, I think it would
> > > > be important to flag this as a point for the future. For now, it may
> > > > be best to flag RoCEv1 as not supporting congestion.
> > > >
> > > > Tom.
> > > >
> > >
> > > (cc'ing Chuck and the linux-rdma list)
> > >
> > > Thanks Tom, that's very interesting.
> > >
> > > Not being well versed in the xprtrdma layer, what condition should we
> > > use here to set the flag? git grep shows that the string "ROCEV1" only
> > > shows up in the bxnt_en driver. Is there some way to determine this
> > > generically for any given RDMA driver?
> >
> > I would not code RoCEv1 as an exception, I would code iWARP and RoCEv2
> > as the only eligible ones. There are any number of other possibilities,
> > none of which should be automatically flagged as congestion-controlled.
> >
> > I'm also not sure I'm comfortable with hardcoding such a list into RPC.
> > But it may be the best you can do for now. Chuck, are you aware of a
> > verbs interface to obtain the RDMA transport type?
>
> If this gets too complicated--we've been allowing NFSv4/UDP for years,
> letting this one (arguable?) exception through in RDMA a little longer
> won't kill us.
>

That's my feeling too. This is still an improvement over the status
quo, and hopefully anyone with RDMA hardware will have a bit more clue
as to whether it can properly support v4.

We can always further restrict when rdma_create_xprt sets the flag in a
later patch if we figure out some way to determine this generically. I
will plan to add a comment that we're setting this RDMA svc_xprt
universally even though it may not always be true.

> (And if we really shouldn't be doing NFSv4 over some RDMA transports--is
> it worth supporting them at all, if the only support we can get is
> NFSv3-only?)
>

I'd be inclined to leave them working and just deny the use of v4 on
such transports.

--
Jeff Layton <[email protected]>

2017-02-23 20:34:31

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

On 2/23/2017 3:26 PM, Jason Gunthorpe wrote:
> On Thu, Feb 23, 2017 at 03:11:09PM -0500, J. Bruce Fields wrote:
>
>> (And if we really shouldn't be doing NFSv4 over some RDMA transports--is
>> it worth supporting them at all, if the only support we can get is
>> NFSv3-only?)
>
> This seems like a strange comment - NFSv4 should be supported on all
> RDMA transports, surely?
>
> Largely RDMA lives in its own congestion management world. If a site
> is running RDMA they have done something to mitigate interactions with
> TCP style congestion control on the same wire.

The key words are "IETF-approved". Mitigation and Interaction are
operational decisions, not protocol design.

We could argue that the requirement is bogus, or that all RDMA transports
comply, or that the RPCRDMA layer provides it, but none of these arguments
would grant IETF approval. That said, I think there's a lot of
room for interpretation here.

Tom.

2017-02-23 20:55:10

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

On Thu, Feb 23, 2017 at 03:33:53PM -0500, Tom Talpey wrote:

> The key words are "IETF-approved". Mitigation and Interaction are
> operational decisions, not protocol design.

We are talking about this bit from RFC 3530 ?

Where an NFS version 4 implementation supports operation over the IP
network protocol, the supported transports between NFS and IP MUST be
among the IETF-approved congestion control transport protocols, which
include TCP and SCTP.

This gives most of RDMA an out as it is not over the IP protocol. The
only obvious troubled one is RoCEv2..

Jason

2017-02-24 15:08:45

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

On 2/23/2017 3:55 PM, Jason Gunthorpe wrote:
> On Thu, Feb 23, 2017 at 03:33:53PM -0500, Tom Talpey wrote:
>
>> The key words are "IETF-approved". Mitigation and Interaction are
>> operational decisions, not protocol design.
>
> We are talking about this bit from RFC 3530 ?
>
> Where an NFS version 4 implementation supports operation over the IP
> network protocol, the supported transports between NFS and IP MUST be
> among the IETF-approved congestion control transport protocols, which
> include TCP and SCTP.
>
> This gives most of RDMA an out as it is not over the IP protocol. The
> only obvious troubled one is RoCEv2..

RFC7530 has updated this text somewhat, but it's similar, yes.

https://tools.ietf.org/html/rfc7530#section-3.1

The specification language specifically calls out IP-based transports,
which is why I mentioned that RoCEv1, being non-IP-based and not even
truly routable, could obtain a bye. But the NFS layer IMO should really
not be digging down to this level. I think it would be much better if
each transport could expose a relevant attribute, which NFS could
optionally inspect.

As you mention, RoCEv2 is a bit of a pickle. It's UDP/IP-based, and it
does have end-to-end congestion control, but technically speaking it
is not "IETF approved". I'm not sure what call to make there.

Tom.

2017-02-24 17:17:32

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

On Fri, 2017-02-24 at 10:08 -0500, Tom Talpey wrote:
> On 2/23/2017 3:55 PM, Jason Gunthorpe wrote:
> > On Thu, Feb 23, 2017 at 03:33:53PM -0500, Tom Talpey wrote:
> >
> > > The key words are "IETF-approved". Mitigation and Interaction are
> > > operational decisions, not protocol design.
> >
> > We are talking about this bit from RFC 3530 ?
> >
> > Where an NFS version 4 implementation supports operation over the IP
> > network protocol, the supported transports between NFS and IP MUST be
> > among the IETF-approved congestion control transport protocols, which
> > include TCP and SCTP.
> >
> > This gives most of RDMA an out as it is not over the IP protocol. The
> > only obvious troubled one is RoCEv2..
>
> RFC7530 has updated this text somewhat, but it's similar, yes.
>
> https://tools.ietf.org/html/rfc7530#section-3.1
>
> The specification language specifically calls out IP-based transports,
> which is why I mentioned that RoCEv1, being non-IP-based and not even
> truly routable, could obtain a bye. But the NFS layer IMO should really
> not be digging down to this level. I think it would be much better if
> each transport could expose a relevant attribute, which NFS could
> optionally inspect.
>
> As you mention, RoCEv2 is a bit of a pickle. It's UDP/IP-based, and it
> does have end-to-end congestion control, but technically speaking it
> is not "IETF approved". I'm not sure what call to make there.
>
> Tom.

I'm perfectly willing to forgo the "IETF approved" verbiage since it's
ambiguous anyway. We can certainly just use more weasel words in the
comments as well.

RFC5661 has a bit more to say on the matter:

   NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
   based transports with the following attributes:
   o  The transport supports reliable delivery of data, which NFSv4.1
      requires but neither NFSv4.1 nor RPC has facilities for ensuring
      [34].

   o  The transport delivers data in the order it was sent.  Ordered
      delivery simplifies detection of transmit errors, and simplifies
      the sending of arbitrary sized requests and responses via the
      record marking protocol [3].

I'd rather rely on those attributes instead of any sort of IETF
approval anyway. Do all RDMA transports (RoCEv2, in particular) have
those characteristics?

If not, then perhaps there is some way to have different RDMA transport
drivers set flags in some structure as to whether they fulfill those
requirements, and then have the xprtrdma driver check for those flags.

In any case, for now I think we should just give all RDMA transports a
pass, and clean that up later. I'm mostly interested in excluding UDP
over IP for now -- being more strict with RDMA can come later.
--
Jeff Layton <[email protected]>

2017-02-24 18:23:37

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols

> I'd rather rely on those attributes instead of any sort of IETF
> approval anyway. Do all RDMA transports (RoCEv2, in particular) have
> those characteristics?

The NFS-RDMA driver only works with RDMA RC transports which are
defined to provide all those characteristics.

> In any case, for now I think we should just give all RDMA transports a
> pass, and clean that up later. I'm mostly interested in excluding UDP
> over IP for now -- being more strict with RDMA can come later.

Makes sense to me. At the end of the day I think everything supported
by NFS-RDMA should be permitted to use NFSv4, and we should ignore
sketchy spec language that suggests otherwise.

Jason

2017-02-24 18:25:29

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 1/4] sunrpc: turn bitfield flags in svc_version into bools

It's just simpler to read this way, IMO. Also, no need to explicitly
set vs_hidden to false in the nfsacl ones.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/callback_xdr.c | 4 ++--
fs/nfsd/nfs2acl.c | 1 -
fs/nfsd/nfs3acl.c | 1 -
fs/nfsd/nfs4proc.c | 2 +-
include/linux/sunrpc/svc.h | 9 +++++----
net/sunrpc/svc.c | 2 +-
6 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index eb094c6011d8..e9836f611d9c 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -1083,7 +1083,7 @@ struct svc_version nfs4_callback_version1 = {
.vs_proc = nfs4_callback_procedures1,
.vs_xdrsize = NFS4_CALLBACK_XDRSIZE,
.vs_dispatch = NULL,
- .vs_hidden = 1,
+ .vs_hidden = true,
};

struct svc_version nfs4_callback_version4 = {
@@ -1092,5 +1092,5 @@ struct svc_version nfs4_callback_version4 = {
.vs_proc = nfs4_callback_procedures1,
.vs_xdrsize = NFS4_CALLBACK_XDRSIZE,
.vs_dispatch = NULL,
- .vs_hidden = 1,
+ .vs_hidden = true,
};
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c
index d08cd88155c7..838f90f3f890 100644
--- a/fs/nfsd/nfs2acl.c
+++ b/fs/nfsd/nfs2acl.c
@@ -376,5 +376,4 @@ struct svc_version nfsd_acl_version2 = {
.vs_proc = nfsd_acl_procedures2,
.vs_dispatch = nfsd_dispatch,
.vs_xdrsize = NFS3_SVC_XDRSIZE,
- .vs_hidden = 0,
};
diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c
index 0c890347cde3..dcb5f79076c0 100644
--- a/fs/nfsd/nfs3acl.c
+++ b/fs/nfsd/nfs3acl.c
@@ -266,6 +266,5 @@ struct svc_version nfsd_acl_version3 = {
.vs_proc = nfsd_acl_procedures3,
.vs_dispatch = nfsd_dispatch,
.vs_xdrsize = NFS3_SVC_XDRSIZE,
- .vs_hidden = 0,
};

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 74a6e573e061..2b73b37c2b36 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2481,7 +2481,7 @@ struct svc_version nfsd_version4 = {
.vs_proc = nfsd_procedures4,
.vs_dispatch = nfsd_dispatch,
.vs_xdrsize = NFS4_SVC_XDRSIZE,
- .vs_rpcb_optnl = 1,
+ .vs_rpcb_optnl = true,
};

/*
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 7321ae933867..96467c95f02e 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -400,10 +400,11 @@ struct svc_version {
struct svc_procedure * vs_proc; /* per-procedure info */
u32 vs_xdrsize; /* xdrsize needed for this version */

- unsigned int vs_hidden : 1, /* Don't register with portmapper.
- * Only used for nfsacl so far. */
- vs_rpcb_optnl:1;/* Don't care the result of register.
- * Only used for nfsv4. */
+ /* Don't register with rpcbind */
+ bool vs_hidden;
+
+ /* Don't care if the rpcbind registration fails */
+ bool vs_rpcb_optnl;

/* Override dispatch function (e.g. when caching replies).
* A return value of 0 means drop the request.
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 75f290bddca1..85bcdea67a3f 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -385,7 +385,7 @@ static int svc_uses_rpcbind(struct svc_serv *serv)
for (i = 0; i < progp->pg_nvers; i++) {
if (progp->pg_vers[i] == NULL)
continue;
- if (progp->pg_vers[i]->vs_hidden == 0)
+ if (!progp->pg_vers[i]->vs_hidden)
return 1;
}
}
--
2.9.3


2017-02-24 18:25:28

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

v2: comment clarifications, and commit log cleanup. No functional changes.

RFC5661 says:

NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
based transports with the following attributes:


o The transport supports reliable delivery of data, which NFSv4.1
requires but neither NFSv4.1 nor RPC has facilities for ensuring
[34].

o The transport delivers data in the order it was sent. Ordered
delivery simplifies detection of transmit errors, and simplifies
the sending of arbitrary sized requests and responses via the
record marking protocol [3].

...and then some hand-wavy stuff about congestion control. RFC7530
doesn't mention needing reliable and ordered delivery, but it does need
congestion control.

In practical terms, that means we should be excluding NFSv4 from UDP
transports. The NFS server has never enforced this requirement,
however, so a user could issue NFSv4 calls against the server via UDP.

This patchset adds a small bit of infrastructure to the sunrpc layer to
enforce this requirement, and has the nfs and nfsd layers set the
appropriate flags for it on their server-side transports. It also has
the rpcbind client skip registering the protocol version on a UDP port
when that flag is set.

Lightly tested by hand, but it's fairly straightforward.

Jeff Layton (4):
sunrpc: turn bitfield flags in svc_version into bools
sunrpc: flag transports as having both reliable and ordered delivery,
and congestion control
nfs/nfsd/sunrpc: enforce transport requirements for NFSv4
sunrpc: don't register UDP port with rpcbind when version needs
congestion control

fs/nfs/callback_xdr.c | 6 ++++--
fs/nfsd/nfs2acl.c | 1 -
fs/nfsd/nfs3acl.c | 1 -
fs/nfsd/nfs4proc.c | 13 +++++++------
include/linux/sunrpc/svc.h | 12 ++++++++----
include/linux/sunrpc/svc_xprt.h | 1 +
net/sunrpc/svc.c | 23 ++++++++++++++++++++++-
net/sunrpc/svcsock.c | 1 +
net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++
9 files changed, 51 insertions(+), 15 deletions(-)

--
2.9.3


2017-02-24 18:25:30

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 2/4] sunrpc: flag transports as having both reliable and ordered delivery, and congestion control

NFSv4 requires a transport protocol with reliable and ordered delivery,
and congestion control in most cases.

On an IP network, that means that NFSv4 over UDP should be forbidden.

The situation with RDMA is a bit more nuanced, but most RDMA transports
are suitable for this. For now, we assume that all RDMA transports are
suitable, but we may need to revise that at some point.

Signed-off-by: Jeff Layton <[email protected]>
---
include/linux/sunrpc/svc_xprt.h | 1 +
net/sunrpc/svcsock.c | 1 +
net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++
3 files changed, 10 insertions(+)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 7440290f64ac..c832acf509d8 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -67,6 +67,7 @@ struct svc_xprt {
#define XPT_CACHE_AUTH 11 /* cache auth info */
#define XPT_LOCAL 12 /* connection from loopback interface */
#define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */
+#define XPT_CONG_CTRL 14 /* has reliable and ordered delivery, and congestion control */

struct svc_serv *xpt_server; /* service for transport */
atomic_t xpt_reserved; /* space on outq that is rsvd */
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index de066acdb34e..1956b8b96b2d 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class,
&svsk->sk_xprt, serv);
set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
+ set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags);
if (sk->sk_state == TCP_LISTEN) {
dprintk("setting up TCP socket for listening\n");
set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 39652d390a9c..0476b412c7b1 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -571,6 +571,14 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
spin_lock_init(&cma_xprt->sc_ctxt_lock);
spin_lock_init(&cma_xprt->sc_map_lock);

+ /*
+ * Note that this implies that the underlying transport support
+ * reliable and ordered delivery, and (generally) some form of
+ * congestion control. For now, we assume that all supported RDMA
+ * transports are suitable here.
+ */
+ set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags);
+
if (listener)
set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);

--
2.9.3


2017-02-24 18:25:31

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 4/4] sunrpc: don't register UDP port with rpcbind when version needs congestion control

Signed-off-by: Jeff Layton <[email protected]>
---
net/sunrpc/svc.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 668e01d8abb8..bb4a1a50305b 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -976,6 +976,13 @@ int svc_register(const struct svc_serv *serv, struct net *net,
if (vers->vs_hidden)
continue;

+ /*
+ * Don't register a UDP port if we need congestion
+ * control.
+ */
+ if (vers->vs_need_cong_ctrl && proto == IPPROTO_UDP)
+ continue;
+
error = __svc_register(net, progp->pg_name, progp->pg_prog,
i, family, proto, port);

--
2.9.3


2017-02-24 18:25:30

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 3/4] nfs/nfsd/sunrpc: enforce transport requirements for NFSv4

NFSv4 requires reliable and ordered delivery, and (in most cases)
congestion control. In practical terms, that means that you should not
run NFSv4 over UDP. The server has never enforced that requirement,
however.

This patchset fixes this by adding a new flag to the svc_version that
states that it has these transport requirements. With that, we can check
that the transport has XPT_CONG_CTRL set before processing an RPC. If it
doesn't we reject it with RPC_PROG_MISMATCH.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/callback_xdr.c | 2 ++
fs/nfsd/nfs4proc.c | 13 +++++++------
include/linux/sunrpc/svc.h | 3 +++
net/sunrpc/svc.c | 14 ++++++++++++++
4 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index e9836f611d9c..fd0284c1dc32 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -1084,6 +1084,7 @@ struct svc_version nfs4_callback_version1 = {
.vs_xdrsize = NFS4_CALLBACK_XDRSIZE,
.vs_dispatch = NULL,
.vs_hidden = true,
+ .vs_need_cong_ctrl = true,
};

struct svc_version nfs4_callback_version4 = {
@@ -1093,4 +1094,5 @@ struct svc_version nfs4_callback_version4 = {
.vs_xdrsize = NFS4_CALLBACK_XDRSIZE,
.vs_dispatch = NULL,
.vs_hidden = true,
+ .vs_need_cong_ctrl = true,
};
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 2b73b37c2b36..f82fcaa2e1d9 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2476,12 +2476,13 @@ static struct svc_procedure nfsd_procedures4[2] = {
};

struct svc_version nfsd_version4 = {
- .vs_vers = 4,
- .vs_nproc = 2,
- .vs_proc = nfsd_procedures4,
- .vs_dispatch = nfsd_dispatch,
- .vs_xdrsize = NFS4_SVC_XDRSIZE,
- .vs_rpcb_optnl = true,
+ .vs_vers = 4,
+ .vs_nproc = 2,
+ .vs_proc = nfsd_procedures4,
+ .vs_dispatch = nfsd_dispatch,
+ .vs_xdrsize = NFS4_SVC_XDRSIZE,
+ .vs_rpcb_optnl = true,
+ .vs_need_cong_ctrl = true,
};

/*
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 96467c95f02e..45a7da2b169e 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -406,6 +406,9 @@ struct svc_version {
/* Don't care if the rpcbind registration fails */
bool vs_rpcb_optnl;

+ /* Need xprt with reliable, ordered delivery and congestion control */
+ bool vs_need_cong_ctrl;
+
/* Override dispatch function (e.g. when caching replies).
* A return value of 0 means drop the request.
* vs_dispatch == NULL means use default dispatcher.
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 85bcdea67a3f..668e01d8abb8 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1169,6 +1169,20 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
!(versp = progp->pg_vers[vers]))
goto err_bad_vers;

+ /*
+ * Some protocol versions (namely NFSv4) require reliable and ordered
+ * delivery and usually some form of congestion control. IOW, UDP is
+ * not allowed. We mark those when setting up the svc_xprt, and verify
+ * that here.
+ *
+ * The spec is not very clear about what error should be returned when
+ * someone tries to access a server that is listening on UDP for lower
+ * versions though. RPC_PROG_MISMATCH seems to be the closest fit.
+ */
+ if (versp->vs_need_cong_ctrl &&
+ !test_bit(XPT_CONG_CTRL, &rqstp->rq_xprt->xpt_flags))
+ goto err_bad_vers;
+
procp = versp->vs_proc + proc;
if (proc >= versp->vs_nproc || !procp->pc_func)
goto err_bad_proc;
--
2.9.3


2017-02-24 18:39:02

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements


> On Feb 24, 2017, at 10:25 AM, Jeff Layton <[email protected]> wrote:
>
> v2: comment clarifications, and commit log cleanup. No functional changes.
>
> RFC5661 says:
>
> NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
> based transports with the following attributes:
>
>
> o The transport supports reliable delivery of data, which NFSv4.1
> requires but neither NFSv4.1 nor RPC has facilities for ensuring
> [34].
>
> o The transport delivers data in the order it was sent. Ordered
> delivery simplifies detection of transmit errors, and simplifies
> the sending of arbitrary sized requests and responses via the
> record marking protocol [3].
>
> ...and then some hand-wavy stuff about congestion control. RFC7530
> doesn't mention needing reliable and ordered delivery, but it does need
> congestion control.
>
> In practical terms, that means we should be excluding NFSv4 from UDP
> transports. The NFS server has never enforced this requirement,
> however, so a user could issue NFSv4 calls against the server via UDP.

RPC-over-RDMA Version One requires the use of RDMA Reliable
Connections, which is a layer above the link layer that
provides reliable, in-order delivery using connection
semantics. This meets all stated transport requirements in
RFC 5661.

The language of RFC 5661 says that UDP by itself must not be
used for NFSv4. IMO the use of Reliable Connections with
RPC-over-RDMA makes this a non-issue for NFSv4, even for RoCE
v2.

rfc5667bis-06 was submitted this morning to address this.


> This patchset adds a small bit of infrastructure to the sunrpc layer to
> enforce this requirement, and has the nfs and nfsd layers set the
> appropriate flags for it on their server-side transports. It also has
> the rpcbind client skip registering the protocol version on a UDP port
> when that flag is set.
>
> Lightly tested by hand, but it's fairly straightforward.
>
> Jeff Layton (4):
> sunrpc: turn bitfield flags in svc_version into bools
> sunrpc: flag transports as having both reliable and ordered delivery,
> and congestion control
> nfs/nfsd/sunrpc: enforce transport requirements for NFSv4
> sunrpc: don't register UDP port with rpcbind when version needs
> congestion control
>
> fs/nfs/callback_xdr.c | 6 ++++--
> fs/nfsd/nfs2acl.c | 1 -
> fs/nfsd/nfs3acl.c | 1 -
> fs/nfsd/nfs4proc.c | 13 +++++++------
> include/linux/sunrpc/svc.h | 12 ++++++++----
> include/linux/sunrpc/svc_xprt.h | 1 +
> net/sunrpc/svc.c | 23 ++++++++++++++++++++++-
> net/sunrpc/svcsock.c | 1 +
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++
> 9 files changed, 51 insertions(+), 15 deletions(-)
>
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2017-02-24 18:57:09

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

On 2/24/2017 1:25 PM, Jeff Layton wrote:
> v2: comment clarifications, and commit log cleanup. No functional changes.
>
> RFC5661 says:
>
> NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
> based transports with the following attributes:
>
>
> o The transport supports reliable delivery of data, which NFSv4.1
> requires but neither NFSv4.1 nor RPC has facilities for ensuring
> [34].
>
> o The transport delivers data in the order it was sent. Ordered
> delivery simplifies detection of transmit errors, and simplifies
> the sending of arbitrary sized requests and responses via the
> record marking protocol [3].
>
> ...and then some hand-wavy stuff about congestion control. RFC7530
> doesn't mention needing reliable and ordered delivery, but it does need
> congestion control.

Snipping some stuff for a pedantic response :-)

There are several good reasons why RFC7530 does not specify reliable and
ordered. The most obvious being, it doesn't need them. Because it has
a session, it can handle out-of-order messages at its layer. This is in
fact critical to supporting trunking and multipathing. And with the
session comes the ability to detect replays, so reliability can be
obviated there too.

In fact, apart from congestion control, with the proper session support,
NFSv4.1 can run very nicely over an unreliable unordered transport.
Now, NFS4.0, and NFSv3 and NFSv2 before it, are another matter entirely.

Note that RDMA transports provide remote direct placement only in RC
(Reliable Connected) endpoints, which is why rpcrdma uses that mode.

Tom.

2017-02-24 19:02:55

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

On Fri, 2017-02-24 at 10:38 -0800, Chuck Lever wrote:
> > On Feb 24, 2017, at 10:25 AM, Jeff Layton <[email protected]> wrote:
> >
> > v2: comment clarifications, and commit log cleanup. No functional changes.
> >
> > RFC5661 says:
> >
> > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
> > based transports with the following attributes:
> >
> >
> > o The transport supports reliable delivery of data, which NFSv4.1
> > requires but neither NFSv4.1 nor RPC has facilities for ensuring
> > [34].
> >
> > o The transport delivers data in the order it was sent. Ordered
> > delivery simplifies detection of transmit errors, and simplifies
> > the sending of arbitrary sized requests and responses via the
> > record marking protocol [3].
> >
> > ...and then some hand-wavy stuff about congestion control. RFC7530
> > doesn't mention needing reliable and ordered delivery, but it does need
> > congestion control.
> >
> > In practical terms, that means we should be excluding NFSv4 from UDP
> > transports. The NFS server has never enforced this requirement,
> > however, so a user could issue NFSv4 calls against the server via UDP.
>
> RPC-over-RDMA Version One requires the use of RDMA Reliable
> Connections, which is a layer above the link layer that
> provides reliable, in-order delivery using connection
> semantics. This meets all stated transport requirements in
> RFC 5661.
>
> The language of RFC 5661 says that UDP by itself must not be
> used for NFSv4. IMO the use of Reliable Connections with
> RPC-over-RDMA makes this a non-issue for NFSv4, even for RoCE
> v2.
>
> rfc5667bis-06 was submitted this morning to address this.
>

Thanks, I may plagiarize you and update the comment in rdma_create_xprt
if that's ok:

+ /*
+ * RPC-over-RDMA Version One requires the use of RDMA Reliable
+ * Connections, which is a layer above the link layer that provides
+ * reliable, in-order delivery using connection semantics.
+ */

I won't bother to re-post just for that though.

> > This patchset adds a small bit of infrastructure to the sunrpc layer to
> > enforce this requirement, and has the nfs and nfsd layers set the
> > appropriate flags for it on their server-side transports. It also has
> > the rpcbind client skip registering the protocol version on a UDP port
> > when that flag is set.
> >
> > Lightly tested by hand, but it's fairly straightforward.
> >
> > Jeff Layton (4):
> > sunrpc: turn bitfield flags in svc_version into bools
> > sunrpc: flag transports as having both reliable and ordered delivery,
> > and congestion control
> > nfs/nfsd/sunrpc: enforce transport requirements for NFSv4
> > sunrpc: don't register UDP port with rpcbind when version needs
> > congestion control
> >
> > fs/nfs/callback_xdr.c | 6 ++++--
> > fs/nfsd/nfs2acl.c | 1 -
> > fs/nfsd/nfs3acl.c | 1 -
> > fs/nfsd/nfs4proc.c | 13 +++++++------
> > include/linux/sunrpc/svc.h | 12 ++++++++----
> > include/linux/sunrpc/svc_xprt.h | 1 +
> > net/sunrpc/svc.c | 23 ++++++++++++++++++++++-
> > net/sunrpc/svcsock.c | 1 +
> > net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++
> > 9 files changed, 51 insertions(+), 15 deletions(-)
> >
> > --
> > 2.9.3
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>

--
Jeff Layton <[email protected]>

2017-02-24 21:23:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

On Fri, Feb 24, 2017 at 01:53:21PM -0500, Tom Talpey wrote:
> On 2/24/2017 1:25 PM, Jeff Layton wrote:
> >v2: comment clarifications, and commit log cleanup. No functional changes.
> >
> >RFC5661 says:
> >
> > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
> > based transports with the following attributes:
> >
> >
> > o The transport supports reliable delivery of data, which NFSv4.1
> > requires but neither NFSv4.1 nor RPC has facilities for ensuring
> > [34].
> >
> > o The transport delivers data in the order it was sent. Ordered
> > delivery simplifies detection of transmit errors, and simplifies
> > the sending of arbitrary sized requests and responses via the
> > record marking protocol [3].
> >
> >...and then some hand-wavy stuff about congestion control. RFC7530
> >doesn't mention needing reliable and ordered delivery, but it does need
> >congestion control.
>
> Snipping some stuff for a pedantic response :-)
>
> There are several good reasons why RFC7530 does not specify reliable and
> ordered.

OK, I'm dropping "reliable and ordered" from the comments and applying.

--b.

> The most obvious being, it doesn't need them. Because it has
> a session, it can handle out-of-order messages at its layer. This is in
> fact critical to supporting trunking and multipathing. And with the
> session comes the ability to detect replays, so reliability can be
> obviated there too.
>
> In fact, apart from congestion control, with the proper session support,
> NFSv4.1 can run very nicely over an unreliable unordered transport.
> Now, NFS4.0, and NFSv3 and NFSv2 before it, are another matter entirely.
>
> Note that RDMA transports provide remote direct placement only in RC
> (Reliable Connected) endpoints, which is why rpcrdma uses that mode.
>
> Tom.

2017-02-24 21:26:39

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

The one other minor thing we could do is skip adding the UDP listener
entirely in the v4-only case. I think that's a job for rpc.nfsd?

--b.

On Fri, Feb 24, 2017 at 01:25:21PM -0500, Jeff Layton wrote:
> v2: comment clarifications, and commit log cleanup. No functional changes.
>
> RFC5661 says:
>
> NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
> based transports with the following attributes:
>
>
> o The transport supports reliable delivery of data, which NFSv4.1
> requires but neither NFSv4.1 nor RPC has facilities for ensuring
> [34].
>
> o The transport delivers data in the order it was sent. Ordered
> delivery simplifies detection of transmit errors, and simplifies
> the sending of arbitrary sized requests and responses via the
> record marking protocol [3].
>
> ...and then some hand-wavy stuff about congestion control. RFC7530
> doesn't mention needing reliable and ordered delivery, but it does need
> congestion control.
>
> In practical terms, that means we should be excluding NFSv4 from UDP
> transports. The NFS server has never enforced this requirement,
> however, so a user could issue NFSv4 calls against the server via UDP.
>
> This patchset adds a small bit of infrastructure to the sunrpc layer to
> enforce this requirement, and has the nfs and nfsd layers set the
> appropriate flags for it on their server-side transports. It also has
> the rpcbind client skip registering the protocol version on a UDP port
> when that flag is set.
>
> Lightly tested by hand, but it's fairly straightforward.
>
> Jeff Layton (4):
> sunrpc: turn bitfield flags in svc_version into bools
> sunrpc: flag transports as having both reliable and ordered delivery,
> and congestion control
> nfs/nfsd/sunrpc: enforce transport requirements for NFSv4
> sunrpc: don't register UDP port with rpcbind when version needs
> congestion control
>
> fs/nfs/callback_xdr.c | 6 ++++--
> fs/nfsd/nfs2acl.c | 1 -
> fs/nfsd/nfs3acl.c | 1 -
> fs/nfsd/nfs4proc.c | 13 +++++++------
> include/linux/sunrpc/svc.h | 12 ++++++++----
> include/linux/sunrpc/svc_xprt.h | 1 +
> net/sunrpc/svc.c | 23 ++++++++++++++++++++++-
> net/sunrpc/svcsock.c | 1 +
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++
> 9 files changed, 51 insertions(+), 15 deletions(-)
>
> --
> 2.9.3

2017-02-24 21:31:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

On Fri, Feb 24, 2017 at 01:53:33PM -0500, Jeff Layton wrote:
> On Fri, 2017-02-24 at 10:38 -0800, Chuck Lever wrote:
> > > On Feb 24, 2017, at 10:25 AM, Jeff Layton <[email protected]> wrote:
> > >
> > > v2: comment clarifications, and commit log cleanup. No functional changes.
> > >
> > > RFC5661 says:
> > >
> > > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
> > > based transports with the following attributes:
> > >
> > >
> > > o The transport supports reliable delivery of data, which NFSv4.1
> > > requires but neither NFSv4.1 nor RPC has facilities for ensuring
> > > [34].
> > >
> > > o The transport delivers data in the order it was sent. Ordered
> > > delivery simplifies detection of transmit errors, and simplifies
> > > the sending of arbitrary sized requests and responses via the
> > > record marking protocol [3].
> > >
> > > ...and then some hand-wavy stuff about congestion control. RFC7530
> > > doesn't mention needing reliable and ordered delivery, but it does need
> > > congestion control.
> > >
> > > In practical terms, that means we should be excluding NFSv4 from UDP
> > > transports. The NFS server has never enforced this requirement,
> > > however, so a user could issue NFSv4 calls against the server via UDP.
> >
> > RPC-over-RDMA Version One requires the use of RDMA Reliable
> > Connections, which is a layer above the link layer that
> > provides reliable, in-order delivery using connection
> > semantics. This meets all stated transport requirements in
> > RFC 5661.
> >
> > The language of RFC 5661 says that UDP by itself must not be
> > used for NFSv4. IMO the use of Reliable Connections with
> > RPC-over-RDMA makes this a non-issue for NFSv4, even for RoCE
> > v2.
> >
> > rfc5667bis-06 was submitted this morning to address this.
> >
>
> Thanks, I may plagiarize you and update the comment in rdma_create_xprt
> if that's ok:
>
> + /*
> + * RPC-over-RDMA Version One requires the use of RDMA Reliable
> + * Connections, which is a layer above the link layer that provides
> + * reliable, in-order delivery using connection semantics.
> + */
>
> I won't bother to re-post just for that though.

Unless we thnk this is esepcially important I'd rather leave the comment
as is ("we assume that all supported RDMA transports are suitable
here.") instead of getting into more detail.

--b.

>
> > > This patchset adds a small bit of infrastructure to the sunrpc layer to
> > > enforce this requirement, and has the nfs and nfsd layers set the
> > > appropriate flags for it on their server-side transports. It also has
> > > the rpcbind client skip registering the protocol version on a UDP port
> > > when that flag is set.
> > >
> > > Lightly tested by hand, but it's fairly straightforward.
> > >
> > > Jeff Layton (4):
> > > sunrpc: turn bitfield flags in svc_version into bools
> > > sunrpc: flag transports as having both reliable and ordered delivery,
> > > and congestion control
> > > nfs/nfsd/sunrpc: enforce transport requirements for NFSv4
> > > sunrpc: don't register UDP port with rpcbind when version needs
> > > congestion control
> > >
> > > fs/nfs/callback_xdr.c | 6 ++++--
> > > fs/nfsd/nfs2acl.c | 1 -
> > > fs/nfsd/nfs3acl.c | 1 -
> > > fs/nfsd/nfs4proc.c | 13 +++++++------
> > > include/linux/sunrpc/svc.h | 12 ++++++++----
> > > include/linux/sunrpc/svc_xprt.h | 1 +
> > > net/sunrpc/svc.c | 23 ++++++++++++++++++++++-
> > > net/sunrpc/svcsock.c | 1 +
> > > net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++
> > > 9 files changed, 51 insertions(+), 15 deletions(-)
> > >
> > > --
> > > 2.9.3
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> > --
> > Chuck Lever
> >
> >
> >
>
> --
> Jeff Layton <[email protected]>

2017-02-24 21:34:31

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

On Fri, 2017-02-24 at 16:25 -0500, J. Bruce Fields wrote:
> The one other minor thing we could do is skip adding the UDP listener
> entirely in the v4-only case. I think that's a job for rpc.nfsd?
>
> --b.
>

Yeah I think we'd need to fix that in rpc.nfsd.

Maybe it's time to just start doing having it do TCP-only by default
anyway? Make it so you have to explicitly enable UDP listeners if you
want them? Does anyone seriously run NFS over UDP these days for
anything other than interop testing? :)


> On Fri, Feb 24, 2017 at 01:25:21PM -0500, Jeff Layton wrote:
> > v2: comment clarifications, and commit log cleanup. No functional changes.
> >
> > RFC5661 says:
> >
> > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
> > based transports with the following attributes:
> >
> >
> > o The transport supports reliable delivery of data, which NFSv4.1
> > requires but neither NFSv4.1 nor RPC has facilities for ensuring
> > [34].
> >
> > o The transport delivers data in the order it was sent. Ordered
> > delivery simplifies detection of transmit errors, and simplifies
> > the sending of arbitrary sized requests and responses via the
> > record marking protocol [3].
> >
> > ...and then some hand-wavy stuff about congestion control. RFC7530
> > doesn't mention needing reliable and ordered delivery, but it does need
> > congestion control.
> >
> > In practical terms, that means we should be excluding NFSv4 from UDP
> > transports. The NFS server has never enforced this requirement,
> > however, so a user could issue NFSv4 calls against the server via UDP.
> >
> > This patchset adds a small bit of infrastructure to the sunrpc layer to
> > enforce this requirement, and has the nfs and nfsd layers set the
> > appropriate flags for it on their server-side transports. It also has
> > the rpcbind client skip registering the protocol version on a UDP port
> > when that flag is set.
> >
> > Lightly tested by hand, but it's fairly straightforward.
> >
> > Jeff Layton (4):
> > sunrpc: turn bitfield flags in svc_version into bools
> > sunrpc: flag transports as having both reliable and ordered delivery,
> > and congestion control
> > nfs/nfsd/sunrpc: enforce transport requirements for NFSv4
> > sunrpc: don't register UDP port with rpcbind when version needs
> > congestion control
> >
> > fs/nfs/callback_xdr.c | 6 ++++--
> > fs/nfsd/nfs2acl.c | 1 -
> > fs/nfsd/nfs3acl.c | 1 -
> > fs/nfsd/nfs4proc.c | 13 +++++++------
> > include/linux/sunrpc/svc.h | 12 ++++++++----
> > include/linux/sunrpc/svc_xprt.h | 1 +
> > net/sunrpc/svc.c | 23 ++++++++++++++++++++++-
> > net/sunrpc/svcsock.c | 1 +
> > net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++
> > 9 files changed, 51 insertions(+), 15 deletions(-)
> >
> > --
> > 2.9.3

--
Jeff Layton <[email protected]>

2017-02-24 21:51:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

On Fri, Feb 24, 2017 at 04:34:24PM -0500, Jeff Layton wrote:
> On Fri, 2017-02-24 at 16:25 -0500, J. Bruce Fields wrote:
> > The one other minor thing we could do is skip adding the UDP listener
> > entirely in the v4-only case. I think that's a job for rpc.nfsd?
> >
> > --b.
> >
>
> Yeah I think we'd need to fix that in rpc.nfsd.
>
> Maybe it's time to just start doing having it do TCP-only by default
> anyway? Make it so you have to explicitly enable UDP listeners if you
> want them? Does anyone seriously run NFS over UDP these days for
> anything other than interop testing? :)

I thought I remembered somebody floating this on linux-nfs a couple
years ago and finding there were still a couple vocal users. Or maybe
that was NFSv2. I can't find the thread now.

I'm pretty conservative about anything that might break people's ancient
but working setups on upgrade, but maybe it's time.

Just switching the default to off in nfs-utils first would be the way to
go, I think, then if that goes well we could think about phasing out
kernel support.

--b.

>
>
> > On Fri, Feb 24, 2017 at 01:25:21PM -0500, Jeff Layton wrote:
> > > v2: comment clarifications, and commit log cleanup. No functional changes.
> > >
> > > RFC5661 says:
> > >
> > > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-
> > > based transports with the following attributes:
> > >
> > >
> > > o The transport supports reliable delivery of data, which NFSv4.1
> > > requires but neither NFSv4.1 nor RPC has facilities for ensuring
> > > [34].
> > >
> > > o The transport delivers data in the order it was sent. Ordered
> > > delivery simplifies detection of transmit errors, and simplifies
> > > the sending of arbitrary sized requests and responses via the
> > > record marking protocol [3].
> > >
> > > ...and then some hand-wavy stuff about congestion control. RFC7530
> > > doesn't mention needing reliable and ordered delivery, but it does need
> > > congestion control.
> > >
> > > In practical terms, that means we should be excluding NFSv4 from UDP
> > > transports. The NFS server has never enforced this requirement,
> > > however, so a user could issue NFSv4 calls against the server via UDP.
> > >
> > > This patchset adds a small bit of infrastructure to the sunrpc layer to
> > > enforce this requirement, and has the nfs and nfsd layers set the
> > > appropriate flags for it on their server-side transports. It also has
> > > the rpcbind client skip registering the protocol version on a UDP port
> > > when that flag is set.
> > >
> > > Lightly tested by hand, but it's fairly straightforward.
> > >
> > > Jeff Layton (4):
> > > sunrpc: turn bitfield flags in svc_version into bools
> > > sunrpc: flag transports as having both reliable and ordered delivery,
> > > and congestion control
> > > nfs/nfsd/sunrpc: enforce transport requirements for NFSv4
> > > sunrpc: don't register UDP port with rpcbind when version needs
> > > congestion control
> > >
> > > fs/nfs/callback_xdr.c | 6 ++++--
> > > fs/nfsd/nfs2acl.c | 1 -
> > > fs/nfsd/nfs3acl.c | 1 -
> > > fs/nfsd/nfs4proc.c | 13 +++++++------
> > > include/linux/sunrpc/svc.h | 12 ++++++++----
> > > include/linux/sunrpc/svc_xprt.h | 1 +
> > > net/sunrpc/svc.c | 23 ++++++++++++++++++++++-
> > > net/sunrpc/svcsock.c | 1 +
> > > net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++
> > > 9 files changed, 51 insertions(+), 15 deletions(-)
> > >
> > > --
> > > 2.9.3
>
> --
> Jeff Layton <[email protected]>

2017-02-27 12:17:54

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

On 2/27/2017 6:59 AM, Jeff Layton wrote:
> What we'd need to make that happen, I think is a [global] stanza in
> nfs.conf with a single 'nfsd_v3' boolean that defaults to off. If

Don't forget v2! And maybe even v4.0 if you're encouraging non-legacy
operation. RFC3530 was published 14 years ago, btw. RFC1813 in 1995,
and RFC1094 in 1989.

2017-02-27 13:04:09

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

On Mon, 2017-02-27 at 07:08 -0500, Tom Talpey wrote:
> On 2/27/2017 6:59 AM, Jeff Layton wrote:
> > What we'd need to make that happen, I think is a [global] stanza in
> > nfs.conf with a single 'nfsd_v3' boolean that defaults to off. If
>
> Don't forget v2! And maybe even v4.0 if you're encouraging non-legacy
> operation. RFC3530 was published 14 years ago, btw. RFC1813 in 1995,
> and RFC1094 in 1989.

I think v2 already defaults to off these days? But yeah, I could see us
adding a similar boolean for v2. Maybe we don't need a new switch at
all, and just need to have everything look at the [nfsd] vers2= and
vers3= config file options?

I think wiring nfsd and mountd up properly for this would be fairly easy
here. statd is a little tougher since we don't want to run it or sm-
notify at all if v2/3 are disabled. I wonder if there is any way we can
make systemd look at this config file and decide whether to start statd
based on whether either of those options is set?

I'd have no issue with eventually defaulting with v4.0 disabled as well,
but there are a fair number of clients in the field that don't support
v4.1 (or don't support it well). I think we'd need to wait and see how
much grief we get about disabling v3 by default before we go there.

--
Jeff Layton <[email protected]>

2017-02-27 13:05:48

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

On Fri, 2017-02-24 at 16:44 -0500, J. Bruce Fields wrote:
> On Fri, Feb 24, 2017 at 04:34:24PM -0500, Jeff Layton wrote:
> > On Fri, 2017-02-24 at 16:25 -0500, J. Bruce Fields wrote:
> > > The one other minor thing we could do is skip adding the UDP listener
> > > entirely in the v4-only case. I think that's a job for rpc.nfsd?
> > >
> > > --b.
> > >
> >
> > Yeah I think we'd need to fix that in rpc.nfsd.
> >
> > Maybe it's time to just start doing having it do TCP-only by default
> > anyway? Make it so you have to explicitly enable UDP listeners if you
> > want them? Does anyone seriously run NFS over UDP these days for
> > anything other than interop testing? :)
>
> I thought I remembered somebody floating this on linux-nfs a couple
> years ago and finding there were still a couple vocal users. Or maybe
> that was NFSv2. I can't find the thread now.
>
> I'm pretty conservative about anything that might break people's ancient
> but working setups on upgrade, but maybe it's time.
>
> Just switching the default to off in nfs-utils first would be the way to
> go, I think, then if that goes well we could think about phasing out
> kernel support.
>
> --b.
>

Ok, I posted a patch a couple of days ago as an RFC. It's pretty
straightforward and works. I don't see any need to turn off kernel
support just yet. If we do have users who need it, turning it back on is
pretty trivial with nfs.conf.

What I'd really like is to eventually have distros move to a default
nfsd configuration that is v4-only. Have the kernel only listen for v4
calls on TCP, turn off lockd and statd, and make mountd not open any IP
sockets.

What we'd need to make that happen, I think is a [global] stanza in
nfs.conf with a single 'nfsd_v3' boolean that defaults to off. If
someone needs to serve v3, they could turn that on and everything would
be reenabled. That would take a bit of plumbing through various daemons
though.

--
Jeff Layton <[email protected]>

2017-02-27 14:20:33

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements

On Mon, Feb 27, 2017 at 07:55:55AM -0500, Jeff Layton wrote:
> On Mon, 2017-02-27 at 07:08 -0500, Tom Talpey wrote:
> > On 2/27/2017 6:59 AM, Jeff Layton wrote:
> > > What we'd need to make that happen, I think is a [global] stanza in
> > > nfs.conf with a single 'nfsd_v3' boolean that defaults to off. If
> >
> > Don't forget v2! And maybe even v4.0 if you're encouraging non-legacy
> > operation. RFC3530 was published 14 years ago, btw. RFC1813 in 1995,
> > and RFC1094 in 1989.

Looking just at the RHEL history.... I think we enabled experimental v4
in 2005 in RHEL4, but regretted that. It wasn't a default until RHEL6
in 2010. Other OS's were different, but in general I think
implementation lagged specification by a lot. Ditto to some degree for
4.1.

> I think v2 already defaults to off these days? But yeah, I could see us
> adding a similar boolean for v2. Maybe we don't need a new switch at
> all, and just need to have everything look at the [nfsd] vers2= and
> vers3= config file options?
>
> I think wiring nfsd and mountd up properly for this would be fairly easy
> here. statd is a little tougher since we don't want to run it or sm-
> notify at all if v2/3 are disabled. I wonder if there is any way we can
> make systemd look at this config file and decide whether to start statd
> based on whether either of those options is set?

Neil might have ideas--see https://lwn.net/Articles/701549/.

--b.

> I'd have no issue with eventually defaulting with v4.0 disabled as well,
> but there are a fair number of clients in the field that don't support
> v4.1 (or don't support it well). I think we'd need to wait and see how
> much grief we get about disabling v3 by default before we go there.