2017-04-28 17:25:39

by Trond Myklebust

[permalink] [raw]
Subject: [RFC PATCH 0/5] Fun with the multipathing code

In the spirit of experimentation, I've put together a set of patches
that implement setting up multiple TCP connections to the server.
The connections all go to the same server IP address, so do not
provide support for multiple IP addresses (which I believe is
something Andy Adamson is working on).

The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't
feel comfortable subjecting NFSv3/v4 replay caches to this
treatment yet. It relies on the mount option "nconnect" to specify
the number of connections to st up. So you can do something like
'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt'
to set up 8 TCP connections to server 'foo'.

Anyhow, feel free to test and give me feedback as to whether or not
this helps performance on your system.

Trond Myklebust (5):
SUNRPC: Allow creation of RPC clients with multiple connections
NFS: Add a mount option to specify number of TCP connections to use
NFSv4: Allow multiple connections to NFSv4.x (x>0) servers
pNFS: Allow multiple connections to the DS
NFS: Display the "nconnect" mount option if it is set.

fs/nfs/client.c | 2 ++
fs/nfs/internal.h | 2 ++
fs/nfs/nfs3client.c | 3 +++
fs/nfs/nfs4client.c | 13 +++++++++++--
fs/nfs/super.c | 12 ++++++++++++
include/linux/nfs_fs_sb.h | 1 +
include/linux/sunrpc/clnt.h | 1 +
net/sunrpc/clnt.c | 17 ++++++++++++++++-
net/sunrpc/xprtmultipath.c | 3 +--
9 files changed, 49 insertions(+), 5 deletions(-)

--
2.9.3



2017-04-28 17:25:40

by Trond Myklebust

[permalink] [raw]
Subject: [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections

Add an argument to struct rpc_create_args that allows the specification
of how many transport connections you want to set up to the server.

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/clnt.h | 1 +
net/sunrpc/clnt.c | 17 ++++++++++++++++-
net/sunrpc/xprtmultipath.c | 3 +--
3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 6095ecba0dde..8c3cb38a385b 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -120,6 +120,7 @@ struct rpc_create_args {
u32 prognumber; /* overrides program->number */
u32 version;
rpc_authflavor_t authflavor;
+ u32 nconnect;
unsigned long flags;
char *client_name;
struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 673046c64e48..0ff97288b43f 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -522,6 +522,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
.bc_xprt = args->bc_xprt,
};
char servername[48];
+ struct rpc_clnt *clnt;
+ int i;

if (args->bc_xprt) {
WARN_ON_ONCE(!(args->protocol & XPRT_TRANSPORT_BC));
@@ -584,7 +586,15 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
if (args->flags & RPC_CLNT_CREATE_NONPRIVPORT)
xprt->resvport = 0;

- return rpc_create_xprt(args, xprt);
+ clnt = rpc_create_xprt(args, xprt);
+ if (IS_ERR(clnt) || args->nconnect <= 1)
+ return clnt;
+
+ for (i = 0; i < args->nconnect - 1; i++) {
+ if (rpc_clnt_add_xprt(clnt, &xprtargs, NULL, NULL) < 0)
+ break;
+ }
+ return clnt;
}
EXPORT_SYMBOL_GPL(rpc_create);

@@ -2605,6 +2615,10 @@ int rpc_clnt_test_and_add_xprt(struct rpc_clnt *clnt,
return -ENOMEM;
data->xps = xprt_switch_get(xps);
data->xprt = xprt_get(xprt);
+ if (rpc_xprt_switch_has_addr(data->xps, (struct sockaddr *)&xprt->addr)) {
+ rpc_cb_add_xprt_release(data);
+ goto success;
+ }

cred = authnull_ops.lookup_cred(NULL, NULL, 0);
task = rpc_call_null_helper(clnt, xprt, cred,
@@ -2614,6 +2628,7 @@ int rpc_clnt_test_and_add_xprt(struct rpc_clnt *clnt,
if (IS_ERR(task))
return PTR_ERR(task);
rpc_put_task(task);
+success:
return 1;
}
EXPORT_SYMBOL_GPL(rpc_clnt_test_and_add_xprt);
diff --git a/net/sunrpc/xprtmultipath.c b/net/sunrpc/xprtmultipath.c
index 95064d510ce6..486819d0c58b 100644
--- a/net/sunrpc/xprtmultipath.c
+++ b/net/sunrpc/xprtmultipath.c
@@ -51,8 +51,7 @@ void rpc_xprt_switch_add_xprt(struct rpc_xprt_switch *xps,
if (xprt == NULL)
return;
spin_lock(&xps->xps_lock);
- if ((xps->xps_net == xprt->xprt_net || xps->xps_net == NULL) &&
- !rpc_xprt_switch_has_addr(xps, (struct sockaddr *)&xprt->addr))
+ if (xps->xps_net == xprt->xprt_net || xps->xps_net == NULL)
xprt_switch_add_xprt_locked(xps, xprt);
spin_unlock(&xps->xps_lock);
}
--
2.9.3


2017-04-28 17:25:42

by Trond Myklebust

[permalink] [raw]
Subject: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use

Allow the user to specify that the client should use multiple connections
to the server. For the moment, this functionality will be limited to
TCP and to NFSv4.x (x>0).

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/internal.h | 1 +
fs/nfs/super.c | 10 ++++++++++
2 files changed, 11 insertions(+)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 31b26cf1b476..31757a742e9b 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -117,6 +117,7 @@ struct nfs_parsed_mount_data {
char *export_path;
int port;
unsigned short protocol;
+ unsigned short nconnect;
} nfs_server;

struct security_mnt_opts lsm_opts;
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 54e0f9f2dd94..7eb48934dc79 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -76,6 +76,8 @@
#define NFS_DEFAULT_VERSION 2
#endif

+#define NFS_MAX_CONNECTIONS 16
+
enum {
/* Mount options that take no arguments */
Opt_soft, Opt_hard,
@@ -107,6 +109,7 @@ enum {
Opt_nfsvers,
Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost,
Opt_addr, Opt_mountaddr, Opt_clientaddr,
+ Opt_nconnect,
Opt_lookupcache,
Opt_fscache_uniq,
Opt_local_lock,
@@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = {
{ Opt_mounthost, "mounthost=%s" },
{ Opt_mountaddr, "mountaddr=%s" },

+ { Opt_nconnect, "nconnect=%s" },
+
{ Opt_lookupcache, "lookupcache=%s" },
{ Opt_fscache_uniq, "fsc=%s" },
{ Opt_local_lock, "local_lock=%s" },
@@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw,
if (mnt->mount_server.addrlen == 0)
goto out_invalid_address;
break;
+ case Opt_nconnect:
+ if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS))
+ goto out_invalid_value;
+ mnt->nfs_server.nconnect = option;
+ break;
case Opt_lookupcache:
string = match_strdup(args);
if (string == NULL)
--
2.9.3


2017-04-28 17:25:43

by Trond Myklebust

[permalink] [raw]
Subject: [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers

If the user specifies the -onconn=<number> mount option, and the transport
protocol is TCP, then set up <number> connections to the server. The
connections will all go to the same IP address.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/client.c | 2 ++
fs/nfs/internal.h | 1 +
fs/nfs/nfs4client.c | 10 ++++++++--
include/linux/nfs_fs_sb.h | 1 +
4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index e0302101e18a..c5b0f3e270a3 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -180,6 +180,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
clp->cl_rpcclient = ERR_PTR(-EINVAL);

clp->cl_proto = cl_init->proto;
+ clp->cl_nconnect = cl_init->nconnect;
clp->cl_net = get_net(cl_init->net);

cred = rpc_lookup_machine_cred("*");
@@ -488,6 +489,7 @@ int nfs_create_rpc_client(struct nfs_client *clp,
struct rpc_create_args args = {
.net = clp->cl_net,
.protocol = clp->cl_proto,
+ .nconnect = clp->cl_nconnect,
.address = (struct sockaddr *)&clp->cl_addr,
.addrsize = clp->cl_addrlen,
.timeout = cl_init->timeparms,
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 31757a742e9b..abe5d3934eaf 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -77,6 +77,7 @@ struct nfs_client_initdata {
struct nfs_subversion *nfs_mod;
int proto;
u32 minorversion;
+ unsigned int nconnect;
struct net *net;
const struct rpc_timeout *timeparms;
};
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 692a7a8bfc7a..c9b10b7829f0 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -834,7 +834,8 @@ static int nfs4_set_client(struct nfs_server *server,
const size_t addrlen,
const char *ip_addr,
int proto, const struct rpc_timeout *timeparms,
- u32 minorversion, struct net *net)
+ u32 minorversion, unsigned int nconnect,
+ struct net *net)
{
struct nfs_client_initdata cl_init = {
.hostname = hostname,
@@ -849,6 +850,8 @@ static int nfs4_set_client(struct nfs_server *server,
};
struct nfs_client *clp;

+ if (minorversion > 0 && proto == XPRT_TRANSPORT_TCP)
+ cl_init.nconnect = nconnect;
if (server->flags & NFS_MOUNT_NORESVPORT)
set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
if (server->options & NFS_OPTION_MIGRATION)
@@ -1040,6 +1043,7 @@ static int nfs4_init_server(struct nfs_server *server,
data->nfs_server.protocol,
&timeparms,
data->minorversion,
+ data->nfs_server.nconnect,
data->net);
if (error < 0)
return error;
@@ -1124,6 +1128,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data,
rpc_protocol(parent_server->client),
parent_server->client->cl_timeout,
parent_client->cl_mvops->minor_version,
+ parent_client->cl_nconnect,
parent_client->cl_net);
if (error < 0)
goto error;
@@ -1215,7 +1220,8 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname,
nfs_server_remove_lists(server);
error = nfs4_set_client(server, hostname, sap, salen, buf,
clp->cl_proto, clnt->cl_timeout,
- clp->cl_minorversion, net);
+ clp->cl_minorversion,
+ clp->cl_nconnect, net);
nfs_put_client(clp);
if (error != 0) {
nfs_server_insert_lists(server);
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 2a70f34dffe8..b7e6b94d1246 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -55,6 +55,7 @@ struct nfs_client {
struct nfs_subversion * cl_nfs_mod; /* pointer to nfs version module */

u32 cl_minorversion;/* NFSv4 minorversion */
+ unsigned int cl_nconnect; /* Number of connections */
struct rpc_cred *cl_machine_cred;

#if IS_ENABLED(CONFIG_NFS_V4)
--
2.9.3


2017-04-28 17:25:44

by Trond Myklebust

[permalink] [raw]
Subject: [RFC PATCH 4/5] pNFS: Allow multiple connections to the DS

If the user specifies -onconn=<number> mount option, and the transport
protocol is TCP, then set up <number> connections to the pNFS data server
as well. The connections will all go to the same IP address.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs3client.c | 3 +++
fs/nfs/nfs4client.c | 3 +++
2 files changed, 6 insertions(+)

diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index 7879f2a0fcfd..8c624c74ddbe 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -100,6 +100,9 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv,
return ERR_PTR(-EINVAL);
cl_init.hostname = buf;

+ if (mds_clp->cl_nconnect > 1 && ds_proto == XPRT_TRANSPORT_TCP)
+ cl_init.nconnect = mds_clp->cl_nconnect;
+
if (mds_srv->flags & NFS_MOUNT_NORESVPORT)
set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index c9b10b7829f0..bfea1b232dd2 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -912,6 +912,9 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_server *mds_srv,
return ERR_PTR(-EINVAL);
cl_init.hostname = buf;

+ if (mds_clp->cl_nconnect > 1 && ds_proto == XPRT_TRANSPORT_TCP)
+ cl_init.nconnect = mds_clp->cl_nconnect;
+
if (mds_srv->flags & NFS_MOUNT_NORESVPORT)
__set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);

--
2.9.3


2017-04-28 17:25:45

by Trond Myklebust

[permalink] [raw]
Subject: [RFC PATCH 5/5] NFS: Display the "nconnect" mount option if it is set.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/super.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 7eb48934dc79..0e07a6684235 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -673,6 +673,8 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss,
seq_printf(m, ",proto=%s",
rpc_peeraddr2str(nfss->client, RPC_DISPLAY_NETID));
rcu_read_unlock();
+ if (clp->cl_nconnect > 0)
+ seq_printf(m, ",nconnect=%u", clp->cl_nconnect);
if (version == 4) {
if (nfss->port != NFS_PORT)
seq_printf(m, ",port=%u", nfss->port);
--
2.9.3


2017-04-28 17:45:41

by Chuck Lever III

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] Fun with the multipathing code


> On Apr 28, 2017, at 10:25 AM, Trond Myklebust <[email protected]> wrote:
>
> In the spirit of experimentation, I've put together a set of patches
> that implement setting up multiple TCP connections to the server.
> The connections all go to the same server IP address, so do not
> provide support for multiple IP addresses (which I believe is
> something Andy Adamson is working on).
>
> The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't
> feel comfortable subjecting NFSv3/v4 replay caches to this
> treatment yet. It relies on the mount option "nconnect" to specify
> the number of connections to st up. So you can do something like
> 'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt'
> to set up 8 TCP connections to server 'foo'.

IMO this setting should eventually be set dynamically by the
client, or should be global (eg., a module parameter).

Since mount points to the same server share the same transport,
what happens if you specify a different "nconnect" setting on
two mount points to the same server?

What will the client do if there are not enough resources
(eg source ports) to create that many? Or is this an "up to N"
kind of setting? I can imagine a big client having to reduce
the number of connections to each server to help it scale in
number of server connections.

Other storage protocols have a mechanism for determining how
transport connections are provisioned: One connection per
CPU core (or one CPU per NUMA node) on the client. This gives
a clear way to decide which connection to use for each RPC,
and guarantees the reply will arrive at the same compute
domain that sent the call.

And of course: RPC-over-RDMA really loves this kind of feature
(multiple connections between same IP tuples) to spread the
workload over multiple QPs. There isn't anything special needed
for RDMA, I hope, but I'll have a look at the SUNRPC pieces.

Thanks for posting, I'm looking forward to seeing this
capability in the Linux client.


> Anyhow, feel free to test and give me feedback as to whether or not
> this helps performance on your system.
>
> Trond Myklebust (5):
> SUNRPC: Allow creation of RPC clients with multiple connections
> NFS: Add a mount option to specify number of TCP connections to use
> NFSv4: Allow multiple connections to NFSv4.x (x>0) servers
> pNFS: Allow multiple connections to the DS
> NFS: Display the "nconnect" mount option if it is set.
>
> fs/nfs/client.c | 2 ++
> fs/nfs/internal.h | 2 ++
> fs/nfs/nfs3client.c | 3 +++
> fs/nfs/nfs4client.c | 13 +++++++++++--
> fs/nfs/super.c | 12 ++++++++++++
> include/linux/nfs_fs_sb.h | 1 +
> include/linux/sunrpc/clnt.h | 1 +
> net/sunrpc/clnt.c | 17 ++++++++++++++++-
> net/sunrpc/xprtmultipath.c | 3 +--
> 9 files changed, 49 insertions(+), 5 deletions(-)
>
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2017-04-28 18:08:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] Fun with the multipathing code

T24gRnJpLCAyMDE3LTA0LTI4IGF0IDEwOjQ1IC0wNzAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g
PiBPbiBBcHIgMjgsIDIwMTcsIGF0IDEwOjI1IEFNLCBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kLm15
a2xlYnVzdEBwcmltDQo+ID4gYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiA+IA0KPiA+IEluIHRoZSBz
cGlyaXQgb2YgZXhwZXJpbWVudGF0aW9uLCBJJ3ZlIHB1dCB0b2dldGhlciBhIHNldCBvZg0KPiA+
IHBhdGNoZXMNCj4gPiB0aGF0IGltcGxlbWVudCBzZXR0aW5nIHVwIG11bHRpcGxlIFRDUCBjb25u
ZWN0aW9ucyB0byB0aGUgc2VydmVyLg0KPiA+IFRoZSBjb25uZWN0aW9ucyBhbGwgZ28gdG8gdGhl
IHNhbWUgc2VydmVyIElQIGFkZHJlc3MsIHNvIGRvIG5vdA0KPiA+IHByb3ZpZGUgc3VwcG9ydCBm
b3IgbXVsdGlwbGUgSVAgYWRkcmVzc2VzICh3aGljaCBJIGJlbGlldmUgaXMNCj4gPiBzb21ldGhp
bmcgQW5keSBBZGFtc29uIGlzIHdvcmtpbmcgb24pLg0KPiA+IA0KPiA+IFRoZSBmZWF0dXJlIGlz
IG9ubHkgZW5hYmxlZCBmb3IgTkZTdjQuMSBhbmQgTkZTdjQuMiBmb3Igbm93OyBJDQo+ID4gZG9u
J3QNCj4gPiBmZWVsIGNvbWZvcnRhYmxlIHN1YmplY3RpbmcgTkZTdjMvdjQgcmVwbGF5IGNhY2hl
cyB0byB0aGlzDQo+ID4gdHJlYXRtZW50IHlldC4gSXQgcmVsaWVzIG9uIHRoZSBtb3VudCBvcHRp
b24gIm5jb25uZWN0IiB0byBzcGVjaWZ5DQo+ID4gdGhlIG51bWJlciBvZiBjb25uZWN0aW9ucyB0
byBzdCB1cC4gU28geW91IGNhbiBkbyBzb21ldGhpbmcgbGlrZQ0KPiA+IMKgJ21vdW50IC10IG5m
cyAtb3ZlcnM9NC4xLG5jb25uZWN0PTggZm9vOi9iYXIgL21udCcNCj4gPiB0byBzZXQgdXAgOCBU
Q1AgY29ubmVjdGlvbnMgdG8gc2VydmVyICdmb28nLg0KPiANCj4gSU1PIHRoaXMgc2V0dGluZyBz
aG91bGQgZXZlbnR1YWxseSBiZSBzZXQgZHluYW1pY2FsbHkgYnkgdGhlDQo+IGNsaWVudCwgb3Ig
c2hvdWxkIGJlIGdsb2JhbCAoZWcuLCBhIG1vZHVsZSBwYXJhbWV0ZXIpLg0KDQpUaGVyZSBpcyBh
biBhcmd1bWVudCBmb3IgbWFraW5nIGl0IGEgcGVyLXNlcnZlciB2YWx1ZSAod2hpY2ggaXMgd2hh
dA0KdGhpcyBwYXRjaHNldCBkb2VzKS4gSXQgYWxsb3dzIHRoZSBhZG1pbiBhIGNlcnRhaW4gY29u
dHJvbCB0byBsaW1pdCB0aGUNCm51bWJlciBvZiBjb25uZWN0aW9ucyB0byBzcGVjaWZpYyBzZXJ2
ZXJzIHRoYXQgYXJlIG5lZWQgdG8gc2VydmUgbGFyZ2VyDQpudW1iZXJzIG9mIGNsaWVudHMuIEhv
d2V2ZXIgSSdtIG9wZW4gdG8gY291bnRlciBhcmd1bWVudHMuIEkndmUgbm8NCnN0cm9uZyBvcGlu
aW9ucyB5ZXQuDQoNCj4gU2luY2UgbW91bnQgcG9pbnRzIHRvIHRoZSBzYW1lIHNlcnZlciBzaGFy
ZSB0aGUgc2FtZSB0cmFuc3BvcnQsDQo+IHdoYXQgaGFwcGVucyBpZiB5b3Ugc3BlY2lmeSBhIGRp
ZmZlcmVudCAibmNvbm5lY3QiIHNldHRpbmcgb24NCj4gdHdvIG1vdW50IHBvaW50cyB0byB0aGUg
c2FtZSBzZXJ2ZXI/DQoNCkN1cnJlbnRseSwgdGhlIGZpcnN0IG9uZSB3aW5zLg0KDQo+IFdoYXQg
d2lsbCB0aGUgY2xpZW50IGRvIGlmIHRoZXJlIGFyZSBub3QgZW5vdWdoIHJlc291cmNlcw0KPiAo
ZWcgc291cmNlIHBvcnRzKSB0byBjcmVhdGUgdGhhdCBtYW55PyBPciBpcyB0aGlzIGFuICJ1cCB0
byBOIg0KPiBraW5kIG9mIHNldHRpbmc/IEkgY2FuIGltYWdpbmUgYSBiaWcgY2xpZW50IGhhdmlu
ZyB0byByZWR1Y2UNCj4gdGhlIG51bWJlciBvZiBjb25uZWN0aW9ucyB0byBlYWNoIHNlcnZlciB0
byBoZWxwIGl0IHNjYWxlIGluDQo+IG51bWJlciBvZiBzZXJ2ZXIgY29ubmVjdGlvbnMuDQoNClRo
ZXJlIGlzIGFuIGFyYml0cmFyeSAoY29tcGlsZSB0aW1lKSBsaW1pdCBvZiAxNi4gVGhlIHVzZSBv
ZiB0aGUNClNPX1JFVVNFUE9SVCBzb2NrZXQgb3B0aW9uIGVuc3VyZXMgdGhhdCB3ZSBzaG91bGQg
YWxtb3N0IGFsd2F5cyBiZSBhYmxlDQp0byBzYXRpc2Z5IHRoYXQgbnVtYmVyIG9mIHNvdXJjZSBw
b3J0cywgc2luY2UgdGhleSBjYW4gYmUgc2hhcmVkIHdpdGgNCmNvbm5lY3Rpb25zIHRvIG90aGVy
IHNlcnZlcnMuDQoNCj4gT3RoZXIgc3RvcmFnZSBwcm90b2NvbHMgaGF2ZSBhIG1lY2hhbmlzbSBm
b3IgZGV0ZXJtaW5pbmcgaG93DQo+IHRyYW5zcG9ydCBjb25uZWN0aW9ucyBhcmUgcHJvdmlzaW9u
ZWQ6IE9uZSBjb25uZWN0aW9uIHBlcg0KPiBDUFUgY29yZSAob3Igb25lIENQVSBwZXIgTlVNQSBu
b2RlKSBvbiB0aGUgY2xpZW50LiBUaGlzIGdpdmVzDQo+IGEgY2xlYXIgd2F5IHRvIGRlY2lkZSB3
aGljaCBjb25uZWN0aW9uIHRvIHVzZSBmb3IgZWFjaCBSUEMsDQo+IGFuZCBndWFyYW50ZWVzIHRo
ZSByZXBseSB3aWxsIGFycml2ZSBhdCB0aGUgc2FtZSBjb21wdXRlDQo+IGRvbWFpbiB0aGF0IHNl
bnQgdGhlIGNhbGwuDQoNCkNhbiB3ZSBwZXJoYXBzIGxheSBvdXQgYSBjYXNlIGZvciB3aGljaCBt
ZWNoYW5pc21zIGFyZSB1c2VmdWwgYXMgZmFyIGFzDQpoYXJkd2FyZSBpcyBjb25jZXJuZWQ/IEkg
dW5kZXJzdGFuZCB0aGUgc29ja2V0IGNvZGUgaXMgYWxyZWFkeQ0KYWZmaW5pdGlzZWQgdG8gQ1BV
IGNhY2hlcywgc28gdGhhdCBvbmUncyBlYXN5LiBJJ20gbGVzcyBmYW1pbGlhciB3aXRoDQp0aGUg
dmFyaW91cyBmZWF0dXJlcyBvZiB0aGUgdW5kZXJseWluZyBvZmZsb2FkZWQgTklDcyBhbmQgaG93
IHRoZXkgdGVuZA0KdG8gcmVhY3Qgd2hlbiB5b3UgYWRkL3N1YnRyYWN0IFRDUCBjb25uZWN0aW9u
cy4NCg0KPiBBbmQgb2YgY291cnNlOiBSUEMtb3Zlci1SRE1BIHJlYWxseSBsb3ZlcyB0aGlzIGtp
bmQgb2YgZmVhdHVyZQ0KPiAobXVsdGlwbGUgY29ubmVjdGlvbnMgYmV0d2VlbiBzYW1lIElQIHR1
cGxlcykgdG8gc3ByZWFkIHRoZQ0KPiB3b3JrbG9hZCBvdmVyIG11bHRpcGxlIFFQcy4gVGhlcmUg
aXNuJ3QgYW55dGhpbmcgc3BlY2lhbCBuZWVkZWQNCj4gZm9yIFJETUEsIEkgaG9wZSwgYnV0IEkn
bGwgaGF2ZSBhIGxvb2sgYXQgdGhlIFNVTlJQQyBwaWVjZXMuDQoNCkkgaGF2ZW4ndCB5ZXQgZW5h
YmxlZCBpdCBmb3IgUlBDL1JETUEsIGJ1dCBJIGltYWdpbmUgeW91IGNhbiBoZWxwIG91dA0KaWYg
eW91IGZpbmQgaXQgdXNlZnVsIChhcyB5b3UgYXBwZWFyIHRvIGRvKS4NCg0KPiBUaGFua3MgZm9y
IHBvc3RpbmcsIEknbSBsb29raW5nIGZvcndhcmQgdG8gc2VlaW5nIHRoaXMNCj4gY2FwYWJpbGl0
eSBpbiB0aGUgTGludXggY2xpZW50Lg0KPiANCj4gDQo+ID4gQW55aG93LCBmZWVsIGZyZWUgdG8g
dGVzdCBhbmQgZ2l2ZSBtZSBmZWVkYmFjayBhcyB0byB3aGV0aGVyIG9yIG5vdA0KPiA+IHRoaXMg
aGVscHMgcGVyZm9ybWFuY2Ugb24geW91ciBzeXN0ZW0uDQo+ID4gDQo+ID4gVHJvbmQgTXlrbGVi
dXN0ICg1KToNCj4gPiDCoFNVTlJQQzogQWxsb3cgY3JlYXRpb24gb2YgUlBDIGNsaWVudHMgd2l0
aCBtdWx0aXBsZSBjb25uZWN0aW9ucw0KPiA+IMKgTkZTOiBBZGQgYSBtb3VudCBvcHRpb24gdG8g
c3BlY2lmeSBudW1iZXIgb2YgVENQIGNvbm5lY3Rpb25zIHRvDQo+ID4gdXNlDQo+ID4gwqBORlN2
NDogQWxsb3cgbXVsdGlwbGUgY29ubmVjdGlvbnMgdG8gTkZTdjQueCAoeD4wKSBzZXJ2ZXJzDQo+
ID4gwqBwTkZTOiBBbGxvdyBtdWx0aXBsZSBjb25uZWN0aW9ucyB0byB0aGUgRFMNCj4gPiDCoE5G
UzogRGlzcGxheSB0aGUgIm5jb25uZWN0IiBtb3VudCBvcHRpb24gaWYgaXQgaXMgc2V0Lg0KPiA+
IA0KPiA+IGZzL25mcy9jbGllbnQuY8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgfMKgwqAyICsr
DQo+ID4gZnMvbmZzL2ludGVybmFsLmjCoMKgwqDCoMKgwqDCoMKgwqDCoMKgfMKgwqAyICsrDQo+
ID4gZnMvbmZzL25mczNjbGllbnQuY8KgwqDCoMKgwqDCoMKgwqDCoHzCoMKgMyArKysNCj4gPiBm
cy9uZnMvbmZzNGNsaWVudC5jwqDCoMKgwqDCoMKgwqDCoMKgfCAxMyArKysrKysrKysrKy0tDQo+
ID4gZnMvbmZzL3N1cGVyLmPCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgfCAxMiArKysrKysr
KysrKysNCj4gPiBpbmNsdWRlL2xpbnV4L25mc19mc19zYi5owqDCoMKgfMKgwqAxICsNCj4gPiBp
bmNsdWRlL2xpbnV4L3N1bnJwYy9jbG50LmggfMKgwqAxICsNCj4gPiBuZXQvc3VucnBjL2NsbnQu
Y8KgwqDCoMKgwqDCoMKgwqDCoMKgwqB8IDE3ICsrKysrKysrKysrKysrKystDQo+ID4gbmV0L3N1
bnJwYy94cHJ0bXVsdGlwYXRoLmPCoMKgfMKgwqAzICstLQ0KPiA+IDkgZmlsZXMgY2hhbmdlZCwg
NDkgaW5zZXJ0aW9ucygrKSwgNSBkZWxldGlvbnMoLSkNCj4gPiANCj4gPiAtLcKgDQo+ID4gMi45
LjMNCj4gPiANCj4gPiAtLQ0KPiA+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5k
IHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC0NCj4gPiBuZnMiIGluDQo+ID4gdGhlIGJvZHkg
b2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4gPiBNb3JlIG1ham9y
ZG9tbyBpbmZvIGF0wqDCoGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRt
bA0KPiANCj4gLS0NCj4gQ2h1Y2sgTGV2ZXINCj4gDQo+IA0KPiANCi0tIA0KVHJvbmQgTXlrbGVi
dXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWts
ZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo=


2017-04-29 17:53:51

by Chuck Lever III

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] Fun with the multipathing code


> On Apr 28, 2017, at 2:08 PM, Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2017-04-28 at 10:45 -0700, Chuck Lever wrote:
>>> On Apr 28, 2017, at 10:25 AM, Trond Myklebust <trond.myklebust@prim
>>> arydata.com> wrote:
>>>
>>> In the spirit of experimentation, I've put together a set of
>>> patches
>>> that implement setting up multiple TCP connections to the server.
>>> The connections all go to the same server IP address, so do not
>>> provide support for multiple IP addresses (which I believe is
>>> something Andy Adamson is working on).
>>>
>>> The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I
>>> don't
>>> feel comfortable subjecting NFSv3/v4 replay caches to this
>>> treatment yet. It relies on the mount option "nconnect" to specify
>>> the number of connections to st up. So you can do something like
>>> 聽'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt'
>>> to set up 8 TCP connections to server 'foo'.
>>
>> IMO this setting should eventually be set dynamically by the
>> client, or should be global (eg., a module parameter).
>
> There is an argument for making it a per-server value (which is what
> this patchset does). It allows the admin a certain control to limit the
> number of connections to specific servers that are need to serve larger
> numbers of clients. However I'm open to counter arguments. I've no
> strong opinions yet.

Like direct I/O, this kind of setting could allow a single
client to DoS a server.

One additional concern might be how to deal with servers who
have no more ability to accept connections during certain
periods, but are able to support a lot of connections at
other times.


>> Since mount points to the same server share the same transport,
>> what happens if you specify a different "nconnect" setting on
>> two mount points to the same server?
>
> Currently, the first one wins.
>
>> What will the client do if there are not enough resources
>> (eg source ports) to create that many? Or is this an "up to N"
>> kind of setting? I can imagine a big client having to reduce
>> the number of connections to each server to help it scale in
>> number of server connections.
>
> There is an arbitrary (compile time) limit of 16. The use of the
> SO_REUSEPORT socket option ensures that we should almost always be able
> to satisfy that number of source ports, since they can be shared with
> connections to other servers.

FWIW, Solaris limits this setting to 8. I think past that
value, there is only incremental and diminishing gain.
That could be apples to pears, though.

I'm not aware of a mount option, but there might be a
system tunable that controls this setting on each client.


>> Other storage protocols have a mechanism for determining how
>> transport connections are provisioned: One connection per
>> CPU core (or one CPU per NUMA node) on the client. This gives
>> a clear way to decide which connection to use for each RPC,
>> and guarantees the reply will arrive at the same compute
>> domain that sent the call.
>
> Can we perhaps lay out a case for which mechanisms are useful as far as
> hardware is concerned? I understand the socket code is already
> affinitised to CPU caches, so that one's easy. I'm less familiar with
> the various features of the underlying offloaded NICs and how they tend
> to react when you add/subtract TCP connections.

Well, the optimal number of connections varies depending on
the NIC hardware design. I don't think there's a hard-and-fast
rule, but typically the server-class NICs have multiple DMA
engines and multiple cores. Thus they benefit from having
multiple sockets, up to a point.

Smaller clients would have a handful of cores, a single
memory hierarchy, and one NIC. I would guess optimizing for
the NIC (or server) would be best in that case. I'd bet
two connections would be a very good default.

For large clients, a connection per NUMA node makes sense.
This keeps the amount of cross-node memory traffic to a
minimum, which improves system scalability.

The issues with "socket per CPU core" are: there can be a lot
of cores, and it might be wasteful to open that many sockets
to each NFS server; and what do you do with a socket when
a CPU core is taken offline?


>> And of course: RPC-over-RDMA really loves this kind of feature
>> (multiple connections between same IP tuples) to spread the
>> workload over multiple QPs. There isn't anything special needed
>> for RDMA, I hope, but I'll have a look at the SUNRPC pieces.
>
> I haven't yet enabled it for RPC/RDMA, but I imagine you can help out
> if you find it useful (as you appear to do).

I can give the patch set a try this week. I haven't seen any
thing that would exclude proto=rdma from playing in this
sandbox.


>> Thanks for posting, I'm looking forward to seeing this
>> capability in the Linux client.
>>
>>
>>> Anyhow, feel free to test and give me feedback as to whether or not
>>> this helps performance on your system.
>>>
>>> Trond Myklebust (5):
>>> 聽SUNRPC: Allow creation of RPC clients with multiple connections
>>> 聽NFS: Add a mount option to specify number of TCP connections to
>>> use
>>> 聽NFSv4: Allow multiple connections to NFSv4.x (x>0) servers
>>> 聽pNFS: Allow multiple connections to the DS
>>> 聽NFS: Display the "nconnect" mount option if it is set.
>>>
>>> fs/nfs/client.c聽聽聽聽聽聽聽聽聽聽聽聽聽|聽聽2 ++
>>> fs/nfs/internal.h聽聽聽聽聽聽聽聽聽聽聽|聽聽2 ++
>>> fs/nfs/nfs3client.c聽聽聽聽聽聽聽聽聽|聽聽3 +++
>>> fs/nfs/nfs4client.c聽聽聽聽聽聽聽聽聽| 13 +++++++++++--
>>> fs/nfs/super.c聽聽聽聽聽聽聽聽聽聽聽聽聽聽| 12 ++++++++++++
>>> include/linux/nfs_fs_sb.h聽聽聽|聽聽1 +
>>> include/linux/sunrpc/clnt.h |聽聽1 +
>>> net/sunrpc/clnt.c聽聽聽聽聽聽聽聽聽聽聽| 17 ++++++++++++++++-
>>> net/sunrpc/xprtmultipath.c聽聽|聽聽3 +--
>>> 9 files changed, 49 insertions(+), 5 deletions(-)
>>>
>>> --聽
>>> 2.9.3
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-
>>> nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at聽聽http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever
>>
>>
>>
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
> �N嫥叉靣笡y氊b瞂千v豝�)藓{.n�+壏{睗�"炟^n噐■��侂h櫒璀�&Ⅷ�瓽珴閔��(殠娸"濟���m��飦赇z罐枈帼f"穐殘坢

--
Chuck Lever




2017-05-04 13:45:22

by Chuck Lever III

[permalink] [raw]
Subject: Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use

Hi Trond-


> On Apr 28, 2017, at 1:25 PM, Trond Myklebust <[email protected]> wrote:
>
> Allow the user to specify that the client should use multiple connections
> to the server. For the moment, this functionality will be limited to
> TCP and to NFSv4.x (x>0).

Some initial reactions:

- 5/5 could be squashed into this patch (2/5).

- 4/5 adds support for using NFSv3 with a DS. Why can't you add NFSv3
support for multiple connections in the non-pNFS case? If there is a
good reason, it should be stated in 4/5's patch description or added
as a comment somewhere in the source code.

- Testing with a Linux server shows that the basic NFS/RDMA pieces
work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
nconnect > 1. I'm looking into it.

- Testing with a Solaris 12 server prototype that supports NFSv4.1
works fine with nconnect=[23]. Not seeing much performance improvement
there because the server is using QDR and a single SATA SSD.

Thus I don't see a strong need to keep the TCP-only limitation. However,
if you do keep it, the logic that implements the second sentence in the
patch description above is added in 3/5. Should this sentence be in that
patch description instead? Or, instead:

s/For the moment/In a subsequent patch


> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/nfs/internal.h | 1 +
> fs/nfs/super.c | 10 ++++++++++
> 2 files changed, 11 insertions(+)
>
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 31b26cf1b476..31757a742e9b 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data {
> char *export_path;
> int port;
> unsigned short protocol;
> + unsigned short nconnect;
> } nfs_server;
>
> struct security_mnt_opts lsm_opts;
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 54e0f9f2dd94..7eb48934dc79 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -76,6 +76,8 @@
> #define NFS_DEFAULT_VERSION 2
> #endif
>
> +#define NFS_MAX_CONNECTIONS 16
> +
> enum {
> /* Mount options that take no arguments */
> Opt_soft, Opt_hard,
> @@ -107,6 +109,7 @@ enum {
> Opt_nfsvers,
> Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost,
> Opt_addr, Opt_mountaddr, Opt_clientaddr,
> + Opt_nconnect,
> Opt_lookupcache,
> Opt_fscache_uniq,
> Opt_local_lock,
> @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = {
> { Opt_mounthost, "mounthost=%s" },
> { Opt_mountaddr, "mountaddr=%s" },
>
> + { Opt_nconnect, "nconnect=%s" },
> +
> { Opt_lookupcache, "lookupcache=%s" },
> { Opt_fscache_uniq, "fsc=%s" },
> { Opt_local_lock, "local_lock=%s" },
> @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw,
> if (mnt->mount_server.addrlen == 0)
> goto out_invalid_address;
> break;
> + case Opt_nconnect:
> + if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS))
> + goto out_invalid_value;
> + mnt->nfs_server.nconnect = option;
> + break;
> case Opt_lookupcache:
> string = match_strdup(args);
> if (string == NULL)
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2017-05-04 13:53:34

by Chuck Lever III

[permalink] [raw]
Subject: Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use


> On May 4, 2017, at 9:45 AM, Chuck Lever <[email protected]> wrote:
>
> Hi Trond-
>
>
>> On Apr 28, 2017, at 1:25 PM, Trond Myklebust <[email protected]> wrote:
>>
>> Allow the user to specify that the client should use multiple connections
>> to the server. For the moment, this functionality will be limited to
>> TCP and to NFSv4.x (x>0).
>
> Some initial reactions:
>
> - 5/5 could be squashed into this patch (2/5).
>
> - 4/5 adds support for using NFSv3 with a DS. Why can't you add NFSv3
> support for multiple connections in the non-pNFS case? If there is a
> good reason, it should be stated in 4/5's patch description or added
> as a comment somewhere in the source code.
>
> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> nconnect > 1. I'm looking into it.
>
> - Testing with a Solaris 12 server prototype that supports NFSv4.1
> works fine with nconnect=[23]. Not seeing much performance improvement
> there because the server is using QDR and a single SATA SSD.
>
> Thus I don't see a strong need to keep the TCP-only limitation. However,
> if you do keep it, the logic that implements the second sentence in the
> patch description above is added in 3/5. Should this sentence be in that
> patch description instead? Or, instead:
>
> s/For the moment/In a subsequent patch

Oops, I forgot to mention: mountstats data looks a little confused
when nconnect > 1. For example:

WRITE:
3075342 ops (131%)
avg bytes sent per op: 26829 avg bytes received per op: 113
backlog wait: 162.375128 RTT: 1.481101 total execute time: 163.861735 (milliseconds)

Haven't looked closely at that 131%, but it could be either the kernel
or the script itself is assuming one connection per mount.


>> Signed-off-by: Trond Myklebust <[email protected]>
>> ---
>> fs/nfs/internal.h | 1 +
>> fs/nfs/super.c | 10 ++++++++++
>> 2 files changed, 11 insertions(+)
>>
>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>> index 31b26cf1b476..31757a742e9b 100644
>> --- a/fs/nfs/internal.h
>> +++ b/fs/nfs/internal.h
>> @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data {
>> char *export_path;
>> int port;
>> unsigned short protocol;
>> + unsigned short nconnect;
>> } nfs_server;
>>
>> struct security_mnt_opts lsm_opts;
>> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
>> index 54e0f9f2dd94..7eb48934dc79 100644
>> --- a/fs/nfs/super.c
>> +++ b/fs/nfs/super.c
>> @@ -76,6 +76,8 @@
>> #define NFS_DEFAULT_VERSION 2
>> #endif
>>
>> +#define NFS_MAX_CONNECTIONS 16
>> +
>> enum {
>> /* Mount options that take no arguments */
>> Opt_soft, Opt_hard,
>> @@ -107,6 +109,7 @@ enum {
>> Opt_nfsvers,
>> Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost,
>> Opt_addr, Opt_mountaddr, Opt_clientaddr,
>> + Opt_nconnect,
>> Opt_lookupcache,
>> Opt_fscache_uniq,
>> Opt_local_lock,
>> @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = {
>> { Opt_mounthost, "mounthost=%s" },
>> { Opt_mountaddr, "mountaddr=%s" },
>>
>> + { Opt_nconnect, "nconnect=%s" },
>> +
>> { Opt_lookupcache, "lookupcache=%s" },
>> { Opt_fscache_uniq, "fsc=%s" },
>> { Opt_local_lock, "local_lock=%s" },
>> @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw,
>> if (mnt->mount_server.addrlen == 0)
>> goto out_invalid_address;
>> break;
>> + case Opt_nconnect:
>> + if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS))
>> + goto out_invalid_value;
>> + mnt->nfs_server.nconnect = option;
>> + break;
>> case Opt_lookupcache:
>> string = match_strdup(args);
>> if (string == NULL)
>> --
>> 2.9.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2017-05-04 16:01:36

by Chuck Lever III

[permalink] [raw]
Subject: Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use


> On May 4, 2017, at 9:45 AM, Chuck Lever <[email protected]> wrote:
>
> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> nconnect > 1. I'm looking into it.

Reproduced with NFSv4.1, TCP, and nconnect=2.

363 /*
364 * RFC5661 18.51.3
365 * Before RECLAIM_COMPLETE done, server should deny new lock
366 */
367 if (nfsd4_has_session(cstate) &&
368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
369 &cstate->session->se_client->cl_flags) &&
370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
371 return nfserr_grace;

Server-side instrumentation confirms:

May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0

Network capture shows the RPCs are interleaved between the two
connections as the client establishes its lease, and that appears
to be confusing the server.

C1: NULL -> NFS4_OK
C1: EXCHANGE_ID -> NFS4_OK
C2: CREATE_SESSION -> NFS4_OK
C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
C2: SEQUENCE -> NFS4_OK
C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
C1: BIND_CONN_TO_SESSION -> NFS4_OK
C2: BIND_CONN_TO_SESSION -> NFS4_OK
C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED

.... mix of GETATTRs and other simple requests ....

C1: OPEN -> NFS4ERR_GRACE
C2: OPEN -> NFS4ERR_GRACE

The RECLAIM_COMPLETE operation failed, and the client does not
retry it. That leaves its lease stuck in GRACE.


--
Chuck Lever




2017-05-04 17:36:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use

On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
>
> > On May 4, 2017, at 9:45 AM, Chuck Lever <[email protected]> wrote:
> >
> > - Testing with a Linux server shows that the basic NFS/RDMA pieces
> > work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> > nconnect > 1. I'm looking into it.
>
> Reproduced with NFSv4.1, TCP, and nconnect=2.
>
> 363 /*
> 364 * RFC5661 18.51.3
> 365 * Before RECLAIM_COMPLETE done, server should deny new lock
> 366 */
> 367 if (nfsd4_has_session(cstate) &&
> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
> 369 &cstate->session->se_client->cl_flags) &&
> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
> 371 return nfserr_grace;
>
> Server-side instrumentation confirms:
>
> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
>
> Network capture shows the RPCs are interleaved between the two
> connections as the client establishes its lease, and that appears
> to be confusing the server.
>
> C1: NULL -> NFS4_OK
> C1: EXCHANGE_ID -> NFS4_OK
> C2: CREATE_SESSION -> NFS4_OK
> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION

What security flavors are involved? I believe the correct behavior
depends on whether gss is in use or not.

--b.

> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
> C2: SEQUENCE -> NFS4_OK
> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> C1: BIND_CONN_TO_SESSION -> NFS4_OK
> C2: BIND_CONN_TO_SESSION -> NFS4_OK
> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>
> .... mix of GETATTRs and other simple requests ....
>
> C1: OPEN -> NFS4ERR_GRACE
> C2: OPEN -> NFS4ERR_GRACE
>
> The RECLAIM_COMPLETE operation failed, and the client does not
> retry it. That leaves its lease stuck in GRACE.
>
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2017-05-04 17:38:42

by Chuck Lever III

[permalink] [raw]
Subject: Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use


> On May 4, 2017, at 1:36 PM, [email protected] wrote:
>
> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
>>
>>> On May 4, 2017, at 9:45 AM, Chuck Lever <[email protected]> wrote:
>>>
>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
>>> nconnect > 1. I'm looking into it.
>>
>> Reproduced with NFSv4.1, TCP, and nconnect=2.
>>
>> 363 /*
>> 364 * RFC5661 18.51.3
>> 365 * Before RECLAIM_COMPLETE done, server should deny new lock
>> 366 */
>> 367 if (nfsd4_has_session(cstate) &&
>> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
>> 369 &cstate->session->se_client->cl_flags) &&
>> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
>> 371 return nfserr_grace;
>>
>> Server-side instrumentation confirms:
>>
>> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
>> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
>> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
>>
>> Network capture shows the RPCs are interleaved between the two
>> connections as the client establishes its lease, and that appears
>> to be confusing the server.
>>
>> C1: NULL -> NFS4_OK
>> C1: EXCHANGE_ID -> NFS4_OK
>> C2: CREATE_SESSION -> NFS4_OK
>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
>
> What security flavors are involved? I believe the correct behavior
> depends on whether gss is in use or not.

The mount options are "sec=sys" but both sides have a keytab.
So the lease management operations are done with krb5i.


> --b.
>
>> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>> C2: SEQUENCE -> NFS4_OK
>> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
>> C1: BIND_CONN_TO_SESSION -> NFS4_OK
>> C2: BIND_CONN_TO_SESSION -> NFS4_OK
>> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>>
>> .... mix of GETATTRs and other simple requests ....
>>
>> C1: OPEN -> NFS4ERR_GRACE
>> C2: OPEN -> NFS4ERR_GRACE
>>
>> The RECLAIM_COMPLETE operation failed, and the client does not
>> retry it. That leaves its lease stuck in GRACE.
>>
>>
>> --
>> Chuck Lever
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2017-05-04 17:45:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use

On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote:
>
> > On May 4, 2017, at 1:36 PM, [email protected] wrote:
> >
> > On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
> >>
> >>> On May 4, 2017, at 9:45 AM, Chuck Lever <[email protected]> wrote:
> >>>
> >>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> >>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> >>> nconnect > 1. I'm looking into it.
> >>
> >> Reproduced with NFSv4.1, TCP, and nconnect=2.
> >>
> >> 363 /*
> >> 364 * RFC5661 18.51.3
> >> 365 * Before RECLAIM_COMPLETE done, server should deny new lock
> >> 366 */
> >> 367 if (nfsd4_has_session(cstate) &&
> >> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
> >> 369 &cstate->session->se_client->cl_flags) &&
> >> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
> >> 371 return nfserr_grace;
> >>
> >> Server-side instrumentation confirms:
> >>
> >> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
> >> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
> >> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
> >>
> >> Network capture shows the RPCs are interleaved between the two
> >> connections as the client establishes its lease, and that appears
> >> to be confusing the server.
> >>
> >> C1: NULL -> NFS4_OK
> >> C1: EXCHANGE_ID -> NFS4_OK
> >> C2: CREATE_SESSION -> NFS4_OK
> >> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> >
> > What security flavors are involved? I believe the correct behavior
> > depends on whether gss is in use or not.
>
> The mount options are "sec=sys" but both sides have a keytab.
> So the lease management operations are done with krb5i.

OK. I'm pretty sure the client needs to send BIND_CONN_TO_SESSION
before step C1.

My memory is that over auth_sys you're allowed to treat any SEQUENCE
over a new connection as implicitly binding that connection to the
referenced session, but over krb5 the server's required to return that
NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION.

I think CREATE_SESSION is allowed as long as the principals agree, and
that's why the call at C2 succeeds. Seems a little weird, though.

--b.

>
>
> > --b.
> >
> >> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
> >> C2: SEQUENCE -> NFS4_OK
> >> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> >> C1: BIND_CONN_TO_SESSION -> NFS4_OK
> >> C2: BIND_CONN_TO_SESSION -> NFS4_OK
> >> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
> >>
> >> .... mix of GETATTRs and other simple requests ....
> >>
> >> C1: OPEN -> NFS4ERR_GRACE
> >> C2: OPEN -> NFS4ERR_GRACE
> >>
> >> The RECLAIM_COMPLETE operation failed, and the client does not
> >> retry it. That leaves its lease stuck in GRACE.
> >>
> >>
> >> --
> >> Chuck Lever
> >>
> >>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2017-05-04 18:55:14

by Chuck Lever III

[permalink] [raw]
Subject: Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use


> On May 4, 2017, at 1:45 PM, J. Bruce Fields <[email protected]> wrote:
>
> On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote:
>>
>>> On May 4, 2017, at 1:36 PM, [email protected] wrote:
>>>
>>> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
>>>>
>>>>> On May 4, 2017, at 9:45 AM, Chuck Lever <[email protected]> wrote:
>>>>>
>>>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
>>>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
>>>>> nconnect > 1. I'm looking into it.
>>>>
>>>> Reproduced with NFSv4.1, TCP, and nconnect=2.
>>>>
>>>> 363 /*
>>>> 364 * RFC5661 18.51.3
>>>> 365 * Before RECLAIM_COMPLETE done, server should deny new lock
>>>> 366 */
>>>> 367 if (nfsd4_has_session(cstate) &&
>>>> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
>>>> 369 &cstate->session->se_client->cl_flags) &&
>>>> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
>>>> 371 return nfserr_grace;
>>>>
>>>> Server-side instrumentation confirms:
>>>>
>>>> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
>>>> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
>>>> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
>>>>
>>>> Network capture shows the RPCs are interleaved between the two
>>>> connections as the client establishes its lease, and that appears
>>>> to be confusing the server.
>>>>
>>>> C1: NULL -> NFS4_OK
>>>> C1: EXCHANGE_ID -> NFS4_OK
>>>> C2: CREATE_SESSION -> NFS4_OK
>>>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
>>>
>>> What security flavors are involved? I believe the correct behavior
>>> depends on whether gss is in use or not.
>>
>> The mount options are "sec=sys" but both sides have a keytab.
>> So the lease management operations are done with krb5i.
>
> OK. I'm pretty sure the client needs to send BIND_CONN_TO_SESSION
> before step C1.
>
> My memory is that over auth_sys you're allowed to treat any SEQUENCE
> over a new connection as implicitly binding that connection to the
> referenced session, but over krb5 the server's required to return that
> NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION.

Ah, that would explain why nconnect=[234] is working against my
Solaris 12 server: no keytab on that server means lease management
is done using plain-old AUTH_SYS.

Multiple connections are now handled entirely by the RPC layer,
and are opened and used at rpc_clnt creation time. The NFS client
is not aware (except for allowing more than one connection to be
used) and relies on its own recovery mechanisms to deal with
exceptions that might arise. IOW it doesn't seem to know that an
extra BC2S is needed, nor does it know where in the RPC stream
to insert that operation.

Seems to me a good approach would be to handle server trunking
discovery and lease establishment using a single connection, and
then open more connections. A conservative approach might actually
hold off on opening additional connections until there are enough
RPC transactions being initiated in parallel to warrant it. Or, if
@nconnect > 1, use a single connection to perform lease management,
and open @nconnect additional connections that handle only per-
mount I/O activity.


> I think CREATE_SESSION is allowed as long as the principals agree, and
> that's why the call at C2 succeeds. Seems a little weird, though.

Well, there's no SEQUENCE operation in that COMPOUND. No session
or connection to use there, I think the principal and client ID
are the only way to recognize the target of the operation?


> --b.
>
>>
>>
>>> --b.
>>>
>>>> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>>>> C2: SEQUENCE -> NFS4_OK
>>>> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
>>>> C1: BIND_CONN_TO_SESSION -> NFS4_OK
>>>> C2: BIND_CONN_TO_SESSION -> NFS4_OK
>>>> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>>>>
>>>> .... mix of GETATTRs and other simple requests ....
>>>>
>>>> C1: OPEN -> NFS4ERR_GRACE
>>>> C2: OPEN -> NFS4ERR_GRACE
>>>>
>>>> The RECLAIM_COMPLETE operation failed, and the client does not
>>>> retry it. That leaves its lease stuck in GRACE.
>>>>
>>>>
>>>> --
>>>> Chuck Lever
>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2017-05-04 19:09:26

by Anna Schumaker

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] Fun with the multipathing code

Hi Trond,

I'm testing these on two VMs with a single core each, so probably not the use case you had in mind for these patches. I ran my script that runs connectathon tests on every NFS version, and I'm seeing it consistently takes about a minute longer with "nconnect=2" than it does without the option.

Thanks for working on this!
Anna

On 04/28/2017 01:25 PM, Trond Myklebust wrote:
> In the spirit of experimentation, I've put together a set of patches
> that implement setting up multiple TCP connections to the server.
> The connections all go to the same server IP address, so do not
> provide support for multiple IP addresses (which I believe is
> something Andy Adamson is working on).
>
> The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't
> feel comfortable subjecting NFSv3/v4 replay caches to this
> treatment yet. It relies on the mount option "nconnect" to specify
> the number of connections to st up. So you can do something like
> 'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt'
> to set up 8 TCP connections to server 'foo'.
>
> Anyhow, feel free to test and give me feedback as to whether or not
> this helps performance on your system.
>
> Trond Myklebust (5):
> SUNRPC: Allow creation of RPC clients with multiple connections
> NFS: Add a mount option to specify number of TCP connections to use
> NFSv4: Allow multiple connections to NFSv4.x (x>0) servers
> pNFS: Allow multiple connections to the DS
> NFS: Display the "nconnect" mount option if it is set.
>
> fs/nfs/client.c | 2 ++
> fs/nfs/internal.h | 2 ++
> fs/nfs/nfs3client.c | 3 +++
> fs/nfs/nfs4client.c | 13 +++++++++++--
> fs/nfs/super.c | 12 ++++++++++++
> include/linux/nfs_fs_sb.h | 1 +
> include/linux/sunrpc/clnt.h | 1 +
> net/sunrpc/clnt.c | 17 ++++++++++++++++-
> net/sunrpc/xprtmultipath.c | 3 +--
> 9 files changed, 49 insertions(+), 5 deletions(-)
>

2017-05-04 19:58:21

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use

On Thu, May 04, 2017 at 02:55:06PM -0400, Chuck Lever wrote:
>
> > On May 4, 2017, at 1:45 PM, J. Bruce Fields <[email protected]> wrote:
> >
> > On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote:
> >>
> >>> On May 4, 2017, at 1:36 PM, [email protected] wrote:
> >>>
> >>> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
> >>>>
> >>>>> On May 4, 2017, at 9:45 AM, Chuck Lever <[email protected]> wrote:
> >>>>>
> >>>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> >>>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> >>>>> nconnect > 1. I'm looking into it.
> >>>>
> >>>> Reproduced with NFSv4.1, TCP, and nconnect=2.
> >>>>
> >>>> 363 /*
> >>>> 364 * RFC5661 18.51.3
> >>>> 365 * Before RECLAIM_COMPLETE done, server should deny new lock
> >>>> 366 */
> >>>> 367 if (nfsd4_has_session(cstate) &&
> >>>> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
> >>>> 369 &cstate->session->se_client->cl_flags) &&
> >>>> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
> >>>> 371 return nfserr_grace;
> >>>>
> >>>> Server-side instrumentation confirms:
> >>>>
> >>>> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
> >>>> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
> >>>> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
> >>>>
> >>>> Network capture shows the RPCs are interleaved between the two
> >>>> connections as the client establishes its lease, and that appears
> >>>> to be confusing the server.
> >>>>
> >>>> C1: NULL -> NFS4_OK
> >>>> C1: EXCHANGE_ID -> NFS4_OK
> >>>> C2: CREATE_SESSION -> NFS4_OK
> >>>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> >>>
> >>> What security flavors are involved? I believe the correct behavior
> >>> depends on whether gss is in use or not.
> >>
> >> The mount options are "sec=sys" but both sides have a keytab.
> >> So the lease management operations are done with krb5i.
> >
> > OK. I'm pretty sure the client needs to send BIND_CONN_TO_SESSION
> > before step C1.
> >
> > My memory is that over auth_sys you're allowed to treat any SEQUENCE
> > over a new connection as implicitly binding that connection to the
> > referenced session, but over krb5 the server's required to return that
> > NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION.
>
> Ah, that would explain why nconnect=[234] is working against my
> Solaris 12 server: no keytab on that server means lease management
> is done using plain-old AUTH_SYS.
>
> Multiple connections are now handled entirely by the RPC layer,
> and are opened and used at rpc_clnt creation time. The NFS client
> is not aware (except for allowing more than one connection to be
> used) and relies on its own recovery mechanisms to deal with
> exceptions that might arise. IOW it doesn't seem to know that an
> extra BC2S is needed, nor does it know where in the RPC stream
> to insert that operation.
>
> Seems to me a good approach would be to handle server trunking
> discovery and lease establishment using a single connection, and
> then open more connections. A conservative approach might actually
> hold off on opening additional connections until there are enough
> RPC transactions being initiated in parallel to warrant it. Or, if
> @nconnect > 1, use a single connection to perform lease management,
> and open @nconnect additional connections that handle only per-
> mount I/O activity.
>
>
> > I think CREATE_SESSION is allowed as long as the principals agree, and
> > that's why the call at C2 succeeds. Seems a little weird, though.
>
> Well, there's no SEQUENCE operation in that COMPOUND. No session
> or connection to use there, I think the principal and client ID
> are the only way to recognize the target of the operation?

I'm just not clear why the explicit BIND_CONN_TO_SESSION is required in
the gss case.

Actually, it's not gss exactly, it's the state protection level:

If, when the client ID was created, the client opted for
SP4_NONE state protection, the client is not required to use
BIND_CONN_TO_SESSION to associate the connection with the
session, unless the client wishes to associate the connection
with the backchannel. When SP4_NONE protection is used, simply
sending a COMPOUND request with a SEQUENCE operation is
sufficient to associate the connection with the session
specified in SEQUENCE.

Anyway.

--b.

2017-05-04 20:40:13

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use

T24gVGh1LCAyMDE3LTA1LTA0IGF0IDEzOjQ1IC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIFRodSwgTWF5IDA0LCAyMDE3IGF0IDAxOjM4OjM1UE0gLTA0MDAsIENodWNrIExldmVy
IHdyb3RlOg0KPiA+IA0KPiA+ID4gT24gTWF5IDQsIDIwMTcsIGF0IDE6MzYgUE0sIGJmaWVsZHNA
ZmllbGRzZXMub3JnIHdyb3RlOg0KPiA+ID4gDQo+ID4gPiBPbiBUaHUsIE1heSAwNCwgMjAxNyBh
dCAxMjowMToyOVBNIC0wNDAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4gPiA+ID4gDQo+ID4gPiA+
ID4gT24gTWF5IDQsIDIwMTcsIGF0IDk6NDUgQU0sIENodWNrIExldmVyIDxjaHVjay5sZXZlckBv
cmFjbGUuYw0KPiA+ID4gPiA+IG9tPiB3cm90ZToNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiAtIFRl
c3Rpbmcgd2l0aCBhIExpbnV4IHNlcnZlciBzaG93cyB0aGF0IHRoZSBiYXNpYyBORlMvUkRNQQ0K
PiA+ID4gPiA+IHBpZWNlcw0KPiA+ID4gPiA+IHdvcmssIGJ1dCBhbnkgT1BFTiBvcGVyYXRpb24g
Z2V0cyBORlM0RVJSX0dSQUNFLCBmb3JldmVyLA0KPiA+ID4gPiA+IHdoZW4gSSB1c2UNCj4gPiA+
ID4gPiBuY29ubmVjdCA+IDEuIEknbSBsb29raW5nIGludG8gaXQuDQo+ID4gPiA+IA0KPiA+ID4g
PiBSZXByb2R1Y2VkIHdpdGggTkZTdjQuMSwgVENQLCBhbmQgbmNvbm5lY3Q9Mi4NCj4gPiA+ID4g
DQo+ID4gPiA+IDM2M8KgwqDCoMKgwqDCoMKgwqDCoC8qDQo+ID4gPiA+IDM2NMKgwqDCoMKgwqDC
oMKgwqDCoMKgKiBSRkM1NjYxIDE4LjUxLjMNCj4gPiA+ID4gMzY1wqDCoMKgwqDCoMKgwqDCoMKg
wqAqIEJlZm9yZSBSRUNMQUlNX0NPTVBMRVRFIGRvbmUsIHNlcnZlciBzaG91bGQgZGVueQ0KPiA+
ID4gPiBuZXcgbG9jaw0KPiA+ID4gPiAzNjbCoMKgwqDCoMKgwqDCoMKgwqDCoCovDQo+ID4gPiA+
IDM2N8KgwqDCoMKgwqDCoMKgwqDCoGlmIChuZnNkNF9oYXNfc2Vzc2lvbihjc3RhdGUpICYmDQo+
ID4gPiA+IDM2OMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIXRlc3RfYml0KE5GU0Q0X0NMSUVO
VF9SRUNMQUlNX0NPTVBMRVRFLA0KPiA+ID4gPiAzNjnCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
oMKgwqDCoMKgwqDCoMKgwqDCoMKgJmNzdGF0ZS0+c2Vzc2lvbi0+c2VfY2xpZW50LQ0KPiA+ID4g
PiA+Y2xfZmxhZ3MpICYmDQo+ID4gPiA+IDM3MMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgb3Bl
bi0+b3BfY2xhaW1fdHlwZSAhPQ0KPiA+ID4gPiBORlM0X09QRU5fQ0xBSU1fUFJFVklPVVMpDQo+
ID4gPiA+IDM3McKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gbmZzZXJy
X2dyYWNlOw0KPiA+ID4gPiANCj4gPiA+ID4gU2VydmVyLXNpZGUgaW5zdHJ1bWVudGF0aW9uIGNv
bmZpcm1zOg0KPiA+ID4gPiANCj4gPiA+ID4gTWF5wqDCoDQgMTE6Mjg6Mjkga2xpbXQga2VybmVs
OiBuZnNkNF9vcGVuOiBoYXNfc2Vzc2lvbiByZXR1cm5zDQo+ID4gPiA+IHRydWUNCj4gPiA+ID4g
TWF5wqDCoDQgMTE6Mjg6Mjkga2xpbXQga2VybmVsOiBuZnNkNF9vcGVuOiBSRUNMQUlNX0NPTVBM
RVRFIGlzDQo+ID4gPiA+IGZhbHNlDQo+ID4gPiA+IE1hecKgwqA0IDExOjI4OjI5IGtsaW10IGtl
cm5lbDogbmZzZDRfb3BlbjogY2xhaW1fdHlwZSBpcyAwDQo+ID4gPiA+IA0KPiA+ID4gPiBOZXR3
b3JrIGNhcHR1cmUgc2hvd3MgdGhlIFJQQ3MgYXJlIGludGVybGVhdmVkIGJldHdlZW4gdGhlIHR3
bw0KPiA+ID4gPiBjb25uZWN0aW9ucyBhcyB0aGUgY2xpZW50IGVzdGFibGlzaGVzIGl0cyBsZWFz
ZSwgYW5kIHRoYXQNCj4gPiA+ID4gYXBwZWFycw0KPiA+ID4gPiB0byBiZSBjb25mdXNpbmcgdGhl
IHNlcnZlci4NCj4gPiA+ID4gDQo+ID4gPiA+IEMxOiBOVUxMIC0+IE5GUzRfT0sNCj4gPiA+ID4g
QzE6IEVYQ0hBTkdFX0lEIC0+IE5GUzRfT0sNCj4gPiA+ID4gQzI6IENSRUFURV9TRVNTSU9OIC0+
IE5GUzRfT0sNCj4gPiA+ID4gQzE6IFJFQ0xBSU1fQ09NUExFVEUgLT4gTkZTNEVSUl9DT05OX05P
VF9CT1VORF9UT19TRVNTSU9ODQo+ID4gPiANCj4gPiA+IFdoYXQgc2VjdXJpdHkgZmxhdm9ycyBh
cmUgaW52b2x2ZWQ/wqDCoEkgYmVsaWV2ZSB0aGUgY29ycmVjdA0KPiA+ID4gYmVoYXZpb3INCj4g
PiA+IGRlcGVuZHMgb24gd2hldGhlciBnc3MgaXMgaW4gdXNlIG9yIG5vdC4NCj4gPiANCj4gPiBU
aGUgbW91bnQgb3B0aW9ucyBhcmUgInNlYz1zeXMiIGJ1dCBib3RoIHNpZGVzIGhhdmUgYSBrZXl0
YWIuDQo+ID4gU28gdGhlIGxlYXNlIG1hbmFnZW1lbnQgb3BlcmF0aW9ucyBhcmUgZG9uZSB3aXRo
IGtyYjVpLg0KPiANCj4gT0suwqDCoEknbSBwcmV0dHkgc3VyZSB0aGUgY2xpZW50IG5lZWRzIHRv
IHNlbmQgQklORF9DT05OX1RPX1NFU1NJT04NCj4gYmVmb3JlIHN0ZXAgQzEuDQo+IA0KPiBNeSBt
ZW1vcnkgaXMgdGhhdCBvdmVyIGF1dGhfc3lzIHlvdSdyZSBhbGxvd2VkIHRvIHRyZWF0IGFueSBT
RVFVRU5DRQ0KPiBvdmVyIGEgbmV3IGNvbm5lY3Rpb24gYXMgaW1wbGljaXRseSBiaW5kaW5nIHRo
YXQgY29ubmVjdGlvbiB0byB0aGUNCj4gcmVmZXJlbmNlZCBzZXNzaW9uLCBidXQgb3ZlciBrcmI1
IHRoZSBzZXJ2ZXIncyByZXF1aXJlZCB0byByZXR1cm4NCj4gdGhhdA0KPiBOT1RfQk9VTkQgZXJy
b3IgaWYgdGhlIHNlcnZlciBza2lwcyB0aGUgQklORF9DT05OX1RPX1NFU1NJT04uDQo+IA0KPiBJ
IHRoaW5rIENSRUFURV9TRVNTSU9OIGlzIGFsbG93ZWQgYXMgbG9uZyBhcyB0aGUgcHJpbmNpcGFs
cyBhZ3JlZSwNCj4gYW5kDQo+IHRoYXQncyB3aHkgdGhlIGNhbGwgYXQgQzIgc3VjY2VlZHMuwqDC
oFNlZW1zIGEgbGl0dGxlIHdlaXJkLCB0aG91Z2guDQo+IA0KDQpTZWUgaHR0cHM6Ly90b29scy5p
ZXRmLm9yZy9odG1sL3JmYzU2NjEjc2VjdGlvbi0yLjEwLjMuMQ0KDQpTbywgd2UgcHJvYmFibHkg
c2hvdWxkIHNlbmQgdGhlIEJJTkRfQ09OTl9UT19TRVNTSU9OIGFmdGVyIGNyZWF0aW5nIHRoZQ0K
c2Vzc2lvbiwgYnV0IHNpbmNlIHRoYXQgaW52b2x2ZXMgZmlndXJpbmcgb3V0IHdoZXRoZXIgb3Ig
bm90IHN0YXRlDQpwcm90ZWN0aW9uIHdhcyBzdWNjZXNzZnVsbHkgbmVnb3RpYXRlZCwgYW5kIHNp
bmNlIHdlIGhhdmUgdG8gc3VwcG9ydA0KaGFuZGxpbmcgTkZTNEVSUl9DT05OX05PVF9CT1VORF9U
T19TRVNTSU9OIGFueXdheSwgSSdtIGFsbCBmb3IganVzdA0Kd2FpdGluZyBmb3IgdGhlIHNlcnZl
ciB0byBzZW5kIHRoZSBlcnJvci4NCg0KPiAtLWIuDQo+IA0KPiA+IA0KPiA+IA0KPiA+ID4gLS1i
Lg0KPiA+ID4gDQo+ID4gPiA+IEMxOiBQVVRST09URkggfCBHRVRBVFRSIC0+IE5GUzRFUlJfU0VR
X01JU09SREVSRUQNCj4gPiA+ID4gQzI6IFNFUVVFTkNFIC0+IE5GUzRfT0sNCj4gPiA+ID4gQzE6
IFBVVFJPT1RGSCB8IEdFVEFUVFIgLT4gTkZTNEVSUl9DT05OX05PVF9CT1VORF9UT19TRVNTSU9O
DQo+ID4gPiA+IEMxOiBCSU5EX0NPTk5fVE9fU0VTU0lPTiAtPiBORlM0X09LDQo+ID4gPiA+IEMy
OiBCSU5EX0NPTk5fVE9fU0VTU0lPTiAtPiBORlM0X09LDQo+ID4gPiA+IEMyOiBQVVRST09URkgg
fCBHRVRBVFRSIC0+IE5GUzRFUlJfU0VRX01JU09SREVSRUQNCj4gPiA+ID4gDQo+ID4gPiA+IC4u
Li4gbWl4IG9mIEdFVEFUVFJzIGFuZCBvdGhlciBzaW1wbGUgcmVxdWVzdHMgLi4uLg0KPiA+ID4g
PiANCj4gPiA+ID4gQzE6IE9QRU4gLT4gTkZTNEVSUl9HUkFDRQ0KPiA+ID4gPiBDMjogT1BFTiAt
PiBORlM0RVJSX0dSQUNFDQo+ID4gPiA+IA0KPiA+ID4gPiBUaGUgUkVDTEFJTV9DT01QTEVURSBv
cGVyYXRpb24gZmFpbGVkLCBhbmQgdGhlIGNsaWVudCBkb2VzIG5vdA0KPiA+ID4gPiByZXRyeSBp
dC4gVGhhdCBsZWF2ZXMgaXRzIGxlYXNlIHN0dWNrIGluIEdSQUNFLg0KPiA+ID4gPiANCj4gPiA+
ID4gDQo+ID4gPiA+IC0tDQo+ID4gPiA+IENodWNrIExldmVyDQo+ID4gPiA+IA0KPiA+ID4gPiAN
Cj4gPiA+ID4gDQo+ID4gPiA+IC0tDQo+ID4gPiA+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBs
aXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZQ0KPiA+ID4gPiBsaW51eC1uZnMiIGluDQo+
ID4gPiA+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3Jn
DQo+ID4gPiA+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXTCoMKgaHR0cDovL3ZnZXIua2VybmVsLm9y
Zy9tYWpvcmRvbW8taW5mby5oDQo+ID4gPiA+IHRtbA0KPiA+ID4gDQo+ID4gPiAtLQ0KPiA+ID4g
VG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJl
IGxpbnV4LQ0KPiA+ID4gbmZzIiBpbg0KPiA+ID4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1h
am9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4gPiA+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXTCoMKg
aHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG0NCj4gPiA+IGwNCj4gPiAN
Cj4gPiAtLQ0KPiA+IENodWNrIExldmVyDQo+ID4gDQo+ID4gDQo+ID4gDQo+ID4gLS0NCj4gPiBU
byB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUg
bGludXgtDQo+ID4gbmZzIiBpbg0KPiA+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRv
bW9Admdlci5rZXJuZWwub3JnDQo+ID4gTW9yZSBtYWpvcmRvbW8gaW5mbyBhdMKgwqBodHRwOi8v
dmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwNCj4gDQo+IC0tDQo+IFRvIHVuc3Vi
c2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC1u
ZnMiDQo+IGluDQo+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJu
ZWwub3JnDQo+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXTCoMKgaHR0cDovL3ZnZXIua2VybmVsLm9y
Zy9tYWpvcmRvbW8taW5mby5odG1sDQo+IA0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5G
UyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5
ZGF0YS5jb20NCg==


2017-05-04 20:42:58

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use

On Thu, May 04, 2017 at 08:40:07PM +0000, Trond Myklebust wrote:
> See https://tools.ietf.org/html/rfc5661#section-2.10.3.1
>
> So, we probably should send the BIND_CONN_TO_SESSION after creating the
> session, but since that involves figuring out whether or not state
> protection was successfully negotiated, and since we have to support
> handling NFS4ERR_CONN_NOT_BOUND_TO_SESSION anyway, I'm all for just
> waiting for the server to send the error.

Makes sense to me.

--b.

2019-01-09 19:39:31

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] Fun with the multipathing code

Hi Trond,

Do you have any plans for this patch set?

I applied the patches on top of 4.20-rc7 kernel I had and tested it
(linux to linux) with iozone on the hardware (40G link with Mellanox
CX-5 card).

Results seem to show read IO improvement from 1.9GB to 3.9GB. Write IO
speed seems to be the same (disk bound I'm guessing). I also tried
mounting tmpfs. Same thing.

Seems like a useful feature to include?

Some raw numbers I got. Each nconnect=X value is just a single data point.

With nconnect=10
Command line used: /home/kolga/iozone3_482/src/current/iozone
-i0 -i1 -s52m -y2k -az -I
Output is in kBytes/sec
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 7820 10956 20960 20927
53248 4 14871 21305 38743 38242
53248 8 27803 35001 75568 75830
53248 16 47452 59596 132513 130921
53248 32 70572 84940 234902 233423
53248 64 94774 101237 355664 354372
53248 128 114667 119413 523245 524855
53248 256 132340 137530 682411 681260
53248 512 143172 146157 784144 356064
53248 1024 148874 154177 1013764 982943
53248 2048 144311 161233 1282095 1592057
53248 4096 164679 169837 1637788 2438329
53248 8192 159221 142882 188536 1523659
53248 16384 169236 96996 3914910 1875398

With nconnect=9
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 7833 10991 20893 20910
53248 4 15254 21136 40030 37510
53248 8 28077 37834 76688 67560
53248 16 47850 60174 137175 135266
53248 32 70653 85120 240219 235160
53248 64 96742 103856 364931 363556
53248 128 115002 119222 526446 517589
53248 256 132349 137254 684606 693748
53248 512 142849 147385 838735 876868
53248 1024 149612 152187 1060375 968514
53248 2048 150830 156006 1476364 1689987
53248 4096 163228 168421 1000338 1645183
53248 8192 165049 151047 3168655 3274393
53248 16384 166007 175972 743835 3817903

With nconnect=8
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 7118 10321 20281 20353
53248 4 13960 20445 39233 39160
53248 8 24688 36543 74964 75111
53248 16 44674 57346 131362 130294
53248 32 67547 82716 231881 228998
53248 64 94195 103270 345326 343389
53248 128 116830 119816 521772 511537
53248 256 133709 137917 682126 693098
53248 512 143913 148801 878939 860046
53248 1024 150329 154027 1041977 1028612
53248 2048 157680 158844 7378 1486753
53248 4096 159543 160027 2441901 2168589
53248 8192 165155 160193 2515452 3142285
53248 16384 169411 176009 2385325 3894130

With nconnect=7
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 7574 10593 20459 20381
53248 4 15064 20928 39865 39760
53248 8 27696 36864 74300 65721
53248 16 46960 59010 128354 127600
53248 32 68841 83578 230226 227369
53248 64 93114 100612 342303 331331
53248 128 112599 116108 498004 508645
53248 256 130668 136554 653718 634570
53248 512 142318 146749 805566 807056
53248 1024 148693 152493 965095 974736
53248 2048 157342 161170 1794490 1697579
53248 4096 144672 161154 2371227 2089308
53248 8192 148515 172814 3098132 766539
53248 16384 152801 143075 3799398 3778023

With nconnect=6
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 7832 11103 21119 21254
53248 4 15490 21607 40520 40215
53248 8 25519 37333 78626 77118
53248 16 47885 54596 139343 138482
53248 32 71914 85094 239720 237024
53248 64 93901 100491 383238 377849
53248 128 95497 119289 545658 533312
53248 256 131614 137665 726717 716209
53248 512 143397 147452 896038 869623
53248 1024 149938 153885 1057554 1062727
53248 2048 157542 159369 1750302 1691100
53248 4096 163450 162691 2524086 2622917
53248 8192 162439 153065 3320433 3286189
53248 16384 153553 166918 3873279 3855965

With nconnect=5
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 7592 10794 20382 20251
53248 4 15068 21096 41136 41865
53248 8 27606 37260 74947 74655
53248 16 47387 59806 137103 135962
53248 32 70402 83767 244301 241492
53248 64 95702 103042 361709 356424
53248 128 114189 118505 564857 556585
53248 256 132799 137856 751432 726667
53248 512 143233 146747 900493 921180
53248 1024 150787 154337 1106200 1088739
53248 2048 156873 161403 1133588 1709520
53248 4096 163741 166672 2468622 2275947
53248 8192 147689 165501 2969179 2943782
53248 16384 157076 143898 3468473 3580892

With nconnect=4
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 7280 10499 20140 21610
53248 4 15003 20658 39282 39084
53248 8 27440 36211 72983 74006
53248 16 46702 58114 130113 129372
53248 32 67942 81592 237173 246333
53248 64 92098 98403 351618 349844
53248 128 117327 120451 492681 480222
53248 256 134457 137616 676207 666874
53248 512 144648 148179 853880 855267
53248 1024 151171 156382 1108038 1075847
53248 2048 157698 161736 1704862 1659547
53248 4096 164955 163237 9991 2274603
53248 8192 167987 173542 3189440 1304661
53248 16384 160230 158367 616211 1008327

With nconnect=3
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 7954 11188 21786 21304
53248 4 15574 21973 41739 40116
53248 8 26917 38019 77460 77323
53248 16 47879 60593 140885 139938
53248 32 69304 83709 250196 247017
53248 64 95273 102929 371638 362578
53248 128 113436 118636 504672 495772
53248 256 131659 136857 749558 739310
53248 512 142581 146588 933209 907939
53248 1024 149502 152321 1092066 1093344
53248 2048 156992 162151 1821551 1772388
53248 4096 164692 170124 2530693 2442783
53248 8192 169409 175014 2795110 2795262
53248 16384 171873 176216 3088432 3172946
With nconnect=2
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 7653 10723 20632 20970
53248 4 15232 21710 43017 42909
53248 8 27894 38009 80566 80249
53248 16 47392 60132 140226 138809
53248 32 72166 84713 240219 240935
53248 64 95449 102520 392916 387097
53248 128 113915 118447 592994 579702
53248 256 132337 136397 808895 782690
53248 512 142757 147276 1023450 980987
53248 1024 149803 153748 1232539 1200873
53248 2048 117144 142496 1726862 1846521
53248 4096 129211 168913 2327366 2035403
53248 8192 168842 173977 2079450 859542
53248 16384 170514 133000 2450596 856588

With nconnect=1
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 7287 10482 20808 20586
53248 4 14282 20916 41216 40532
53248 8 26230 36606 76589 79005
53248 16 45838 59445 142976 141382
53248 32 70513 84601 250468 247247
53248 64 95128 103600 373719 377915
53248 128 116702 121174 571526 558482
53248 256 133131 137286 720249 702101
53248 512 140870 145269 907632 894129
53248 1024 148632 152558 1025853 1071471
53248 2048 69684 68052 1640169 1587587
53248 4096 57389 65044 1932496 1923277
53248 8192 65201 75412 1896445 1880839
53248 16384 86395 109635 1784491 1777077


Mounting a tmpfs instead of the disk
nconnect=10
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 20766 21097 21248 21096
53248 4 38718 39837 40282 40562
53248 8 70787 73029 75134 75473
53248 16 129871 135244 137464 137202
53248 32 206931 225844 246440 243423
53248 64 307101 324226 362781 363964
53248 128 423743 437825 533503 539324
53248 256 549566 600099 726419 756622
53248 512 658211 723361 890941 902508
53248 1024 771731 898627 1079691 1125845
53248 2048 904072 1047097 1746060 1814433
53248 4096 1197609 1278558 1780285 2390797
53248 8192 1022231 1523377 1463727 1304735
53248 16384 1321716 1716730 3913052 3861092

nconnect=9
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 18595 19418 19935 19555
53248 4 38048 38871 39015 39058
53248 8 70431 73903 73787 73437
53248 16 115428 120146 108439 132652
53248 32 189369 208458 238736 239319
53248 64 310172 326099 351834 350228
53248 128 419917 443973 540968 538233
53248 256 542390 578625 724630 721654
53248 512 636801 692928 876813 886978
53248 1024 740769 807593 1023254 1038803
53248 2048 900703 977706 1744465 1795702
53248 4096 991434 1218405 2312809 1534298
53248 8192 172671 1556220 3210650 1240208
53248 16384 1135860 1732470 3855099 3912755

nconnect=8
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 20164 20622 20499 21020
53248 4 38006 39090 40008 40093
53248 8 70803 72965 75611 75827
53248 16 125845 132516 135011 135602
53248 32 216442 232697 239348 239241
53248 64 288013 297895 356983 363912
53248 128 418932 441833 520451 513015
53248 256 560464 616810 726013 730965
53248 512 674367 722693 903227 936461
53248 1024 761283 840974 1089472 1128827
53248 2048 943060 924299 1467459 1666917
53248 4096 970724 1052788 2433414 1938400
53248 8192 1342030 1089869 464917 3304996
53248 16384 1458436 1095725 3794363 1635401

nconnect=7
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 20482 21154 21481 21409
53248 4 39328 40445 41006 40581
53248 8 75042 77753 80518 79727
53248 16 131785 136573 139394 138978
53248 32 150097 209044 249709 250655
53248 64 316353 333310 380193 383393
53248 128 427594 453668 573614 573235
53248 256 568166 611842 751230 753997
53248 512 655601 718936 909862 920353
53248 1024 749337 824988 1073221 1092846
53248 2048 959526 991769 1722507 1835308
53248 4096 1114485 1273084 824029 2244745
53248 8192 1096944 1590424 3208102 1612757
53248 16384 186085 1777460 2446002 3071636

nconnect=6
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 19954 20159 20472 20692
53248 4 38829 39657 40025 39943
53248 8 70936 73492 74566 74764
53248 16 119267 123319 136927 136591
53248 32 193462 227254 239441 240293
53248 64 280700 280861 348085 352502
53248 128 410708 433280 268324 480572
53248 256 549707 599025 705775 721743
53248 512 694691 777286 834676 831794
53248 1024 796161 899669 985672 1011762
53248 2048 660219 1095097 1442643 1536969
53248 4096 713024 1097287 2375110 2278199
53248 8192 961825 814827 1414807 1073586
53248 16384 1302666 188459 789169 3799328

nconnect=5
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 20083 20853 21387 21790
53248 4 39346 40634 41595 41911
53248 8 72275 75203 78950 79016
53248 16 110484 128308 135731 131166
53248 32 202718 216528 239493 240653
53248 64 293191 298468 379034 382413
53248 128 457944 496666 551294 555308
53248 256 595181 641156 750500 751126
53248 512 694337 787317 895434 898956
53248 1024 761906 854799 1064769 1073980
53248 2048 946967 1116994 1735369 1746934
53248 4096 392953 1355423 2615086 2455756
53248 8192 1356030 1578369 3033668 3172360
53248 16384 1454587 1743974 3562513 3540975

nconnect=4
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 20228 20092 21908 22120
53248 4 38694 39986 41089 41153
53248 8 75699 78465 80083 80017
53248 16 102728 130883 135680 141924
53248 32 220118 231684 240910 249315
53248 64 302994 321295 385046 386325
53248 128 457099 488792 564420 563577
53248 256 586191 676053 767127 776559
53248 512 715344 782611 899003 906520
53248 1024 771923 874051 1182348 1256440
53248 2048 969607 1104706 1557321 1911278
53248 4096 1179644 981022 1722069 2709534
53248 8192 1216820 1556373 3159477 3254646
53248 16384 1508198 605894 3517653 3571029

nconnect=3
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 21481 21763 21988 21311
53248 4 39828 40888 41669 41768
53248 8 65010 76085 80491 80466
53248 16 123527 135609 143423 144154
53248 32 225695 236990 250957 251665
53248 64 320309 348847 396364 396967
53248 128 426707 452220 565097 565103
53248 256 558951 600620 763477 767196
53248 512 668986 726410 972622 989905
53248 1024 782668 839173 1183444 1149741
53248 2048 974740 1075588 1853002 1885892
53248 4096 1198605 1308529 1270347 1624458
53248 8192 936760 1609546 2008581 2949932
53248 16384 579957 1068755 1254678 1268465

nconnect=2
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 20386 21137 21406 21519
53248 4 38273 39530 40406 41521
53248 8 73789 73972 78914 79116
53248 16 127961 133436 138270 137096
53248 32 213333 231143 238689 239144
53248 64 292544 301586 372603 374027
53248 128 449001 480655 552909 532209
53248 256 551713 611455 726627 738374
53248 512 652788 745258 845863 848531
53248 1024 822491 904270 1080454 1024272
53248 2048 829847 948519 2001870 1985974
53248 4096 1198116 1387247 2519900 2503433
53248 8192 1345305 1475502 2918073 3259019
53248 16384 634718 475630 3128884 2969906

nconnect=1
random
random bkwd record stride
kB reclen write rewrite read reread read
write read rewrite read fwrite frewrite fread
freread
53248 2 21288 21799 21638 21763
53248 4 40599 42412 42758 42762
53248 8 75734 78713 80072 81414
53248 16 124331 133874 148128 148421
53248 32 229738 242286 261479 262589
53248 64 337174 357993 385598 391051
53248 128 428862 462394 582345 576003
53248 256 527788 530506 780829 790614
53248 512 668147 732605 1071388 1058561
53248 1024 823391 921211 1218651 1223558
53248 2048 1016144 1111789 1600626 1585513
53248 4096 1251567 1436417 1818215 1868426
53248 8192 1479547 1716916 1804469 1789697
53248 16384 1435145 1954500 1796230 1799570



On Sun, Apr 30, 2017 at 8:49 AM Trond Myklebust
<[email protected]> wrote:
>
> In the spirit of experimentation, I've put together a set of patches
> that implement setting up multiple TCP connections to the server.
> The connections all go to the same server IP address, so do not
> provide support for multiple IP addresses (which I believe is
> something Andy Adamson is working on).
>
> The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't
> feel comfortable subjecting NFSv3/v4 replay caches to this
> treatment yet. It relies on the mount option "nconnect" to specify
> the number of connections to st up. So you can do something like
> 'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt'
> to set up 8 TCP connections to server 'foo'.
>
> Anyhow, feel free to test and give me feedback as to whether or not
> this helps performance on your system.
>
> Trond Myklebust (5):
> SUNRPC: Allow creation of RPC clients with multiple connections
> NFS: Add a mount option to specify number of TCP connections to use
> NFSv4: Allow multiple connections to NFSv4.x (x>0) servers
> pNFS: Allow multiple connections to the DS
> NFS: Display the "nconnect" mount option if it is set.
>
> fs/nfs/client.c | 2 ++
> fs/nfs/internal.h | 2 ++
> fs/nfs/nfs3client.c | 3 +++
> fs/nfs/nfs4client.c | 13 +++++++++++--
> fs/nfs/super.c | 12 ++++++++++++
> include/linux/nfs_fs_sb.h | 1 +
> include/linux/sunrpc/clnt.h | 1 +
> net/sunrpc/clnt.c | 17 ++++++++++++++++-
> net/sunrpc/xprtmultipath.c | 3 +--
> 9 files changed, 49 insertions(+), 5 deletions(-)
>
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2019-01-09 20:38:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] Fun with the multipathing code

Hi Olga

On Wed, 2019-01-09 at 14:39 -0500, Olga Kornievskaia wrote:
> Hi Trond,
>
> Do you have any plans for this patch set?
>
> I applied the patches on top of 4.20-rc7 kernel I had and tested it
> (linux to linux) with iozone on the hardware (40G link with Mellanox
> CX-5 card).
>
> Results seem to show read IO improvement from 1.9GB to 3.9GB. Write
> IO
> speed seems to be the same (disk bound I'm guessing). I also tried
> mounting tmpfs. Same thing.
>
> Seems like a useful feature to include?

Thanks for testing this.

Was this your own port of the original patches, or have you taken my
branch from
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/multipath_tcp
?

Either way I appreciate the data point. I haven't seen too many other
reports of performance improvements, and that's the main reason why
this patchset has languished.

3.9GB/s would be about 31Gbps, so that is not quite wire speed, but
certainly a big improvement on 1.9GB/s. I'm a little surprised, tbough,
that the write performance did not improve with the tmpfs. Was all this
using aio+dio on the client?

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-01-09 22:18:48

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [RFC PATCH 0/5] Fun with the multipathing code

On Wed, Jan 9, 2019 at 3:38 PM Trond Myklebust <[email protected]> wrote:
>
> Hi Olga
>
> On Wed, 2019-01-09 at 14:39 -0500, Olga Kornievskaia wrote:
> > Hi Trond,
> >
> > Do you have any plans for this patch set?
> >
> > I applied the patches on top of 4.20-rc7 kernel I had and tested it
> > (linux to linux) with iozone on the hardware (40G link with Mellanox
> > CX-5 card).
> >
> > Results seem to show read IO improvement from 1.9GB to 3.9GB. Write
> > IO
> > speed seems to be the same (disk bound I'm guessing). I also tried
> > mounting tmpfs. Same thing.
> >
> > Seems like a useful feature to include?
>
> Thanks for testing this.
>
> Was this your own port of the original patches, or have you taken my
> branch from
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/multipath_tcp
> ?

I didn't know one existed. I just took original patches from the
mailing list and applied to 4.20-rc7 (they applied without issues that
I recall).

> Either way I appreciate the data point. I haven't seen too many other
> reports of performance improvements, and that's the main reason why
> this patchset has languished.
>
> 3.9GB/s would be about 31Gbps, so that is not quite wire speed, but
> certainly a big improvement on 1.9GB/s.

Maybe it's the lab setup that's not tuned to achieve max performance.

> I'm a little surprised, tbough,
> that the write performance did not improve with the tmpfs. Was all this
> using aio+dio on the client?

It is what ever "iozone -i0 -i1 -s52m -y2k -az -I" translates to.

To clarify by "didn't improve" I didn't mean the write speed with disk
is same as write speed with tmpfs (disk write speed is ~168MB and
tmpfs write speed is 1.47GB). I meant that it seems with nconnect=1 it
achieves the "max" performance of disk/tmpfs.

>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>