2010-02-22 21:54:38

by Ben Myers

[permalink] [raw]
Subject: [PATCH 0/2] knfsd: support larger numbers of clients

Several customers that have large numbers of nfs clients have run afoul of the
existing connection limits ((nr_threads+3) * 20) and have been handling the
situation by bumping the number of nfsd threads. There are two problems:

1) The maximum number of connections is tied to the number of nfsd threads.

The first patch decouples the maximum number of connections from the number of
threads by making it a tuneable. Note that the default setting still uses the
existing formula. Folks who want to handle additional connections without
bumping the number of threads can do so.

2) At around 1020 threads the send buffer size wraps negative and things grind
to a halt.

The second patch decouples the size of send and receive buffers from the number
of threads by making them tuneable. Buffer sizes were getting absurdly large
as the number of nfsd threads grew. Now they remain constant.


Both patches were originally implmented using sysctls but I have changed them
over to module parameters to match what Jeff Layton has done with
nlm_max_connections.

Thanks,
Ben

---

Ben Myers (2):
knfsd: nfsd_max_connections module parameter
sunrpc: socket buffer size module parameter


fs/nfsd/nfsctl.c | 7 ++++
fs/nfsd/nfssvc.c | 3 ++
net/sunrpc/sunrpc_syms.c | 39 ++++++++++++++++++++++
net/sunrpc/svcsock.c | 81 +++++++++++++++++++++++-----------------------
4 files changed, 89 insertions(+), 41 deletions(-)

--


2010-02-23 16:11:35

by Ben Myers

[permalink] [raw]
Subject: Re: [PATCH 2/2] sunrpc: socket buffer size module parameter

Hey Chuck,

On Mon, Feb 22, 2010 at 05:33:10PM -0800, Chuck Lever wrote:
> On 02/22/2010 01:54 PM, Ben Myers wrote:
>> +int tcp_rcvbuf_nrpc = 6;
>
> Just curious, is this '6' a typo?

Not a typo. The original setting for tcp receive buffer was hardcoded
at

3 (in svc_tcp_init and svc_tcp_recv_record)
*
sv_max_mesg
*
2 (in svc_sock_setbufsize)

That's where I came up with the 6 for the tcp recv buffer. The setting
hasn't changed.



The UDP send/recv buffer settings and TCP send buffer settings were
going to be

( 4 (default number of kernel threads on sles11)
+
3 (as in svc_udp_recvfrom, etc) )
*
sv_max_mesg
*
2 (in svc_sock_setbufsize)

but 14 wasn't a very round number, so I went with 16 which also happened
to match the slot_table_entries default.

> Perhaps it would be nice to have a
> single macro defined as the default value for all of these.
>
> Do we have a high degree of confidence that these new default settings
> will not adversely affect workloads that already perform well?

This patch has been in several releases of SGI's nfsd respin and I've
heard nothing to suggest there is an issue. I didn't spend much time
taking measurements on UDP and didn't keep my TCP measurements. If you
feel measurements are essential I'll be happy to provide a few, but
won't be able to get around to it for a little while.

Thanks,
Ben

2010-02-23 20:11:42

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 2/2] sunrpc: socket buffer size module parameter

On 02/23/2010 08:12 AM, [email protected] wrote:
> Hey Chuck,
>
> On Mon, Feb 22, 2010 at 05:33:10PM -0800, Chuck Lever wrote:
>> On 02/22/2010 01:54 PM, Ben Myers wrote:
>>> +int tcp_rcvbuf_nrpc = 6;
>>
>> Just curious, is this '6' a typo?
>
> Not a typo. The original setting for tcp receive buffer was hardcoded
> at
>
> 3 (in svc_tcp_init and svc_tcp_recv_record)
> *
> sv_max_mesg
> *
> 2 (in svc_sock_setbufsize)
>
> That's where I came up with the 6 for the tcp recv buffer. The setting
> hasn't changed.
>
>
>
> The UDP send/recv buffer settings and TCP send buffer settings were
> going to be
>
> ( 4 (default number of kernel threads on sles11)
> +
> 3 (as in svc_udp_recvfrom, etc) )
> *
> sv_max_mesg
> *
> 2 (in svc_sock_setbufsize)
>
> but 14 wasn't a very round number, so I went with 16 which also happened
> to match the slot_table_entries default.

It triggered my "naked integer" nerve. It would be nice to provide some
level of detail, similar to your description here, in the comments
around these settings. Perhaps some guidance for admins about how to
choose these values would also be warranted.

Most importantly, though, there should be some documentation of why
these are the chosen defaults.

>> Perhaps it would be nice to have a
>> single macro defined as the default value for all of these.
>>
>> Do we have a high degree of confidence that these new default settings
>> will not adversely affect workloads that already perform well?
>
> This patch has been in several releases of SGI's nfsd respin and I've
> heard nothing to suggest there is an issue. I didn't spend much time
> taking measurements on UDP and didn't keep my TCP measurements. If you
> feel measurements are essential I'll be happy to provide a few, but
> won't be able to get around to it for a little while.

There were recent changes to the server's default buffer size settings
that caused problems for certain common workloads.

I don't think you need to go overboard with measurements and rationale,
but some guarantee that these two patches won't cause performance
regressions on typical NFS server workloads would be "nice to have."

--
Chuck Lever

2010-02-23 01:34:13

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 2/2] sunrpc: socket buffer size module parameter

On 02/22/2010 01:54 PM, Ben Myers wrote:
> At 1020 threads the send buffer size wraps and becomes negative causing the nfs
> server to grind to a halt. Rather than setting bufsize based upon the number
> of nfsd threads I have made the buffer sizes tuneable via module parameters.
> In other versions of this patch they were sysctls.
>
> Set the buffer sizes in terms of the number of rpcs you want to fit into the
> buffer.
>
> Signed-off-by: Ben Myers<[email protected]>
> ---
> fs/nfsd/nfsctl.c | 1 +
> net/sunrpc/sunrpc_syms.c | 39 ++++++++++++++++++++++
> net/sunrpc/svcsock.c | 81 +++++++++++++++++++++++-----------------------
> 3 files changed, 80 insertions(+), 41 deletions(-)
>
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 981366d..bce8df3 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -19,6 +19,7 @@
> /* Default to use existing settings in svc_check_conn_limits */
> unsigned int nfsd_max_connections = 0;
>
> +
> /*
> * We have a single directory with 9 nodes in it.
> */
> diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c
> index f438347..a7a80b4 100644
> --- a/net/sunrpc/sunrpc_syms.c
> +++ b/net/sunrpc/sunrpc_syms.c
> @@ -26,6 +26,45 @@ extern struct cache_detail ip_map_cache, unix_gid_cache;
>
> extern void cleanup_rpcb_clnt(void);
>
> +/*
> + * Size each socket's send and receive buffers based upon the number of rpcs
> + * we'd like to be able to fit into them. The TCP rcvbuf is enough for just a
> + * few requests. TCP sndbuf is the same as the tcp_slot_table_entries default,
> + * the maximum number of in-flight rpcs for a client. UDP sndbuf and rcvbuf
> + * defaults are the same as the udp_slot_table_entries default.
> + *
> + * Setting these too small can cause nfsd threads to block when sending.
> + */
> +int tcp_rcvbuf_nrpc = 6;

Just curious, is this '6' a typo? Perhaps it would be nice to have a
single macro defined as the default value for all of these.

Do we have a high degree of confidence that these new default settings
will not adversely affect workloads that already perform well?

> +int tcp_sndbuf_nrpc = 16;
> +
> +int udp_rcvbuf_nrpc = 16;
> +int udp_sndbuf_nrpc = 16;
> +
> +/*
> + * Minimum for bufsize settings is enough space for 6 rpcs, maximum is enough
> + * space for 128 rpcs.
> + */
> +static int param_set_bufsize_nrpc(const char *val, struct kernel_param *kp)
> +{
> + char *endp;
> + int num = simple_strtol(val,&endp, 0);
> +
> + if (endp == val || (*endp&& *endp != '\n') || num< 6 || num> 128)
> + return -EINVAL;
> + *((int *)kp->arg) = num;
> + return 0;
> +}
> +
> +module_param_call(tcp_rcvbuf_nrpc, param_set_bufsize_nrpc, param_get_int,
> + &tcp_rcvbuf_nrpc, 0644);
> +module_param_call(tcp_sndbuf_nrpc, param_set_bufsize_nrpc, param_get_int,
> + &tcp_sndbuf_nrpc, 0644);
> +module_param_call(udp_rcvbuf_nrpc, param_set_bufsize_nrpc, param_get_int,
> + &udp_rcvbuf_nrpc, 0644);
> +module_param_call(udp_sndbuf_nrpc, param_set_bufsize_nrpc, param_get_int,
> + &udp_sndbuf_nrpc, 0644);
> +
> static int __init
> init_sunrpc(void)
> {
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 9e09391..f172b27 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -381,8 +381,7 @@ static int svc_recvfrom(struct svc_rqst *rqstp, struct kvec *iov, int nr,
> /*
> * Set socket snd and rcv buffer lengths
> */
> -static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
> - unsigned int rcv)
> +static void svc_sock_setbufsize(struct socket *sock, int snd, int rcv)
> {
> #if 0
> mm_segment_t oldfs;
> @@ -398,13 +397,14 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
> * DaveM said I could!
> */
> lock_sock(sock->sk);
> - sock->sk->sk_sndbuf = snd * 2;
> - sock->sk->sk_rcvbuf = rcv * 2;
> + sock->sk->sk_sndbuf = snd;
> + sock->sk->sk_rcvbuf = rcv;
> sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
> sock->sk->sk_write_space(sock->sk);
> release_sock(sock->sk);
> #endif
> }
> +
> /*
> * INET callback when data has been received on the socket.
> */
> @@ -498,6 +498,7 @@ static int svc_udp_get_dest_address(struct svc_rqst *rqstp,
> return 0;
> }
>
> +extern int udp_sndbuf_nrpc, udp_rcvbuf_nrpc;
> /*
> * Receive a datagram from a UDP socket.
> */
> @@ -521,18 +522,16 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
> size_t len;
> int err;
>
> - if (test_and_clear_bit(XPT_CHNGBUF,&svsk->sk_xprt.xpt_flags))
> - /* udp sockets need large rcvbuf as all pending
> - * requests are still in that buffer. sndbuf must
> - * also be large enough that there is enough space
> - * for one reply per thread. We count all threads
> - * rather than threads in a particular pool, which
> - * provides an upper bound on the number of threads
> - * which will access the socket.
> - */
> - svc_sock_setbufsize(svsk->sk_sock,
> - (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> - (serv->sv_nrthreads+3) * serv->sv_max_mesg);
> + if (test_and_clear_bit(XPT_CHNGBUF,&svsk->sk_xprt.xpt_flags)) {
> + /*
> + * UDP sockets need a large rcvbuf as all pending requests are
> + * still in that buffer. sndbuf must also be large enough to
> + * fit a reply for each request the server could be processing.
> + */
> + svc_sock_setbufsize(svsk->sk_sock,
> + udp_sndbuf_nrpc * serv->sv_max_mesg,
> + udp_rcvbuf_nrpc * serv->sv_max_mesg);
> + }
>
> clear_bit(XPT_DATA,&svsk->sk_xprt.xpt_flags);
> skb = NULL;
> @@ -697,14 +696,14 @@ static void svc_udp_init(struct svc_sock *svsk, struct svc_serv *serv)
> clear_bit(XPT_CACHE_AUTH,&svsk->sk_xprt.xpt_flags);
> svsk->sk_sk->sk_data_ready = svc_udp_data_ready;
> svsk->sk_sk->sk_write_space = svc_write_space;
> -
> - /* initialise setting must have enough space to
> - * receive and respond to one request.
> - * svc_udp_recvfrom will re-adjust if necessary
> +
> + /*
> + * Initial setting must have enough space to receive and respond to one
> + * request. svc_udp_recvfrom will readjust if necessary.
> */
> svc_sock_setbufsize(svsk->sk_sock,
> - 3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
> - 3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
> + 6 * svsk->sk_xprt.xpt_server->sv_max_mesg,
> + 6 * svsk->sk_xprt.xpt_server->sv_max_mesg);
>
> /* data might have come in before data_ready set up */
> set_bit(XPT_DATA,&svsk->sk_xprt.xpt_flags);
> @@ -874,6 +873,7 @@ failed:
> return NULL;
> }
>
> +extern int tcp_sndbuf_nrpc, tcp_rcvbuf_nrpc;
> /*
> * Receive data.
> * If we haven't gotten the record length yet, get the next four bytes.
> @@ -885,22 +885,20 @@ static int svc_tcp_recv_record(struct svc_sock *svsk, struct svc_rqst *rqstp)
> struct svc_serv *serv = svsk->sk_xprt.xpt_server;
> int len;
>
> - if (test_and_clear_bit(XPT_CHNGBUF,&svsk->sk_xprt.xpt_flags))
> - /* sndbuf needs to have room for one request
> - * per thread, otherwise we can stall even when the
> - * network isn't a bottleneck.
> - *
> - * We count all threads rather than threads in a
> - * particular pool, which provides an upper bound
> - * on the number of threads which will access the socket.
> - *
> - * rcvbuf just needs to be able to hold a few requests.
> - * Normally they will be removed from the queue
> - * as soon a a complete request arrives.
> + if (test_and_clear_bit(XPT_CHNGBUF,&svsk->sk_xprt.xpt_flags)) {
> + /*
> + * sndbuf needs to have enough room for every request the
> + * server could be processing, otherwise we can block even when
> + * the network isn't a bottleneck.
> + *
> + * Normally they will be removed from the queue as soon as a
> + * complete request arrives.
> */
> svc_sock_setbufsize(svsk->sk_sock,
> - (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> - 3 * serv->sv_max_mesg);
> + tcp_sndbuf_nrpc * serv->sv_max_mesg,
> + tcp_rcvbuf_nrpc * serv->sv_max_mesg);
> + }
> +
>
> clear_bit(XPT_DATA,&svsk->sk_xprt.xpt_flags);
>
> @@ -1250,13 +1248,14 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
>
> tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;
>
> - /* initialise setting must have enough space to
> - * receive and respond to one request.
> - * svc_tcp_recvfrom will re-adjust if necessary
> + /*
> + * Initial setting must have enough space to receive and
> + * respond to one request. svc_tcp_recvfrom will readjust if
> + * necessary.
> */
> svc_sock_setbufsize(svsk->sk_sock,
> - 3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
> - 3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
> + 6 * svsk->sk_xprt.xpt_server->sv_max_mesg,
> + 6 * svsk->sk_xprt.xpt_server->sv_max_mesg);
>
> set_bit(XPT_CHNGBUF,&svsk->sk_xprt.xpt_flags);
> set_bit(XPT_DATA,&svsk->sk_xprt.xpt_flags);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever

2010-02-22 21:54:43

by Ben Myers

[permalink] [raw]
Subject: [PATCH 1/2] knfsd: nfsd_max_connections module parameter

Decouple the maximum number of nfs client connections from the number of server
threads by adding the nfsd_max_connections module parameter. Note that the
default remains the current formula: (nr_threads+3) * 20.

Signed-off-by: Ben Myers <[email protected]>
---
fs/nfsd/nfsctl.c | 6 ++++++
fs/nfsd/nfssvc.c | 3 +++
2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 0f0e77f..981366d 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -16,6 +16,9 @@
#include "nfsd.h"
#include "cache.h"

+/* Default to use existing settings in svc_check_conn_limits */
+unsigned int nfsd_max_connections = 0;
+
/*
* We have a single directory with 9 nodes in it.
*/
@@ -1447,5 +1450,8 @@ static void __exit exit_nfsd(void)

MODULE_AUTHOR("Olaf Kirch <[email protected]>");
MODULE_LICENSE("GPL");
+
+module_param(nfsd_max_connections, uint, 0644);
+
module_init(init_nfsd)
module_exit(exit_nfsd)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 171699e..a21fd4e 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -275,11 +275,14 @@ int nfsd_create_serv(void)
return err;
}

+extern unsigned int nfsd_max_connections;
static int nfsd_init_socks(int port)
{
int error;
if (!list_empty(&nfsd_serv->sv_permsocks))
return 0;
+
+ nfsd_serv->sv_maxconn = nfsd_max_connections;

error = svc_create_xprt(nfsd_serv, "udp", PF_INET, port,
SVC_SOCK_DEFAULTS);


2010-02-22 21:54:48

by Ben Myers

[permalink] [raw]
Subject: [PATCH 2/2] sunrpc: socket buffer size module parameter

At 1020 threads the send buffer size wraps and becomes negative causing the nfs
server to grind to a halt. Rather than setting bufsize based upon the number
of nfsd threads I have made the buffer sizes tuneable via module parameters.
In other versions of this patch they were sysctls.

Set the buffer sizes in terms of the number of rpcs you want to fit into the
buffer.

Signed-off-by: Ben Myers <[email protected]>
---
fs/nfsd/nfsctl.c | 1 +
net/sunrpc/sunrpc_syms.c | 39 ++++++++++++++++++++++
net/sunrpc/svcsock.c | 81 +++++++++++++++++++++++-----------------------
3 files changed, 80 insertions(+), 41 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 981366d..bce8df3 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -19,6 +19,7 @@
/* Default to use existing settings in svc_check_conn_limits */
unsigned int nfsd_max_connections = 0;

+
/*
* We have a single directory with 9 nodes in it.
*/
diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c
index f438347..a7a80b4 100644
--- a/net/sunrpc/sunrpc_syms.c
+++ b/net/sunrpc/sunrpc_syms.c
@@ -26,6 +26,45 @@ extern struct cache_detail ip_map_cache, unix_gid_cache;

extern void cleanup_rpcb_clnt(void);

+/*
+ * Size each socket's send and receive buffers based upon the number of rpcs
+ * we'd like to be able to fit into them. The TCP rcvbuf is enough for just a
+ * few requests. TCP sndbuf is the same as the tcp_slot_table_entries default,
+ * the maximum number of in-flight rpcs for a client. UDP sndbuf and rcvbuf
+ * defaults are the same as the udp_slot_table_entries default.
+ *
+ * Setting these too small can cause nfsd threads to block when sending.
+ */
+int tcp_rcvbuf_nrpc = 6;
+int tcp_sndbuf_nrpc = 16;
+
+int udp_rcvbuf_nrpc = 16;
+int udp_sndbuf_nrpc = 16;
+
+/*
+ * Minimum for bufsize settings is enough space for 6 rpcs, maximum is enough
+ * space for 128 rpcs.
+ */
+static int param_set_bufsize_nrpc(const char *val, struct kernel_param *kp)
+{
+ char *endp;
+ int num = simple_strtol(val, &endp, 0);
+
+ if (endp == val || (*endp && *endp != '\n') || num < 6 || num > 128)
+ return -EINVAL;
+ *((int *)kp->arg) = num;
+ return 0;
+}
+
+module_param_call(tcp_rcvbuf_nrpc, param_set_bufsize_nrpc, param_get_int,
+ &tcp_rcvbuf_nrpc, 0644);
+module_param_call(tcp_sndbuf_nrpc, param_set_bufsize_nrpc, param_get_int,
+ &tcp_sndbuf_nrpc, 0644);
+module_param_call(udp_rcvbuf_nrpc, param_set_bufsize_nrpc, param_get_int,
+ &udp_rcvbuf_nrpc, 0644);
+module_param_call(udp_sndbuf_nrpc, param_set_bufsize_nrpc, param_get_int,
+ &udp_sndbuf_nrpc, 0644);
+
static int __init
init_sunrpc(void)
{
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 9e09391..f172b27 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -381,8 +381,7 @@ static int svc_recvfrom(struct svc_rqst *rqstp, struct kvec *iov, int nr,
/*
* Set socket snd and rcv buffer lengths
*/
-static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
- unsigned int rcv)
+static void svc_sock_setbufsize(struct socket *sock, int snd, int rcv)
{
#if 0
mm_segment_t oldfs;
@@ -398,13 +397,14 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
* DaveM said I could!
*/
lock_sock(sock->sk);
- sock->sk->sk_sndbuf = snd * 2;
- sock->sk->sk_rcvbuf = rcv * 2;
+ sock->sk->sk_sndbuf = snd;
+ sock->sk->sk_rcvbuf = rcv;
sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
sock->sk->sk_write_space(sock->sk);
release_sock(sock->sk);
#endif
}
+
/*
* INET callback when data has been received on the socket.
*/
@@ -498,6 +498,7 @@ static int svc_udp_get_dest_address(struct svc_rqst *rqstp,
return 0;
}

+extern int udp_sndbuf_nrpc, udp_rcvbuf_nrpc;
/*
* Receive a datagram from a UDP socket.
*/
@@ -521,18 +522,16 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
size_t len;
int err;

- if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
- /* udp sockets need large rcvbuf as all pending
- * requests are still in that buffer. sndbuf must
- * also be large enough that there is enough space
- * for one reply per thread. We count all threads
- * rather than threads in a particular pool, which
- * provides an upper bound on the number of threads
- * which will access the socket.
- */
- svc_sock_setbufsize(svsk->sk_sock,
- (serv->sv_nrthreads+3) * serv->sv_max_mesg,
- (serv->sv_nrthreads+3) * serv->sv_max_mesg);
+ if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags)) {
+ /*
+ * UDP sockets need a large rcvbuf as all pending requests are
+ * still in that buffer. sndbuf must also be large enough to
+ * fit a reply for each request the server could be processing.
+ */
+ svc_sock_setbufsize(svsk->sk_sock,
+ udp_sndbuf_nrpc * serv->sv_max_mesg,
+ udp_rcvbuf_nrpc * serv->sv_max_mesg);
+ }

clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
skb = NULL;
@@ -697,14 +696,14 @@ static void svc_udp_init(struct svc_sock *svsk, struct svc_serv *serv)
clear_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
svsk->sk_sk->sk_data_ready = svc_udp_data_ready;
svsk->sk_sk->sk_write_space = svc_write_space;
-
- /* initialise setting must have enough space to
- * receive and respond to one request.
- * svc_udp_recvfrom will re-adjust if necessary
+
+ /*
+ * Initial setting must have enough space to receive and respond to one
+ * request. svc_udp_recvfrom will readjust if necessary.
*/
svc_sock_setbufsize(svsk->sk_sock,
- 3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
- 3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
+ 6 * svsk->sk_xprt.xpt_server->sv_max_mesg,
+ 6 * svsk->sk_xprt.xpt_server->sv_max_mesg);

/* data might have come in before data_ready set up */
set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
@@ -874,6 +873,7 @@ failed:
return NULL;
}

+extern int tcp_sndbuf_nrpc, tcp_rcvbuf_nrpc;
/*
* Receive data.
* If we haven't gotten the record length yet, get the next four bytes.
@@ -885,22 +885,20 @@ static int svc_tcp_recv_record(struct svc_sock *svsk, struct svc_rqst *rqstp)
struct svc_serv *serv = svsk->sk_xprt.xpt_server;
int len;

- if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
- /* sndbuf needs to have room for one request
- * per thread, otherwise we can stall even when the
- * network isn't a bottleneck.
- *
- * We count all threads rather than threads in a
- * particular pool, which provides an upper bound
- * on the number of threads which will access the socket.
- *
- * rcvbuf just needs to be able to hold a few requests.
- * Normally they will be removed from the queue
- * as soon a a complete request arrives.
+ if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags)) {
+ /*
+ * sndbuf needs to have enough room for every request the
+ * server could be processing, otherwise we can block even when
+ * the network isn't a bottleneck.
+ *
+ * Normally they will be removed from the queue as soon as a
+ * complete request arrives.
*/
svc_sock_setbufsize(svsk->sk_sock,
- (serv->sv_nrthreads+3) * serv->sv_max_mesg,
- 3 * serv->sv_max_mesg);
+ tcp_sndbuf_nrpc * serv->sv_max_mesg,
+ tcp_rcvbuf_nrpc * serv->sv_max_mesg);
+ }
+

clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);

@@ -1250,13 +1248,14 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)

tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;

- /* initialise setting must have enough space to
- * receive and respond to one request.
- * svc_tcp_recvfrom will re-adjust if necessary
+ /*
+ * Initial setting must have enough space to receive and
+ * respond to one request. svc_tcp_recvfrom will readjust if
+ * necessary.
*/
svc_sock_setbufsize(svsk->sk_sock,
- 3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
- 3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
+ 6 * svsk->sk_xprt.xpt_server->sv_max_mesg,
+ 6 * svsk->sk_xprt.xpt_server->sv_max_mesg);

set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);


2010-02-23 20:47:56

by Ben Myers

[permalink] [raw]
Subject: Re: [PATCH 2/2] sunrpc: socket buffer size module parameter

Hey Chuck,

On Tue, Feb 23, 2010 at 12:09:50PM -0800, Chuck Lever wrote:
> On 02/23/2010 08:12 AM, [email protected] wrote:
>> On Mon, Feb 22, 2010 at 05:33:10PM -0800, Chuck Lever wrote:
>>> On 02/22/2010 01:54 PM, Ben Myers wrote:
>>>> +int tcp_rcvbuf_nrpc = 6;
>>>
>>> Just curious, is this '6' a typo?
>>
>> Not a typo. The original setting for tcp receive buffer was hardcoded
>> at
>>
>> 3 (in svc_tcp_init and svc_tcp_recv_record)
>> *
>> sv_max_mesg
>> *
>> 2 (in svc_sock_setbufsize)
>>
>> That's where I came up with the 6 for the tcp recv buffer. The setting
>> hasn't changed.
>>
>>
>>
>> The UDP send/recv buffer settings and TCP send buffer settings were
>> going to be
>>
>> ( 4 (default number of kernel threads on sles11)
>> +
>> 3 (as in svc_udp_recvfrom, etc) )
>> *
>> sv_max_mesg
>> *
>> 2 (in svc_sock_setbufsize)
>>
>> but 14 wasn't a very round number, so I went with 16 which also happened
>> to match the slot_table_entries default.
>
> It triggered my "naked integer" nerve. It would be nice to provide some
> level of detail, similar to your description here, in the comments
> around these settings.

Sure... I'll add some comments.

> Perhaps some guidance for admins about how to
> choose these values would also be warranted.

Learning how best to use the settings will take some experimentation
which is something I hadn't bothered to do. Here I was simply trying to
remove a scalability limitation.

> Most importantly, though, there should be some documentation of why
> these are the chosen defaults.
>
>>> Perhaps it would be nice to have a
>>> single macro defined as the default value for all of these.
>>>
>>> Do we have a high degree of confidence that these new default settings
>>> will not adversely affect workloads that already perform well?
>>
>> This patch has been in several releases of SGI's nfsd respin and I've
>> heard nothing to suggest there is an issue. I didn't spend much time
>> taking measurements on UDP and didn't keep my TCP measurements. If you
>> feel measurements are essential I'll be happy to provide a few, but
>> won't be able to get around to it for a little while.
>
> There were recent changes to the server's default buffer size settings
> that caused problems for certain common workloads.

Yeah, I think I saw that stuff go by.

> I don't think you need to go overboard with measurements and rationale,
> but some guarantee that these two patches won't cause performance
> regressions on typical NFS server workloads would be "nice to have."

I'll do some benchmarking with tar and some streaming io with dd and
repost. If you have some suggestions about appropriate workloads I'm
all ears.

Thanks,
Ben