2009-01-13 10:27:00

by Greg Banks

[permalink] [raw]
Subject: [patch 3/3] knfsd: add file to export stats about nfsd pools

Add /proc/fs/nfsd/pool_stats to export to userspace various
statistics about the operation of rpc server thread pools.

This patch is based on a forward-ported version of
knfsd-add-pool-thread-stats which has been shipping in the SGI
"Enhanced NFS" product since 2006 and which was previously
posted:

http://article.gmane.org/gmane.linux.nfs/10375

It has also been updated thus:

* moved EXPORT_SYMBOL() to near the function it exports
* made the new struct struct seq_operations const
* used SEQ_START_TOKEN instead of ((void *)1)
* merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
printk in svc_pool_stats_*()" by Harshula Jayasuriya.
* merged fix from SGI PV 964001 "Crash reading pool_stats before
nfsds are started".

Signed-off-by: Greg Banks <[email protected]>
Signed-off-by: Harshula Jayasuriya <[email protected]>
---

fs/nfsd/nfsctl.c | 12 ++++
fs/nfsd/nfssvc.c | 7 ++
include/linux/sunrpc/svc.h | 11 +++
net/sunrpc/svc_xprt.c | 100 +++++++++++++++++++++++++++++++++-
4 files changed, 129 insertions(+), 1 deletion(-)

Index: bfields/fs/nfsd/nfsctl.c
===================================================================
--- bfields.orig/fs/nfsd/nfsctl.c
+++ bfields/fs/nfsd/nfsctl.c
@@ -60,6 +60,7 @@ enum {
NFSD_FO_UnlockFS,
NFSD_Threads,
NFSD_Pool_Threads,
+ NFSD_Pool_Stats,
NFSD_Versions,
NFSD_Ports,
NFSD_MaxBlkSize,
@@ -172,6 +173,16 @@ static const struct file_operations expo
.owner = THIS_MODULE,
};

+extern int nfsd_pool_stats_open(struct inode *inode, struct file *file);
+
+static struct file_operations pool_stats_operations = {
+ .open = nfsd_pool_stats_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+ .owner = THIS_MODULE,
+};
+
/*----------------------------------------------------------------------------*/
/*
* payload - write methods
@@ -1246,6 +1257,7 @@ static int nfsd_fill_super(struct super_
[NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
+ [NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO},
[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
Index: bfields/fs/nfsd/nfssvc.c
===================================================================
--- bfields.orig/fs/nfsd/nfssvc.c
+++ bfields/fs/nfsd/nfssvc.c
@@ -546,3 +546,10 @@ nfsd_dispatch(struct svc_rqst *rqstp, __
nfsd_cache_update(rqstp, proc->pc_cachetype, statp + 1);
return 1;
}
+
+int nfsd_pool_stats_open(struct inode *inode, struct file *file)
+{
+ if (nfsd_serv == NULL)
+ return -ENODEV;
+ return svc_pool_stats_open(nfsd_serv, file);
+}
Index: bfields/include/linux/sunrpc/svc.h
===================================================================
--- bfields.orig/include/linux/sunrpc/svc.h
+++ bfields/include/linux/sunrpc/svc.h
@@ -24,6 +24,15 @@
*/
typedef int (*svc_thread_fn)(void *);

+/* statistics for svc_pool structures */
+struct svc_pool_stats {
+ unsigned long packets;
+ unsigned long sockets_queued;
+ unsigned long threads_woken;
+ unsigned long overloads_avoided;
+ unsigned long threads_timedout;
+};
+
/*
*
* RPC service thread pool.
@@ -42,6 +51,7 @@ struct svc_pool {
unsigned int sp_nrthreads; /* # of threads in pool */
struct list_head sp_all_threads; /* all server threads */
int sp_nwaking; /* number of threads woken but not yet active */
+ struct svc_pool_stats sp_stats; /* statistics on pool operation */
} ____cacheline_aligned_in_smp;

/*
@@ -396,6 +406,7 @@ struct svc_serv * svc_create_pooled(str
sa_family_t, void (*shutdown)(struct svc_serv *),
svc_thread_fn, struct module *);
int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
+int svc_pool_stats_open(struct svc_serv *serv, struct file *file);
void svc_destroy(struct svc_serv *);
int svc_process(struct svc_rqst *);
int svc_register(const struct svc_serv *, const unsigned short,
Index: bfields/net/sunrpc/svc_xprt.c
===================================================================
--- bfields.orig/net/sunrpc/svc_xprt.c
+++ bfields/net/sunrpc/svc_xprt.c
@@ -318,6 +318,8 @@ void svc_xprt_enqueue(struct svc_xprt *x
goto out_unlock;
}

+ pool->sp_stats.packets++;
+
/* Mark transport as busy. It will remain in this state until
* the provider calls svc_xprt_received. We update XPT_BUSY
* atomically because it also guards against trying to enqueue
@@ -355,6 +357,7 @@ void svc_xprt_enqueue(struct svc_xprt *x
if (pool->sp_nwaking >= SVC_MAX_WAKING) {
/* too many threads are runnable and trying to wake up */
thread_avail = 0;
+ pool->sp_stats.overloads_avoided++;
}

if (thread_avail) {
@@ -374,11 +377,13 @@ void svc_xprt_enqueue(struct svc_xprt *x
atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
rqstp->rq_waking = 1;
pool->sp_nwaking++;
+ pool->sp_stats.threads_woken++;
BUG_ON(xprt->xpt_pool != pool);
wake_up(&rqstp->rq_wait);
} else {
dprintk("svc: transport %p put into queue\n", xprt);
list_add_tail(&xprt->xpt_ready, &pool->sp_sockets);
+ pool->sp_stats.sockets_queued++;
BUG_ON(xprt->xpt_pool != pool);
}

@@ -591,6 +596,7 @@ int svc_recv(struct svc_rqst *rqstp, lon
int pages;
struct xdr_buf *arg;
DECLARE_WAITQUEUE(wait, current);
+ long time_left;

dprintk("svc: server %p waiting for data (to = %ld)\n",
rqstp, timeout);
@@ -676,12 +682,14 @@ int svc_recv(struct svc_rqst *rqstp, lon
add_wait_queue(&rqstp->rq_wait, &wait);
spin_unlock_bh(&pool->sp_lock);

- schedule_timeout(timeout);
+ time_left = schedule_timeout(timeout);

try_to_freeze();

spin_lock_bh(&pool->sp_lock);
remove_wait_queue(&rqstp->rq_wait, &wait);
+ if (!time_left)
+ pool->sp_stats.threads_timedout++;

xprt = rqstp->rq_xprt;
if (!xprt) {
@@ -1103,3 +1111,93 @@ int svc_xprt_names(struct svc_serv *serv
return totlen;
}
EXPORT_SYMBOL_GPL(svc_xprt_names);
+
+
+/*----------------------------------------------------------------------------*/
+
+static void *svc_pool_stats_start(struct seq_file *m, loff_t *pos)
+{
+ unsigned int pidx = (unsigned int)*pos;
+ struct svc_serv *serv = m->private;
+
+ dprintk("svc_pool_stats_start, *pidx=%u\n", pidx);
+
+ lock_kernel();
+ /* bump up the pseudo refcount while traversing */
+ svc_get(serv);
+ unlock_kernel();
+
+ if (!pidx)
+ return SEQ_START_TOKEN;
+ return (pidx > serv->sv_nrpools ? NULL : &serv->sv_pools[pidx-1]);
+}
+
+static void *svc_pool_stats_next(struct seq_file *m, void *p, loff_t *pos)
+{
+ struct svc_pool *pool = p;
+ struct svc_serv *serv = m->private;
+
+ dprintk("svc_pool_stats_next, *pos=%llu\n", *pos);
+
+ if (p == SEQ_START_TOKEN) {
+ pool = &serv->sv_pools[0];
+ } else {
+ unsigned int pidx = (pool - &serv->sv_pools[0]);
+ if (pidx < serv->sv_nrpools-1)
+ pool = &serv->sv_pools[pidx+1];
+ else
+ pool = NULL;
+ }
+ ++*pos;
+ return pool;
+}
+
+static void svc_pool_stats_stop(struct seq_file *m, void *p)
+{
+ struct svc_serv *serv = m->private;
+
+ lock_kernel();
+ /* this function really, really should have been called svc_put() */
+ svc_destroy(serv);
+ unlock_kernel();
+}
+
+static int svc_pool_stats_show(struct seq_file *m, void *p)
+{
+ struct svc_pool *pool = p;
+
+ if (p == SEQ_START_TOKEN) {
+ seq_puts(m, "# pool packets-arrived sockets-enqueued threads-woken overloads-avoided threads-timedout\n");
+ return 0;
+ }
+
+ seq_printf(m, "%u %lu %lu %lu %lu %lu\n",
+ pool->sp_id,
+ pool->sp_stats.packets,
+ pool->sp_stats.sockets_queued,
+ pool->sp_stats.threads_woken,
+ pool->sp_stats.overloads_avoided,
+ pool->sp_stats.threads_timedout);
+
+ return 0;
+}
+
+static const struct seq_operations svc_pool_stats_seq_ops = {
+ .start = svc_pool_stats_start,
+ .next = svc_pool_stats_next,
+ .stop = svc_pool_stats_stop,
+ .show = svc_pool_stats_show,
+};
+
+int svc_pool_stats_open(struct svc_serv *serv, struct file *file)
+{
+ int err;
+
+ err = seq_open(file, &svc_pool_stats_seq_ops);
+ if (!err)
+ ((struct seq_file *) file->private_data)->private = serv;
+ return err;
+}
+EXPORT_SYMBOL(svc_pool_stats_open);
+
+/*----------------------------------------------------------------------------*/

--
--
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.


2009-02-19 06:46:03

by Greg Banks

[permalink] [raw]
Subject: Re: [patch 3/3] knfsd: add file to export stats about nfsd pools

J. Bruce Fields wrote:
> On Tue, Jan 13, 2009 at 09:26:36PM +1100, Greg Banks wrote:
>
>> Add /proc/fs/nfsd/pool_stats to export to userspace various
>> statistics about the operation of rpc server thread pools.
>>
>
> Could you explainw hy these specific statistics (total packets,
> sockets_queued, threads_woken, overloads_avoided, threads_timedout) are
> the important ones to capture? Could you give examples of what sort of
> problems could be solved using them?
>
Actually I originally added these stats to help debug the
overload-avoiding patch. Then I thought to use them to drive a
userspace control loop for controlling the number of nfsds, which I
never finished writing.
> As you said, an important question for the sysadmin is "should I
> configure more nfsds?" How do they answer that?
>
You can work that out, but it's not obvious, i.e. not human-friendly.
Firstly, you need to rate convert all the stats.

The total_packets stat tells you how many NFS packets are arriving on
each thread pool. This is your primary load metric, i.e. with more load
you want more nfsd threads.

The sockets_queued stat tells you that calls are arriving which are not
being immediately serviced by threads, i.e. you're either thread-limited
or CPU-limited rather than network-limited and you might get better
throughput if there were more nfsd threads.

Conversely the overloads_avoided stat tells you if there are more
threads than can usefully be made runnable on the available CPUs, so
that adding more nfsd threads is unlikely to be helpful.

The threads_timedout stat will give you a first-level approximation of
whether there are threads that are completely idle, i.e. don't see any
calls for the svc_recv() timeout (which I reduced to IIRC 10 sec as part
of the original version of this patch). This is a clue that you can now
reduce the number of threads.

--
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.


2009-02-19 07:07:44

by Greg Banks

[permalink] [raw]
Subject: Re: [patch 3/3] knfsd: add file to export stats about nfsd pools

Kevin Constantine wrote:
> On 02/12/09 09:11, J. Bruce Fields wrote:
>>
>>
>> As you said, an important question for the sysadmin is "should I
>> configure more nfsds?" How do they answer that?
>>
>
> I typically use the "th" line to determine whether to add more threads
> or not by looking at the distribution of values across the histogram.
> If things are weighted more towards the 90-100% group, I'll add more
> threads and watch the traffic patterns.
That seems to have been more or less the idea behind the histogram.
However, to get any useful indication you first need to rate-convert the
histogram. The values are counters of jiffies which wrap every million
seconds (11.6 days) and there's no way to zero them.

Also, at a call rate above 1/jiffy (a trivially low NFS load) most of
the updates are adding 0 to the histogram. Every now and again a call
will cross a jiffy boundary and actually add 1 to the histogram. If
your load is high but not steady, the number of threads busy when that
happens is unpredicable. In other words there's a sampling effect which
could in the worst case make the values in the histogram be more or less
completely random.

>
> Usually, the question of how many to add is answered by trial and error.
>
> echo 32 > /proc/fs/nfsd/threads
> Did that improve my throughput? yes?
> echo 128 > /proc/fs/nfsd/threads
> Did that improve my throughput? no it actually decreased.
> rinse... repeat...

This procedure is what I would recommend admins do today, assuming their
load is steady or reproducable. It's not only simple, but it uses a
real-world performance measure, which is "does my application go
faster?". There's no point adding nfsds if you're limited by the
application, or some filesystem effect that causes the same load to go
slower. It is of course rather tedious :-)

--
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.


2009-02-12 17:12:19

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 3/3] knfsd: add file to export stats about nfsd pools

On Tue, Jan 13, 2009 at 09:26:36PM +1100, Greg Banks wrote:
> Add /proc/fs/nfsd/pool_stats to export to userspace various
> statistics about the operation of rpc server thread pools.

Could you explainw hy these specific statistics (total packets,
sockets_queued, threads_woken, overloads_avoided, threads_timedout) are
the important ones to capture? Could you give examples of what sort of
problems could be solved using them?

As you said, an important question for the sysadmin is "should I
configure more nfsds?" How do they answer that?

--b.

> This patch is based on a forward-ported version of
> knfsd-add-pool-thread-stats which has been shipping in the SGI
> "Enhanced NFS" product since 2006 and which was previously
> posted:

>
> http://article.gmane.org/gmane.linux.nfs/10375
>
> It has also been updated thus:
>
> * moved EXPORT_SYMBOL() to near the function it exports
> * made the new struct struct seq_operations const
> * used SEQ_START_TOKEN instead of ((void *)1)
> * merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
> printk in svc_pool_stats_*()" by Harshula Jayasuriya.
> * merged fix from SGI PV 964001 "Crash reading pool_stats before
> nfsds are started".
>
> Signed-off-by: Greg Banks <[email protected]>
> Signed-off-by: Harshula Jayasuriya <[email protected]>
> ---
>
> fs/nfsd/nfsctl.c | 12 ++++
> fs/nfsd/nfssvc.c | 7 ++
> include/linux/sunrpc/svc.h | 11 +++
> net/sunrpc/svc_xprt.c | 100 +++++++++++++++++++++++++++++++++-
> 4 files changed, 129 insertions(+), 1 deletion(-)
>
> Index: bfields/fs/nfsd/nfsctl.c
> ===================================================================
> --- bfields.orig/fs/nfsd/nfsctl.c
> +++ bfields/fs/nfsd/nfsctl.c
> @@ -60,6 +60,7 @@ enum {
> NFSD_FO_UnlockFS,
> NFSD_Threads,
> NFSD_Pool_Threads,
> + NFSD_Pool_Stats,
> NFSD_Versions,
> NFSD_Ports,
> NFSD_MaxBlkSize,
> @@ -172,6 +173,16 @@ static const struct file_operations expo
> .owner = THIS_MODULE,
> };
>
> +extern int nfsd_pool_stats_open(struct inode *inode, struct file *file);
> +
> +static struct file_operations pool_stats_operations = {
> + .open = nfsd_pool_stats_open,
> + .read = seq_read,
> + .llseek = seq_lseek,
> + .release = seq_release,
> + .owner = THIS_MODULE,
> +};
> +
> /*----------------------------------------------------------------------------*/
> /*
> * payload - write methods
> @@ -1246,6 +1257,7 @@ static int nfsd_fill_super(struct super_
> [NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
> [NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
> [NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
> + [NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO},
> [NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
> [NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
> [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
> Index: bfields/fs/nfsd/nfssvc.c
> ===================================================================
> --- bfields.orig/fs/nfsd/nfssvc.c
> +++ bfields/fs/nfsd/nfssvc.c
> @@ -546,3 +546,10 @@ nfsd_dispatch(struct svc_rqst *rqstp, __
> nfsd_cache_update(rqstp, proc->pc_cachetype, statp + 1);
> return 1;
> }
> +
> +int nfsd_pool_stats_open(struct inode *inode, struct file *file)
> +{
> + if (nfsd_serv == NULL)
> + return -ENODEV;
> + return svc_pool_stats_open(nfsd_serv, file);
> +}
> Index: bfields/include/linux/sunrpc/svc.h
> ===================================================================
> --- bfields.orig/include/linux/sunrpc/svc.h
> +++ bfields/include/linux/sunrpc/svc.h
> @@ -24,6 +24,15 @@
> */
> typedef int (*svc_thread_fn)(void *);
>
> +/* statistics for svc_pool structures */
> +struct svc_pool_stats {
> + unsigned long packets;
> + unsigned long sockets_queued;
> + unsigned long threads_woken;
> + unsigned long overloads_avoided;
> + unsigned long threads_timedout;
> +};
> +
> /*
> *
> * RPC service thread pool.
> @@ -42,6 +51,7 @@ struct svc_pool {
> unsigned int sp_nrthreads; /* # of threads in pool */
> struct list_head sp_all_threads; /* all server threads */
> int sp_nwaking; /* number of threads woken but not yet active */
> + struct svc_pool_stats sp_stats; /* statistics on pool operation */
> } ____cacheline_aligned_in_smp;
>
> /*
> @@ -396,6 +406,7 @@ struct svc_serv * svc_create_pooled(str
> sa_family_t, void (*shutdown)(struct svc_serv *),
> svc_thread_fn, struct module *);
> int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
> +int svc_pool_stats_open(struct svc_serv *serv, struct file *file);
> void svc_destroy(struct svc_serv *);
> int svc_process(struct svc_rqst *);
> int svc_register(const struct svc_serv *, const unsigned short,
> Index: bfields/net/sunrpc/svc_xprt.c
> ===================================================================
> --- bfields.orig/net/sunrpc/svc_xprt.c
> +++ bfields/net/sunrpc/svc_xprt.c
> @@ -318,6 +318,8 @@ void svc_xprt_enqueue(struct svc_xprt *x
> goto out_unlock;
> }
>
> + pool->sp_stats.packets++;
> +
> /* Mark transport as busy. It will remain in this state until
> * the provider calls svc_xprt_received. We update XPT_BUSY
> * atomically because it also guards against trying to enqueue
> @@ -355,6 +357,7 @@ void svc_xprt_enqueue(struct svc_xprt *x
> if (pool->sp_nwaking >= SVC_MAX_WAKING) {
> /* too many threads are runnable and trying to wake up */
> thread_avail = 0;
> + pool->sp_stats.overloads_avoided++;
> }
>
> if (thread_avail) {
> @@ -374,11 +377,13 @@ void svc_xprt_enqueue(struct svc_xprt *x
> atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
> rqstp->rq_waking = 1;
> pool->sp_nwaking++;
> + pool->sp_stats.threads_woken++;
> BUG_ON(xprt->xpt_pool != pool);
> wake_up(&rqstp->rq_wait);
> } else {
> dprintk("svc: transport %p put into queue\n", xprt);
> list_add_tail(&xprt->xpt_ready, &pool->sp_sockets);
> + pool->sp_stats.sockets_queued++;
> BUG_ON(xprt->xpt_pool != pool);
> }
>
> @@ -591,6 +596,7 @@ int svc_recv(struct svc_rqst *rqstp, lon
> int pages;
> struct xdr_buf *arg;
> DECLARE_WAITQUEUE(wait, current);
> + long time_left;
>
> dprintk("svc: server %p waiting for data (to = %ld)\n",
> rqstp, timeout);
> @@ -676,12 +682,14 @@ int svc_recv(struct svc_rqst *rqstp, lon
> add_wait_queue(&rqstp->rq_wait, &wait);
> spin_unlock_bh(&pool->sp_lock);
>
> - schedule_timeout(timeout);
> + time_left = schedule_timeout(timeout);
>
> try_to_freeze();
>
> spin_lock_bh(&pool->sp_lock);
> remove_wait_queue(&rqstp->rq_wait, &wait);
> + if (!time_left)
> + pool->sp_stats.threads_timedout++;
>
> xprt = rqstp->rq_xprt;
> if (!xprt) {
> @@ -1103,3 +1111,93 @@ int svc_xprt_names(struct svc_serv *serv
> return totlen;
> }
> EXPORT_SYMBOL_GPL(svc_xprt_names);
> +
> +
> +/*----------------------------------------------------------------------------*/
> +
> +static void *svc_pool_stats_start(struct seq_file *m, loff_t *pos)
> +{
> + unsigned int pidx = (unsigned int)*pos;
> + struct svc_serv *serv = m->private;
> +
> + dprintk("svc_pool_stats_start, *pidx=%u\n", pidx);
> +
> + lock_kernel();
> + /* bump up the pseudo refcount while traversing */
> + svc_get(serv);
> + unlock_kernel();
> +
> + if (!pidx)
> + return SEQ_START_TOKEN;
> + return (pidx > serv->sv_nrpools ? NULL : &serv->sv_pools[pidx-1]);
> +}
> +
> +static void *svc_pool_stats_next(struct seq_file *m, void *p, loff_t *pos)
> +{
> + struct svc_pool *pool = p;
> + struct svc_serv *serv = m->private;
> +
> + dprintk("svc_pool_stats_next, *pos=%llu\n", *pos);
> +
> + if (p == SEQ_START_TOKEN) {
> + pool = &serv->sv_pools[0];
> + } else {
> + unsigned int pidx = (pool - &serv->sv_pools[0]);
> + if (pidx < serv->sv_nrpools-1)
> + pool = &serv->sv_pools[pidx+1];
> + else
> + pool = NULL;
> + }
> + ++*pos;
> + return pool;
> +}
> +
> +static void svc_pool_stats_stop(struct seq_file *m, void *p)
> +{
> + struct svc_serv *serv = m->private;
> +
> + lock_kernel();
> + /* this function really, really should have been called svc_put() */
> + svc_destroy(serv);
> + unlock_kernel();
> +}
> +
> +static int svc_pool_stats_show(struct seq_file *m, void *p)
> +{
> + struct svc_pool *pool = p;
> +
> + if (p == SEQ_START_TOKEN) {
> + seq_puts(m, "# pool packets-arrived sockets-enqueued threads-woken overloads-avoided threads-timedout\n");
> + return 0;
> + }
> +
> + seq_printf(m, "%u %lu %lu %lu %lu %lu\n",
> + pool->sp_id,
> + pool->sp_stats.packets,
> + pool->sp_stats.sockets_queued,
> + pool->sp_stats.threads_woken,
> + pool->sp_stats.overloads_avoided,
> + pool->sp_stats.threads_timedout);
> +
> + return 0;
> +}
> +
> +static const struct seq_operations svc_pool_stats_seq_ops = {
> + .start = svc_pool_stats_start,
> + .next = svc_pool_stats_next,
> + .stop = svc_pool_stats_stop,
> + .show = svc_pool_stats_show,
> +};
> +
> +int svc_pool_stats_open(struct svc_serv *serv, struct file *file)
> +{
> + int err;
> +
> + err = seq_open(file, &svc_pool_stats_seq_ops);
> + if (!err)
> + ((struct seq_file *) file->private_data)->private = serv;
> + return err;
> +}
> +EXPORT_SYMBOL(svc_pool_stats_open);
> +
> +/*----------------------------------------------------------------------------*/
>
> --
> --
> Greg Banks, P.Engineer, SGI Australian Software Group.
> the brightly coloured sporks of revolution.
> I don't speak for SGI.

2009-02-13 02:02:51

by Kevin Constantine

[permalink] [raw]
Subject: Re: [patch 3/3] knfsd: add file to export stats about nfsd pools

On 02/12/09 09:11, J. Bruce Fields wrote:
> On Tue, Jan 13, 2009 at 09:26:36PM +1100, Greg Banks wrote:
>> Add /proc/fs/nfsd/pool_stats to export to userspace various
>> statistics about the operation of rpc server thread pools.
>
> Could you explainw hy these specific statistics (total packets,
> sockets_queued, threads_woken, overloads_avoided, threads_timedout) are
> the important ones to capture? Could you give examples of what sort of
> problems could be solved using them?
>
> As you said, an important question for the sysadmin is "should I
> configure more nfsds?" How do they answer that?
>

I typically use the "th" line to determine whether to add more threads
or not by looking at the distribution of values across the histogram.
If things are weighted more towards the 90-100% group, I'll add more
threads and watch the traffic patterns.

Usually, the question of how many to add is answered by trial and error.

echo 32 > /proc/fs/nfsd/threads
Did that improve my throughput? yes?
echo 128 > /proc/fs/nfsd/threads
Did that improve my throughput? no it actually decreased.
rinse... repeat...

> --b.
>
>> This patch is based on a forward-ported version of
>> knfsd-add-pool-thread-stats which has been shipping in the SGI
>> "Enhanced NFS" product since 2006 and which was previously
>> posted:
>
>> http://article.gmane.org/gmane.linux.nfs/10375
>>
>> It has also been updated thus:
>>
>> * moved EXPORT_SYMBOL() to near the function it exports
>> * made the new struct struct seq_operations const
>> * used SEQ_START_TOKEN instead of ((void *)1)
>> * merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
>> printk in svc_pool_stats_*()" by Harshula Jayasuriya.
>> * merged fix from SGI PV 964001 "Crash reading pool_stats before
>> nfsds are started".
>>
>> Signed-off-by: Greg Banks <[email protected]>
>> Signed-off-by: Harshula Jayasuriya <[email protected]>
>> ---
>>
>> fs/nfsd/nfsctl.c | 12 ++++
>> fs/nfsd/nfssvc.c | 7 ++
>> include/linux/sunrpc/svc.h | 11 +++
>> net/sunrpc/svc_xprt.c | 100 +++++++++++++++++++++++++++++++++-
>> 4 files changed, 129 insertions(+), 1 deletion(-)
>>
>> Index: bfields/fs/nfsd/nfsctl.c
>> ===================================================================
>> --- bfields.orig/fs/nfsd/nfsctl.c
>> +++ bfields/fs/nfsd/nfsctl.c
>> @@ -60,6 +60,7 @@ enum {
>> NFSD_FO_UnlockFS,
>> NFSD_Threads,
>> NFSD_Pool_Threads,
>> + NFSD_Pool_Stats,
>> NFSD_Versions,
>> NFSD_Ports,
>> NFSD_MaxBlkSize,
>> @@ -172,6 +173,16 @@ static const struct file_operations expo
>> .owner = THIS_MODULE,
>> };
>>
>> +extern int nfsd_pool_stats_open(struct inode *inode, struct file *file);
>> +
>> +static struct file_operations pool_stats_operations = {
>> + .open = nfsd_pool_stats_open,
>> + .read = seq_read,
>> + .llseek = seq_lseek,
>> + .release = seq_release,
>> + .owner = THIS_MODULE,
>> +};
>> +
>> /*----------------------------------------------------------------------------*/
>> /*
>> * payload - write methods
>> @@ -1246,6 +1257,7 @@ static int nfsd_fill_super(struct super_
>> [NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
>> [NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
>> [NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
>> + [NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO},
>> [NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
>> [NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
>> [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
>> Index: bfields/fs/nfsd/nfssvc.c
>> ===================================================================
>> --- bfields.orig/fs/nfsd/nfssvc.c
>> +++ bfields/fs/nfsd/nfssvc.c
>> @@ -546,3 +546,10 @@ nfsd_dispatch(struct svc_rqst *rqstp, __
>> nfsd_cache_update(rqstp, proc->pc_cachetype, statp + 1);
>> return 1;
>> }
>> +
>> +int nfsd_pool_stats_open(struct inode *inode, struct file *file)
>> +{
>> + if (nfsd_serv == NULL)
>> + return -ENODEV;
>> + return svc_pool_stats_open(nfsd_serv, file);
>> +}
>> Index: bfields/include/linux/sunrpc/svc.h
>> ===================================================================
>> --- bfields.orig/include/linux/sunrpc/svc.h
>> +++ bfields/include/linux/sunrpc/svc.h
>> @@ -24,6 +24,15 @@
>> */
>> typedef int (*svc_thread_fn)(void *);
>>
>> +/* statistics for svc_pool structures */
>> +struct svc_pool_stats {
>> + unsigned long packets;
>> + unsigned long sockets_queued;
>> + unsigned long threads_woken;
>> + unsigned long overloads_avoided;
>> + unsigned long threads_timedout;
>> +};
>> +
>> /*
>> *
>> * RPC service thread pool.
>> @@ -42,6 +51,7 @@ struct svc_pool {
>> unsigned int sp_nrthreads; /* # of threads in pool */
>> struct list_head sp_all_threads; /* all server threads */
>> int sp_nwaking; /* number of threads woken but not yet active */
>> + struct svc_pool_stats sp_stats; /* statistics on pool operation */
>> } ____cacheline_aligned_in_smp;
>>
>> /*
>> @@ -396,6 +406,7 @@ struct svc_serv * svc_create_pooled(str
>> sa_family_t, void (*shutdown)(struct svc_serv *),
>> svc_thread_fn, struct module *);
>> int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
>> +int svc_pool_stats_open(struct svc_serv *serv, struct file *file);
>> void svc_destroy(struct svc_serv *);
>> int svc_process(struct svc_rqst *);
>> int svc_register(const struct svc_serv *, const unsigned short,
>> Index: bfields/net/sunrpc/svc_xprt.c
>> ===================================================================
>> --- bfields.orig/net/sunrpc/svc_xprt.c
>> +++ bfields/net/sunrpc/svc_xprt.c
>> @@ -318,6 +318,8 @@ void svc_xprt_enqueue(struct svc_xprt *x
>> goto out_unlock;
>> }
>>
>> + pool->sp_stats.packets++;
>> +
>> /* Mark transport as busy. It will remain in this state until
>> * the provider calls svc_xprt_received. We update XPT_BUSY
>> * atomically because it also guards against trying to enqueue
>> @@ -355,6 +357,7 @@ void svc_xprt_enqueue(struct svc_xprt *x
>> if (pool->sp_nwaking >= SVC_MAX_WAKING) {
>> /* too many threads are runnable and trying to wake up */
>> thread_avail = 0;
>> + pool->sp_stats.overloads_avoided++;
>> }
>>
>> if (thread_avail) {
>> @@ -374,11 +377,13 @@ void svc_xprt_enqueue(struct svc_xprt *x
>> atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
>> rqstp->rq_waking = 1;
>> pool->sp_nwaking++;
>> + pool->sp_stats.threads_woken++;
>> BUG_ON(xprt->xpt_pool != pool);
>> wake_up(&rqstp->rq_wait);
>> } else {
>> dprintk("svc: transport %p put into queue\n", xprt);
>> list_add_tail(&xprt->xpt_ready, &pool->sp_sockets);
>> + pool->sp_stats.sockets_queued++;
>> BUG_ON(xprt->xpt_pool != pool);
>> }
>>
>> @@ -591,6 +596,7 @@ int svc_recv(struct svc_rqst *rqstp, lon
>> int pages;
>> struct xdr_buf *arg;
>> DECLARE_WAITQUEUE(wait, current);
>> + long time_left;
>>
>> dprintk("svc: server %p waiting for data (to = %ld)\n",
>> rqstp, timeout);
>> @@ -676,12 +682,14 @@ int svc_recv(struct svc_rqst *rqstp, lon
>> add_wait_queue(&rqstp->rq_wait, &wait);
>> spin_unlock_bh(&pool->sp_lock);
>>
>> - schedule_timeout(timeout);
>> + time_left = schedule_timeout(timeout);
>>
>> try_to_freeze();
>>
>> spin_lock_bh(&pool->sp_lock);
>> remove_wait_queue(&rqstp->rq_wait, &wait);
>> + if (!time_left)
>> + pool->sp_stats.threads_timedout++;
>>
>> xprt = rqstp->rq_xprt;
>> if (!xprt) {
>> @@ -1103,3 +1111,93 @@ int svc_xprt_names(struct svc_serv *serv
>> return totlen;
>> }
>> EXPORT_SYMBOL_GPL(svc_xprt_names);
>> +
>> +
>> +/*----------------------------------------------------------------------------*/
>> +
>> +static void *svc_pool_stats_start(struct seq_file *m, loff_t *pos)
>> +{
>> + unsigned int pidx = (unsigned int)*pos;
>> + struct svc_serv *serv = m->private;
>> +
>> + dprintk("svc_pool_stats_start, *pidx=%u\n", pidx);
>> +
>> + lock_kernel();
>> + /* bump up the pseudo refcount while traversing */
>> + svc_get(serv);
>> + unlock_kernel();
>> +
>> + if (!pidx)
>> + return SEQ_START_TOKEN;
>> + return (pidx > serv->sv_nrpools ? NULL : &serv->sv_pools[pidx-1]);
>> +}
>> +
>> +static void *svc_pool_stats_next(struct seq_file *m, void *p, loff_t *pos)
>> +{
>> + struct svc_pool *pool = p;
>> + struct svc_serv *serv = m->private;
>> +
>> + dprintk("svc_pool_stats_next, *pos=%llu\n", *pos);
>> +
>> + if (p == SEQ_START_TOKEN) {
>> + pool = &serv->sv_pools[0];
>> + } else {
>> + unsigned int pidx = (pool - &serv->sv_pools[0]);
>> + if (pidx < serv->sv_nrpools-1)
>> + pool = &serv->sv_pools[pidx+1];
>> + else
>> + pool = NULL;
>> + }
>> + ++*pos;
>> + return pool;
>> +}
>> +
>> +static void svc_pool_stats_stop(struct seq_file *m, void *p)
>> +{
>> + struct svc_serv *serv = m->private;
>> +
>> + lock_kernel();
>> + /* this function really, really should have been called svc_put() */
>> + svc_destroy(serv);
>> + unlock_kernel();
>> +}
>> +
>> +static int svc_pool_stats_show(struct seq_file *m, void *p)
>> +{
>> + struct svc_pool *pool = p;
>> +
>> + if (p == SEQ_START_TOKEN) {
>> + seq_puts(m, "# pool packets-arrived sockets-enqueued threads-woken overloads-avoided threads-timedout\n");
>> + return 0;
>> + }
>> +
>> + seq_printf(m, "%u %lu %lu %lu %lu %lu\n",
>> + pool->sp_id,
>> + pool->sp_stats.packets,
>> + pool->sp_stats.sockets_queued,
>> + pool->sp_stats.threads_woken,
>> + pool->sp_stats.overloads_avoided,
>> + pool->sp_stats.threads_timedout);
>> +
>> + return 0;
>> +}
>> +
>> +static const struct seq_operations svc_pool_stats_seq_ops = {
>> + .start = svc_pool_stats_start,
>> + .next = svc_pool_stats_next,
>> + .stop = svc_pool_stats_stop,
>> + .show = svc_pool_stats_show,
>> +};
>> +
>> +int svc_pool_stats_open(struct svc_serv *serv, struct file *file)
>> +{
>> + int err;
>> +
>> + err = seq_open(file, &svc_pool_stats_seq_ops);
>> + if (!err)
>> + ((struct seq_file *) file->private_data)->private = serv;
>> + return err;
>> +}
>> +EXPORT_SYMBOL(svc_pool_stats_open);
>> +
>> +/*----------------------------------------------------------------------------*/
>>
>> --
>> --
>> Greg Banks, P.Engineer, SGI Australian Software Group.
>> the brightly coloured sporks of revolution.
>> I don't speak for SGI.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-03-15 21:25:35

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 3/3] knfsd: add file to export stats about nfsd pools

On Thu, Feb 19, 2009 at 05:42:49PM +1100, Greg Banks wrote:
> J. Bruce Fields wrote:
> > On Tue, Jan 13, 2009 at 09:26:36PM +1100, Greg Banks wrote:
> >
> >> Add /proc/fs/nfsd/pool_stats to export to userspace various
> >> statistics about the operation of rpc server thread pools.
> >>
> >
> > Could you explainw hy these specific statistics (total packets,
> > sockets_queued, threads_woken, overloads_avoided, threads_timedout) are
> > the important ones to capture? Could you give examples of what sort of
> > problems could be solved using them?
> >
> Actually I originally added these stats to help debug the
> overload-avoiding patch. Then I thought to use them to drive a
> userspace control loop for controlling the number of nfsds, which I
> never finished writing.
> > As you said, an important question for the sysadmin is "should I
> > configure more nfsds?" How do they answer that?
> >
> You can work that out, but it's not obvious, i.e. not human-friendly.
> Firstly, you need to rate convert all the stats.
>
> The total_packets stat tells you how many NFS packets are arriving on
> each thread pool. This is your primary load metric, i.e. with more load
> you want more nfsd threads.
>
> The sockets_queued stat tells you that calls are arriving which are not
> being immediately serviced by threads, i.e. you're either thread-limited
> or CPU-limited rather than network-limited and you might get better
> throughput if there were more nfsd threads.
>
> Conversely the overloads_avoided stat tells you if there are more
> threads than can usefully be made runnable on the available CPUs, so
> that adding more nfsd threads is unlikely to be helpful.
>
> The threads_timedout stat will give you a first-level approximation of
> whether there are threads that are completely idle, i.e. don't see any
> calls for the svc_recv() timeout (which I reduced to IIRC 10 sec as part
> of the original version of this patch). This is a clue that you can now
> reduce the number of threads.

OK, thanks, that makes sense, but: it would be really fantastic if we
could update and/or replace some of the howto's and faq's that are out
there. The above would be useful, as would be some of the material from
your nfs-tuning presentation.

Queued up this and the old-stats removal for 2.6.30.

--b.

2009-03-16 03:16:20

by Greg Banks

[permalink] [raw]
Subject: Re: [patch 3/3] knfsd: add file to export stats about nfsd pools

J. Bruce Fields wrote:
> On Thu, Feb 19, 2009 at 05:42:49PM +1100, Greg Banks wrote:
>
>> J. Bruce Fields wrote:
>>
>>> On Tue, Jan 13, 2009 at 09:26:36PM +1100, Greg Banks wrote:
>>>
>>>
>>> As you said, an important question for the sysadmin is "should I
>>> configure more nfsds?" How do they answer that?
>>>
>>>
>> You can work that out, [...]
>>
>
> OK, thanks, that makes sense, but: it would be really fantastic if we
> could update and/or replace some of the howto's and faq's that are out
> there.

Fair enough.

After I composed the above reply, I wondered whether the right thing to
do would have been to add that text to a new file somewhere in
Documentation/. Would that suit as an interim measure while I look at
howtos and faqs? I mention this because I'll be adding more new stats
in the next tranche of patches.
>
> Queued up this and the old-stats removal for 2.6.30.
>

Thanks.

--
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.


2009-03-16 13:37:33

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 3/3] knfsd: add file to export stats about nfsd pools

On Mon, Mar 16, 2009 at 02:21:02PM +1100, Greg Banks wrote:
> J. Bruce Fields wrote:
> > On Thu, Feb 19, 2009 at 05:42:49PM +1100, Greg Banks wrote:
> >
> >> J. Bruce Fields wrote:
> >>
> >>> On Tue, Jan 13, 2009 at 09:26:36PM +1100, Greg Banks wrote:
> >>>
> >>>
> >>> As you said, an important question for the sysadmin is "should I
> >>> configure more nfsds?" How do they answer that?
> >>>
> >>>
> >> You can work that out, [...]
> >>
> >
> > OK, thanks, that makes sense, but: it would be really fantastic if we
> > could update and/or replace some of the howto's and faq's that are out
> > there.
>
> Fair enough.
>
> After I composed the above reply, I wondered whether the right thing to
> do would have been to add that text to a new file somewhere in
> Documentation/. Would that suit as an interim measure while I look at
> howtos and faqs?

Yep, I think that's a great idea.

> I mention this because I'll be adding more new stats
> in the next tranche of patches.

OK!--b.