LinuxLists.cc - [PATCH 0 of 5] knfsd: miscellaneous performance-related fixes

2006-08-08 04:04:47

Subject: [PATCH 0 of 5] knfsd: miscellaneous performance-related fixes

G'day,

Five more patches which fix various issues found in knfsd,
mostly directly performance-related.

1 of 5 knfsd: make readahead params cache SMP-friendly
the racache is a global contention point and performance bottleneck in
some read-dominated workloads, alleviate that

2 of 5 knfsd: cache ipmap per TCP socket
for TCP we can avoid doing a hash lookup on the ip_map
cache for every RPC call

3 of 5 knfsd: avoid nfsd CPU scheduler overload
avoid enormous load averages and CPU scheduler overload
with workloads which generate extremely high call rates

4 of 5 knfsd: add RPC pool thread stats
add some statistics to the svc_pool

5 of 5 knfsd: remove nfsd threadstats
remove the 'th' stats line from /proc/net/rc/nfsd, it's
a global contention point in high call rate workloads

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-14 21:43:20

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 0 of 5] knfsd: miscellaneous performance-related fixes

On Wed, Aug 09, 2006 at 12:37:36PM +1000, Greg Banks wrote:
> On Wed, 2006-08-09 at 01:49, J. Bruce Fields wrote:
> > It'd be nice if we could avoid ripping out a working user interface that
> > someone might be using....
>
> I agree, but I don't believe a) it's working or b) anyone could
> be getting any use out of it.

OK. So the advice at

http://nfs.sourceforge.net/nfs-howto/ar01s05.html#nfsd_daemon_instances

is wrong?

> An earlier version of the patch left the data structures in place
> and just reported them as zeros in the /proc file. Would that
> be preferrable?

I'm not sure that makes any difference.

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-15 01:23:14

by Greg Banks

[permalink] [raw]

Subject: Re: [PATCH 0 of 5] knfsd: miscellaneous performance-related fixes

On Tue, 2006-08-15 at 07:43, J. Bruce Fields wrote:
> On Wed, Aug 09, 2006 at 12:37:36PM +1000, Greg Banks wrote:
> > On Wed, 2006-08-09 at 01:49, J. Bruce Fields wrote:
> > > It'd be nice if we could avoid ripping out a working user interface that
> > > someone might be using....
> >
> > I agree, but I don't believe a) it's working or b) anyone could
> > be getting any use out of it.
>
> OK. So the advice at
>
> http://nfs.sourceforge.net/nfs-howto/ar01s05.html#nfsd_daemon_instances
>
> is wrong?

Let's see.

> Most startup scripts, Linux and otherwise, start 8 instances of nfsd.
> In the early days of NFS, Sun decided on this number as a rule of
> thumb, and everyone else copied. There are no good measures of how
> many instances are optimal, but a more heavily-trafficked server may
> require more.

Correct.

> You should use at the very least one daemon per processor, but four to
> eight per processor may be a better rule of thumb.

Wrong. This rule might work up to 4 or 8 cpus, but only by
coincidence. Try running knfsd on a 512 cpu machine; you don't
need anything like 512 to 4096 nfsd threads. A better rule of
thumb would be 1 to 4 nfsds per simultaneously active client.
Of course that number is a lot harder to measure with the server
as it stands today.

> If you are using a 2.4 or higher kernel and you want to see how
> heavily each nfsd thread is being used, you can look at the file
> /proc/net/rpc/nfsd. The last ten numbers on the th line in that file
> indicate the number of seconds that the thread usage was at that
> percentage of the maximum allowable. If you have a large number in the
> top three deciles, you may wish to increase the number of nfsd
> instances.

This is true, except that

1. the numbers are undercounted (the mechanism tends to err
towards incrementing a lower bucket), and

2. the numbers are never reset and are scaled to the number of
nfsd daemons, so to tell whether your change in the number
of nfsds was helpful you need to reload the nfsd module or
reboot.

> This is done upon starting nfsd using the number of instances as the
> command line option,

Or by echoing a number into /proc/fs/nfsd/threads.

> and is specified in the NFS startup script (/etc/rc.d/init.d/nfs on
> Red Hat) as RPCNFSDCOUNT. See the nfsd(8) man page for more
> information.

On SUSE the file is /etc/sysconfig/nfs and the variable is
USE_KERNEL_NFSD_NUMBER.

> > An earlier version of the patch left the data structures in place
> > and just reported them as zeros in the /proc file. Would that
> > be preferrable?
>
> I'm not sure that makes any difference.

Fair enough.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-08 08:01:32

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH 0 of 5] knfsd: miscellaneous performance-related fixes

On Tuesday August 8, [email protected] wrote:
> G'day,
>
> Five more patches which fix various issues found in knfsd,
> mostly directly performance-related.

Thanks. I'll have a bit more of a look in a day or two, but just
quickly:

>
> 1 of 5 knfsd: make readahead params cache SMP-friendly
> the racache is a global contention point and performance
> bottleneck in some read-dominated workloads, alleviate that
>

Looks good, though might become irrelevant if the 'new' readahead code
goes ahead (the racache can just be discarded).

> 2 of 5 knfsd: cache ipmap per TCP socket
> for TCP we can avoid doing a hash lookup on the ip_map
> cache for every RPC call

Looks sane, but I'm wondering why:

+static inline int cache_valid(struct cache_head *h)
+{
+ return (h->expiry_time != 0 && test_bit(CACHE_VALID, &h->flags));
+}
+

we need to test expiry_time here. Is not the test on CACHE_VALID
enough?

>
> 3 of 5 knfsd: avoid nfsd CPU scheduler overload
> avoid enormous load averages and CPU scheduler overload
> with workloads which generate extremely high call rates

Not sure I quite understand all the implications of this yet.
There would have to be lots of active sockets because each socket can
only be waking one thread. But that is entirely possible with TCP.
So a request comes in on a TCP socket and we decide not to queue it to
a process just yet, so it stays on the pool queue.
That means it doesn't get handled until some process in the queue
finishes it's request.
These seems to suggest quite some room for unfairness on request
handling.
That may not be a problem, but I'm not sure...
Maybe when one thread wakes up it should kick the next one to wake
up???
Or am I missing something.

>
> 4 of 5 knfsd: add RPC pool thread stats
> add some statistics to the svc_pool

Could we get a paragraph or 3 explaining how to interpret these stats?
And we have to do something about svc_put. With this patch as it
stands, if you:
write some sockets to 'portlist' (with no threads running)
look at the pool stats
try to start thread
the ports will have been forgotten because looking at the pool
stats destroyed the ports...

>
> 5 of 5 knfsd: remove nfsd threadstats
> remove the 'th' stats line from /proc/net/rc/nfsd, it's
> a global contention point in high call rate workloads
>

Yes..... I can appreciate all the problems that you identify.
But before removing this I would like to have something to replace it.
Some that gives some sort of vague idea about whether you have a
suitable number of threads or not.
Does your pool stats provide that at all?

What I would really like it auto-scaling of the number of threads to
match the load. I wonder if that is a good idea?

NeilBrown

> Greg.
> --
> Greg Banks, R&D Software Engineer, SGI Australian Software Group.
> I don't speak for SGI.
>

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-08 10:22:52

by Greg Banks

[permalink] [raw]

Subject: Re: [PATCH 0 of 5] knfsd: miscellaneous performance-related fixes

On Tue, 2006-08-08 at 18:01, Neil Brown wrote:
> On Tuesday August 8, [email protected] wrote:
> > G'day,
> >
> > Five more patches which fix various issues found in knfsd,
> > mostly directly performance-related.
>
> Thanks. I'll have a bit more of a look in a day or two,

Suddenly we've all got a lot of reading to do ;-)

> but just
> quickly:
>
> >
> > 1 of 5 knfsd: make readahead params cache SMP-friendly
> > the racache is a global contention point and performance
> > bottleneck in some read-dominated workloads, alleviate that
> >
>
> Looks good, though might become irrelevant if the 'new' readahead code
> goes ahead (the racache can just be discarded).

I haven't looked at the new readahead code. Does it no longer
store anything attached to struct file?

>
> > 2 of 5 knfsd: cache ipmap per TCP socket
> > for TCP we can avoid doing a hash lookup on the ip_map
> > cache for every RPC call
>
> Looks sane, but I'm wondering why:
>
> +static inline int cache_valid(struct cache_head *h)
> +{
> + return (h->expiry_time != 0 && test_bit(CACHE_VALID, &h->flags));
> +}
> +
>
> we need to test expiry_time here. Is not the test on CACHE_VALID
> enough?

Hmm, it seemed necessary when I wrote that line.

Looking at current cache.c code, when a cache entry is replaced
with a new one in sunrpc_cache_update(), the old one is left in
a state where CACHE_VALID is set but h->expiry_time == 0. It's
in that state while it's waiting for the last ref to be dropped.

Before the patch, that state was transient and we would soon after
drop that reference. After the patch we keep a ref alive for a
long time attached to the svc_sock, so if the ip_map enters that
state we need to drop that ref and lookup again. That was why I
needed to check for both the VALID flag and a zero expiry time.

The test case that caused the addition of that logic was a 2nd
mount from the same IP address. In the codebase before your
cache rewrite, that would cause ip_map_lookup() to replace the
existing valid ip_map as described. I haven't followed the
current logic of ip_map_lookup() to see what it does, but it
does still seem to be an issue.

> > 3 of 5 knfsd: avoid nfsd CPU scheduler overload
> > avoid enormous load averages and CPU scheduler overload
> > with workloads which generate extremely high call rates
>
> Not sure I quite understand all the implications of this yet.
> There would have to be lots of active sockets because each socket can
> only be waking one thread. But that is entirely possible with TCP.

It's real easy when you have 200 active clients on TCP. It's a
whole lot easier when you have 2000.

> So a request comes in on a TCP socket and we decide not to queue it to
> a process just yet, so it stays on the pool queue.
> That means it doesn't get handled until some process in the queue
> finishes it's request.
> These seems to suggest quite some room for unfairness on request
> handling.
> That may not be a problem, but I'm not sure...

I don't think it changes the order in which sockets get serviced,
just how many we try to do at the same time. Maybe I've missed
some subtle side effect.

> Maybe when one thread wakes up it should kick the next one to wake
> up???

?? Not sure what you mean.

We want to keep the number of threads woken to a minimum.
Ideally, a small subset (around 5) of the threads on a pool
are hot in cache and handle all the requests when the calls
are quickly satisfied from memory, and the others kick in only
when calls need to block on disk traffic.

> Or am I missing something.
>
> >
> > 4 of 5 knfsd: add RPC pool thread stats
> > add some statistics to the svc_pool
>
> Could we get a paragraph or 3 explaining how to interpret these stats?

For each pool, we see

id <poolid>
ps <packets> <socks-queued> <threads-woken> <overloads-avoided> <threads-timedout>

<poolid>
the pool id, i.e. 0..npools-1
<packets>
how many times more data arrived on an NFS socket
(more precisely, how many times svc_sock_enqueue was called)
<socks-queued>
how many times a socket was queued, because there were not
enough nfsd threads to service it
<threads-woken>
how many times an nfsd thread was woken to handle a socket
<overloads-avoided>
how many times it was necessary not to wake a thread, because
too many threads had been woken recently for the cpus in this
pool to run them
<threads-timedout>
how many times a thread timed out waiting for a socket to
need handling

> And we have to do something about svc_put. With this patch as it
> stands, if you:
> write some sockets to 'portlist' (with no threads running)
> look at the pool stats
> try to start thread
> the ports will have been forgotten because looking at the pool
> stats destroyed the ports...
>
> >
> > 5 of 5 knfsd: remove nfsd threadstats
> > remove the 'th' stats line from /proc/net/rc/nfsd, it's
> > a global contention point in high call rate workloads
> >
>
> Yes..... I can appreciate all the problems that you identify.
> But before removing this I would like to have something to replace it.
> Some that gives some sort of vague idea about whether you have a
> suitable number of threads or not.
> Does your pool stats provide that at all?

This patch, no. Probably the most useful stat in this patch is
the packets per pool, which gives you some idea of how balanced
across pools your traffic is.

I'm still working on another patch (it only recently became possible
with the ktime_t code) which replaces threadstats fully. That patch
adds nanosecond resolution busy and idle counters to each thread,
then aggregates them per-pool when you read the pool_stats file.
After export to userspace and rate conversion, these numbers tell
us directly what percentage of the nfsds are currently being used.

That patch worked on two platforms some weeks ago, but I'm not sure
it's ready for primetime. I can post it for comment if you like?

> What I would really like it auto-scaling of the number of threads to
> match the load. I wonder if that is a good idea?

Yes it is.

The original purpose of the pool_stats file was to drive a userspace
balancing daemon to do that, and I have written most of such a daemon.
The trouble is that measuring demand for nfsds from what meagre stats
it was possible to measure in a 2.6.9 kernel proved difficult. The
new pool idle counter provides a direct measure of the excess nfsd
capacity, which it will be then easy to write a control loop for.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-08 11:25:52

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH 0 of 5] knfsd: miscellaneous performance-related fixes

On Tuesday August 8, [email protected] wrote:
> On Tue, 2006-08-08 at 18:01, Neil Brown wrote:
> > On Tuesday August 8, [email protected] wrote:
> > > G'day,
> > >
> > > Five more patches which fix various issues found in knfsd,
> > > mostly directly performance-related.
> >
> > Thanks. I'll have a bit more of a look in a day or two,
>
> Suddenly we've all got a lot of reading to do ;-)
>

I've taken to reading kids fantasy novels recently. Nice and light,
yet still entertaining (e.g. The Quentaris Chronicles).

>
> I'm still working on another patch (it only recently became possible
> with the ktime_t code) which replaces threadstats fully. That patch
> adds nanosecond resolution busy and idle counters to each thread,
> then aggregates them per-pool when you read the pool_stats file.
> After export to userspace and rate conversion, these numbers tell
> us directly what percentage of the nfsds are currently being used.
>
> That patch worked on two platforms some weeks ago, but I'm not sure
> it's ready for primetime. I can post it for comment if you like?

Please!
I always feel a bit guilty complaining about a patch that has been
thoroughly tested and polished. Seeing things early removed the guilt :-)

>
> > What I would really like it auto-scaling of the number of threads to
> > match the load. I wonder if that is a good idea?
>
> Yes it is.
>
> The original purpose of the pool_stats file was to drive a userspace
> balancing daemon to do that, and I have written most of such a daemon.
> The trouble is that measuring demand for nfsds from what meagre stats
> it was possible to measure in a 2.6.9 kernel proved difficult. The
> new pool idle counter provides a direct measure of the excess nfsd
> capacity, which it will be then easy to write a control loop for.

sounds grand!

NeilBrown

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-08 15:49:28

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 0 of 5] knfsd: miscellaneous performance-related fixes

On Tue, Aug 08, 2006 at 08:22:44PM +1000, Greg Banks wrote:
> I'm still working on another patch (it only recently became possible
> with the ktime_t code) which replaces threadstats fully. That patch
> adds nanosecond resolution busy and idle counters to each thread,
> then aggregates them per-pool when you read the pool_stats file.
> After export to userspace and rate conversion, these numbers tell
> us directly what percentage of the nfsds are currently being used.

Would it be possible to use that to continue to support the existing
interface (the "th" line)?

It'd be nice if we could avoid ripping out a working user interface that
someone might be using....

--b.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-09 02:37:46

by Greg Banks

[permalink] [raw]

Subject: Re: [PATCH 0 of 5] knfsd: miscellaneous performance-related fixes

On Wed, 2006-08-09 at 01:49, J. Bruce Fields wrote:
> On Tue, Aug 08, 2006 at 08:22:44PM +1000, Greg Banks wrote:
> > I'm still working on another patch (it only recently became possible
> > with the ktime_t code) which replaces threadstats fully. That patch
> > adds nanosecond resolution busy and idle counters to each thread,
> > then aggregates them per-pool when you read the pool_stats file.
> > After export to userspace and rate conversion, these numbers tell
> > us directly what percentage of the nfsds are currently being used.
>
> Would it be possible to use that to continue to support the existing
> interface (the "th" line)?

No, the formats are really quite different.

> It'd be nice if we could avoid ripping out a working user interface that
> someone might be using....

I agree, but I don't believe a) it's working or b) anyone could
be getting any use out of it.

An earlier version of the patch left the data structures in place
and just reported them as zeros in the /proc file. Would that
be preferrable?

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-09 04:18:42

by Greg Banks

[permalink] [raw]

Subject: Re: [PATCH 0 of 5] knfsd: miscellaneous performance-related fixes

On Tue, 2006-08-08 at 21:25, Neil Brown wrote:
> On Tuesday August 8, [email protected] wrote:
> > I'm still working on another patch (it only recently became possible
> > with the ktime_t code) which replaces threadstats fully. That patch
> > adds nanosecond resolution busy and idle counters to each thread,
> > then aggregates them per-pool when you read the pool_stats file.
> > After export to userspace and rate conversion, these numbers tell
> > us directly what percentage of the nfsds are currently being used.
> >
> > That patch worked on two platforms some weeks ago, but I'm not sure
> > it's ready for primetime. I can post it for comment if you like?
>
> Please!
> I always feel a bit guilty complaining about a patch that has been
> thoroughly tested and polished. Seeing things early removed the guilt :-)

Ok, for comment only...this patch won't apply for several reasons
but hopefully you can get the gist of what I was trying to do.
--
knfsd: Maintain per-thread nanosecond resolution counters for time
nfsd threads spend idle (i.e. queued on the pool or waking up) and
busy (doing cpu work to service requests or blocking waiting for
disk I/O). Accumulate those counter per-pool on demand and export
them via /proc/fs/nfsd/pool_stats.

The expectation is that userspace will rate-convert the counters
from nanoseconds to nanosec-per-sec and then scale down to a
multiple percentage, by analogy with CPU utilisation stats. For
example, a 4 CPU machine can report:

150% user 100% sys 50% intr 100 % idle

Where the total is Ncpus * 100%. By analogy, a machine with 4 nfsd
threads can report:

300% busy 100% idle

(where the total is Nthreads * 100%) meaning that on average 3 of
the threads are doing useful NFS work and 1 is not being used.

By presenting thread usage as just two numbers, we make it easy
for a sysadmin to understand whether we have too many threads for
the currently running workload. For example, a machine with 16
threads which reports 327% busy 1273% idle has about 12 threads
more than it needs to satisfy the workload.

Also, the %idle number is potentially an ideal input for an
automatic control loop. That control loop would attempt to keep
the %idle number at some predetermined small level (I like 100%),
by adding new threads (if the %idle falls too low) or removing
threads (if %idle climbs too high).

Please consider this patch a beta release only.

Signed-off-by: Greg Banks <[email protected]>
---

fs/nfsd/stats.c | 25 ++++++++++++------
include/linux/sunrpc/svc.h | 14 ++++++++++
net/sunrpc/svc.c | 17 ++++++++++++
net/sunrpc/svcsock.c | 45 +++++++++++++++++++++++++++++++---
4 files changed, 89 insertions(+), 12 deletions(-)

Index: linux/fs/nfsd/stats.c
===================================================================
--- linux.orig/fs/nfsd/stats.c 2006-06-20 19:52:02.854477556 +1000
+++ linux/fs/nfsd/stats.c 2006-06-21 17:11:45.117989613 +1000
@@ -423,16 +423,25 @@ void nfsd_stats_update_op(struct svc_rqs
}
}

-static inline ktime_t ktime_get(void)
+static inline ktime_t svc_elapsed_ktime(struct svc_rqst *rqstp, int newstate)
{
- struct timespec ts;
- ktime_get_ts(&ts);
- return timespec_to_ktime(ts);
+ ktime_t now, elapsed;
+ int oldstate = rqstp->rq_state;
+
+ now = ktime_get();
+ elapsed = ktime_sub(now, rqstp->rq_timestamp);
+ rqstp->rq_timestamp = now;
+
+ rqstp->rq_times[oldstate] = ktime_add(rqstp->rq_times[oldstate],
+ elapsed);
+ rqstp->rq_state = newstate;
+
+ return elapsed;
}

void nfsd_stats_pre(struct svc_rqst *rqstp)
{
- rqstp->rq_timestamp = ktime_get();
+ svc_elapsed_ktime(rqstp, /*going busy*/1);
rqstp->rq_export_stats = NULL;
rqstp->rq_client_stats = NULL;
}
@@ -458,15 +467,13 @@ static inline int time_bucket(const ktim
void nfsd_stats_post(struct svc_rqst *rqstp)
{
int tb = -1;
- ktime_t now, svctime;
+ ktime_t svctime;

if (rqstp->rq_export_stats == NULL && rqstp->rq_client_stats == NULL)
return;

/* calculate service time and update the stats */
- now = ktime_get();
- svctime = ktime_sub(now, rqstp->rq_timestamp);
- rqstp->rq_timestamp = now;
+ svctime = svc_elapsed_ktime(rqstp, /*going idle*/0);
tb = time_bucket(svctime);

Index: linux/include/linux/sunrpc/svc.h
===================================================================
--- linux.orig/include/linux/sunrpc/svc.h 2006-06-20 19:30:48.083987201 +1000
+++ linux/include/linux/sunrpc/svc.h 2006-06-21 17:13:49.625163996 +1000
@@ -30,6 +30,11 @@ struct svc_pool_stats {
unsigned long threads_woken;
unsigned long overloads_avoided;
unsigned long threads_timedout;
+ ktime_t dead_times[2]; /* Counts accumulated idle
+ * and busy times for threads
+ * which have died, to avoid
+ * the /proc counters going
+ * backward. */
};

/*
@@ -251,6 +256,8 @@ struct svc_rqst {
wait_queue_head_t rq_wait; /* synchronization */
struct task_struct *rq_task; /* service thread */
int rq_waking; /* 1 if thread is being woken */
+ int rq_state; /* 0=idle 1=busy */
+ ktime_t rq_times[2]; /* cumulative time spent idle,busy */
ktime_t rq_timestamp; /* time of last idle<->busy transition */
struct nfsd_stats_hentry *rq_export_stats;
struct nfsd_stats_hentry *rq_client_stats;
@@ -440,5 +447,12 @@ static inline struct svc_pool *svc_pool_
#endif
}

+static inline ktime_t ktime_get(void)
+{
+ struct timespec ts;
+ ktime_get_ts(&ts);
+ return timespec_to_ktime(ts);
+}
+

#endif /* SUNRPC_SVC_H */
Index: linux/net/sunrpc/svcsock.c
===================================================================
--- linux.orig/net/sunrpc/svcsock.c 2006-06-20 19:52:02.858383300 +1000
+++ linux/net/sunrpc/svcsock.c 2006-06-21 17:25:43.140054463 +1000
@@ -1733,19 +1733,58 @@ static void svc_pool_stats_stop(struct s
static int svc_pool_stats_show(struct seq_file *m, void *p)
{
struct svc_pool *pool = p;
+ ktime_t times[2], now;
+ struct list_head *iter;
+ int state;
+ unsigned nthreads;

if (p == (void *)1) {
- seq_puts(m, "# pool packets-arrived sockets-enqueued threads-woken overloads-avoided threads-timedout\n");
+ seq_puts(m, "# pool packets-arrived sockets-enqueued "
+ "threads-woken overloads-avoided threads-timedout "
+ "num-threads nanosec-idle nanosec-busy\n");
return 0;
}

- seq_printf(m, "%u %lu %lu %lu %lu %lu\n",
+ /*
+ * Here we accumulate the times for each thread in the pool,
+ * and only print accumulated times rather than each thread's
+ * time. Experience shows that the per-pool numbers are useful,
+ * but the per-thread numbers are Too Much Information.
+ */
+ nthreads = 0;
+ now = ktime_get();
+
+ /* take sp_lock to traverse sp_all_threads; this also
+ * prevents threads from transitioning busy<->idle */
+ spin_lock(&pool->sp_lock);
+ /* initialise accumulators to time accumulated by dead threads */
+ times[0] = pool->sp_stats.dead_times[0];
+ times[1] = pool->sp_stats.dead_times[1];
+
+ list_for_each(iter, &pool->sp_all_threads) {
+ struct svc_rqst *rqstp =
+ list_entry(iter, struct svc_rqst, rq_all);
+ times[0] = ktime_add(times[0], rqstp->rq_times[0]);
+ times[1] = ktime_add(times[1], rqstp->rq_times[1]);
+ /* interpolate time from last change to now */
+ state = rqstp->rq_state;
+ times[state] = ktime_add(times[state],
+ ktime_sub(now, rqstp->rq_timestamp));
+ nthreads++;
+ }
+
+ spin_unlock(&pool->sp_lock);
+
+ seq_printf(m, "%u %lu %lu %lu %lu %lu %u %lu %lu\n",
pool->sp_id,
pool->sp_stats.packets,
pool->sp_stats.sockets_queued,
pool->sp_stats.threads_woken,
pool->sp_stats.overloads_avoided,
- pool->sp_stats.threads_timedout);
+ pool->sp_stats.threads_timedout,
+ nthreads,
+ ktime_to_ns(times[0]),
+ ktime_to_ns(times[1]));

return 0;
}
Index: linux/net/sunrpc/svc.c
===================================================================
--- linux.orig/net/sunrpc/svc.c 2006-06-20 19:28:40.198213294 +1000
+++ linux/net/sunrpc/svc.c 2006-06-21 17:24:25.476421936 +1000
@@ -395,6 +395,8 @@ __svc_create_thread(svc_thread_fn func,
spin_unlock_bh(&pool->sp_lock);
rqstp->rq_server = serv;
rqstp->rq_pool = pool;
+ rqstp->rq_state = 0; /* idle */
+ rqstp->rq_timestamp = ktime_get();

#if SVC_HAVE_MULTIPLE_POOLS
if (serv->sv_nrpools > 1)
@@ -540,12 +542,27 @@ svc_exit_thread(struct svc_rqst *rqstp)
{
struct svc_serv *serv = rqstp->rq_server;
struct svc_pool *pool = rqstp->rq_pool;
+ ktime_t now;

svc_release_buffer(rqstp);
kfree(rqstp->rq_resp);
kfree(rqstp->rq_argp);
kfree(rqstp->rq_auth_data);
+
+ /* interpolate time since last state change */
+ now = ktime_get();
+ rqstp->rq_times[rqstp->rq_state] = ktime_add(
+ rqstp->rq_times[rqstp->rq_state],
+ ktime_sub(now,
+ rqstp->rq_timestamp));
spin_lock_bh(&pool->sp_lock);
+ /* remember accumulated time in dead_times */
+ pool->sp_stats.dead_times[0] = ktime_add(
+ pool->sp_stats.dead_times[0],
+ rqstp->rq_times[0]);
+ pool->sp_stats.dead_times[1] = ktime_add(
+ pool->sp_stats.dead_times[1],
+ rqstp->rq_times[1]);
list_del(&rqstp->rq_all);
spin_unlock_bh(&pool->sp_lock);
kfree(rqstp);

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs