2004-03-08 16:55:26

by Olaf Kirch

[permalink] [raw]
Subject: Chuck's iostat patch

Hi Chuck, hi all,

I took Chuck's iostat patch and played with it a little. Attached
you can find the results of this playing around.

This patch changes Chuck's work in two ways - one, all iostat sampling
happens in the RPC layer now, rather than having to add "nfs_count_call"
to every NFS function. This changes some things, e.g. the way in/out
bytes are counted. Chuck's patch counts the payload bytes of a read/write
request only, while my approach includes the RPC overhead.

Second, I ported it to 2.6, and made it use a seqfile for reading the
proc file.

There is still the open question whether the proc file interface to
retrieving and zapping the iostat counters is appropriate. It's
probably not too hard to extend this stuff to have a timer printk
the iostat values every second or so - the question is do we really
want this, or isn't the procfs approach sufficient?

Comments?

Olaf
--
Olaf Kirch | Stop wasting entropy - start using predictable
[email protected] | tempfile names today!
---------------+


Attachments:
(No filename) (1.00 kB)
nfs-iostat (32.13 kB)
Download all attachments

2004-03-09 14:24:15

by Olaf Kirch

[permalink] [raw]
Subject: Re: RE: Chuck's iostat patch

On Tue, Mar 09, 2004 at 05:58:46AM -0800, Lever, Charles wrote:
> i don't have a 2.6 version nor have i started any modification of
> iostat or sar. i simply wanted to get agreement on APIs and exactly
> what metrics we are interested in providing. as such this is only
> a prototype. there are some ugly things that will change in the
> final version (nfs_count_this_op, i believe, is going away).

That is good. I very much prefer doing all the iostat stuff
inside the RPC layer, without having to modify NFS very much
(except for a single call to enable iostat collection for the
rpc_clnt)

> # optype op count bytes retrans errors

I would like to see separate in/out byte counts.

> there are a few problems with this solution. one is that zeroing
> these counters is not atomic. another is that umount is racy and
> can leave these data structures in a not so friendly state that
> could result in an oops.

I've addressed that by putting the rpc_clnt pointer into the proc dir
entry, and make it bump the refcount when the file is opened and call
rpc_release_client() when the file is closed.

> and i think we all agree that using
> seq_file is a much better way to export these metrics to user space.
> in the final version we will use sysfs instead of /proc. it's
> not clear how to name the stat files uniquely -- today i'm using
> minor numbers, which is a hack.

It's a reasonable compromise. You can even use a simple monotonic counter
or the %p representation of the rpc_clnt pointer as the file name -
it should be up to iostat/sar to display something reasonable to the
user. You can help it by making nfs_show_options() give a hint which
iostat file corresponds to which mount.

> for use in the stat files. in 2.6 we have superblock sharing, and
> i'm not sure what exactly that will mean for the "per-mountpoint"
> nature of this implementation.

Well, that is not an issue in this case I think. Only super blocks
with the same file handle get shared.

Olaf
--
Olaf Kirch | Stop wasting entropy - start using predictable
[email protected] | tempfile names today!
---------------+


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-08 18:12:20

by Lever, Charles

[permalink] [raw]
Subject: RE: Chuck's iostat patch

hi olaf-

> I took Chuck's iostat patch and played with it a little. Attached
> you can find the results of this playing around.
>=20
> This patch changes Chuck's work in two ways - one, all iostat sampling
> happens in the RPC layer now, rather than having to add=20
> "nfs_count_call"
> to every NFS function. This changes some things, e.g. the way in/out
> bytes are counted. Chuck's patch counts the payload bytes of=20
> a read/write
> request only, while my approach includes the RPC overhead.
>=20
> Second, I ported it to 2.6, and made it use a seqfile for reading the
> proc file.

trond has already ported my 2.4 patch up to 2.6. i'm still
waiting to see his work, as he has defined the metrics API
the way he likes it, and i'd like to build on that.

it sounds like you may have an old version of that patch
anyway, i've already made similar changes to the 2.4
version of this patch.

> There is still the open question whether the proc file interface to
> retrieving and zapping the iostat counters is appropriate. It's
> probably not too hard to extend this stuff to have a timer printk
> the iostat values every second or so - the question is do we really
> want this, or isn't the procfs approach sufficient?

we are planning to use sysfs instead of proc. zapping the counters
is an important feature, and that is planned for inclusion.

i don't like the "timer printk" approach. think of multiple
performance sampling tools running at the same time: iostat,
and say, sar. these need to have the absolute counters available
to them, and they can compute the averages based on time-period
samples. that's all done in user space.

did i misunderstand you?


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-09 14:08:02

by Lever, Charles

[permalink] [raw]
Subject: RE: RE: Chuck's iostat patch

> I too have done something similar to this for 2.4 and 2.6. I=20
> believe I=20
> announed it a while ago. You may want to take a look at it here. =20
> http://www.mcnicoll.ca/iostat
>=20
> This is a project for a prof of mine who originally did it=20
> for his PHD,=20
> way back in 2.0.34. I am just implementing it in 2.4 and=20
> 2.6 for NFS. =20
> The patches are on the website.
>=20
> Where are the patches for Chucks work?

http://plymouth.citi.umich.edu/cel/nfs-client/rhel-3.0/linux-2.4.21-nfs_
metrics.patch

i don't have a 2.6 version nor have i started any modification of
iostat or sar. i simply wanted to get agreement on APIs and exactly
what metrics we are interested in providing. as such this is only
a prototype. there are some ugly things that will change in the
final version (nfs_count_this_op, i believe, is going away).

it creates files under /proc/fs/nfs/stats, one file for each mount
point on the client. the stat files look something like:

-------- cut here --------
nfs/stats format version: 1.0
hostname: %s
nfs version: %d

mounted %lu seconds ago
transport idle time: %lu seconds
major timeouts: %lu
transport partial writes: %lu
write_space callbacks: %lu
transport socket type: tcp
connect attempts: %lu
total connect wait time: %Lu usecs
or
transport socket type: udp

# optype op count bytes retrans errors
read:
write:
commit:
getattr:
lookup:
readdir:
symlink:
readlnk:
remove:
other:

ticks/sec: %Lu
# optype rtt total ticks sum rtt ** 2 execute total ticks
sum execute ** 2
read:
write:
commit:
getattr:
lookup:
readdir:
symlink:
readlnk:
remove:
other:

# optype slot util backlog sndq util
read:
write:
commit:
getattr:
lookup:
readdir:
symlink:
readlnk:
remove:
other:
-------- cut here --------

the idea being that this file would export raw counts that show
the number of bytes that have been transfered along with errors
and retransmissions, and the RPC latencies and queue utilizations,
with enough data to compute running averages and standard deviation.

the user land tools (iostat and sar, at least) would then be able
to compute and display byte rates, average RPC latencies (per op type),
and so on. these stats are per-mountpoint and per op type so we
can detect when a server is slow at writes and fast at reads, or
vice versa.

i'd also like to provide cache hit rate statistics, and some info
about RPC scheduling (like how many times, on average, an RPC task
is moved from queue to queue or put to sleep).

there are a few problems with this solution. one is that zeroing
these counters is not atomic. another is that umount is racy and
can leave these data structures in a not so friendly state that
could result in an oops. and i think we all agree that using
seq_file is a much better way to export these metrics to user space.
in the final version we will use sysfs instead of /proc. it's
not clear how to name the stat files uniquely -- today i'm using
minor numbers, which is a hack. the export pathname is available only
in the vfsmount structures, so it is entirely unavailable today
for use in the stat files. in 2.6 we have superblock sharing, and
i'm not sure what exactly that will mean for the "per-mountpoint"
nature of this implementation.

there's a lot of work left to be done.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs