2004-03-31 16:34:33

by Lever, Charles

[permalink] [raw]
Subject: [PATCH] 1/3 - RPC metrics support

This patch provides a generic interface to the RPC client for tallying
per RPC request measurements. The interface can be used by any
in-kernel application that calls the RPC client. The API is:

rpc_alloc_tally
rpc_clear_tally

Provides a way for ULPs (upper level protocols) to allocate and free
that data structure that will hold the measurement totals. This
structure is passed in to any RPC request that the ULP wants to have a
measurement for.

rpc_print_tally

Passing a seq_file and an RPC tally structure to this routine causes a
formatted human-readable byte stream representation of the tally struct
to appear on the seq_file.

rpc_clear_tally

Clear the counter fields in the RPC tally structure.

- Chuck Lever
--
corporate: <cel at netapp dot com>
personal: <chucklever at bigfoot dot com>


Attachments:
06-rpc_metrics.patch (19.09 kB)
06-rpc_metrics.patch

2004-03-31 16:46:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] 1/3 - RPC metrics support


My experience with gettimeofday() was that it slows down and eventually
hangs the machine when you use it inside bottom halves. That's why I had
to revert to using 'jiffies' instead. While I agree that precision may
be important, a measurement that so impacts on what it is measuring is
not particularly useful either...

Could you also move EXPORT_SYMBOL() definitions to after their function
definition, as per the 2.6.x recommendations.

Cheers,
Trond

--
Trond Myklebust <[email protected]>


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-31 16:57:28

by Lever, Charles

[permalink] [raw]
Subject: RE: [PATCH] 1/3 - RPC metrics support

> My experience with gettimeofday() was that it slows down and=20
> eventually hangs the machine when you use it inside bottom=20
> halves. That's why I had to revert to using 'jiffies'=20
> instead. While I agree that precision may be important, a=20
> measurement that so impacts on what it is measuring is not=20
> particularly useful either...

using jiffies makes these measurements useless for anything
faster than a millisecond (or 10 milliseconds on current
x86), which is most everything on 100Mb or faster, these
days. isn't there some other solution?

> Could you also move EXPORT_SYMBOL() definitions to after=20
> their function definition, as per the 2.6.x recommendations.

i would prefer that we did this in a separate patch, and
moved all the symbols at once.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-31 17:19:13

by Trond Myklebust

[permalink] [raw]
Subject: RE: [PATCH] 1/3 - RPC metrics support

On Wed, 2004-03-31 at 11:57, Lever, Charles wrote:
> > My experience with gettimeofday() was that it slows down and
> > eventually hangs the machine when you use it inside bottom
> > halves. That's why I had to revert to using 'jiffies'
> > instead. While I agree that precision may be important, a
> > measurement that so impacts on what it is measuring is not
> > particularly useful either...
>
> using jiffies makes these measurements useless for anything
> faster than a millisecond (or 10 milliseconds on current
> x86), which is most everything on 100Mb or faster, these
> days. isn't there some other solution?

In 2.6. jiffies have 1 ms resolution, not 10 on x86. There may be some
alternative generic timing solution in Linux that does better than that,
and that can work within bottom halves without side-effects but I'm not
aware of such a solution.

You can of course use do_gettimeofday() for the non bh-safe parts if you
want to, but using it inside things like *data_ready() etc. was clearly
unacceptable.

The only generic solution I see is therefore to use statistics...

Cheers,
Trond


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-31 18:21:51

by Jeremy McNicoll

[permalink] [raw]
Subject: Re: RE: [PATCH] 1/3 - RPC metrics support

On Wed, 2004-03-31 at 12:19, Trond Myklebust wrote:
> On Wed, 2004-03-31 at 11:57, Lever, Charles wrote:
> > > My experience with gettimeofday() was that it slows down and
> > > eventually hangs the machine when you use it inside bottom
> > > halves. That's why I had to revert to using 'jiffies'
> > > instead. While I agree that precision may be important, a
> > > measurement that so impacts on what it is measuring is not
> > > particularly useful either...
> >
> > using jiffies makes these measurements useless for anything
> > faster than a millisecond (or 10 milliseconds on current
> > x86), which is most everything on 100Mb or faster, these
> > days. isn't there some other solution?
>
> In 2.6. jiffies have 1 ms resolution, not 10 on x86. There may be some
> alternative generic timing solution in Linux that does better than that,
> and that can work within bottom halves without side-effects but I'm not
> aware of such a solution.
>
> You can of course use do_gettimeofday() for the non bh-safe parts if you
> want to, but using it inside things like *data_ready() etc. was clearly
> unacceptable.
>
> The only generic solution I see is therefore to use statistics...
>

I am the middle of the "fun" part of my school project that involves
this stuff, documentation that is!!! But If I recall correctly there is
a patch that gives you better resolution jiffies.... I cant recall off
the top of my head. But I have to agree with the jiffies solution. It
worked really well for me. It is just a matter of converting jiffies to
MS in user space.

I am using Robert Love's book right now, for a reference while I
describe what I accomplished. [ (http://www.mcnicoll.ca/iostat/) as soon
as the documentation is completed I will post it there.] Which has
good info on timing in chapter 9 , this may help you out.



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-01 08:14:13

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] 1/3 - RPC metrics support

G'day,

[cc trimmed somewhat]

"Lever, Charles" wrote:
>
> -------------------------------------------------------------------------------------
> Name: 06-rpc_metrics.patch
> 06-rpc_metrics.patch Type: unspecified type (application/octet-stream)
> Encoding: base64
> Description: 06-rpc_metrics.patch
> -------------------------------------------------------------------------------------
> Name: 07-nfs-iostat.patch
> 07-nfs-iostat.patch Type: unspecified type (application/octet-stream)
> Encoding: base64
> Description: 07-nfs-iostat.patch
> -------------------------------------------------------------------------------------
> Name: 08-rpc_call.patch
> 08-rpc_call.patch Type: unspecified type (application/octet-stream)
> Encoding: base64
> Description: 08-rpc_call.patch


These patches look great, I'm very happy to see extended stats
on the client and I expect it will be very useful debugging
performance and other problems. Thanks! Just a few comments...


I don't understand why you have an rpc_tally pointer in rpc_message.
This just seems to complicate matters by forcing you to initialise
the pointer to the rpc_tally in the nfs_server on every call. AFAICT
all the place where you need the rpc_tally pointer you already have
an rpc_clnt pointer which is initialised directly from the nfs_server
and could be made to hold the rpc_tally* instead. With that reorg
you could drop most of the 08-rpc_call.patch and all the changes
to nfs3proc.c and proc.c in nfs-iostat.patch and replace them with
a small change to nfs_create_client(). That would greatly reduce
the code perturbation without affecting performance or functionality.


I don't see any way for the userspace program which reads these to
figure out which of the iostat files corresponds to which mount.
We get the hostname in the filename and again in the contents but
nowhere do we get the mount path, only the device minor number in
the filename. How about adding a line to nfs_iostat_show() to
print nfss->mnt_path?


The minor dimension of the 2d array of struct rpc_metrics_totals
pointed to by rpc_tally is per-CPU. To get correct per-CPU
separation you will want to cacheline-align the per-CPU parts
(so that cachelines which straddle the per-CPU parts don't end
up bouncing between CPUs). For example, in rpc_alloc_tally(),

size = NR_CPUS * ROUNDUP(ops * sizeof(struct rpc_metric_totals), L1_CACHE_BYTES);

and change the stride of the tl_totals[] initialisation loop.


As others have mentioned, having at least four do_gettimeofday()
calls per RPC call is evil; we've seen major problems scaling
do_gettimeofday() on large CPU count Altix machines. This problem
also affects some NIC drivers because the network stack does
a do_gettimeofday() to timestamp every incoming ethernet packet;
David Miller's proposed solution to that is to define a new
scalable high-resolution timestamp API. I don't know if that's
gone anywhere, you might try asking on the netdev list.


So when the mount is unmounted, all stats are lost? How does this
work on clients which do intermittent traffic on automount mounts?
Would it be possible to get some global stats too?


The accumulation-over-CPU loops in nfs_iostat_show() and rpc_print_tally()
would be simpler if all the fields in struct nfs_iostat and struct
rpc_metric_totals were a consistent size, so you could just treat the
structs as an array of that type during the accumulation step.


In struct nfs_iostat, it would be nice to gather the number of times
calls retried because they received EJUKEBOX from the server, the number
of SETATTRs which cause a truncate, the number of WRITEs which cause
a file extension, and the number of WRITEs which are not aligned to
the wsize.


nfs_iostat_file_name() does an unbounded sprintf() of an unbounded string
that comes from userspace (from an unprivileged user if the site runs
automounter) into a buffer only NFS_PROC_NAMELEN=256 bytes long.


Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-01 15:15:45

by Lever, Charles

[permalink] [raw]
Subject: RE: [PATCH] 1/3 - RPC metrics support

hi greg-

thanks for the review.

> I don't understand why you have an rpc_tally pointer in rpc_message.
> This just seems to complicate matters by forcing you to initialise
> the pointer to the rpc_tally in the nfs_server on every call. AFAICT
> all the place where you need the rpc_tally pointer you already have
> an rpc_clnt pointer which is initialised directly from the nfs_server
> and could be made to hold the rpc_tally* instead. With that reorg
> you could drop most of the 08-rpc_call.patch and all the changes
> to nfs3proc.c and proc.c in nfs-iostat.patch and replace them with
> a small change to nfs_create_client(). That would greatly reduce
> the code perturbation without affecting performance or functionality.

trond wanted a mechanism that could be used for other things besides
per-mount statistics. for instance, you can allocate a different
tally structure for each inode on a file system, and collect stats
for each of them. or you can use the same tally struct for files
in the same directory. nailing the tally structure into the rpc_clnt
makes that kind of usage (which might be nice for debugging) impossible.

there was interest in removing rpc_call() anyway. this is just a
convenient excuse.

in terms of performance impact, this change is actually pretty
minor compared to the massive amount of data copying done by XDR
and the socket layer, and to the lock contention that occurs in
the RPC client. as i've inlined nfs3_rpc_wrapper, that more
than makes up for setting a single extra pointer, at least in
the NFSv3 synchronous path. i don't see that there are any data
dependencies that would cause a pipeline stall when rpc_talp is
set, do you?

> I don't see any way for the userspace program which reads these to
> figure out which of the iostat files corresponds to which mount.
> We get the hostname in the filename and again in the contents but
> nowhere do we get the mount path, only the device minor number in
> the filename. How about adding a line to nfs_iostat_show() to
> print nfss->mnt_path?

the problem there is namespace. each process can have its own
set of mount points (ie export to local directory pairings). we
still need to work out exactly how to provide that information.

i'm not too concerned about that, as iostat on local file systems
currently produces results based on devices, forcing an adminis-
trator to go back to /etc/fstab or df or mount to figure out
what's going on anyway.

> The minor dimension of the 2d array of struct rpc_metrics_totals
> pointed to by rpc_tally is per-CPU. To get correct per-CPU
> separation you will want to cacheline-align the per-CPU parts
> (so that cachelines which straddle the per-CPU parts don't end
> up bouncing between CPUs). For example, in rpc_alloc_tally(),
>=20
> size =3D NR_CPUS * ROUNDUP(ops * sizeof(struct=20
> rpc_metric_totals), L1_CACHE_BYTES);
>=20
> and change the stride of the tl_totals[] initialisation loop.

makes sense, will do.

> As others have mentioned, having at least four do_gettimeofday()
> calls per RPC call is evil; we've seen major problems scaling
> do_gettimeofday() on large CPU count Altix machines. This problem
> also affects some NIC drivers because the network stack does
> a do_gettimeofday() to timestamp every incoming ethernet packet;
> David Miller's proposed solution to that is to define a new
> scalable high-resolution timestamp API. I don't know if that's
> gone anywhere, you might try asking on the netdev list.

i asked on lkml yesterday. folks there say "no such API exists."
at this point i'm thinking of using jiffies and calling it a day.
at some later point we can come back and add support for finer
resolution timestamps. i've already added a timestamp resolution
indicator in the iostat file format, so changing later should be
no problem if user-level pays attention to it.

> So when the mount is unmounted, all stats are lost? How does this
> work on clients which do intermittent traffic on automount mounts?
> Would it be possible to get some global stats too?

global stats still exist in /proc/net/rpc/ -- anything you
think is missing there?

> The accumulation-over-CPU loops in nfs_iostat_show() and=20
> rpc_print_tally()
> would be simpler if all the fields in struct nfs_iostat and struct
> rpc_metric_totals were a consistent size, so you could just treat the
> structs as an array of that type during the accumulation step.

i'm trying to keep rpc_metric_totals small because there are a whole
lot of them. i'm not sure i understand exactly what you mean here.
can you give an example?

> In struct nfs_iostat, it would be nice to gather the number of times
> calls retried because they received EJUKEBOX from the server,=20
> the number
> of SETATTRs which cause a truncate, the number of WRITEs which cause
> a file extension, and the number of WRITEs which are not aligned to
> the wsize.

ok, good ideas.

> nfs_iostat_file_name() does an unbounded sprintf() of an=20
> unbounded string
> that comes from userspace (from an unprivileged user if the site runs
> automounter) into a buffer only NFS_PROC_NAMELEN=3D256 bytes long.

good catch.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-01 16:45:51

by Lever, Charles

[permalink] [raw]
Subject: RE: [PATCH] 1/3 - RPC metrics support

> The minor dimension of the 2d array of struct=20
> rpc_metrics_totals pointed to by rpc_tally is per-CPU. To=20
> get correct per-CPU separation you will want to=20
> cacheline-align the per-CPU parts (so that cachelines which=20
> straddle the per-CPU parts don't end up bouncing between=20
> CPUs). For example, in rpc_alloc_tally(),
>=20
> size =3D NR_CPUS * ROUNDUP(ops * sizeof(struct=20
> rpc_metric_totals), L1_CACHE_BYTES);
>=20
> and change the stride of the tl_totals[] initialisation loop.

that doesn't work, unfortunately, unless sizeof(struct rpc_tally)
is also an exact multiple of L1_CACHE_BYTES. otherwise the whole
metrics array starts on a non-cache-aligned boundary since these
are all cut from the same piece of memory.

what is the preferred mechanism these days for ensuring that
structures like these are sized in multiples of L1_CACHE_BYTES?
if i can get the C compiler to do this, that would be a cleaner
solution.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-01 17:05:34

by Lever, Charles

[permalink] [raw]
Subject: RE: [PATCH] 1/3 - RPC metrics support

> I don't see any way for the userspace program which reads=20
> these to figure out which of the iostat files corresponds to=20
> which mount. We get the hostname in the filename and again in=20
> the contents but nowhere do we get the mount path, only the=20
> device minor number in the filename. How about adding a line=20
> to nfs_iostat_show() to print nfss->mnt_path?

nfss->mnt_path is an NFSv4 only field at the moment.

let me make sure i understand exactly what it contains. is
this the mounted-on directory on the client, or the export
path on the server?

if we saved the export path somewhere in the nfs_server
structure, that would be mighty convenient for iostats
and i don't think that would break namespaces.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-01 23:58:35

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] 1/3 - RPC metrics support

On Thu, Apr 01, 2004 at 07:15:34AM -0800, Lever, Charles wrote:
>
> trond wanted a mechanism that could be used for other things besides
> per-mount statistics. for instance, you can allocate a different
> tally structure for each inode on a file system, and collect stats
> for each of them. or you can use the same tally struct for files
> in the same directory. nailing the tally structure into the rpc_clnt
> makes that kind of usage (which might be nice for debugging) impossible.

Ah. I didn't realise that was a requirement.

> there was interest in removing rpc_call() anyway.

Why?

> in terms of performance impact, this change is actually pretty
> minor

Agreed, I saw no performance issues other than do_gettimeofday().

> > I don't see any way for the userspace program which reads these to
> > figure out which of the iostat files corresponds to which mount.
>
> the problem there is namespace. each process can have its own
> set of mount points (ie export to local directory pairings). we
> still need to work out exactly how to provide that information.

Actually I was interested in knowing the serverside mount path,
which doesn't depend on the client's namespace (although it's not
necessarily unique).

The obvious solution to the namespace issue is to just use the
namespace of the process doing the /proc read, which should be
current->namespace during nfs_iostat_show(). For almost every
case this will be the default namespace anyway. In the unlikely
case the the sb is not mounted in that namespace, return early
and don't print any stats.

> i'm not too concerned about that, as iostat on local file systems
> currently produces results based on devices, forcing an adminis-
> trator to go back to /etc/fstab or df or mount to figure out
> what's going on anyway.

This works because for local filesystems those sources give you
the device name. However (at least on my system) neither fstab
nor mount nor /proc/mounts will tell you anything about the device
number for NFS filesystems.

So what I see from "mount" is

server:/ on /mnt/foo type nfs (rw,addr=134.14.54.149)

and in /proc/mounts

server:/ /mnt/foo nfs rw,v3,rsize=32768,wsize=32768,hard,udp,lock,addr=server 0 0

So how am I supposed to correlate this back to a device minor?

> > As others have mentioned, having at least four do_gettimeofday()
> > calls per RPC call is evil;
>
> i asked on lkml yesterday. folks there say "no such API exists."
> at this point i'm thinking of using jiffies and calling it a day.

Fair enough.

> > So when the mount is unmounted, all stats are lost? How does this
> > work on clients which do intermittent traffic on automount mounts?
> > Would it be possible to get some global stats too?
>
> global stats still exist in /proc/net/rpc/ -- anything you
> think is missing there?

It would be nice to have all the same stats as are gathered per-mount,
in the same format so we only have to write and debug one userspace
parsing routine. In particular, the latency stats.

> > The accumulation-over-CPU loops in nfs_iostat_show() and
> > rpc_print_tally()
> > would be simpler if all the fields in struct nfs_iostat and struct
> > rpc_metric_totals were a consistent size, so you could just treat the
> > structs as an array of that type during the accumulation step.
>
> i'm trying to keep rpc_metric_totals small because there are a whole
> lot of them.

Sure, highly admirable.

> i'm not sure i understand exactly what you mean here.
> can you give an example?

struct rpc_metric_totals {
u64 mt_ops; /* count of operations */
u64 mt_ntrans; /* count of RPC transmissions */
u64 mt_cwnd; /* congestion window util */
u64 mt_cong; /* congestion window util */
u64 mt_slot_u; /* slot table utilization */
u64 mt_sndq_u; /* send queue utilization */
u64 mt_bklog_u; /* backlog utilization */
u64 mt_errors; /* count of EIO errors */

u64 mt_bytes_sent; /* count of bytes out */
u64 mt_bytes_recv; /* count of bytes in */
u64 mt_sendbuf; /* total sendbuf */

u64 mt_rtt; /* usecs for RPC RTT */
u64 mt_rtt_s; /* sum(rtt**2) */
u64 mt_execute; /* usecs for RPC execution */
u64 mt_execute_s; /* sum(execute**2) */
};

rpc_print_tally(...)
{
...
for (op = 0; op < vers->nrprocs; op++) {
memset(&total, 0, sizeof(struct rpc_metric_totals));
for (i = 0; i < NR_CPUS; i++) {
u64 *r = tally->tl_totals[i];
for (k = 0 ; k < sizeof(struct rpc_metric_totals)/sizeof(u64) ; k++)
((u64 *)&total)[k] += r[k];
}
}
}

The idea being that adding another field to rpc_metric_totals
becomes easier. Obviously the cost is higher memory use.
Just an idea.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-02 00:11:10

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] 1/3 - RPC metrics support

On Thu, Apr 01, 2004 at 08:45:40AM -0800, Lever, Charles wrote:
> > The minor dimension of the 2d array of struct
> > rpc_metrics_totals pointed to by rpc_tally is per-CPU. To
> > get correct per-CPU separation you will want to
> > cacheline-align the per-CPU parts (so that cachelines which
> > straddle the per-CPU parts don't end up bouncing between
> > CPUs). For example, in rpc_alloc_tally(),
> >
> > size = NR_CPUS * ROUNDUP(ops * sizeof(struct
> > rpc_metric_totals), L1_CACHE_BYTES);
> >
> > and change the stride of the tl_totals[] initialisation loop.
>
> that doesn't work, unfortunately, unless sizeof(struct rpc_tally)
> is also an exact multiple of L1_CACHE_BYTES. otherwise the whole
> metrics array starts on a non-cache-aligned boundary since these
> are all cut from the same piece of memory.

That's right, you'll need to add ____cacheline_aligned like this:

struct rpc_tally {
} ____cacheline_aligned;

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-02 00:17:47

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] 1/3 - RPC metrics support

On Thu, Apr 01, 2004 at 09:05:20AM -0800, Lever, Charles wrote:
> > I don't see any way for the userspace program which reads
> > these to figure out which of the iostat files corresponds to
> > which mount. We get the hostname in the filename and again in
> > the contents but nowhere do we get the mount path, only the
> > device minor number in the filename. How about adding a line
> > to nfs_iostat_show() to print nfss->mnt_path?
>
> nfss->mnt_path is an NFSv4 only field at the moment.
>
> let me make sure i understand exactly what it contains. is
> this the mounted-on directory on the client, or the export
> path on the server?
>
> if we saved the export path somewhere in the nfs_server
> structure, that would be mighty convenient for iostats
> and i don't think that would break namespaces.

Sounds good.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-02 00:34:16

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] 1/3 - RPC metrics support

On Thu, 2004-04-01 at 18:58, Greg Banks wrote:

> struct rpc_metric_totals {
> u64 mt_ops; /* count of operations */
> u64 mt_ntrans; /* count of RPC transmissions */
> u64 mt_cwnd; /* congestion window util */
> u64 mt_cong; /* congestion window util */
> u64 mt_slot_u; /* slot table utilization */
> u64 mt_sndq_u; /* send queue utilization */
> u64 mt_bklog_u; /* backlog utilization */
> u64 mt_errors; /* count of EIO errors */
>
> u64 mt_bytes_sent; /* count of bytes out */
> u64 mt_bytes_recv; /* count of bytes in */
> u64 mt_sendbuf; /* total sendbuf */
>
> u64 mt_rtt; /* usecs for RPC RTT */
> u64 mt_rtt_s; /* sum(rtt**2) */
> u64 mt_execute; /* usecs for RPC execution */
> u64 mt_execute_s; /* sum(execute**2) */
> };
>
> rpc_print_tally(...)
> {
> ...
> for (op = 0; op < vers->nrprocs; op++) {
> memset(&total, 0, sizeof(struct rpc_metric_totals));
> for (i = 0; i < NR_CPUS; i++) {
> u64 *r = tally->tl_totals[i];
> for (k = 0 ; k < sizeof(struct rpc_metric_totals)/sizeof(u64) ; k++)
> ((u64 *)&total)[k] += r[k];
> }
> }
> }

Let's not go there... 64-bit arithmetic on a 32-bit platform is still
not one of gcc's strong points, so let's not force everything into one
mold simply in order to save a couple of lines of code. Performance does
matter too...

Cheers,
Trond


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-02 00:36:33

by Trond Myklebust

[permalink] [raw]
Subject: RE: [PATCH] 1/3 - RPC metrics support

On Thu, 2004-04-01 at 12:05, Lever, Charles wrote:

> if we saved the export path somewhere in the nfs_server
> structure, that would be mighty convenient for iostats
> and i don't think that would break namespaces.

...but it would force us to introduce a new mount_data structure, and
hence a new "mount" ABI so that will have to wait until 2.7.x.

Cheers,
Trond


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-02 00:49:35

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] 1/3 - RPC metrics support

On Thu, Apr 01, 2004 at 07:36:27PM -0500, Trond Myklebust wrote:
> On Thu, 2004-04-01 at 12:05, Lever, Charles wrote:
>
> > if we saved the export path somewhere in the nfs_server
> > structure, that would be mighty convenient for iostats
> > and i don't think that would break namespaces.
>
> ...but it would force us to introduce a new mount_data structure, and
> hence a new "mount" ABI so that will have to wait until 2.7.x.

Why not just save a copy of @devname in nfs_get_sb() ? It's
the same name that's stored in struct vfsmount and reported
in /proc/mounts.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-02 01:41:24

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] 1/3 - RPC metrics support

On Thu, 2004-04-01 at 19:49, Greg Banks wrote:

> Why not just save a copy of @devname in nfs_get_sb() ? It's
> the same name that's stored in struct vfsmount and reported
> in /proc/mounts.

That is very different. That we can do...

Please don't confuse the export path and the device name. nfss->mnt_path
is the former: it is part of the NFS mount ABI, and contains the path
that we use to look up the mountpoint on the NFSv4 server. It allows us,
for instance, to recover in the case where the server changes the
filehandle of the mountpoint (NFSv4 servers are allowed to have
filehandles that expire).
Adding the export path to the NFSv2/v3 mount structures would for
instance allow us to do things such as identifying whether or not the
user is mounting a parent/subdirectory of one of our current mount
points.


The device name OTOH is just whatever label the caller of sys_mount()
feels like passing down to the kernel. It will usually contain the name
of the server and the export path ('cos that's what the nfs-utils
"mount" does), but it does not have to, and we currently do not depend
on it taking any particular value.


Cheers,
Trond


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-02 02:33:18

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] 1/3 - RPC metrics support

Trond Myklebust wrote:
>
> On Thu, 2004-04-01 at 18:58, Greg Banks wrote:
>
> > struct rpc_metric_totals {
> > u64 mt_ops; /* count of operations */
>
> Let's not go there... 64-bit arithmetic on a 32-bit platform is still
> not one of gcc's strong points,

This is of course not something I considered ;-)

> so let's not force everything into one
> mold simply in order to save a couple of lines of code. Performance does
> matter too...

Fair enough.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-02 02:42:42

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] 1/3 - RPC metrics support

Trond Myklebust wrote:
>
> On Thu, 2004-04-01 at 19:49, Greg Banks wrote:
>
> > Why not just save a copy of @devname in nfs_get_sb() ?
>
> That is very different. That we can do...
>
> Please don't confuse the export path and the device name. nfss->mnt_path
> is the former:

> The device name OTOH is just whatever label the caller of sys_mount()
> feels like passing down to the kernel. It will usually contain the name
> of the server and the export path ('cos that's what the nfs-utils
> "mount" does), but it does not have to, and we currently do not depend
> on it taking any particular value.

If all we care about is identifying the sb for purposes of iostat and
post-mortem debugging (I for one would find this useful), the precise
semantics placed on the string by the mount program don't matter that
much. You could add a `devname' field to nfs_server and document
that it doesn't mean anything at all. Alternatively you could check
that the first part matches (modulo domain names) the `hostname' field
and is followed by a ':' and a path, and if all that matches, save the
path.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-04 02:36:01

by Lever, Charles

[permalink] [raw]
Subject: RE: [PATCH] 1/3 - RPC metrics support

> On Thu, Apr 01, 2004 at 07:36:27PM -0500, Trond Myklebust wrote:
> > On Thu, 2004-04-01 at 12:05, Lever, Charles wrote:
> >=20
> > > if we saved the export path somewhere in the nfs_server
> > > structure, that would be mighty convenient for iostats
> > > and i don't think that would break namespaces.
> >=20
> > ...but it would force us to introduce a new mount_data=20
> structure, and
> > hence a new "mount" ABI so that will have to wait until 2.7.x.
>=20
> Why not just save a copy of @devname in nfs_get_sb() ? It's
> the same name that's stored in struct vfsmount and reported
> in /proc/mounts.

unless i've missed something, there are only two NFS functions
that take vfsmount as an argument. one is nfs_show_options, and
the other is nfs_getattr.

the vfsmount struct is not available to the NFS client during a
mount operation.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-04 06:17:50

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] 1/3 - RPC metrics support

"Lever, Charles" wrote:
>
> > Why not just save a copy of @devname in nfs_get_sb() ? It's
> > the same name that's stored in struct vfsmount and reported
> > in /proc/mounts.
>
> unless i've missed something, there are only two NFS functions
> that take vfsmount as an argument. one is nfs_show_options, and
> the other is nfs_getattr.
>
> the vfsmount struct is not available to the NFS client during a
> mount operation.

The devname string that fs/super.c:do_kern_mount() passes to
nfs_get_sb() via file_system_type->get_sb is passed to
alloc_vfsmount() and used there to initialise the vfsmount's
mnt_devname. So while NFS doesn't have the vfsmount it does
have the same devname string.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs