I've got an NFS server with client machines running User-mode Linux.
These UML instances open several large files on the NFS server at boot
time (root, /home, etc), and perform varying amounts of IO over the next
several months, or however long the UML instances are running.
The problem is, I really have no concrete way of determining IO load on
these files. Nothing distinguishes the packets for one file from
packets for another file coming from the same physical host machine.
Except the filehandle that is.
A while ago I wrote a script that caught every NFS packet and used
periodic getattr RPC calls to complete a mapping between the filehandles
and inodes (and thus pathnames), and it mostly worked. However,
something has changed and these getattr calls have ceased. Thus, the
script happily gathers read and write calls but has no means of
translating them.
My question: is there some means of divining the device/inode of a
filehandle? I looked around the NFS and VFS code and found nothing that
looked promising, but I don't really understand the VFS subsystem.
(aside: some net sources indicate that Solaris has a /var/nfs/fhtable
that it keeps up to date, but that's related to nfslogd and various
other Sun-isms)
Would it be possible to write a program to walk through kmem from some
known global kernel symbol to find the relevant structures necessary to
perform this translation? (Ideally, on an existing production system,
passively) If so, what symbol would be the base of such a trip? If
someone can explain roughly what the relevant structure tree looks like,
I can figure out how to write the necessary code, but until I have a
clue about the relevant VFS/NFS code, I'm lost.
TIA,
Omega
aka Erik Walthinsen
[email protected]
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Sat, May 08, 2004 at 04:33:28PM -0700, Erik Walthinsen wrote:
> I've got an NFS server with client machines running User-mode Linux.
>[...]
> My question: is there some means of divining the device/inode of a
> filehandle? I looked around the NFS and VFS code and found nothing that
> looked promising, but I don't really understand the VFS subsystem.
Generally the device and inode of the file are encoded in the
file handle itself rather than mapped in the server; this is
the easiest way of satisfying the requirement that the filehandles
be stable across server reboots.
If your NFS server is a Linux box, see the comments in
include/linux/nfsd/nfsfh.h for a description of the format
of Linux' file handles.
Also, ethereal will pick apart and display Linux file handles
(unless the underlying server fs is XFS).
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Sun, 2004-05-09 at 00:24, Greg Banks wrote:
> Generally the device and inode of the file are encoded in the
> file handle itself rather than mapped in the server; this is
> the easiest way of satisfying the requirement that the filehandles
> be stable across server reboots.
Oh. Duh. I was assuming the filehandles were randomly assigned by the
server, didn't think about the requirements filehandle continuity would
impose... I suppose it's just plain cheaper too.
> If your NFS server is a Linux box, see the comments in
> include/linux/nfsd/nfsfh.h for a description of the format
> of Linux' file handles.
Heh, found the inode in the filehandle in ethereal just by inspection,
will have to hack my script tomorrow to make use of it.
> Also, ethereal will pick apart and display Linux file handles
> (unless the underlying server fs is XFS).
The version I have (debian/sid ethereal-0.10.3) doesn't display
filehandles as anything but length and data, though I have used the mode
that does what my script was trying to do, which will try to find and
cache the translation. However, that only works when the getattrs are
present <g> Not sure if running ethereal on the NFS server itself would
help, but it's got no head and no X anyway.
Well, I think my problem is solved, thanks for the help! ;-)
Now if someone just had as good an answer to the "what is a client?"
question.... You might imagine my interest in that is more than
academic, with 100's of *long*-duration filehandles in the GB+ range
each, yet over only a few physical client machines.
Thanks,
Omega
aka Erik Walthinsen
[email protected]
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
> Now if someone just had as good an answer to the "what is a client?"
> question.... You might imagine my interest in that is more than
> academic, with 100's of *long*-duration filehandles in the GB+ range
> each, yet over only a few physical client machines.
>
Any thoughts to making this program available anywhere? I'd be
interested to get some mappings to find out who my most abusive users
are.
I've run into some performance problems recently and I'd love to know if
I've got some specific users who are being abusive or if the machine is
just underpowered.
Thanks
-sv
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Sun, May 09, 2004 at 08:26:54PM -0400, seth vidal alleged:
>
> > Now if someone just had as good an answer to the "what is a client?"
> > question.... You might imagine my interest in that is more than
> > academic, with 100's of *long*-duration filehandles in the GB+ range
> > each, yet over only a few physical client machines.
> >
>
> Any thoughts to making this program available anywhere? I'd be
> interested to get some mappings to find out who my most abusive users
> are.
>
> I've run into some performance problems recently and I'd love to know if
> I've got some specific users who are being abusive or if the machine is
> just underpowered.
Come to think of it, I've found myself in the same situation... wondering who
is beating up my NFS servers. This would be a great tool to have.
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
On Sun, 2004-05-09 at 20:00, Garrick Staples wrote:
> > Any thoughts to making this program available anywhere? I'd be
> > interested to get some mappings to find out who my most abusive users
> > are.
> Come to think of it, I've found myself in the same situation... wondering who
> is beating up my NFS servers. This would be a great tool to have.
Sure. In its current incarnation it's a Python script using a pcap
module I found in some package that I can no longer identify.... It's
also quite tied to my current file structure, though some of that also
has to do with the earlier assumptions I made about filehandles.
The biggest challenge for general usage is the inode -> pathname
translation. In my case there are currently about 900 files I can look
up before the packet capture starts, and cache those inode numbers. A
general utility will require a different solution, starting with the
existing method of snooping getattr and other calls. Any other
suggestions on how I can generically convert inodes to pathnames without
massive overhead?
Once I have my current code doing what I want I'll clean it up and post
it here so it can be hacked into a useful tool.
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Sun, 9 May 2004, Erik Walthinsen wrote:
> The biggest challenge for general usage is the inode -> pathname
> translation.
Can't this be done quicker on the server ?
I'm not saying that the program should be run on the server because of
this reason - only that if it's run on the server it might be more
efficient. I see 2 situations here:
- when you have several clients and one server and wonder who is
banging on the server - then you run the program on the server
- when you have several servers and wonder why has a client which
mounts FS-es from all a so high I/O-induced load - then you run the
program on the client
> Any other suggestions on how I can generically convert inodes to
> pathnames without massive overhead?
IMHO, the most important question is: do you want to convert _all_
inodes back to pathnames ? For example, what I'm interested in is the
cases where a process has one or a few large files open that are
accessed over and over again. I'm not at all interested in finding out
the pathnames of all the files in a source directory which will be
read exactly once during the build process and those of the
corresponding object files that will be written to, also exactly once.
Each source/object file can be transferred as one or more NFS chunks,
depending on the file size and the NFS chunk size. So I propose to
make some kind of caching and counting of the filehandles and only if
the same filehandle was present in a certain number of NFS chunks
should the program attempt to get the corresponding pathname.
However, by setting this number to something low (even 1), the admin
can also see such kind of accessed-once files.
Also, there should be a cache of already converted filenames, such
that accessing a file often does not result in many conversions; from
what you wrote, it appears that you are already doing this, only that
the cache is filled at startup (and probably never deleted from)
rather than dynamically at runtime. But be careful here: as the cache
grows, so is the time needed to check if the entry is cached or not up
to the point that it might be faster to do a new conversion.
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [email protected]
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Mon, 2004-05-10 at 01:02, Bogdan Costescu wrote:
> Can't this be done quicker on the server ?
I never specified one way or another, but yes, this is done on the
server.
However, it doesn't change the problem that in order to do an
inode->pathname translation, without some tricks I don't know about, you
have to stat() every single file on the entire filesystem. Yes, this is
faster on the server, I never considered running such a search from a
client.
The only hack I can think of would be if the server is running ext3 or
some other filesystem with a debugger, and see if it's possible to
perform the reverse mapping that way, quickly. Very much not my
favorite option though.
> IMHO, the most important question is: do you want to convert _all_
> inodes back to pathnames ? For example, what I'm interested in is the
> cases where a process has one or a few large files open that are
> accessed over and over again. I'm not at all interested in finding out
> the pathnames of all the files in a source directory which will be
> read exactly once during the build process and those of the
> corresponding object files that will be written to, also exactly once.
> Each source/object file can be transferred as one or more NFS chunks,
> depending on the file size and the NFS chunk size. So I propose to
> make some kind of caching and counting of the filehandles and only if
> the same filehandle was present in a certain number of NFS chunks
> should the program attempt to get the corresponding pathname.
> However, by setting this number to something low (even 1), the admin
> can also see such kind of accessed-once files.
OK, that can be done fairly easily, esp with Python.
> Also, there should be a cache of already converted filenames, such
> that accessing a file often does not result in many conversions; from
> what you wrote, it appears that you are already doing this, only that
> the cache is filled at startup (and probably never deleted from)
> rather than dynamically at runtime. But be careful here: as the cache
> grows, so is the time needed to check if the entry is cached or not up
> to the point that it might be faster to do a new conversion.
The cache I create is static because new files are *extremely* rare.
Because I don't have more than 1k files currently, performance
optimizations will likely be someone else's modification to the script
once I've got it built.
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs