2007-03-07 10:24:00

by Simon Peter

[permalink] [raw]
Subject: Delays on "first" access to a NFS mount

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


Attachments:
capture.txt (9.34 kB)
(No filename) (345.00 B)
(No filename) (140.00 B)
Download all attachments

2007-03-07 21:16:56

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

On Wed, Mar 07, 2007 at 04:07:39PM -0500, Talpey, Thomas wrote:
> I was actually trying to talk Simon out of trying, in case that wasn't
> obvious.

Probably good advice.

> But I'm really glad it bugs you! Keep thinking that way, maybe you can
> untangle it someday. :-)
>
> While you're thinking about it, what's the actual timeout on a given
> in-kernel export cache entry? There's a 120-second deadline on an
> unresolved cache miss being populated, but when, exactly, does an
> existing (resolved) entry go stale? I admit to having tried to figure it
> out once and wound up going in circles.

There's an expiry time that's passed down with each cache entry. In
this particular case it's 30 minutes. There's also a "flush" file you
can write to to ask that the whole cache be flushed. I don't remember
how this works in detail, though.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 21:23:38

by Talpey, Thomas

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

At 04:17 PM 3/7/2007, J. Bruce Fields wrote:
>There's an expiry time that's passed down with each cache entry. In
>this particular case it's 30 minutes. There's also a "flush" file you
>can write to to ask that the whole cache be flushed. I don't remember
>how this works in detail, though.

Aha - so the time comes from mountd. There's some sort of refresh
timer that the kernel triggers though. So it's not a deadline of this
time (I think). Or is it.

The "flush" file lives in /proc/net/rpc/nfsd.export, and you write an
integer value to it. I *think* it then flushes any entries which are
more than that many seconds old.

The whole thing makes my head hurt and I try not to look at it.

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 21:41:08

by Simon Peter

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

> > This sounds like a job for inotify. The mountd could stat the
> > export root and use inotify_add_watch(2) to keep an eye on it to
> > see if the stat contents changed.
> Hm. Would it be enough just to hold an open file descriptor for the
> directory? Is it safe to assume that for any filesystem (uh, any disk
> filesystem anyway) that if you have something open then stat() on it
> won't have to go to the disk?

I think what Tom had in mind was to stat all directories once, remember
their values, have inotify keep an eye on 'em and whenever they change,
update the remembered values. This way, disk access would only have to
be done whenever something changes, which is when the disk is spun up
anyway.

Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 21:46:48

by Simon Peter

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

> Though there's one point I'm unclear on: are the directories
> you're exporting mountpoints? That's the normal

Not all of my exported directories are mountpoints of the underlying
VFS of the server. Some are, though.

> - the filehandle->export mapping that this function tells the
> kernel about is cached by nfsd for a half-hour. That time
> is set a little later in nfsd_fh:
> I don't think there would be any harm to just changing that
> time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
> should be invalidating that cache explicitly whenever it's
> needed. Maybe that should be the default.

I could try that for now.

Are you sure these are invalidated automatically, especially through
nfs-utils? If the kernel cache never expires, it should consequently
never ask for it, so nfs-utils would not be involved. Am I missing
something?

Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 21:53:37

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

On Wed, Mar 07, 2007 at 04:23:23PM -0500, Talpey, Thomas wrote:
> At 04:17 PM 3/7/2007, J. Bruce Fields wrote:
> >There's an expiry time that's passed down with each cache entry. In
> >this particular case it's 30 minutes. There's also a "flush" file you
> >can write to to ask that the whole cache be flushed. I don't remember
> >how this works in detail, though.
>
> Aha - so the time comes from mountd. There's some sort of refresh
> timer that the kernel triggers though. So it's not a deadline of this
> time (I think). Or is it.

The kernel sweeps through the cache every now and then and cleans out
expired entries. I think it also takes note of the earliest future
expiry it runs across in the process, and uses that to decide when to
check next. This is all in linux/net/sunrpc/cache.c:cache_clean().

> The "flush" file lives in /proc/net/rpc/nfsd.export, and you write an
> integer value to it. I *think* it then flushes any entries which are
> more than that many seconds old.

Right.

> The whole thing makes my head hurt and I try not to look at it.

Well, non-head-hurty ideas always welcomed. I've got two export-related
problems to fix:

- Our current NFSv4 pseudofs fsid=0 hack is a pain to administer
and results in inconsistent paths across different NFS
versions.

- The trick of using the pseudoflavor as a client name (so doing

/export gss/krb5(rw)

instead of

/export *(sec=krb5,rw)

), is inconsistent with what other os's do, and makes it
impossible to specify restrictions based both on flavor and on
ip network/dns name/netgroup.

While I'm at it Trond and Christoph and others seem to be asking whether
we can't make some more fundamental changes, such as:

- Maintaining a static in-kernel exports table instead of
loading it on demand from mountd, and

- divorcing the exports namespace completely from any local
process namespace, to the extent that you could even just say
"I want to export /dev/sda7 as /usr/local/bin" without first
mounting /dev/sda7 someplace.

But I really need a better idea of the requirements on the exports
system. And some other examples to look at wouldn't hurt either. (Take
pity on me, Linux is all I know....)

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 22:05:04

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

On Wed, Mar 07, 2007 at 10:46:24PM +0100, Simon Peter wrote:
> > Though there's one point I'm unclear on: are the directories
> > you're exporting mountpoints? That's the normal
>
> Not all of my exported directories are mountpoints of the underlying
> VFS of the server.

I'd be curious why. There's some hard-to-solve security problems
there--people can guess filehandles of unexported files and access them
directly without lookups. So some day I'd love to actually forbid (or
at least strongly discourage) what you're doing.... But clearly we'd
first need to understand why people do that and make sure there are
adequate alternatives.

> Some are, though.

Are the spinning-up delays happening only on those drives that have
exported directories that aren't mountpoints?

> > - the filehandle->export mapping that this function tells the
> > kernel about is cached by nfsd for a half-hour. That time
> > is set a little later in nfsd_fh:
> > I don't think there would be any harm to just changing that
> > time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
> > should be invalidating that cache explicitly whenever it's
> > needed. Maybe that should be the default.
>
> I could try that for now.
>
> Are you sure these are invalidated automatically, especially through
> nfs-utils? If the kernel cache never expires, it should consequently
> never ask for it, so nfs-utils would not be involved. Am I missing
> something?

There's also a mechanism by which nfs-utils can ask for the whole cache
to be flushed immediately on its own. So re-running exportfs to change
the exports, for example, should result in the cache being flushed. I
haven't checked whether that's done in all the places it should be, but
it probably is.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 23:20:07

by Simon Peter

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

> > Not all of my exported directories are mountpoints of the underlying
> > VFS of the server.
> I'd be curious why. There's some hard-to-solve security problems
> there--people can guess filehandles of unexported files and access
> them directly without lookups. So some day I'd love to actually
> forbid (or at least strongly discourage) what you're doing.... But
> clearly we'd first need to understand why people do that and make
> sure there are adequate alternatives.

Well, I've actually done it for security (not knowing what you just
said about it). There are some directories on those disks that I don't
want people to poke around in, so I don't export the whole filesystem
of a disk. For some other directories, I have different access
constraints.

For example, there's one subdirectory that I export to two subnets and
one that is only exported to one of them. I do that because I have an
"access granting" security philosophy: At first, any access is denied
and then I grant access only to those people who can make use of their
granted resources. Since one of those directories is only useful to
the users of that one subnet, I only export it for that one.

> > Some are, though.
> Are the spinning-up delays happening only on those drives that have
> exported directories that aren't mountpoints?

I just notice that I was wrong. No exports are on mountpoints. I'm
sorry.

> > Are you sure these are invalidated automatically, especially through
> > nfs-utils? If the kernel cache never expires, it should consequently
> > never ask for it, so nfs-utils would not be involved. Am I missing
> > something?
> There's also a mechanism by which nfs-utils can ask for the whole
> cache to be flushed immediately on its own. So re-running exportfs
> to change the exports, for example, should result in the cache being
> flushed. I haven't checked whether that's done in all the places it
> should be, but it probably is.

Okay. So if we really only need major, minor and inode information,
like Neil said, then that would work. Because otherwise the data on
disk could change without the kernel noticing.

Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-08 13:27:35

by Olaf Kirch

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

On Wednesday 07 March 2007 22:54, J. Bruce Fields wrote:
> - Maintaining a static in-kernel exports table instead of
> loading it on demand from mountd, and

Well, the original implementation did just that, and people kept
forgetting to re-run exportfs after changing the exports table,
and whatnot. Lots of gross inconsistencies. The addition of a
dynamic exports table was considered a sliced bread kind of
innovation... so it does feel like time warp when we talk about
a static export table now.

> - divorcing the exports namespace completely from any local
> process namespace, to the extent that you could even just say
> "I want to export /dev/sda7 as /usr/local/bin" without first
> mounting /dev/sda7 someplace.

Is that really a desirable goal? From an admin's point of view,
file names are usually more "natural" than using fs uuids or
retro stuff such as device file names (the udev people would
actually hit you with their "device numbers are smoke and mirrors"
bat now).

Users actually want things like "I export /mnt and then clients
can see the contents of the CD mounted on /mnt/cdrom"

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-08 15:49:52

by Simon Peter

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

> - the filehandle->export mapping that this function tells the
> kernel about is cached by nfsd for a half-hour. That time
> is set a little later in nfsd_fh:
> qword_printint(f, time(0)+30*60);
> I don't think there would be any harm to just changing that
> time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
> should be invalidating that cache explicitly whenever it's
> needed. Maybe that should be the default.

I've done so and it seems to work! Been using the changed version the
whole day.

Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-08 21:45:34

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

On Thu, Mar 08, 2007 at 02:27:08PM +0100, Olaf Kirch wrote:
> On Wednesday 07 March 2007 22:54, J. Bruce Fields wrote:
> > - Maintaining a static in-kernel exports table instead of
> > loading it on demand from mountd, and
>
> Well, the original implementation did just that, and people kept
> forgetting to re-run exportfs after changing the exports table,

What exactly changed? It's not the case today that you can expect
modifications to /etc/exports to be noticed automatically.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-09 13:02:48

by Simon Peter

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

> - the filehandle->export mapping that this function tells the
> kernel about is cached by nfsd for a half-hour. That time
> is set a little later in nfsd_fh:
> qword_printint(f, time(0)+30*60);
> I don't think there would be any harm to just changing that
> time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
> should be invalidating that cache explicitly whenever it's
> needed. Maybe that should be the default.

Is there any way that we could see this change incorporated into
nfs-utils? I certainly would like to have it.

Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-09 14:58:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

On Fri, Mar 09, 2007 at 02:02:37PM +0100, Simon Peter wrote:
> > - the filehandle->export mapping that this function tells the
> > kernel about is cached by nfsd for a half-hour. That time
> > is set a little later in nfsd_fh:
> > qword_printint(f, time(0)+30*60);
> > I don't think there would be any harm to just changing that
> > time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
> > should be invalidating that cache explicitly whenever it's
> > needed. Maybe that should be the default.
>
> Is there any way that we could see this change incorporated into
> nfs-utils? I certainly would like to have it.

Make a diff showing that change (and nothing else), stick an explanation
of the change and why it's correct at the top, and mail it to Neil cc'd
to this list?....

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 12:38:33

by Talpey, Thomas

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

The delay seems to be on the server side. What is the server running?
Does it have any nameservice issues? Delays on the first response can
often be due to server exports checking, which usually requires reverse
name lookups.

Tom.

At 05:23 AM 3/7/2007, Simon Peter wrote:
>Hi,
>
>I get a good 10 second delay anytime I am accessing my NFS mounts from
>a client for the first time (or after a long time not accessing them --
>I suppose whenever the cache is cleared or something similar).
>
>I usually did not bother, even though it is very annoying, but this
>time I collected a network protocol capture, which is attached. Notice
>the big delay between packet #6 and #8, while #7 should show that it is
>not a network issue.
>
>I would be very glad if somebody could explain these delays.
>
>Thanks,
>Simon


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 13:23:11

by Simon Peter

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

The server runs Debian unstable:
Linux server 2.6.18-3-k7 #1 SMP Sun Dec 10 20:17:39 UTC 2006 i686
GNU/Linux

And I use classic NFSv3, exporting only to that subnet with options
(rw,sync). The client uses solely the intr option.

There shouldn't be any nameservice issues since the server is also
running the nameserver and is the authority for the subnet that I am
mounting from. I tried canonical and reverse lookups of the client's
name/IP from that server, all without delay. How would I check for
nameservice issues?

The server has some disks with exported directories that spin down after
some idle time, but the disk of that particular mount point that I am
using is always online. Maybe the server somehow checks all exports all
the time and not just the particular requested one and thus spins up
all the other disks?

Simon

> The delay seems to be on the server side. What is the server running?
> Does it have any nameservice issues? Delays on the first response can
> often be due to server exports checking, which usually requires
> reverse name lookups.
>
> Tom.
>
> At 05:23 AM 3/7/2007, Simon Peter wrote:
> >Hi,
> >
> >I get a good 10 second delay anytime I am accessing my NFS mounts
> >from a client for the first time (or after a long time not accessing
> >them -- I suppose whenever the cache is cleared or something
> >similar).
> >
> >I usually did not bother, even though it is very annoying, but this
> >time I collected a network protocol capture, which is attached.
> >Notice the big delay between packet #6 and #8, while #7 should show
> >that it is not a network issue.
> >
> >I would be very glad if somebody could explain these delays.
> >
> >Thanks,
> >Simon


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 15:06:47

by Simon Peter

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

Hi again,

just verified that the server indeed spins up all disks before
answering the request. I thus suspect it is somehow checking all exports
whenever any one export is mounted. Is this correct behaviour?

Simon

> The delay seems to be on the server side. What is the server running?
> Does it have any nameservice issues? Delays on the first response can
> often be due to server exports checking, which usually requires
> reverse name lookups.
>
> Tom.
>
> At 05:23 AM 3/7/2007, Simon Peter wrote:
> >Hi,
> >
> >I get a good 10 second delay anytime I am accessing my NFS mounts
> >from a client for the first time (or after a long time not accessing
> >them -- I suppose whenever the cache is cleared or something
> >similar).
> >
> >I usually did not bother, even though it is very annoying, but this
> >time I collected a network protocol capture, which is attached.
> >Notice the big delay between packet #6 and #8, while #7 should show
> >that it is not a network issue.
> >
> >I would be very glad if somebody could explain these delays.
> >
> >Thanks,
> >Simon


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 15:10:09

by Simon Peter

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

That should have read "accessed", not "mounted". The export is already
mounted. :)

Simon

> Hi again,
>
> just verified that the server indeed spins up all disks before
> answering the request. I thus suspect it is somehow checking all
> exports whenever any one export is mounted. Is this correct behaviour?
>
> Simon
>
> > The delay seems to be on the server side. What is the server
> > running? Does it have any nameservice issues? Delays on the first
> > response can often be due to server exports checking, which usually
> > requires reverse name lookups.
> >
> > Tom.
> >
> > At 05:23 AM 3/7/2007, Simon Peter wrote:
> > >Hi,
> > >
> > >I get a good 10 second delay anytime I am accessing my NFS mounts
> > >from a client for the first time (or after a long time not
> > >accessing them -- I suppose whenever the cache is cleared or
> > >something similar).
> > >
> > >I usually did not bother, even though it is very annoying, but this
> > >time I collected a network protocol capture, which is attached.
> > >Notice the big delay between packet #6 and #8, while #7 should show
> > >that it is not a network issue.
> > >
> > >I would be very glad if somebody could explain these delays.
> > >
> > >Thanks,
> > >Simon
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 15:42:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

On Wed, Mar 07, 2007 at 04:06:33PM +0100, Simon Peter wrote:
> just verified that the server indeed spins up all disks before
> answering the request. I thus suspect it is somehow checking all exports
> whenever any one export is mounted. Is this correct behaviour?

Hm. If you have the nfs-utils source, you can see there's a loop in

nfs-utils/utils/mountd/cache.c:nfsd_fh()

that stats the root of each export, in two places; the first it looks
like you shouldn't hit if you don't have the mountpoint export option
set:

if (exp->m_export.e_mountpoint &&
!is_mountpoint(exp->m_export.e_mountpoint[0]?
exp->m_export.e_mountpoint:
exp->m_export.e_path))
dev_missing ++;

The second is to figure out which filesystem the filehandle that you passed in
that getattr is for:

if (stat(exp->m_export.e_path, &stb) != 0)
continue;
if (fsidtype == 1 &&
((exp->m_export.e_flags & NFSEXP_FSID) == 0 ||
exp->m_export.e_fsid != fsidnum))
continue;
if (fsidtype != 1) {
if (stb.st_ino != inode)
continue;
if (major != major(stb.st_dev) ||
minor != minor(stb.st_dev))
continue;
}
/* It's a match !! */

You could stick a printf() in there somewhere or something to check whether
this is really where it's waiting.

Could we cache the stat information in the export and then double-check it if
necessary when there's a match? Or is there some way we could get the kernel
to keep that cached for us?

It seems reasonable to want to export filesystems from a bunch of disks without
necessarily keeping them all spun up all the time.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 18:44:31

by Simon Peter

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

> > just verified that the server indeed spins up all disks before
> > answering the request. I thus suspect it is somehow checking all
> > exports whenever any one export is accessed. Is this correct
> > behaviour?
> Hm. If you have the nfs-utils source, you can see there's a loop in
> nfs-utils/utils/mountd/cache.c:nfsd_fh()
> that stats the root of each export, in two places; the first it looks
> like you shouldn't hit if you don't have the mountpoint export option
> set:

Correct. This one is never hit in my case.

> The second is to figure out which filesystem the filehandle that you
> passed in that getattr is for:
> if (stat(exp->m_export.e_path, &stb) != 0)
> continue;

This is where the wait for the respective disk to spin up occurs.

> Could we cache the stat information in the export and then
> double-check it if necessary when there's a match? Or is there some
> way we could get the kernel to keep that cached for us?

I could certainly cook up a patch for mountd to cache that information
on its own. I don't have too much clue about how the kernel does its
cacheing, though. If it's useful to do that directly in mountd, I could
get my hands on it.

> It seems reasonable to want to export filesystems from a bunch of
> disks without necessarily keeping them all spun up all the time.

This is at least what I would like to see...

Thanks,
Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 20:29:17

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

On Wed, Mar 07, 2007 at 07:44:18PM +0100, Simon Peter wrote:
> > > just verified that the server indeed spins up all disks before
> > > answering the request. I thus suspect it is somehow checking all
> > > exports whenever any one export is accessed. Is this correct
> > > behaviour?
> > Hm. If you have the nfs-utils source, you can see there's a loop in
> > nfs-utils/utils/mountd/cache.c:nfsd_fh()
> > that stats the root of each export, in two places; the first it looks
> > like you shouldn't hit if you don't have the mountpoint export option
> > set:
>
> Correct. This one is never hit in my case.
>
> > The second is to figure out which filesystem the filehandle that you
> > passed in that getattr is for:
> > if (stat(exp->m_export.e_path, &stb) != 0)
> > continue;
>
> This is where the wait for the respective disk to spin up occurs.

OK, cool, so we understand the problem.

> > Could we cache the stat information in the export and then
> > double-check it if necessary when there's a match? Or is there some
> > way we could get the kernel to keep that cached for us?
>
> I could certainly cook up a patch for mountd to cache that information
> on its own. I don't have too much clue about how the kernel does its
> cacheing, though. If it's useful to do that directly in mountd, I
> could get my hands on it.

There's two caches involved:

- the filesystem caches attributes so that subsequent stat's of
the exported directory can be answered without having to go to
disk. I guess it's not suprising that that wouldn't be cached
anymore if you hadn't touched the filesystem in a long time.
Though there's one point I'm unclear on: are the directories
you're exporting mountpoints? That's the normal
configuration, and in that case I would've thought the inode
for that directory would be pinned in memory so the stat
wouldn't have to go to disk. I'm probably missing something.

- the filehandle->export mapping that this function tells the
kernel about is cached by nfsd for a half-hour. That time is
set a little later in nfsd_fh:

qword_printint(f, time(0)+30*60);

I don't think there would be any harm to just changing that
time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
should be invalidating that cache explicitly whenever it's
needed. Maybe that should be the default.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 20:32:21

by Talpey, Thomas

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

At 01:44 PM 3/7/2007, Simon Peter wrote:
>> Could we cache the stat information in the export and then
>> double-check it if necessary when there's a match? Or is there some
>> way we could get the kernel to keep that cached for us?
>
>I could certainly cook up a patch for mountd to cache that information
>on its own. I don't have too much clue about how the kernel does its
>cacheing, though. If it's useful to do that directly in mountd, I could
>get my hands on it.

This sounds like a job for inotify. The mountd could stat the export root
and use inotify_add_watch(2) to keep an eye on it to see if the stat
contents changed. Since the export already has a reference, it doesn't
seem offhand like it would change things much, operationally. Of course,
making mountd depend on an optional facility might be an issue, but it
could always fall back to the current behavior.

You probably don't want to sign up for enhancing the in-kernel export
cache. :-) Let's just say it's a bit mysterious, especially its interaction
with mountd.

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 20:49:47

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

On Wed, Mar 07, 2007 at 03:31:49PM -0500, Talpey, Thomas wrote:
> At 01:44 PM 3/7/2007, Simon Peter wrote:
> >> Could we cache the stat information in the export and then
> >> double-check it if necessary when there's a match? Or is there some
> >> way we could get the kernel to keep that cached for us?
> >
> >I could certainly cook up a patch for mountd to cache that information
> >on its own. I don't have too much clue about how the kernel does its
> >cacheing, though. If it's useful to do that directly in mountd, I could
> >get my hands on it.
>
> This sounds like a job for inotify. The mountd could stat the export root
> and use inotify_add_watch(2) to keep an eye on it to see if the stat
> contents changed.

Hm. Would it be enough just to hold an open file descriptor for the
directory? Is it safe to assume that for any filesystem (uh, any disk
filesystem anyway) that if you have something open then stat() on it
won't have to go to the disk?

> You probably don't want to sign up for enhancing the in-kernel export
> cache. :-) Let's just say it's a bit mysterious, especially its interaction
> with mountd.

I sympathize, though this is actually one of the few mysteries of our
nfs implementation that I feel like I understand, at least on alternate
Thursdays.... What's bugging me a lot these days is I don't understand
well enough why it's the way it is and what might need to be done to
make it better.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-07 21:08:14

by Talpey, Thomas

[permalink] [raw]
Subject: Re: Delays on "first" access to a NFS mount

At 03:50 PM 3/7/2007, J. Bruce Fields wrote:
>> You probably don't want to sign up for enhancing the in-kernel export
>> cache. :-) Let's just say it's a bit mysterious, especially its interaction
>> with mountd.
>
>I sympathize, though this is actually one of the few mysteries of our
>nfs implementation that I feel like I understand, at least on alternate
>Thursdays.... What's bugging me a lot these days is I don't understand
>well enough why it's the way it is and what might need to be done to
>make it better.

I was actually trying to talk Simon out of trying, in case that wasn't
obvious.

But I'm really glad it bugs you! Keep thinking that way, maybe you can
untangle it someday. :-)

While you're thinking about it, what's the actual timeout on a given
in-kernel export cache entry? There's a 120-second deadline on an
unresolved cache miss being populated, but when, exactly, does an
existing (resolved) entry go stale? I admit to having tried to figure it
out once and wound up going in circles.

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs