2004-04-12 20:38:18

by David Mansfield

[permalink] [raw]
Subject: NFS file handle cached incorrectly


Hi Trond,

I have a problem with the fedora core 1 kernels, which as I understand,
have a patch which you originally created which creates a function called
nfs_cached_lookup, as well as adding handling for READDIRPLUS.

I'm guessing that this is the cause, after looking at diffs between
systems that work and those that don't.

It would appear this patch hasn't made it to the mainline 2.4 kernels, and
I haven't tested the 2.6 for this problem, so I understand if this is
something you cannot help me with.

(I have filed a bugzilla with redhat, but it is languishing in NEW state
for a while now bug#118922).

For this test, client is FC1, NFS server is RHEL3, 'othermachine' is any
other NFS client that also mounts the same network home directory.
Current working directory is home directory (NFS mounted). You'll need an
ssh-agent assisted session to get the timing right.

To reproduce (all as one command to get the timing right):

rm foo; date >foo; sleep 1; cat foo; \
ssh -x othermachine 'touch foo.new; rm foo; mv foo.new foo'; cat foo; date

Basically, what happens is that the inode (on server) of the file 'foo'
changes 'out from under' the client (by another client, 'othermachine',
that has mounted same directory). Client then uses the cached file handle
and gets a 'stale nfs handle' error visible to user space for the last
'cat foo'.

If more than one second passes between the rename on 'othermachine' and
the 'cat foo' on the client, the problem doesn't appear.

In reality, this is adversely affecting 'ssh' with X11 forwarding (funny
things happening with the xauth program on either end of the ssh).

Could we retry the LOOKUP if the fh comes from the cache and we get a
stale file handle error automatically?

Any other ideas?

David

--
/==============================\
| David Mansfield |
| [email protected] |
\==============================/


2004-04-12 21:02:04

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS file handle cached incorrectly

P? m , 12/04/2004 klokka 13:38, skreiv David Mansfield:

> rm foo; date >foo; sleep 1; cat foo; \
> ssh -x othermachine 'touch foo.new; rm foo; mv foo.new foo'; cat foo; date
>
> Basically, what happens is that the inode (on server) of the file 'foo'
> changes 'out from under' the client (by another client, 'othermachine',
> that has mounted same directory). Client then uses the cached file handle
> and gets a 'stale nfs handle' error visible to user space for the last
> 'cat foo'.
>
> If more than one second passes between the rename on 'othermachine' and
> the 'cat foo' on the client, the problem doesn't appear.
>
> In reality, this is adversely affecting 'ssh' with X11 forwarding (funny
> things happening with the xauth program on either end of the ssh).
>
> Could we retry the LOOKUP if the fh comes from the cache and we get a
> stale file handle error automatically?

No! The file handle is assumed to be correct if the directory
revalidated without any errors. It is cached until it is needed. The
ESTALE occurs long after we've exited from LOOKUP.

> Any other ideas?

I am sorry: you are simply violating the NFS caching premises. This is
something that is not *ever* guaranteed to work whether or not you have
READDIRPLUS enabled.
The problem here is rather that you are making remote modifications to
the NFS server's directory within < 1second (which is the resolution on
"mtime" on Linux 2.4.x) of the previous modification. Linux (and all
other NFS clients that I'm aware of) uses the mtime in order to decide
whether or not a file/directory/... has been modified since the cache
was last updated (unless it is a modification that was made by this
client).

The only "solution" to your problem here is to upgrade the *server* to
Linux-2.6.x: the latter has 1 nanosecond resolution on the "mtime", and
so can register modifications that are far smaller than 1second.

Cheers,
Trond

2004-04-12 21:07:33

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS file handle cached incorrectly

P? m , 12/04/2004 klokka 14:01, skreiv Trond Myklebust:
> The problem here is rather that you are making remote modifications to
> the NFS server's directory within < 1second (which is the resolution on
> "mtime" on Linux 2.4.x) of the previous modification. Linux (and all
> other NFS clients that I'm aware of) uses the mtime in order to decide
> whether or not a file/directory/... has been modified since the cache
> was last updated (unless it is a modification that was made by this
> client).

Clarification: the problem is IOW the fact that the server will not
update mtime for any changes that are made within 1 second of one
another. The same client will work fine with any server that has better
resolution on mtime. Hence the suggestion:

> The only "solution" to your problem here is to upgrade the *server* to
> Linux-2.6.x: the latter has 1 nanosecond resolution on the "mtime", and
> so can register modifications that are far smaller than 1second.

Cheers,
Trond

2004-04-13 14:25:48

by David Mansfield

[permalink] [raw]
Subject: Re: NFS file handle cached incorrectly

On Mon, 12 Apr 2004, Trond Myklebust wrote:

> P? m , 12/04/2004 klokka 14:01, skreiv Trond Myklebust:
> > The problem here is rather that you are making remote modifications to
> > the NFS server's directory within < 1second (which is the resolution on
> > "mtime" on Linux 2.4.x) of the previous modification. Linux (and all
> > other NFS clients that I'm aware of) uses the mtime in order to decide
> > whether or not a file/directory/... has been modified since the cache
> > was last updated (unless it is a modification that was made by this
> > client).
>
> Clarification: the problem is IOW the fact that the server will not
> update mtime for any changes that are made within 1 second of one
> another. The same client will work fine with any server that has better
> resolution on mtime. Hence the suggestion:
>
> > The only "solution" to your problem here is to upgrade the *server* to
> > Linux-2.6.x: the latter has 1 nanosecond resolution on the "mtime", and
> > so can register modifications that are far smaller than 1second.
>

I don't think this is quite correct. The 1 second or less gap is not
between two modifications of the directory. It is between the initial
lookup and a remote modification. The mtime IS being updated, it's
just not being checked. ie.

time t: file foo is created on client2 (no lookup happens on client1),
directory mtime = t
time t+10: file foo is accessed on client1, readdirplus, cache is 'as of'
time t+10
time t+10.5: file foo is replaced with different file on client2,
directory mtime = t + 10 (only full second granularity)
time t+10.75: file foo is accessed on client1, using stale handle.

So at time t+10.75, the mtime of foo has changed since initial access, and
the mtime of the directory has changed. Neither is checked because the
readdirplus happened within a second. The directory mtime is not even
checked. (I looked at tcpdump).

Try this one:

#
# create file remotely. (don't cause lookup on client1)
# sleep any number of seconds
# access file on client1 (cause lookup)
# replace file remotely
# access file on client1 (no lookup): stale file handle visible to
# user space. (rh9 retries this)
#

ssh -x client2 'rm foo; date >foo; ls -i foo; stat .'; \
sleep 3; \
cat foo; \
ssh -x client2 'touch foo.new; rm foo; mv foo.new foo; ls -i foo; stat .';\
cat foo; \
date


Looking at redhat 9 nfs (which doesn't have the same problem) it looks
like it IS retrying the lookup when the cached file handle is stale. This
is a tcpdump of the last access that under FC1 generates the 'stale file
handle' visible to userspace (sorry about the line wrap mangling)

10:20:09.359534 208.222.80.103.2205827529 > 208.222.80.60.2049: 120
getattr fh Unknown/1 (DF)
10:20:09.360185 208.222.80.60.2049 > 208.222.80.103.2205827529: reply ok
32 getattr ERROR: Stale NFS file handle (DF)
10:20:09.360202 208.222.80.103.800 > 208.222.80.60.nfs: . ack 733 win
63712 <nop,nop,timestamp 100962223 300082326> (DF)
10:20:09.360505 208.222.80.103.2222604745 > 208.222.80.60.2049: 124 lookup
fh Unknown/1 "foo" (DF)
10:20:09.361176 208.222.80.60.2049 > 208.222.80.103.2222604745: reply ok
236 lookup fh Unknown/1 (DF)
10:20:09.396778 208.222.80.103.800 > 208.222.80.60.nfs: . ack 969 win
63712 <nop,nop,timestamp 100962227 300082327> (DF)

You can see the access using a cached filehandle, the stale file handle
reply, then a new lookup returning a new handle.

David

--
/==============================\
| David Mansfield |
| [email protected] |
\==============================/

2004-04-13 16:57:38

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS file handle cached incorrectly

P? ty , 13/04/2004 klokka 07:25, skreiv David Mansfield:

> I don't think this is quite correct. The 1 second or less gap is not
> between two modifications of the directory. It is between the initial
> lookup and a remote modification. The mtime IS being updated, it's
> just not being checked. ie.

In my version of the patch (which I assume is what Steve grabbed) the
first thing nfs_cached_lookup() does is to call nfs_revalidate_inode().

It may well be that the directory attribute cache has not timed out yet.
If so, that will indeed cause the mtime not to be checked on the client.
If this really is a problem for you, then I suggest you make the
directory attribute caching less aggressive. 'man 5 nfs' and read up on
"acdirmin/acdirmax".

Cheers,
Trond