2005-10-06 12:57:26

by Leif Nixon

[permalink] [raw]
Subject: Cache invalidation bug in v3

Hi,

We have come across a bug where a v3 client fails to invalidate its
data cache for a file even though it realizes that the file attributes
have changed. We have been able to recreate the bug on a range of
kernel versions and different underlying file systems.

Here's a minimal way to reproduce the error (there seems to be some
timing issues involved, but this has worked at least 90% of the time):

NFS client n1 NFS client n2

$ echo 1 > f
$ cat f
1
$ touch .
$ echo 2 > f
$ touch f
$ cat f
1

Now client n2 is stuck in a state where it uses its old cached data
forever (or at least for several hours):

NFS client n1 NFS client n2

$ cat f
2
$ cat f
1

However, stat(1) gives the same output on both clients. "touch f" on
either machine corrects the situation; n2 invalidates its data cache.

We have seen this on a range of kernels between 2.6.9 and 2.6.13.2 on
Debian, CentOS, RHEL, Fedora and vanilla kernel.org, on both clients
and server. We have *not* been able to reproduce the bug with a
Solaris server. Underlying file systems have been ext3 and xfs (and
Solaris ufs). We have tried varying mount options, but to no avail;
the bug persists, even with "noac".


Hypothesis:

When n2 does "touch f" and wants to do SETATTR, it first has to do a
LOOKUP (because n1 has updated the attributes on cwd with "touch .").
It seems that when n2 receives the updated attributes for f as a part
of the LOOKUP reply, it updates its attribute cache without
invalidating its data cache, leading to the anomalous situation.

If the "touch ." is omitted, n2 receives the updated file attributes
via an explicit GETATTR on f, and then everything works properly.

--
Leif Nixon - Systems expert
------------------------------------------------------------
National Supercomputer Centre - Linkoping University
------------------------------------------------------------


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-10-06 13:36:36

by Leif Nixon

[permalink] [raw]
Subject: Re: Cache invalidation bug in v3

Leif Nixon <[email protected]> writes:

> Here's a minimal way to reproduce the error (there seems to be some
> timing issues involved, but this has worked at least 90% of the time):
>
> NFS client n1 NFS client n2
>
> $ echo 1 > f
> $ cat f
> 1
> $ touch .
> $ echo 2 > f
> $ touch f
> $ cat f
> 1

Ah, yes, I forgot one strange point; the second write to the file must
not change the size of the file, else the bug doesn't appear. So if
you change "echo 2 > f" to "echo foo > f", everything works just
fine...

--
Leif Nixon - Systems expert
------------------------------------------------------------
National Supercomputer Centre - Linkoping University
------------------------------------------------------------


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-10-06 21:36:20

by David Warren

[permalink] [raw]
Subject: Re: Cache invalidation bug in v3

This is the same thing I reported about a week and a half ago. While you
can't replicate it on a solaris server, I could replicate it with a
solaris client. I also can not replicate it with NFSv4. I had several
configuration suggestions sent to me. None of them changed anything.

Leif Nixon wrote:

>Hi,
>
>We have come across a bug where a v3 client fails to invalidate its
>data cache for a file even though it realizes that the file attributes
>have changed. We have been able to recreate the bug on a range of
>kernel versions and different underlying file systems.
>
>Here's a minimal way to reproduce the error (there seems to be some
>timing issues involved, but this has worked at least 90% of the time):
>
> NFS client n1 NFS client n2
>
> $ echo 1 > f
> $ cat f
> 1
> $ touch .
> $ echo 2 > f
> $ touch f
> $ cat f
> 1
>
>Now client n2 is stuck in a state where it uses its old cached data
>forever (or at least for several hours):
>
> NFS client n1 NFS client n2
>
> $ cat f
> 2
> $ cat f
> 1
>
>However, stat(1) gives the same output on both clients. "touch f" on
>either machine corrects the situation; n2 invalidates its data cache.
>
>We have seen this on a range of kernels between 2.6.9 and 2.6.13.2 on
>Debian, CentOS, RHEL, Fedora and vanilla kernel.org, on both clients
>and server. We have *not* been able to reproduce the bug with a
>Solaris server. Underlying file systems have been ext3 and xfs (and
>Solaris ufs). We have tried varying mount options, but to no avail;
>the bug persists, even with "noac".
>
>
>Hypothesis:
>
>When n2 does "touch f" and wants to do SETATTR, it first has to do a
>LOOKUP (because n1 has updated the attributes on cwd with "touch .").
>It seems that when n2 receives the updated attributes for f as a part
>of the LOOKUP reply, it updates its attribute cache without
>invalidating its data cache, leading to the anomalous situation.
>
>If the "touch ." is omitted, n2 receives the updated file attributes
>via an explicit GETATTR on f, and then everything works properly.
>
>
>


--
David Warren INTERNET: [email protected]
(206) 543-0945 Fax: (206) 543-0308
University of Washington
Dept of Atmospheric Sciences, Box 351640
Seattle, WA 98195-1640
-------------------------------------------------------------------------------
DECUS E-PUBS Library Committee representative
SeaLUG DECUS Chair



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-10-07 12:33:08

by Leif Nixon

[permalink] [raw]
Subject: Re: Cache invalidation bug in v3

David Warren <[email protected]> writes:

> This is the same thing I reported about a week and a half ago.

Yes, or at least similar.

> While you can't replicate it on a solaris server, I could replicate
> it with a solaris client.

I have failed to replicate my problem with two Solaris clients against
a 2.6.11 Linux server.

Trond, you thought David's problem might be caused by re-use of
inodes, but in this case, i.e.:

NFS client n1 NFS client n2

$ echo 1 > f
$ cat f
1
$ touch .
$ echo 2 > f
$ touch f
$ cat f
1

there is only a single inode involved.

--
Leif Nixon - Systems expert
------------------------------------------------------------
National Supercomputer Centre - Linkoping University
------------------------------------------------------------


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs