2005-03-12 11:56:27

by Junfeng Yang

[permalink] [raw]
Subject: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)


Hi,

We checked NFS on top of ext3 using FiSC (our file system model checker)
and found a case where NFS stat cache can contain inconsistent entries.

Basically, to trigger this inconsistency, just do the following steps:
1. create a file A1, write a few bytes to it, so A1 is 4 words
2. create a hard link A2, pointing to A1
3. stat on A2. A2's size is 4 words
4. truncate A1 to a larger size, write a few bytes at the end. now it's
1031 words.
5. stat on A2. it's size is still 4 words, which should be 1031 words

We have a test case to re-create this warning. You can download it at
http://fisc.stanford.edu/bug16/crash.c. It includes some sudo commands
to mount nfs partitions, which you might want to change according to your
local settings.

cat /etc/exports shows:
/mnt/sbd0-export localhost(rw,sync)
/mnt/sbd1-export localhost(rw,sync)

Let me know if you have any problems reproducing the warning. We'd
appreciate any confirmations/clarifications.

-Junfeng


2005-03-13 05:05:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

lau den 12.03.2005 Klokka 03:56 (-0800) skreiv Junfeng Yang:
> Hi,
>
> We checked NFS on top of ext3 using FiSC (our file system model checker)
> and found a case where NFS stat cache can contain inconsistent entries.
>
> Basically, to trigger this inconsistency, just do the following steps:
> 1. create a file A1, write a few bytes to it, so A1 is 4 words
> 2. create a hard link A2, pointing to A1
> 3. stat on A2. A2's size is 4 words
> 4. truncate A1 to a larger size, write a few bytes at the end. now it's
> 1031 words.
> 5. stat on A2. it's size is still 4 words, which should be 1031 words
>
> We have a test case to re-create this warning. You can download it at
> http://fisc.stanford.edu/bug16/crash.c. It includes some sudo commands
> to mount nfs partitions, which you might want to change according to your
> local settings.
>
> cat /etc/exports shows:
> /mnt/sbd0-export localhost(rw,sync)
> /mnt/sbd1-export localhost(rw,sync)
>
> Let me know if you have any problems reproducing the warning. We'd
> appreciate any confirmations/clarifications.
>

This is a known problem. Turn off the (default - grrr) subtree checking
export option on the server, and it will all work properly. The subtree
checking option violates the NFS standards for filehandle generation in
so many ways, that it isn't even funny.

Cheers,
Trond

--
Trond Myklebust <[email protected]>

2005-03-13 06:17:18

by Junfeng Yang

[permalink] [raw]
Subject: Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

> This is a known problem. Turn off the (default - grrr) subtree checking
> export option on the server, and it will all work properly. The subtree
> checking option violates the NFS standards for filehandle generation in
> so many ways, that it isn't even funny.

Thanks Trond. no_subtree_check fixes the problem.

-Junfeng

2005-03-13 20:04:41

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

On Sun, Mar 13, 2005 at 12:04:27AM -0500, Trond Myklebust wrote:
> lau den 12.03.2005 Klokka 03:56 (-0800) skreiv Junfeng Yang:
> > Hi,
> >
> > We checked NFS on top of ext3 using FiSC (our file system model checker)
> > and found a case where NFS stat cache can contain inconsistent entries.
> >
> > Basically, to trigger this inconsistency, just do the following steps:
> > 1. create a file A1, write a few bytes to it, so A1 is 4 words
> > 2. create a hard link A2, pointing to A1
> > 3. stat on A2. A2's size is 4 words
> > 4. truncate A1 to a larger size, write a few bytes at the end. now it's
> > 1031 words.
> > 5. stat on A2. it's size is still 4 words, which should be 1031 words
> >
> > We have a test case to re-create this warning. You can download it at
> > http://fisc.stanford.edu/bug16/crash.c. It includes some sudo commands
> > to mount nfs partitions, which you might want to change according to your
> > local settings.
> >
> > cat /etc/exports shows:
> > /mnt/sbd0-export localhost(rw,sync)
> > /mnt/sbd1-export localhost(rw,sync)
> >
> > Let me know if you have any problems reproducing the warning. We'd
> > appreciate any confirmations/clarifications.
> >
>
> This is a known problem. Turn off the (default - grrr) subtree checking
> export option on the server, and it will all work properly. The subtree
> checking option violates the NFS standards for filehandle generation in
> so many ways, that it isn't even funny.

I can't find any documentation about this, but it seems like the same
problem that has been causing me headaches lately; when I replace glibc
from the server side of an nfsroot, the client has a couple of
variously wrong reads before it sees the new files. If it breaks NFS
so badly, why is it the default for the Linux NFS server?

--
Daniel Jacobowitz
CodeSourcery, LLC

2005-03-13 20:42:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

su den 13.03.2005 Klokka 15:04 (-0500) skreiv Daniel Jacobowitz:

> I can't find any documentation about this, but it seems like the same
> problem that has been causing me headaches lately; when I replace glibc
> from the server side of an nfsroot, the client has a couple of
> variously wrong reads before it sees the new files. If it breaks NFS
> so badly, why is it the default for the Linux NFS server?

No, that's a very different issue: you are violating the NFS cache
consistency rules if you are changing a file that is being held open by
other machines.
The correct way to do the above is to use GNU install with the '-b'
option: that will rename the version of glibc that is in use, and then
install the new glibc in a different inode.

Cheers,
Trond
--
Trond Myklebust <[email protected]>

2005-03-13 20:46:52

by Trond Myklebust

[permalink] [raw]
Subject: Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

su den 13.03.2005 Klokka 15:42 (-0500) skreiv Trond Myklebust:

> No, that's a very different issue: you are violating the NFS cache
> consistency rules if you are changing a file that is being held open by
> other machines.
> The correct way to do the above is to use GNU install with the '-b'
> option: that will rename the version of glibc that is in use, and then
> install the new glibc in a different inode.

BTW: there is a more complete description of the NFS cache consistency
model in the NFS FAQ:

http://nfs.sourceforge.net/index.cel.php#faq_a8

Cheers,
Trond

--
Trond Myklebust <[email protected]>

2005-03-14 00:35:26

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

On Sun, Mar 13, 2005 at 03:42:29PM -0500, Trond Myklebust wrote:
> su den 13.03.2005 Klokka 15:04 (-0500) skreiv Daniel Jacobowitz:
>
> > I can't find any documentation about this, but it seems like the same
> > problem that has been causing me headaches lately; when I replace glibc
> > from the server side of an nfsroot, the client has a couple of
> > variously wrong reads before it sees the new files. If it breaks NFS
> > so badly, why is it the default for the Linux NFS server?
>
> No, that's a very different issue: you are violating the NFS cache
> consistency rules if you are changing a file that is being held open by
> other machines.
> The correct way to do the above is to use GNU install with the '-b'
> option: that will rename the version of glibc that is in use, and then
> install the new glibc in a different inode.

[closed and/or irrelevant lists removed from CC:]

No, the copy of glibc in question is not in use at the time. The next
attempt to open it on the client will sometimes generate a "stale NFS
handle" message, or if the open succeeds a read will sometimes return
EIO. But it sounds like this is a different problem than the original
poster was testing for.

I'm still curious about the answer to my question above :-)

--
Daniel Jacobowitz
CodeSourcery, LLC

2005-03-14 00:50:35

by Trond Myklebust

[permalink] [raw]
Subject: Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

su den 13.03.2005 Klokka 19:35 (-0500) skreiv Daniel Jacobowitz:
> On Sun, Mar 13, 2005 at 03:42:29PM -0500, Trond Myklebust wrote:
> > su den 13.03.2005 Klokka 15:04 (-0500) skreiv Daniel Jacobowitz:
> >
> > > I can't find any documentation about this, but it seems like the same
> > > problem that has been causing me headaches lately; when I replace glibc
> > > from the server side of an nfsroot, the client has a couple of
> > > variously wrong reads before it sees the new files. If it breaks NFS
> > > so badly, why is it the default for the Linux NFS server?
> >
> > No, that's a very different issue: you are violating the NFS cache
> > consistency rules if you are changing a file that is being held open by
> > other machines.
> > The correct way to do the above is to use GNU install with the '-b'
> > option: that will rename the version of glibc that is in use, and then
> > install the new glibc in a different inode.
>
> [closed and/or irrelevant lists removed from CC:]
>
> No, the copy of glibc in question is not in use at the time. The next
> attempt to open it on the client will sometimes generate a "stale NFS
> handle" message, or if the open succeeds a read will sometimes return
> EIO. But it sounds like this is a different problem than the original
> poster was testing for.

Sorry, but you should _never_ have gotten an ESTALE error if the file
was not in use when you deleted the old copy of glibc. A fresh call to
open() will always result in a new lookup of the filehandle.
What may have happened in the case of the EIO error is that you may have
raced: i.e. a client starts reading the file while it is being copied
to.

You'll rather want to ask Neil Brown about why subtree_check is still
the default for knfsd. He is the NFS server maintainer.

Cheers,
Trond
--
Trond Myklebust <[email protected]>

2005-03-14 00:54:18

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

On Sun, Mar 13, 2005 at 07:50:09PM -0500, Trond Myklebust wrote:
> Sorry, but you should _never_ have gotten an ESTALE error if the file
> was not in use when you deleted the old copy of glibc. A fresh call to
> open() will always result in a new lookup of the filehandle.
> What may have happened in the case of the EIO error is that you may have
> raced: i.e. a client starts reading the file while it is being copied
> to.

It is in a separate root filesystem, currently not used by anything on
the target. It is likely to be in cache, but I can absolutely
guarantee it isn't open. Hmm, server is x86_64 2.6.7, client is 2.6.10
MIPS. I should upgrade them and see if that helps.

Unfortunately I haven't found any smaller testcases than installing an
entire root FS.

--
Daniel Jacobowitz
CodeSourcery, LLC

2005-03-14 00:56:52

by NeilBrown

[permalink] [raw]
Subject: Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

On Sunday March 13, [email protected] wrote:
>
> You'll rather want to ask Neil Brown about why subtree_check is still
> the default for knfsd. He is the NFS server maintainer.

Apathy?
No-one has complained loudly enough or long enough or sent me a patch,
and it simply isn't a priority for me.
(a patch would have to provide clear warning to the user of the change
in defaults, such as is currently done for sync/async (and it's
probably time to remove that warning).

Note: this is purely a userspace issue. nfs-utils sets the default,
not the kernel.

NeilBrown