LinuxLists.cc - NFS caching bug is back

2007-04-19 15:43:47

Subject: NFS caching bug is back

A bug that we turned in a while ago is back in the 2.6.20 kernels, only
worse. I have found it in 2.6.20.6 and 2.6.20.7. It happens with both
NFS4 and NFS3 mounts. Clients don't see inode changes (delete and
recreate file):
Setup:
3 systems all running:
Linux enkf1 2.6.20.7 #1 SMP Wed Apr 18 16:23:15 PDT 2007 x86_64 GNU/Linux
File system is xfs. mounted as:
/dev/sda3 /home/enkf xfs defaults 0 2
exported as:
(fsid=0,nohide,rw,no_subtree_check,sync,no_root_squash)
NFS mounted as
enkf1:/ /home/enkf nfs4 rw,proto=tcp,rsize=32768,wsize=32768,hard 0 0
also as
enkf1:/ /home/enkf nfs rw,proto=tcp,rsize=8192,wsize=8192,hard 0 0
and
enkf1:/ /home/enkf nfs rw,proto=tcp,hard 0 0

Here are the mounts according to the mount command:
enkf1 -
/dev/sda3 on /home/enkf type xfs (rw)

enkf2 -
enkf1:/ on /home/enkf type nfs4
(rw,proto=tcp,rsize=32768,wsize=32768,hard,addr=128.95.175.220)

enkf3 -
enkf1:/ on /home/enkf type nfs4
(rw,proto=tcp,rsize=32768,wsize=32768,hard,addr=128.95.175.220)

Here is the sequence in time order. This is even easier to demonstrate
than the last time:
enkf1:~# echo 2 > /home/enkf/ddd

enkf2:~# cat /home/enkf/ddd
2

enkf3:~# cat /home/enkf/ddd
2

enkf1:~# rm /home/enkf/ddd
enkf1:~# echo 3 > /home/enkf/ddd

enkf2:~# cat /home/enkf/ddd
2

enkf3:~# cat /home/enkf/ddd
2

enkf1:~# ls -i /home/enkf/ddd
10872 /home/enkf/ddd

enkf2:~# ls -i /home/enkf/ddd
10855 /home/enkf/ddd

enkf3:~# ls -i /home/enkf/ddd
10855 /home/enkf/ddd
enkf3:~# touch /home/enkf/ddd
enkf3:~# ls -i /home/enkf/ddd
10872 /home/enkf/ddd

enkf2:~# ls -i /home/enkf/ddd
10855 /home/enkf/ddd

--
David Warren INTERNET: [email protected]
(206) 543-0945 Fax: (206) 543-0308
University of Washington
Dept of Atmospheric Sciences, Box 351640
Seattle, WA 98195-1640
-------------------------------------------------------------------------------
DECUS E-PUBS Library Committee representative
SeaLUG DECUS Chair

Attachments:

(No filename) (286.00 B)
(No filename) (140.00 B)
Download all attachments

2007-04-19 16:25:50

by Bryan O'Sullivan

[permalink] [raw]

Subject: Re: NFS caching bug is back

David Warren wrote:
> A bug that we turned in a while ago is back in the 2.6.20 kernels, only
> worse. I have found it in 2.6.20.6 and 2.6.20.7.

These symptoms look similar to this bug:

http://bugzilla.kernel.org/show_bug.cgi?id=8305

Which has been around since 2.6.17. Do you find that the problem
magically resolves itself after a little while?

<b

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-19 16:36:16

by Trond Myklebust

[permalink] [raw]

Subject: Re: NFS caching bug is back

On Thu, 2007-04-19 at 08:43 -0700, David Warren wrote:
> A bug that we turned in a while ago is back in the 2.6.20 kernels,
> only worse. I have found it in 2.6.20.6 and 2.6.20.7. It happens with
> both NFS4 and NFS3 mounts. Clients don't see inode changes (delete and
> recreate file):

Interesting. Do you see an OPEN request being sent to the server when
you 'cat' the file on enkf2 or enkf3? You can check either using
ethereal/wireshark, or by comparing the values in the OPEN column
in /proc/self/mountstats on the client before and after issuing the
'cat' command.

Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-19 18:07:04

by David Warren

[permalink] [raw]

Subject: Re: NFS caching bug is back

No it doesn't. I just let the changed file sit for about 45 minutes and
the inode still has not changed. It is very similar to a bug I sent in
for 2.6.11 that had been fixed.

I have also now verified that the same thing happens with a Solaris 10
client, so it is likely to be the server side.

From wireshark I see the client sending packets with:
PUTFH and GETATTR
then at the end
PUTFH and ACCESS

The return values for the ACCESS are:
access: 0x2d
.... .1 = allow READ
.... 0. = not allow LOOKUP
...1 .. = allow MODIFY
..1. .. = allow EXTEND
.0.. .. = not allow DELETE
1... .. = allow EXECUTE
The request had
Supported: 0x1f
.... .1 = allow READ
.... 1. = allow LOOKUP
...1 .. = allow MODIFY
..1. .. = allow EXTEND
.1.. .. = allow DELETE
0... .. = allow EXECUTE
Access: 0x1f
.... .1 = allow READ
.... 1. = allow LOOKUP
...1 .. = allow MODIFY
..1. .. = allow EXTEND
.1.. .. = allow DELETE
0... .. = allow EXECUTE

I don't know that much about the inner workings of the NFS protocol, but
considering that the inode has been removed and replaced by a new one
shouldn't all the return values from the access request be 0? It seems
odd that read, modify, extend and execute are allowed for a nonexistent
object.

Bryan O'Sullivan wrote:
> David Warren wrote:
>> A bug that we turned in a while ago is back in the 2.6.20 kernels,
>> only worse. I have found it in 2.6.20.6 and 2.6.20.7.
>
> These symptoms look similar to this bug:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=8305
>
> Which has been around since 2.6.17. Do you find that the problem
> magically resolves itself after a little while?
>
> <b

Trond Myklebust wrote:
> On Thu, 2007-04-19 at 08:43 -0700, David Warren wrote:
>
>> A bug that we turned in a while ago is back in the 2.6.20 kernels,
>> only worse. I have found it in 2.6.20.6 and 2.6.20.7. It happens with
>> both NFS4 and NFS3 mounts. Clients don't see inode changes (delete and
>> recreate file):
>>
>
> Interesting. Do you see an OPEN request being sent to the server when
> you 'cat' the file on enkf2 or enkf3? You can check either using
> ethereal/wireshark, or by comparing the values in the OPEN column
> in /proc/self/mountstats on the client before and after issuing the
> 'cat' command.
>
> Trond

--
David Warren INTERNET: [email protected]
(206) 543-0945 Fax: (206) 543-0308
University of Washington
Dept of Atmospheric Sciences, Box 351640
Seattle, WA 98195-1640
-------------------------------------------------------------------------------
DECUS E-PUBS Library Committee representative
SeaLUG DECUS Chair

Attachments:

(No filename) (286.00 B)
(No filename) (140.00 B)
Download all attachments

2007-04-19 18:23:11

by Trond Myklebust

[permalink] [raw]

Subject: Re: NFS caching bug is back

On Thu, 2007-04-19 at 11:06 -0700, David Warren wrote:

> I don't know that much about the inner workings of the NFS protocol,
> but considering that the inode has been removed and replaced by a new
> one shouldn't all the return values from the access request be 0? It
> seems odd that read, modify, extend and execute are allowed for a
> nonexistent object.

The filehandle should normally be invalidated and any attempt by the
client to use it should result in an ESTALE error. The exception would
be if a hard link to the file still exists somewhere on the filesystem
(which didn't seem to be the case in your test).

Irrespective of whether or not the file still exists somewhere else, the
mtime on the parent directory _will_ change when you unlink the file.
The client is supposed to pick up on this and re-issue a LOOKUP and/or
OPEN for the file, at which point the server should reply with an ENOENT
or with the new file and its filehandle in something like your testcase.

My immediate advice would be to take the whole filesystem offline and
fsck it just in order to be sure that there are no corruption that might
be confusing the NFS server.

Cheers
Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-19 19:31:48

by David Warren

[permalink] [raw]

Subject: Re: NFS caching bug is back

I did the fsck, which found no problems. However I have found a couple
of other interesting things here.
The directory mtime does not update on the client, but the link count
for the file is 0:
server:
drwxr-xr-x 8 root root 89 2007-04-19 12:02 .
drwxr-xr-x 10 root root 105 2007-04-19 08:00 ..
-rw-r--r-- 1 root root 3 2007-04-19 12:02 ddd
client:
drwxr-xr-x 8 root root 89 2007-04-19 11:38 .
drwxr-xr-x 8 root root 77 2007-04-19 08:06 ..
-rw-r--r-- 0 root root 3 2007-04-19 12:01 ddd

Note - 11:38 is actually prior to me unexporting, fscking and
reexporting the filesystem.

Another discovery -
on a 32 bit client we are seeing an occasional delay before it picks up
the change, but it does eventually pick it up (1 - 5 seconds). The 64
bit clients do not. Also, if the server reuses the same inode the 32 bit
systems sees it immediately.

Trond Myklebust wrote:
> On Thu, 2007-04-19 at 11:06 -0700, David Warren wrote:
>
>
>> I don't know that much about the inner workings of the NFS protocol,
>> but considering that the inode has been removed and replaced by a new
>> one shouldn't all the return values from the access request be 0? It
>> seems odd that read, modify, extend and execute are allowed for a
>> nonexistent object.
>>
>
> The filehandle should normally be invalidated and any attempt by the
> client to use it should result in an ESTALE error. The exception would
> be if a hard link to the file still exists somewhere on the filesystem
> (which didn't seem to be the case in your test).
>
> Irrespective of whether or not the file still exists somewhere else, the
> mtime on the parent directory _will_ change when you unlink the file.
> The client is supposed to pick up on this and re-issue a LOOKUP and/or
> OPEN for the file, at which point the server should reply with an ENOENT
> or with the new file and its filehandle in something like your testcase.
>
> My immediate advice would be to take the whole filesystem offline and
> fsck it just in order to be sure that there are no corruption that might
> be confusing the NFS server.
>
> Cheers
> Trond
>

--
David Warren INTERNET: [email protected]
(206) 543-0945 Fax: (206) 543-0308
University of Washington
Dept of Atmospheric Sciences, Box 351640
Seattle, WA 98195-1640
-------------------------------------------------------------------------------
DECUS E-PUBS Library Committee representative
SeaLUG DECUS Chair

Attachments:

(No filename) (286.00 B)
(No filename) (140.00 B)
Download all attachments

2007-04-19 19:37:09

by David Warren

[permalink] [raw]

Subject: Re: NFS caching bug is back

One more thing.
If you create a new file on the server (or any system writing to it),
the client sees that the directory has changed, it sees the new file,
but it still doesn't see that the old file has changed.

e.g
rm ddd
echo 1234 > ddd
echo 12345 >yyy

It sees the directory update, yyy and the old ddd

Trond Myklebust wrote:
> On Thu, 2007-04-19 at 11:06 -0700, David Warren wrote:
>
>
>> I don't know that much about the inner workings of the NFS protocol,
>> but considering that the inode has been removed and replaced by a new
>> one shouldn't all the return values from the access request be 0? It
>> seems odd that read, modify, extend and execute are allowed for a
>> nonexistent object.
>>
>
> The filehandle should normally be invalidated and any attempt by the
> client to use it should result in an ESTALE error. The exception would
> be if a hard link to the file still exists somewhere on the filesystem
> (which didn't seem to be the case in your test).
>
> Irrespective of whether or not the file still exists somewhere else, the
> mtime on the parent directory _will_ change when you unlink the file.
> The client is supposed to pick up on this and re-issue a LOOKUP and/or
> OPEN for the file, at which point the server should reply with an ENOENT
> or with the new file and its filehandle in something like your testcase.
>
> My immediate advice would be to take the whole filesystem offline and
> fsck it just in order to be sure that there are no corruption that might
> be confusing the NFS server.
>
> Cheers
> Trond
>

--
David Warren INTERNET: [email protected]
(206) 543-0945 Fax: (206) 543-0308
University of Washington
Dept of Atmospheric Sciences, Box 351640
Seattle, WA 98195-1640
-------------------------------------------------------------------------------
DECUS E-PUBS Library Committee representative
SeaLUG DECUS Chair

Attachments:

(No filename) (286.00 B)
(No filename) (140.00 B)
Download all attachments

2007-04-19 19:52:55

by David Warren

[permalink] [raw]

Subject: Re: NFS caching bug is back

One last really odd one.
enkf3:~# ls -al /home/enkf
total 8
drwxr-xr-x 8 root root 89 Apr 19 12:02 .
drwxr-xr-x 8 root root 77 Apr 19 08:06 ..
-rw-r--r-- 0 root root 3 Apr 19 12:01 ddd
drwxr-xr-x 15 enkf enkf 4096 Apr 19 11:14 enkf
drwxrwsr-x 6 daemon daemon 38 Apr 13 11:05 job
drwxr-xr-x 2 root root 6 Apr 19 11:41 lost+found
drwxr-xr-x 3 torn root 28 Apr 18 16:02 torn
drwxr-xr-x 8 warren root 151 Apr 18 11:08 warren
drwxrwxr-x 4 enkf enkf 34 Apr 16 11:15 wrf2

I removed the file from the server here!!!

enkf3:~# cat /home/enkf/ddd
45

but it knows it is gone
enkf3:~# ls -al /home/enkf
total 4
drwxr-xr-x 8 root root 89 Apr 19 12:48 .
drwxr-xr-x 8 root root 77 Apr 19 08:06 ..
-rw-r--r-- 1 root root 0 Apr 19 12:39 dd2
drwxr-xr-x 15 enkf enkf 4096 Apr 19 11:14 enkf
drwxrwsr-x 6 daemon daemon 38 Apr 13 11:05 job
drwxr-xr-x 2 root root 6 Apr 19 11:41 lost+found
drwxr-xr-x 3 torn root 28 Apr 18 16:02 torn
drwxr-xr-x 8 warren root 151 Apr 18 11:08 warren
drwxrwxr-x 4 enkf enkf 34 Apr 16 11:15 wrf2

however it can still cat it
enkf3:~# cat /home/enkf/ddd
45

Trond Myklebust wrote:
> On Thu, 2007-04-19 at 11:06 -0700, David Warren wrote:
>
>
>> I don't know that much about the inner workings of the NFS protocol,
>> but considering that the inode has been removed and replaced by a new
>> one shouldn't all the return values from the access request be 0? It
>> seems odd that read, modify, extend and execute are allowed for a
>> nonexistent object.
>>
>
> The filehandle should normally be invalidated and any attempt by the
> client to use it should result in an ESTALE error. The exception would
> be if a hard link to the file still exists somewhere on the filesystem
> (which didn't seem to be the case in your test).
>
> Irrespective of whether or not the file still exists somewhere else, the
> mtime on the parent directory _will_ change when you unlink the file.
> The client is supposed to pick up on this and re-issue a LOOKUP and/or
> OPEN for the file, at which point the server should reply with an ENOENT
> or with the new file and its filehandle in something like your testcase.
>
> My immediate advice would be to take the whole filesystem offline and
> fsck it just in order to be sure that there are no corruption that might
> be confusing the NFS server.
>
> Cheers
> Trond
>

--
David Warren INTERNET: [email protected]
(206) 543-0945 Fax: (206) 543-0308
University of Washington
Dept of Atmospheric Sciences, Box 351640
Seattle, WA 98195-1640
-------------------------------------------------------------------------------
DECUS E-PUBS Library Committee representative
SeaLUG DECUS Chair

Attachments:

(No filename) (286.00 B)
(No filename) (140.00 B)
Download all attachments

2007-04-19 21:30:06

by David Warren

[permalink] [raw]

Subject: Re: NFS caching bug is back - We think we found it

After more testing, we think we have the answer. It looks like the only
servers that exhibit this problem are ones that have gfs disks attached.
Systems with identical kernels except for no gfs, gfs2 or dlm modules do
not seem to do this. So, something in the gfs modules must be trashing
some kernel structure that the nfs server uses, even though this is not
a gfs file system.

Trond Myklebust wrote:
> On Thu, 2007-04-19 at 11:06 -0700, David Warren wrote:
>
>
>> I don't know that much about the inner workings of the NFS protocol,
>> but considering that the inode has been removed and replaced by a new
>> one shouldn't all the return values from the access request be 0? It
>> seems odd that read, modify, extend and execute are allowed for a
>> nonexistent object.
>>
>
> The filehandle should normally be invalidated and any attempt by the
> client to use it should result in an ESTALE error. The exception would
> be if a hard link to the file still exists somewhere on the filesystem
> (which didn't seem to be the case in your test).
>
> Irrespective of whether or not the file still exists somewhere else, the
> mtime on the parent directory _will_ change when you unlink the file.
> The client is supposed to pick up on this and re-issue a LOOKUP and/or
> OPEN for the file, at which point the server should reply with an ENOENT
> or with the new file and its filehandle in something like your testcase.
>
> My immediate advice would be to take the whole filesystem offline and
> fsck it just in order to be sure that there are no corruption that might
> be confusing the NFS server.
>
> Cheers
> Trond
>

--
David Warren INTERNET: [email protected]
(206) 543-0945 Fax: (206) 543-0308
University of Washington
Dept of Atmospheric Sciences, Box 351640
Seattle, WA 98195-1640
-------------------------------------------------------------------------------
DECUS E-PUBS Library Committee representative
SeaLUG DECUS Chair

Attachments:

(No filename) (286.00 B)
(No filename) (140.00 B)
Download all attachments