2005-09-26 19:13:54

by David Warren

[permalink] [raw]
Subject: NFS caching problem

I have discovered a wierd problem with NFSv3 on linux.
I have 3 machines
machine A and B both mount a disk D from machine C
The options are tcp,rw,hard and intr.

Program test runs on machine A writing to D:
(fortran)
program test
do i=1,10
call system("/bin/rm t")
open (10, file='t', status='new')
write(10,*)i
write(6,*)i
close(10)
call sleep(1)
enddo
end

program t2 runs on machine B reading from D:
(c, but doesn't have to be)
#include <stdlib.h>
#include <fcntl.h>
main(){
char in[80];
int file;
int len;

while(1){
file=open("t", O_RDONLY);
len=read(file,in,79);
in[len]='\0';
printf("%s\n",in);
close(file);
}
}

while machine A is counting 1 - 10 and placing these numbers into file
t, machine B is continually reading 1 from file t, then after a while it
will switch to another number and read it for a while. In my first
version of this, I was opening and rewriting the same file. In that
version, machine B always read 1's. Now that I am creating new inodes
all the time, it changes every few minutes while I repeatedly rerun test
on machine A.

Now for the other interresting facts:
Reading this file from an unrelated sun during this produces the same
result as machine B.
The same thing under NFSv4 does not do this. It works exactly as one
would expect it to. As soon as the file is writen, the reader sees the
new data.

Any ideas what I could have done wrong in my NFSv3 set up? Is there some
kernel parameter that need tweaking? is there some mount option I should
have???
Thanks.

--
David Warren INTERNET: [email protected]
(206) 543-0945 Fax: (206) 543-0308
University of Washington
Dept of Atmospheric Sciences, Box 351640
Seattle, WA 98195-1640




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-09-26 19:37:03

by Peter Staubach

[permalink] [raw]
Subject: Re: NFS caching problem

David Warren wrote:

> I have discovered a wierd problem with NFSv3 on linux.
> I have 3 machines
> machine A and B both mount a disk D from machine C
> The options are tcp,rw,hard and intr.
>
> Program test runs on machine A writing to D:
> (fortran)
> program test
> do i=1,10
> call system("/bin/rm t")
> open (10, file='t', status='new')
> write(10,*)i
> write(6,*)i
> close(10)
> call sleep(1)
> enddo
> end
>
> program t2 runs on machine B reading from D:
> (c, but doesn't have to be)
> #include <stdlib.h>
> #include <fcntl.h>
> main(){
> char in[80];
> int file;
> int len;
>
> while(1){
> file=open("t", O_RDONLY);
> len=read(file,in,79);
> in[len]='\0';
> printf("%s\n",in);
> close(file);
> }
> }
>
> while machine A is counting 1 - 10 and placing these numbers into file
> t, machine B is continually reading 1 from file t, then after a while
> it will switch to another number and read it for a while. In my first
> version of this, I was opening and rewriting the same file. In that
> version, machine B always read 1's. Now that I am creating new inodes
> all the time, it changes every few minutes while I repeatedly rerun
> test on machine A.
>
> Now for the other interresting facts:
> Reading this file from an unrelated sun during this produces the same
> result as machine B.
> The same thing under NFSv4 does not do this. It works exactly as one
> would expect it to. As soon as the file is writen, the reader sees the
> new data.
>
> Any ideas what I could have done wrong in my NFSv3 set up? Is there
> some kernel parameter that need tweaking? is there some mount option I
> should have???


What version(s) of Linux are on machines, A, B, C? What is the local file
system on machine, C, which is on disk, D?

Thanx...

ps


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-26 20:02:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS caching problem

m=C3=A5 den 26.09.2005 Klokka 12:13 (-0700) skreiv David Warren:
> I have discovered a wierd problem with NFSv3 on linux.
> I have 3 machines
> machine A and B both mount a disk D from machine C
> The options are tcp,rw,hard and intr.
>=20
> Program test runs on machine A writing to D:
> (fortran)
> program test
> do i=3D1,10
> call system("/bin/rm t")
> open (10, file=3D't', status=3D'new')
> write(10,*)i
> write(6,*)i
> close(10)
> call sleep(1)
> enddo
> end
>=20
> program t2 runs on machine B reading from D:
> (c, but doesn't have to be)
> #include <stdlib.h>
> #include <fcntl.h>
> main(){
> char in[80];
> int file;
> int len;
>=20
> while(1){
> file=3Dopen("t", O_RDONLY);
> len=3Dread(file,in,79);
> in[len]=3D'\0';
> printf("%s\n",in);
> close(file);
> }
> }
>=20
> while machine A is counting 1 - 10 and placing these numbers into file=20
> t, machine B is continually reading 1 from file t, then after a while it=20
> will switch to another number and read it for a while. In my first=20
> version of this, I was opening and rewriting the same file. In that=20
> version, machine B always read 1's. Now that I am creating new inodes=20
> all the time, it changes every few minutes while I repeatedly rerun test=20
> on machine A.
>=20
> Now for the other interresting facts:
> Reading this file from an unrelated sun during this produces the same=20
> result as machine B.
> The same thing under NFSv4 does not do this. It works exactly as one=20
> would expect it to. As soon as the file is writen, the reader sees the=20
> new data.
>=20
> Any ideas what I could have done wrong in my NFSv3 set up? Is there some=20
> kernel parameter that need tweaking? is there some mount option I should=20
> have???
> Thanks.

When I see that problem on my test-rig, it appears to be due to the
reuse of inode numbers by my server. IOW: the file created by the
fortran program always ends up having the same inode number (check this
using 'ls -i t').

In that case, the client is indeed expected to have problems recognising
that the file has changed w.r.t. the cache.

Cheers,
Trond



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-26 20:11:39

by Peter Staubach

[permalink] [raw]
Subject: Re: NFS caching problem

Trond Myklebust wrote:

>m=E5 den 26.09.2005 Klokka 12:13 (-0700) skreiv David Warren:
> =20
>
>>I have discovered a wierd problem with NFSv3 on linux.
>>I have 3 machines
>>machine A and B both mount a disk D from machine C
>>The options are tcp,rw,hard and intr.
>>
>>Program test runs on machine A writing to D:
>>(fortran)
>> program test
>> do i=3D1,10
>> call system("/bin/rm t")
>> open (10, file=3D't', status=3D'new')
>> write(10,*)i
>> write(6,*)i
>> close(10)
>> call sleep(1)
>> enddo
>> end
>>
>>program t2 runs on machine B reading from D:
>>(c, but doesn't have to be)
>>#include <stdlib.h>
>>#include <fcntl.h>
>>main(){
>> char in[80];
>> int file;
>> int len;
>>
>> while(1){
>> file=3Dopen("t", O_RDONLY);
>> len=3Dread(file,in,79);
>> in[len]=3D'\0';
>> printf("%s\n",in);
>> close(file);
>> }
>>}
>>
>>while machine A is counting 1 - 10 and placing these numbers into file=20
>>t, machine B is continually reading 1 from file t, then after a while i=
t=20
>>will switch to another number and read it for a while. In my first=20
>>version of this, I was opening and rewriting the same file. In that=20
>>version, machine B always read 1's. Now that I am creating new inodes=20
>>all the time, it changes every few minutes while I repeatedly rerun tes=
t=20
>>on machine A.
>>
>>Now for the other interresting facts:
>>Reading this file from an unrelated sun during this produces the same=20
>>result as machine B.
>>The same thing under NFSv4 does not do this. It works exactly as one=20
>>would expect it to. As soon as the file is writen, the reader sees the=20
>>new data.
>>
>>Any ideas what I could have done wrong in my NFSv3 set up? Is there som=
e=20
>>kernel parameter that need tweaking? is there some mount option I shoul=
d=20
>>have???
>>Thanks.
>> =20
>>
>
>When I see that problem on my test-rig, it appears to be due to the
>reuse of inode numbers by my server. IOW: the file created by the
>fortran program always ends up having the same inode number (check this
>using 'ls -i t').
>
>In that case, the client is indeed expected to have problems recognising
>that the file has changed w.r.t. the cache.
>

Actually, the server is responsible for changing the file handle when the
inode is reused. There is a semantic called a "generation count" which i=
s
used to differentiate between instances of files (re)using the same inode
number. If the server does not do this, then it is broken and relays thi=
s
breakage on to clients as demonstrated here.

ps


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-20 17:35:42

by David Warren

[permalink] [raw]
Subject: Re: NFS caching problem

I was a little too quick to blame it on gfs. I can now replicate it on
systems that have never been in the presence of a gfs or dlm module. It
is very consistent under V4. The only catch is you do have to get a new
inode when you recreate the file. The client does seem to pick up
changes in the same inode number. However, I have still not been able to
replicate it under V3 on these systems.

It seems like this is something that everyone should be seeing, so I am
guessing that we must have something odd in our setup. Any suggestions?
Here are all the NFS options from my config:
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_NFS_DIRECTIO=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_NFSD_TCP=y
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y

--
David Warren INTERNET: [email protected]
(206) 543-0945 Fax: (206) 543-0308
University of Washington
Dept of Atmospheric Sciences, Box 351640
Seattle, WA 98195-1640
-------------------------------------------------------------------------------
DECUS E-PUBS Library Committee representative
SeaLUG DECUS Chair


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs