2006-08-20 21:55:27

by Chris Worley

[permalink] [raw]
Subject: File has different contents on different clients

I'm using NFS on a cluster, and one file seems to keep getting
different contents on different NFS clients, while having the proper
attributes on all clients:

Here, the user queries all clients and greps for a value in the file:

bash-3.00$ pbs2pdsh 21647 grep n_pess /...PATH.../par | dshbak -c
----------------
c-21-[01,04,06-07,09-11,16,18-20,22-30],c-22-[01-30],c-23-[01-04,06-27,29-30],c-24-[01-04,06-30],c-26-[01-30],c-27-[01-30],c-28-[01-30],c-29-[01-30],c-30-[01-12,14-17,19]
----------------
n_pess = 4
----------------
c-23-05,c-30-18
----------------
n_pess = 8

Most clients see the proper value of "4", two clients see
the old value of "8".
But, all clients show the same modification time on the file:

[root@hn ~]# pdsh -a ls -l /...PATH/par | dshbak -c
----------------
c-21-[01-30],c-22-[01-30],c-23-[01-27,29-30],c-24-[01-04,06-30],c-26-[01-30],c-27-[01-30],c-28-[01-30],c-29-[01-30],c-30-[01-12,14-19]
----------------
-rw-r--r-- 1 anaraiki anaraiki 753 Aug 19 13:52 /...PATH.../par

The checksum is also different on many clients:

[root@hn ~]# pdsh -a sum /...PATH.../par | dshbak -c
----------------
c-21-[01,04,06-07,09-11,16-30],c-22-[01-30],c-23-[01-04,06-27,29-30],c-24-[01-04,06-30],c-26-[01-30],c-27-[01-30],c-28-[01-30],c-29-[01-30],c-30-[01-12,14-17,19]
----------------
27851 1
----------------
c-21-[02-03,05,08,12-15],c-23-05,c-30-18
----------------
44233 1

If I "touch" the file... all clients get back in sync. This is the
second time it's happend to the same file, but different NFS clients
got out of sync.

We're running a 2.6.9-22 kernel w/ RHEL4U2.

Any ideas?

Thanks,

Chris

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-08-21 01:54:13

by NeilBrown

[permalink] [raw]
Subject: Re: File has different contents on different clients

On Sunday August 20, [email protected] wrote:
> I'm using NFS on a cluster, and one file seems to keep getting
> different contents on different NFS clients, while having the proper
> attributes on all clients:
>
....
>
> If I "touch" the file... all clients get back in sync. This is the
> second time it's happend to the same file, but different NFS clients
> got out of sync.
>
> We're running a 2.6.9-22 kernel w/ RHEL4U2.
>
> Any ideas?

My guess is that the file is getting changed multiple times within one
second, and as the mtime on many filesystems has a resolution of one
second, NFS does not notice subsequent changes.

Maybe you could try '-o noac', but that might hurt your performance.
Maybe you could try a filesystem with hi-res timestamps - I think xfs
has this.

NeilBrown

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs