2005-05-25 23:03:01

by Anton Starikov

[permalink] [raw]
Subject: NFS cache problem

I have fileserver exporting via NFSv3 /home directories to few desktops
and small cluster. File server has two NICs, one for cluster, one for
desktops. Kernel version is 2.6.5.


export options are (exportfs -v):
/home 192.168.211.0/24(rw,async,no_root_squash)

mount options are (cat /proc/mounts):
192.168.211.240:/home /home nfs
rw,sync,v3,rsize=32768,wsize=32768,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,intr,tcp,noac,lock,addr=192.168.211.240
0 0

Time on all clients (desktops and cluster) are synchronized via NTP.

But time to time I have strange situation. You can rewrite file at one
host, but for long time (up to few hours!!!) some hosts will read new
file, some hosts will read old file.

I tried to play with all options without any result. Exporting with
"sync" seems to be a sollution, but performance is really very low
(because of this "sync" wasn't realy tested). And and I don't need
"sync" because I'm not interesting in saving to "real" media.
File-cache is good enough for me. And, in principal, "syns/async" on
server side should be irrelevant in this case (at least on client side I
have "sync", it's enough).

To avoid future discussion, there is nothing "cluster specific". Cluster
is very small and there is nothing like concurent read/write.
Basically, only one specific thing, the same data can be accesible from
different host. But even not at the same time usually.
You write file on one host, let say, and in couple minutes you read it
from different host (you prepare input data on client host or master
node and after you submit job into the queue). That should be OK, but in
my case...up to few hours clients can see chaos by reading different
versions of file.

Does anybody has some ideas how to solve the problem?

BTW, hardware configuration.
server:
3ware SATA raid, 2xXeon CPUs (NFSD started in 8 threads). Intel and
Broadcom GbE NICs.

Clients - mostly dual Opteron machines.

Actually, I have strong filling that problem started to be much more
"visible" when I have added second CPU to server. It exists before, but
usually not longer that for 10 minutes. Now my users report me about
hours. This is incredible. Basically, work in my group partly paralysed
now :(

Of course there is such things like lustre, PVFS and so on. But I
beleive that my case isn't proper case to start use such filesystems.
NFS should be more than enough.

Thanks,
Anton Starikov.



-------------------------------------------------------
SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
online with coworkers and clients while avoiding the high cost of travel and
communications. There is no equipment to buy and you can meet as often as
you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-05-25 23:34:28

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS cache problem

to den 26.05.2005 Klokka 01:01 (+0200) skreiv Anton Starikov:
> I have fileserver exporting via NFSv3 /home directories to few desktops
> and small cluster. File server has two NICs, one for cluster, one for
> desktops. Kernel version is 2.6.5.
>
>
> export options are (exportfs -v):
> /home 192.168.211.0/24(rw,async,no_root_squash)
>
> mount options are (cat /proc/mounts):
> 192.168.211.240:/home /home nfs
> rw,sync,v3,rsize=32768,wsize=32768,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,intr,tcp,noac,lock,addr=192.168.211.240
> 0 0
>
> Time on all clients (desktops and cluster) are synchronized via NTP.
>
> But time to time I have strange situation. You can rewrite file at one
> host, but for long time (up to few hours!!!) some hosts will read new
> file, some hosts will read old file.
>
> I tried to play with all options without any result. Exporting with
> "sync" seems to be a sollution, but performance is really very low
> (because of this "sync" wasn't realy tested). And and I don't need
> "sync" because I'm not interesting in saving to "real" media.
> File-cache is good enough for me. And, in principal, "syns/async" on
> server side should be irrelevant in this case (at least on client side I
> have "sync", it's enough).
>
> To avoid future discussion, there is nothing "cluster specific". Cluster
> is very small and there is nothing like concurent read/write.
> Basically, only one specific thing, the same data can be accesible from
> different host. But even not at the same time usually.
> You write file on one host, let say, and in couple minutes you read it
> from different host (you prepare input data on client host or master
> node and after you submit job into the queue). That should be OK, but in
> my case...up to few hours clients can see chaos by reading different
> versions of file.
>
> Does anybody has some ideas how to solve the problem?

The NFS cache consistency model is discussed in the FAQ. Please see
http://nfs.sourceforge.net/#faq_a8

Cheers,
Trond


> BTW, hardware configuration.
> server:
> 3ware SATA raid, 2xXeon CPUs (NFSD started in 8 threads). Intel and
> Broadcom GbE NICs.
>
> Clients - mostly dual Opteron machines.
>
> Actually, I have strong filling that problem started to be much more
> "visible" when I have added second CPU to server. It exists before, but
> usually not longer that for 10 minutes. Now my users report me about
> hours. This is incredible. Basically, work in my group partly paralysed
> now :(
>
> Of course there is such things like lustre, PVFS and so on. But I
> beleive that my case isn't proper case to start use such filesystems.
> NFS should be more than enough.
>
> Thanks,
> Anton Starikov.
>
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
> online with coworkers and clients while avoiding the high cost of travel and
> communications. There is no equipment to buy and you can meet as often as
> you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
online with coworkers and clients while avoiding the high cost of travel and
communications. There is no equipment to buy and you can meet as often as
you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-25 23:44:30

by Anton Starikov

[permalink] [raw]
Subject: Re: NFS cache problem

Trond Myklebust wrote:
> The NFS cache consistency model is discussed in the FAQ. Please see
> http://nfs.sourceforge.net/#faq_a8

I did it long time ago. I've reread it now.
My usage fits this. I have "noac" and "sync" on client side. What's more? :)

Can ACL be a problem?

Best regards,
Anton


-------------------------------------------------------
SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
online with coworkers and clients while avoiding the high cost of travel and
communications. There is no equipment to buy and you can meet as often as
you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-26 00:12:45

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS cache problem

to den 26.05.2005 Klokka 01:44 (+0200) skreiv Anton Starikov:
> Trond Myklebust wrote:
> > The NFS cache consistency model is discussed in the FAQ. Please see
> > http://nfs.sourceforge.net/#faq_a8
>
> I did it long time ago. I've reread it now.
> My usage fits this. I have "noac" and "sync" on client side. What's more? :)

"noac" just means that the client rereads the attributes every time it
reads or writes.
If the size and the mtime on the file haven't changed, then the client
assumes the data in the file itself also hasn't changed. That is usually
a good assumption on something like XFS (which has nanosecond precision
on mtime). It can be rather inaccurate on something like EXT3 (which has
a precision of only 1 second on mtime).

If you seriously need uncached reads and writes, then you should rather
consider using O_DIRECT.

Cheers,
Trond



-------------------------------------------------------
SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
online with coworkers and clients while avoiding the high cost of travel and
communications. There is no equipment to buy and you can meet as often as
you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-26 00:29:13

by Anton Starikov

[permalink] [raw]
Subject: Re: NFS cache problem

Trond Myklebust wrote:
> "noac" just means that the client rereads the attributes every time it
> reads or writes.
> If the size and the mtime on the file haven't changed, then the client
> assumes the data in the file itself also hasn't changed. That is usually
> a good assumption on something like XFS (which has nanosecond precision
> on mtime). It can be rather inaccurate on something like EXT3 (which has
> a precision of only 1 second on mtime).
I use ReiserFS.
And here we talk not about seconds...we talk about hours sometimes.
That's seems too strange for me. For seconds I can find plenty of
explanations :)
In principal even if it will be one minute, I'll be much more happy than
now.

> If you seriously need uncached reads and writes, then you should rather,
> consider using O_DIRECT.
Unfortunatelly this is not trivial problem. A lot of software are
involved here. And I beleive that NFS should be able to work in this
conditions properly. At least it did with solaris in similar environment.

Best,
Anton.


-------------------------------------------------------
SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
online with coworkers and clients while avoiding the high cost of travel and
communications. There is no equipment to buy and you can meet as often as
you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-26 01:21:58

by Lever, Charles

[permalink] [raw]
Subject: RE: NFS cache problem

hi anton-

> I use ReiserFS.
> And here we talk not about seconds...we talk about hours sometimes.
> That's seems too strange for me. For seconds I can find plenty of
> explanations :)
> In principal even if it will be one minute, I'll be much more=20
> happy than
> now.

> > If you seriously need uncached reads and writes, then you=20
> should rather,
> > consider using O_DIRECT.
> Unfortunatelly this is not trivial problem. A lot of software are
> involved here. And I beleive that NFS should be able to work in this
> conditions properly. At least it did with solaris in similar=20
> environment.

if you have a way of reproducing this condition, maybe you could capture
a network trace on the server while running your test case... "tcpdump
-s0 -w dumpfile" and post it on the web so we can take a look at what's
going on.


-------------------------------------------------------
SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
online with coworkers and clients while avoiding the high cost of travel and
communications. There is no equipment to buy and you can meet as often as
you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-26 01:28:19

by Anton Starikov

[permalink] [raw]
Subject: Re: NFS cache problem

Lever, Charles wrote:
> if you have a way of reproducing this condition, maybe you could capture
> a network trace on the server while running your test case... "tcpdump
> -s0 -w dumpfile" and post it on the web so we can take a look at what's
> going on.

OK, I'll do as soon as one of my users will reports problem.

Anton.


-------------------------------------------------------
SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
online with coworkers and clients while avoiding the high cost of travel and
communications. There is no equipment to buy and you can meet as often as
you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-26 02:12:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS cache problem

to den 26.05.2005 Klokka 02:29 (+0200) skreiv Anton Starikov:
> Trond Myklebust wrote:
> > "noac" just means that the client rereads the attributes every time it
> > reads or writes.
> > If the size and the mtime on the file haven't changed, then the client
> > assumes the data in the file itself also hasn't changed. That is usually
> > a good assumption on something like XFS (which has nanosecond precision
> > on mtime). It can be rather inaccurate on something like EXT3 (which has
> > a precision of only 1 second on mtime).
> I use ReiserFS.
> And here we talk not about seconds...we talk about hours sometimes.
> That's seems too strange for me. For seconds I can find plenty of
> explanations :)
> In principal even if it will be one minute, I'll be much more happy than
> now.

I repeat: if the file attributes do not change, then the NFS client will
not update its data cache. That is true of Solaris too.

> > If you seriously need uncached reads and writes, then you should rather,
> > consider using O_DIRECT.
> Unfortunatelly this is not trivial problem. A lot of software are
> involved here. And I beleive that NFS should be able to work in this
> conditions properly. At least it did with solaris in similar environment.

Mind describing _what_ kind of environment it works under? In other
words, what is the Solaris client doing that the Linux client is not
doing when running under the _same_ conditions (i.e. against the same
server and running the same applications)?

Cheers,
Trond



-------------------------------------------------------
SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
online with coworkers and clients while avoiding the high cost of travel and
communications. There is no equipment to buy and you can meet as often as
you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs