2009-08-21 18:22:11

by Stefan Egli

[permalink] [raw]
Subject: NFS v3 cached directory content out of sync

Hi there,

I'm experiencing a problem with NFS v3. After a rush of data written
to a particular NFS partition (to/from NetApp) I have noticed
directory listings between three different NFS v3 clients (doing a'ls
in the same directory) to see different content. The 'rush of data'
included approx 400GB (and took a few hours to do so). The
inconsistency that the three clients experienced was still the case 4
hours after the mentioned data rush.

I would 'somehow understand' that such a 'data storm' could maybe
overwhelm the NFS client's caches - and thus would 'accept' (although
not liking it) a delay between the client's cache updates. But that
such an inconsistency is still existing 4 hours afterwards is IMHO a
plain bug.

=A0Question 1: Would you agree that this is a bug or is it 'NFS as desi=
gned' ?

What I did having seen this is creating a new file in that directory -
and voila, all clients immediately got to see the right directory
content. This again calmed my nerves as in 'NFS still works'.

Having said that - I now did some more testing with this NFS cache
delay and noticed that file content updates between the three NFS
clients easily takes a few seconds - up to 10-15 seconds.

=A0Question 2: Is such a file content delay in NFS 'as designed' - I'm
assuming that fiddling with NFS mount parameters could put a 'defined
maximum' to such a delay? Or is there no such maximum and under 'bad
luck' situations it can go infinitely high (which would be Q1) ?

=A0Question 3: What is a suggested best-practice NFS mount parameter se=
t
for complying with the following requirements:
=A0=A0 * lots of reads - tons of files - reads often from different fil=
es
=A0=A0 * few writes - but if written it should propagate to all NFS
clients 'immediately'
=A0=A0 * high load situations (as with the 400GB read/write stuff above=
) -
and after or even during this doing a 'ls' in a directory should
produce consistent results on different attached NFS clients

=A0Question 4: If we'd somehow manually detected such a directory
content inconsistency - would there be something like a 'hey NFS
client, flush all NFS caches NOW' thing?

=A0Question 5: any of this related to commit 37d9d76d8b3a2ac5817e1fa326=
3cfe
0fdb439e51: NFS: flush cached directory information slightly more readi=
ly. ?

Thanks for answers to any or all of above! :)

MUCH appreciated!

Regards,
Stefan


2009-08-22 17:36:10

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS v3 cached directory content out of sync

On Fri, 2009-08-21 at 12:15 -0700, Simon Kirby wrote:
> On Fri, Aug 21, 2009 at 09:08:01PM +0200, Stefan Egli wrote:
>
> > >> ?Question 4: If we'd somehow manually detected such a directory
> > >> content inconsistency - would there be something like a 'hey NFS
> > >> client, flush all NFS caches NOW' thing?
> > >
> > > No.
> >
> > unmount / mount would - but that's obviously not feasible. bugger
> > there's nothing for that...
>
> Wouldn't (admittedly sledgehammer) echo 3 > /proc/sys/vm/drop_caches
> accomplish this?

Only if you are certain that there are no processes actually using that
directory or any of its subdirs/files.

Cheers
Trond


2009-08-21 18:49:53

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS v3 cached directory content out of sync

On Fri, 2009-08-21 at 20:22 +0200, Stefan Egli wrote:
> Question 1: Would you agree that this is a bug or is it 'NFS as designed' ?

It sounds like a bug. You don't mention which client you are using.

> Question 2: Is such a file content delay in NFS 'as designed' - I'm
> assuming that fiddling with NFS mount parameters could put a 'defined
> maximum' to such a delay? Or is there no such maximum and under 'bad
> luck' situations it can go infinitely high (which would be Q1) ?

You can tune the amount of time the client caches information by using
the 'ac*' mount options. In this case, you will want to adjust the
values of 'acdirmin' and 'acdirmax', probably setting them to zero.

'man 5 nfs' should provide you with more information.

> Question 3: What is a suggested best-practice NFS mount parameter set
> for complying with the following requirements:
> * lots of reads - tons of files - reads often from different files
> * few writes - but if written it should propagate to all NFS
> clients 'immediately'
> * high load situations (as with the 400GB read/write stuff above) -
> and after or even during this doing a 'ls' in a directory should
> produce consistent results on different attached NFS clients

See above

> Question 4: If we'd somehow manually detected such a directory
> content inconsistency - would there be something like a 'hey NFS
> client, flush all NFS caches NOW' thing?

No.

> Question 5: any of this related to commit 37d9d76d8b3a2ac5817e1fa3263cfe
> 0fdb439e51: NFS: flush cached directory information slightly more readily. ?

You client should be seeing mtime changes when you are creating new
files, so it shouldn't need to look at the ctime.

The only time when ctime changes are relevant are if you use 'rsync' to
copy the files without specifying --omit-dir-times.
IOW: if something is explicitly setting the mtime on the directory.

Cheers
Trond


2009-08-21 19:08:00

by Stefan Egli

[permalink] [raw]
Subject: Re: NFS v3 cached directory content out of sync

Hi Trond,

> It sounds like a bug. You don't mention which client you are using.

from uname:
Linux (Ubuntu) 2.6.24-24-generic #1 SMP Tue Jun 30 19:54:36 UTC 2009
x86_64 GNU/Linux

from mount:
type nfs (ro,tcp,rsize=3D8192,wsize=3D8192,nfsvers=3D3,addr=3D192.168.X=
X.YY)

> You can tune the amount of time the client caches information by usin=
g
> the 'ac*' mount options. In this case, you will want to adjust the
> values of 'acdirmin' and 'acdirmax', probably setting them to zero.
>
> 'man 5 nfs' should provide you with more information.

Ok, so setting acdirmin=3D0 and acdirmax=3D0 would mean no directory
content caching, right?

>> =A0Question 4: If we'd somehow manually detected such a directory
>> content inconsistency - would there be something like a 'hey NFS
>> client, flush all NFS caches NOW' thing?
>
> No.

unmount / mount would - but that's obviously not feasible. bugger
there's nothing for that...

>> =A0Question 5: any of this related to commit 37d9d76d8b3a2ac5817e1fa=
3263cfe
>> 0fdb439e51: NFS: flush cached directory information slightly more re=
adily. ?
>
> You client should be seeing mtime changes when you are creating new
> files, so it shouldn't need to look at the ctime.
>
> The only time when ctime changes are relevant are if you use 'rsync' =
to
> copy the files without specifying --omit-dir-times.
> IOW: if something is explicitly setting the mtime on the directory.

Would have to check if that applies in our case. We're doing
backup/restore from tivoli (tsm) - maybe that guy is to be blamed
for not correctly updating mtime/ctimes ?

Cheers,
Stefan

2009-08-21 19:09:32

by Stefan Egli

[permalink] [raw]
Subject: Re: NFS v3 cached directory content out of sync

> from uname:
> Linux (Ubuntu) 2.6.24-24-generic #1 SMP Tue Jun 30 19:54:36 UTC 2009
> x86_64 GNU/Linux
>
> from mount:
> type nfs (ro,tcp,rsize=8192,wsize=8192,nfsvers=3,addr=192.168.XX.YY)

sorry, that was wrong - it should read:
(rw,tcp,rsize=8192,wsize=8192,nfsvers=3,addr=192.168.XX.YY)

Cheers,
Stefan

2009-08-21 19:15:11

by Simon Kirby

[permalink] [raw]
Subject: Re: NFS v3 cached directory content out of sync

On Fri, Aug 21, 2009 at 09:08:01PM +0200, Stefan Egli wrote:

> >> ?Question 4: If we'd somehow manually detected such a directory
> >> content inconsistency - would there be something like a 'hey NFS
> >> client, flush all NFS caches NOW' thing?
> >
> > No.
>
> unmount / mount would - but that's obviously not feasible. bugger
> there's nothing for that...

Wouldn't (admittedly sledgehammer) echo 3 > /proc/sys/vm/drop_caches
accomplish this?

Simon-