2007-11-17 22:12:01

by Timo Sirainen

[permalink] [raw]
Subject: Re: [NFS] Cache flushing

On Sat, 2007-11-17 at 15:41 -0500, Trond Myklebust wrote:
> On Sat, 2007-11-17 at 22:12 +0200, Timo Sirainen wrote:
> > On 17.11.2007, at 21.46, Trond Myklebust wrote:
> > > Why is this needed?
> >
> > Do you mean why is flushing attribute cache needed, or why is this
> > particular way to flush it needed?
> >
> > I need to be able to find out if a file has changed, so I need to get
> > its attribute cache flushed. fchown()ing to -1, -1 would work safely
> > in all situations because it's guaranteed not to change the file in
> > any way.
>
> Why can't you simply close(), and then re-open() the file? That is _the_
> standard way to force an attribute cache revalidation on all NFS
> versions. The close-to-open caching model, which is implemented on most
> NFS clients guarantees this.

Interesting. Too bad this is the first time I have heard of it (where as
I've seen fchown()/chown() suggested in several mailing lists before). I
understood NFS FAQ's close-to-open caching description to mean only how
data caching is handled.

close()+open() would have been difficult to handle because open() can
fail, but looks like opening another file descriptor and closing it
works just as well. Also looks like it works for flushing directories'
attribute cache (which doesn't seem to work with FreeBSD though).

There's one potential problem with closing a file descriptor though. It
loses all fcntl locks for that file from all fds. I'm not sure if this a
problem for me though.

> > Also O_DIRECT is a bit too much for my use case. I do want the file
> > to be cached for the most part, but there are times occationally when
> > parts of it can be overwritten, and I need to make sure that in those
> > situations the newest data is read.
> >
> > If you want a wider description of what I'm trying to do: I'm
> > developing Dovecot IMAP server. A lot of people store mails on NFS
> > and want to have multiple IMAP servers be able to access the mails.
> > Dovecot uses somewhat complex index files to speed up accessing the
> > mailboxes, and it's mainly for these index files that I need this
> > explicit control over caching. If two servers are accessing the same
> > mailbox at the same time, the index files get easily corrupted if I
> > can't control the caching.
>
> So how are you ensuring that both servers don't try writing to the same
> locations? You must have some form of synchronisation scheme for this to
> work.

Well, there's no simple answer for that. :) There are 3 different kinds
of index files with completely different locking behavior, because I try
to avoid long lasting locks. I do use write locks, but reads are mainly
lockless. There's this transaction log file which tells me when
something has changed, so I know when data cache need to be flushed.

Anyway, all of this is working already, but I'd just like to get the
performance a bit better with Linux by avoiding those unnecessary lock
+unlock sequences.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part
(No filename) (228.00 B)
(No filename) (362.00 B)
Download all attachments

2007-11-17 23:52:16

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] Cache flushing


On Sun, 2007-11-18 at 00:11 +0200, Timo Sirainen wrote:
> On Sat, 2007-11-17 at 15:41 -0500, Trond Myklebust wrote:
> > Why can't you simply close(), and then re-open() the file? That is _the_
> > standard way to force an attribute cache revalidation on all NFS
> > versions. The close-to-open caching model, which is implemented on most
> > NFS clients guarantees this.
>
> Interesting. Too bad this is the first time I have heard of it (where as
> I've seen fchown()/chown() suggested in several mailing lists before). I
> understood NFS FAQ's close-to-open caching description to mean only how
> data caching is handled.

No. It handles attribute caching also. In NFSv2/v3, the mtime is used to
decide whether or not the data has changed. In NFSv4 it is the change
attribute. In either case, you need to force an attribute cache
update...

> close()+open() would have been difficult to handle because open() can
> fail, but looks like opening another file descriptor and closing it
> works just as well. Also looks like it works for flushing directories'
> attribute cache (which doesn't seem to work with FreeBSD though).
>
> There's one potential problem with closing a file descriptor though. It
> loses all fcntl locks for that file from all fds. I'm not sure if this a
> problem for me though.

Right. I understood that you were not using fcntl() locks. Those will in
any case ensure cache revalidation, so you wouldn't have to use anything
else.

> > > Also O_DIRECT is a bit too much for my use case. I do want the file
> > > to be cached for the most part, but there are times occationally when
> > > parts of it can be overwritten, and I need to make sure that in those
> > > situations the newest data is read.
> > >
> > > If you want a wider description of what I'm trying to do: I'm
> > > developing Dovecot IMAP server. A lot of people store mails on NFS
> > > and want to have multiple IMAP servers be able to access the mails.
> > > Dovecot uses somewhat complex index files to speed up accessing the
> > > mailboxes, and it's mainly for these index files that I need this
> > > explicit control over caching. If two servers are accessing the same
> > > mailbox at the same time, the index files get easily corrupted if I
> > > can't control the caching.
> >
> > So how are you ensuring that both servers don't try writing to the same
> > locations? You must have some form of synchronisation scheme for this to
> > work.
>
> Well, there's no simple answer for that. :) There are 3 different kinds
> of index files with completely different locking behavior, because I try
> to avoid long lasting locks. I do use write locks, but reads are mainly
> lockless. There's this transaction log file which tells me when
> something has changed, so I know when data cache need to be flushed.
>
> Anyway, all of this is working already, but I'd just like to get the
> performance a bit better with Linux by avoiding those unnecessary lock
> +unlock sequences.

Again, you can use the close-to-open trick of simply reopening the file
whenever you need to revalidate the data cache.

The problem, however, is that on the most common Linux filesystems
(ext2/ext3, reiserfs,...) the time resolution on the mtime is 1 second.
If your NFS server is running one of those filesystems, then the data
cache revalidation may fail to detect a write that happens within 1
second of the previous write.

Cheers
Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs


2007-11-20 02:14:54

by Timo Sirainen

[permalink] [raw]
Subject: Re: [NFS] Cache flushing

On Sun, 2007-11-18 at 00:11 +0200, Timo Sirainen wrote:
> > Why can't you simply close(), and then re-open() the file? That is _the_
> > standard way to force an attribute cache revalidation on all NFS
> > versions. The close-to-open caching model, which is implemented on most
> > NFS clients guarantees this.
..
> close()+open() would have been difficult to handle because open() can
> fail, but looks like opening another file descriptor and closing it
> works just as well. Also looks like it works for flushing directories'
> attribute cache (which doesn't seem to work with FreeBSD though).

Actually it works for flushing a directory's attribute cache in
v2.6.17-rc2, but not in v2.6.22. chown() works in v2.6.22 also. Is there
a reason for this change? I guess it anyway means that I'm back to using
chown() for flushing a directory's attribute cache.

The reason for flushing a directory's attribute cache is so that I can
see with stat() if an open file under it has been replaced (if its inode
has changed).

http://dovecot.org/tools/nfstest.c can be used to easily test what works
and what doesn't for cache flushes.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part
(No filename) (228.00 B)
(No filename) (362.00 B)
Download all attachments

2007-11-20 23:46:59

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] Cache flushing


On Tue, 2007-11-20 at 04:14 +0200, Timo Sirainen wrote:
> On Sun, 2007-11-18 at 00:11 +0200, Timo Sirainen wrote:
> > > Why can't you simply close(), and then re-open() the file? That is _the_
> > > standard way to force an attribute cache revalidation on all NFS
> > > versions. The close-to-open caching model, which is implemented on most
> > > NFS clients guarantees this.
> ..
> > close()+open() would have been difficult to handle because open() can
> > fail, but looks like opening another file descriptor and closing it
> > works just as well. Also looks like it works for flushing directories'
> > attribute cache (which doesn't seem to work with FreeBSD though).
>
> Actually it works for flushing a directory's attribute cache in
> v2.6.17-rc2, but not in v2.6.22. chown() works in v2.6.22 also. Is there
> a reason for this change? I guess it anyway means that I'm back to using
> chown() for flushing a directory's attribute cache.

close-to-open caching works fine for me in v2.6.22. It does indeed send
a GETATTR and revalidate the inode.

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs