(Corrected Chris Wedgwood's name and email.)
My friend Akkana followed my advice to use noatime on one of her
machines, but discovered that mutt was unusable because it always
thought that new messages had arrived since the last time it had
checked a folder (mbox format). I thought this was a bummer, so I
wrote a "relative lazy atime" patch which only updates the atime if
the old atime was less than the ctime or mtime. This is not the same
as the lazy atime patch of yore[1], which maintained a list of inodes
with dirty atimes and wrote them out on unmount.
Patch below. Current version (plus test program) is at:
http://infohost.nmt.edu/~val/patches.html#lazyatime
Thanks to everyone who discussed this with me, especially the folks on
#linuxfs on irc.oftc.net.
-VAL
[1] http://www.ussg.iu.edu/hypermail/linux/kernel/0005.0/0284.html
---
Relative lazy atime - only update atime if the old atime was older
than the ctime or mtime. We are maintaining atime relative to
mtime/ctime.
Not for inclusion (yet).
Written by Val Henson <[email protected]>
fs/inode.c | 16 +++++++++++++++-
fs/namespace.c | 5 ++++-
include/linux/fs.h | 1 +
include/linux/mount.h | 1 +
4 files changed, 21 insertions(+), 2 deletions(-)
diff -Nrup a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c 2006-08-02 23:10:56 -07:00
+++ b/fs/inode.c 2006-08-02 23:10:56 -07:00
@@ -1200,10 +1200,24 @@ void touch_atime(struct vfsmount *mnt, s
return;
now = current_fs_time(inode->i_sb);
- if (!timespec_equal(&inode->i_atime, &now)) {
+ if (timespec_equal(&inode->i_atime, &now))
+ return;
+ /*
+ * With lazy atime, only update atime if the previous atime is
+ * earlier than either the ctime or mtime.
+ */
+ if (!mnt ||
+ !(mnt->mnt_flags & MNT_LAZYATIME) ||
+ (timespec_compare(&inode->i_atime, &inode->i_mtime) < 0) ||
+ (timespec_compare(&inode->i_atime, &inode->i_ctime) < 0)) {
inode->i_atime = now;
mark_inode_dirty_sync(inode);
}
+ /*
+ * We could update the in-memory atime here if we wanted, but
+ * that makes it possible for atime to revert if we evict the
+ * inode from memory.
+ */
}
EXPORT_SYMBOL(touch_atime);
diff -Nrup a/fs/namespace.c b/fs/namespace.c
--- a/fs/namespace.c 2006-08-02 23:10:56 -07:00
+++ b/fs/namespace.c 2006-08-02 23:10:56 -07:00
@@ -376,6 +376,7 @@ static int show_vfsmnt(struct seq_file *
{ MNT_NOEXEC, ",noexec" },
{ MNT_NOATIME, ",noatime" },
{ MNT_NODIRATIME, ",nodiratime" },
+ { MNT_LAZYATIME, ",lazyatime" },
{ 0, NULL }
};
struct proc_fs_info *fs_infop;
@@ -1413,9 +1414,11 @@ long do_mount(char *dev_name, char *dir_
mnt_flags |= MNT_NOATIME;
if (flags & MS_NODIRATIME)
mnt_flags |= MNT_NODIRATIME;
+ if (flags & MS_LAZYATIME)
+ mnt_flags |= MNT_LAZYATIME;
flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
- MS_NOATIME | MS_NODIRATIME);
+ MS_NOATIME | MS_NODIRATIME | MS_LAZYATIME);
/* ... and get the mountpoint */
retval = path_lookup(dir_name, LOOKUP_FOLLOW, &nd);
diff -Nrup a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h 2006-08-02 23:10:56 -07:00
+++ b/include/linux/fs.h 2006-08-02 23:10:56 -07:00
@@ -119,6 +119,7 @@ extern int dir_notify_enable;
#define MS_PRIVATE (1<<18) /* change to private */
#define MS_SLAVE (1<<19) /* change to slave */
#define MS_SHARED (1<<20) /* change to shared */
+#define MS_LAZYATIME (1<<21) /* Lazily update on-disk access times. */
#define MS_ACTIVE (1<<30)
#define MS_NOUSER (1<<31)
diff -Nrup a/include/linux/mount.h b/include/linux/mount.h
--- a/include/linux/mount.h 2006-08-02 23:10:56 -07:00
+++ b/include/linux/mount.h 2006-08-02 23:10:56 -07:00
@@ -27,6 +27,7 @@ struct namespace;
#define MNT_NOEXEC 0x04
#define MNT_NOATIME 0x08
#define MNT_NODIRATIME 0x10
+#define MNT_LAZYATIME 0x20
#define MNT_SHRINKABLE 0x100
On Wed, Aug 02, 2006 at 11:36:22PM -0700, Valerie Henson wrote:
> (Corrected Chris Wedgwood's name and email.)
>
> My friend Akkana followed my advice to use noatime on one of her
> machines, but discovered that mutt was unusable because it always
> thought that new messages had arrived since the last time it had
> checked a folder (mbox format). I thought this was a bummer, so I
This is why people normally recommend "nodiratime" ...
On Wed, Aug 02, 2006 at 11:36:22PM -0700, Valerie Henson wrote:
> (Corrected Chris Wedgwood's name and email.)
>
> My friend Akkana followed my advice to use noatime on one of her
> machines, but discovered that mutt was unusable because it always
> thought that new messages had arrived since the last time it had
> checked a folder (mbox format). I thought this was a bummer, so I
> wrote a "relative lazy atime" patch which only updates the atime if
> the old atime was less than the ctime or mtime. This is not the same
> as the lazy atime patch of yore[1], which maintained a list of inodes
> with dirty atimes and wrote them out on unmount.
Another idea, similar to how atime updates work in xfs currently might
be interesting: Always update atime in core, but don't start a
transaction just for it - instead only flush it when you'd do it anyway,
that is another transaction or evicting the inode.
Christoph Hellwig wrote:
> Another idea, similar to how atime updates work in xfs currently might
> be interesting: Always update atime in core, but don't start a
> transaction just for it - instead only flush it when you'd do it anyway,
> that is another transaction or evicting the inode.
this is sort of having a "dirty" and "dirty atime" split for the inode I suppose..
shouldn't be impossible to do with a bit of vfs support..
On Sat, 2006-08-05 at 14:25 +0200, Christoph Hellwig wrote:
> On Wed, Aug 02, 2006 at 11:36:22PM -0700, Valerie Henson wrote:
> > (Corrected Chris Wedgwood's name and email.)
> >
> > My friend Akkana followed my advice to use noatime on one of her
> > machines, but discovered that mutt was unusable because it always
> > thought that new messages had arrived since the last time it had
> > checked a folder (mbox format). I thought this was a bummer, so I
> > wrote a "relative lazy atime" patch which only updates the atime if
> > the old atime was less than the ctime or mtime. This is not the same
> > as the lazy atime patch of yore[1], which maintained a list of inodes
> > with dirty atimes and wrote them out on unmount.
>
> Another idea, similar to how atime updates work in xfs currently might
> be interesting: Always update atime in core, but don't start a
> transaction just for it - instead only flush it when you'd do it anyway,
> that is another transaction or evicting the inode.
Hmm. That adds a cost to evicting what the vfs considers a clean inode.
It seems wrong, but if that's what xfs does, it must not be a problem.
Shaggy
--
David Kleikamp
IBM Linux Technology Center
On Sat, 2006-08-05 at 11:58 -0500, Dave Kleikamp wrote:
> On Sat, 2006-08-05 at 14:25 +0200, Christoph Hellwig wrote:
> > On Wed, Aug 02, 2006 at 11:36:22PM -0700, Valerie Henson wrote:
> > > (Corrected Chris Wedgwood's name and email.)
> > >
> > > My friend Akkana followed my advice to use noatime on one of her
> > > machines, but discovered that mutt was unusable because it always
> > > thought that new messages had arrived since the last time it had
> > > checked a folder (mbox format). I thought this was a bummer, so I
> > > wrote a "relative lazy atime" patch which only updates the atime if
> > > the old atime was less than the ctime or mtime. This is not the same
> > > as the lazy atime patch of yore[1], which maintained a list of inodes
> > > with dirty atimes and wrote them out on unmount.
> >
> > Another idea, similar to how atime updates work in xfs currently might
> > be interesting: Always update atime in core, but don't start a
> > transaction just for it - instead only flush it when you'd do it anyway,
> > that is another transaction or evicting the inode.
>
> Hmm. That adds a cost to evicting what the vfs considers a clean inode.
the vfs shouldn't consider it clean, it should consider it "atime-only
dirty".. with that many of the vfs interaction issues ought to go away
On Sat, Aug 05, 2006 at 07:04:34PM +0200, Arjan van de Ven wrote:
> the vfs shouldn't consider it clean, it should consider it
> "atime-only dirty".. with that many of the vfs interaction issues
> ought to go away
should it be atime-dirty or non-critical-dirty? (ie. make it more
generic to cover cases where we might have other non-critical fields
to flush if we can but can tolerate loss if we dont)
adminitedly atime is the only one i can think of now
On Sat, Aug 05, 2006 at 11:36:09AM -0700, Chris Wedgwood wrote:
> should it be atime-dirty or non-critical-dirty? (ie. make it more
> generic to cover cases where we might have other non-critical fields
> to flush if we can but can tolerate loss if we dont)
So, just to be sure - we're fine with atime being lost due to crashes,
errors, etc?
I don't see why not, but I figure it'd be good to make sure there's some
concensus on that.
If that is in fact the case, OCFS2 could do the same thing as XFS and
update disk only when we're going there for some other reason. The only
thing that we would have to add on top of that is a disk write when we're
dropping a cluster lock and the inode is still 'atime-dirty'.
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
[email protected]
On Sat, 5 Aug 2006, Mark Fasheh wrote:
> On Sat, Aug 05, 2006 at 11:36:09AM -0700, Chris Wedgwood wrote:
>> should it be atime-dirty or non-critical-dirty? (ie. make it more
>> generic to cover cases where we might have other non-critical fields
>> to flush if we can but can tolerate loss if we dont)
> So, just to be sure - we're fine with atime being lost due to crashes,
> errors, etc?
at least as a optional mode of operation yes.
I'm sure someone will want/need the existing 'update atime immediatly', and
there are people who don't care about atime at all (and use noatime), but there
is a large middle ground between them where atime is helpful, but doesn't need
the real-time update or crash protection.
David Lang
On Sat, 5 Aug 2006, David Lang wrote:
> On Sat, 5 Aug 2006, Mark Fasheh wrote:
>
> > On Sat, Aug 05, 2006 at 11:36:09AM -0700, Chris Wedgwood wrote:
> > > should it be atime-dirty or non-critical-dirty? (ie. make it more
> > > generic to cover cases where we might have other non-critical fields
> > > to flush if we can but can tolerate loss if we dont)
> > So, just to be sure - we're fine with atime being lost due to crashes,
> > errors, etc?
>
> at least as a optional mode of operation yes.
>
> I'm sure someone will want/need the existing 'update atime immediatly', and
> there are people who don't care about atime at all (and use noatime), but
> there is a large middle ground between them where atime is helpful, but
> doesn't need the real-time update or crash protection.
i can't understand when atime is *ever* reliable... root doing backups
with something like rsync will cause atimes to change. (and it can't
save/restore the atime without race conditions.)
you can work around mutt's silly dependancy on atime by configuring it
with --enable-buffy-size. so far mutt is the only program i've discovered
which cares about atime.
also -- i wasn't aware that xfs tried to do a better job with atime
updates... i'm not sure it's really that effective. i've got a
busy shell/mail/web server, and here's a typical 60s sample with
noatime,nodiratime on xfs:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.77 13.72 25.94 418.34 328.72 18.84 2.21 55.75 3.63 14.40
and a typical 60s sample with atime,diratime:
sda 0.07 0.58 15.82 35.87 472.13 412.52 17.12 0.70 13.56 3.54 18.30
that's been my experience in general... an extra 15 to 20% iops required
to maintain atime... just for mutt... no thanks :)
(btw there's nvram underneath sda, so the await change isn't too
surprising.)
-dean
p.s. lazyatime sounds like a nice hack to make mutt work too.
On Sat, Aug 05, 2006 at 04:28:29PM -0700, dean gaudet wrote:
> i can't understand when atime is *ever* reliable... root doing
> backups with something like rsync will cause atimes to change.
well, rsync and friends could use O_NOATIME, but usually that isn't
worth the pain
> you can work around mutt's silly dependancy on atime by configuring
> it with --enable-buffy-size. so far mutt is the only program i've
> discovered which cares about atime.
i've seen other applications use it, but most are pretty tolerant
about it not working that way you might think it would
> also -- i wasn't aware that xfs tried to do a better job with atime
> updates... i'm not sure it's really that effective. i've got a
> busy shell/mail/web server
OT, you might fine ikeep helps a little
On Sat, Aug 05, 2006 at 04:06:47PM -0700, David Lang wrote:
> On Sat, 5 Aug 2006, Mark Fasheh wrote:
>
> >On Sat, Aug 05, 2006 at 11:36:09AM -0700, Chris Wedgwood wrote:
> >>should it be atime-dirty or non-critical-dirty? (ie. make it more
> >>generic to cover cases where we might have other non-critical fields
> >>to flush if we can but can tolerate loss if we dont)
> >So, just to be sure - we're fine with atime being lost due to crashes,
> >errors, etc?
>
> at least as a optional mode of operation yes.
Well, it certainly doesn't sound like XFS is making this sort of guarantee.
Another method I've seen file systems use is to only update atime if the
difference between current time and the current inode atime is greater than
some timeout value. I'm not a huge fan of that approach as it seems less
predicatble than Val's, and doesn't (theoretically) preserve performance
like the XFS approach.
That said, it's not like OCFS2 doesn't have to care just because other file
systems don't but I don't see much complaining about their methods.
> I'm sure someone will want/need the existing 'update atime immediatly', and
> there are people who don't care about atime at all (and use noatime), but
> there is a large middle ground between them where atime is helpful, but
> doesn't need the real-time update or crash protection.
For OCFS2, the performance hit of 'immediate' atime updates could be
considerable. It's essentially turning a bunch of read only cluster locks
into exclusive ones, and forcing a disk update. This means that other nodes
who have locks on the object would have to invalidate their cache. The node
doing the update will be have to flush it's journal before giving the lock
up, incurring additional cost.
We could certainly give OCFS2 an "immediate atime" mode, but maybe that
could be done as a different mount option? Say, "immatime"? That way the
default is to get the large middle ground. Those who want more performance
could use 'noatime', and those concerned with absolute 100% correctness of
atime in the event of a crash could use the new option.
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
[email protected]
On Sat, Aug 05, 2006 at 04:28:29PM -0700, dean gaudet wrote:
> you can work around mutt's silly dependancy on atime by configuring it
> with --enable-buffy-size. so far mutt is the only program i've discovered
> which cares about atime.
For the shell, atime is the difference between 'you have mail' and 'you
have new mail'.
I still don't understand though, how much does this really buy us over
nodiratime?
On Sat, Aug 05, 2006 at 09:01:47PM -0600, Matthew Wilcox wrote:
> On Sat, Aug 05, 2006 at 04:28:29PM -0700, dean gaudet wrote:
> > you can work around mutt's silly dependancy on atime by configuring it
> > with --enable-buffy-size. so far mutt is the only program i've discovered
> > which cares about atime.
>
> For the shell, atime is the difference between 'you have mail' and 'you
> have new mail'.
>
> I still don't understand though, how much does this really buy us over
> nodiratime?
Lazy atime buys us a reduction in writes over nodiratime for any
workload which reads files, such as grep -r, a kernel compile, or
backup software. Do I misunderstand the question?
-VAL
On Tue, 8 August 2006 23:39:49 -0700, Valerie Henson wrote:
> On Sat, Aug 05, 2006 at 09:01:47PM -0600, Matthew Wilcox wrote:
> > On Sat, Aug 05, 2006 at 04:28:29PM -0700, dean gaudet wrote:
> > > you can work around mutt's silly dependancy on atime by configuring it
> > > with --enable-buffy-size. so far mutt is the only program i've discovered
> > > which cares about atime.
> >
> > For the shell, atime is the difference between 'you have mail' and 'you
> > have new mail'.
> >
> > I still don't understand though, how much does this really buy us over
> > nodiratime?
>
> Lazy atime buys us a reduction in writes over nodiratime for any
> workload which reads files, such as grep -r, a kernel compile, or
> backup software. Do I misunderstand the question?
At the risk of stating the obvious, let me try to explain what each
method does:
1. standard
Every read access to a file/directory causes an atime update.
2. nodiratime
Every read access to a non-directory causes an atime update.
3. lazy atime
The first read access to a file/directory causes an atime update.
4. noatime
No read access to a file/directory causes an atime update.
In comparison, lazy atime will cause more atime updates for
directories and vastly fewer for non-directories. Effectively atime
is turned into little more than a flag, stating whether the file was
ever read since the last write to it. And it appears as if neither
mutt nor the shell use atime for more than this flagging purpose, so I
am rather fond of the idea.
J?rn
--
The cheapest, fastest and most reliable components of a computer
system are those that aren't there.
-- Gordon Bell, DEC labratories
On Wed, 2006-08-09 at 14:21 +0200, J?rn Engel wrote:
> 1. standard
> Every read access to a file/directory causes an atime update.
>
> 2. nodiratime
> Every read access to a non-directory causes an atime update.
>
> 3. lazy atime
> The first read access to a file/directory causes an atime update.
>
> 4. noatime
> No read access to a file/directory causes an atime update.
>
> In comparison, lazy atime will cause more atime updates for
> directories and vastly fewer for non-directories.
Using nodiratime and lazy atime together would probably be the best
option for those that only want atime for mutt/shell mail notification.
> Effectively atime
> is turned into little more than a flag, stating whether the file was
> ever read since the last write to it. And it appears as if neither
> mutt nor the shell use atime for more than this flagging purpose, so I
> am rather fond of the idea.
>
> J?rn
>
--
David Kleikamp
IBM Linux Technology Center
On Sat, Aug 05, 2006 at 06:22:50AM -0700, Arjan van de Ven wrote:
> Christoph Hellwig wrote:
> >Another idea, similar to how atime updates work in xfs currently might
> >be interesting: Always update atime in core, but don't start a
> >transaction just for it - instead only flush it when you'd do it anyway,
> >that is another transaction or evicting the inode.
>
> this is sort of having a "dirty" and "dirty atime" split for the inode I
> suppose..
> shouldn't be impossible to do with a bit of vfs support..
This is certainly another possibility. There may be other uses for
the idea of a half-dirty inode.
However, one thing I want to avoid is an event that would cause the
build-up and subsequent write-out of a big list of half-dirty inodes -
think about the worst case: grep -r of the entire file system,
followed by some kind of memory pressure or an unmount. Would we then
flush out a write to every inode in the file system? Ew. (This is
worse than having atime on because with full atime, the writes would
be spread out during the execution of the grep -r command.)
-VAL
BTW, there may be another atime mode to consider, which I believe is what
Windows XP does w/ NTFS: only update the atime if it's newer than the
on-disk file's atime by N seconds (defaults to one hour, but I believe it's
configurable). There could be scenarios in which this mode is preferable.
Erez.
On Wed, Aug 09, 2006 at 02:21:34PM +0200, J?rn Engel wrote:
> At the risk of stating the obvious, let me try to explain what each
> method does:
>
> 1. standard
> Every read access to a file/directory causes an atime update.
>
> 2. nodiratime
> Every read access to a non-directory causes an atime update.
>
> 3. lazy atime
> The first read access to a file/directory causes an atime update.
>
> 4. noatime
> No read access to a file/directory causes an atime update.
5. lazy atime writeout
To reduce the pain of a fully functional atime only flush "atime-dirty"
inodes when the on-disk/in-core atime difference becomes big enough
(e.g. by maintaining an "atime dirtyness" level for the in-core inode).
I haven't seen anyone mentioning it but properly written cleanup programs
for /tmp et.al. do depend on atimes. When a system crashes after a long
time then (3) and (4) will probably cause /tmp to be wiped out because
at the next boot all atimes will be really old.
--
Frank
Valerie Henson wrote:
> On Sat, Aug 05, 2006 at 06:22:50AM -0700, Arjan van de Ven wrote:
>
>> Christoph Hellwig wrote:
>>
>>> Another idea, similar to how atime updates work in xfs currently might
>>> be interesting: Always update atime in core, but don't start a
>>> transaction just for it - instead only flush it when you'd do it anyway,
>>> that is another transaction or evicting the inode.
>>>
>> this is sort of having a "dirty" and "dirty atime" split for the inode I
>> suppose..
>> shouldn't be impossible to do with a bit of vfs support..
>>
>
> This is certainly another possibility. There may be other uses for
> the idea of a half-dirty inode.
>
> However, one thing I want to avoid is an event that would cause the
> build-up and subsequent write-out of a big list of half-dirty inodes -
> think about the worst case: grep -r of the entire file system,
> followed by some kind of memory pressure or an unmount. Would we then
> flush out a write to every inode in the file system? Ew. (This is
> worse than having atime on because with full atime, the writes would
> be spread out during the execution of the grep -r command.)
>
An unmount will flush out anything that is dirty anyway. Memory
pressure is easier, it usually don't have to flush everything,
only enough to satisfy the memory requests.
Of course you can have some mechanism that trickles out writes
of half-dirty stuff. Similiar to how dirty stuff is flushed, but with
a much lower frequency.
Helge Hafting
Frank van Maarseveen <[email protected]> wrote:
> I haven't seen anyone mentioning it but properly written cleanup programs
> for /tmp et.al. do depend on atimes. When a system crashes after a long
> time then (3) and (4) will probably cause /tmp to be wiped out because
> at the next boot all atimes will be really old.
s-/tmp-/var/tmp-, since you should expect /tmp to be wiped after a reboot
(especially if it's tmpfs).
--
Ich danke GMX daf?r, die Verwendung meiner Adressen mittels per SPF
verbreiteten L?gen zu sabotieren.
http://david.woodhou.se/why-not-spf.html
Valerie Henson wrote:
> On Sat, Aug 05, 2006 at 09:01:47PM -0600, Matthew Wilcox wrote:
>> On Sat, Aug 05, 2006 at 04:28:29PM -0700, dean gaudet wrote:
>>> you can work around mutt's silly dependancy on atime by configuring it
>>> with --enable-buffy-size. so far mutt is the only program i've discovered
>>> which cares about atime.
>> For the shell, atime is the difference between 'you have mail' and 'you
>> have new mail'.
>>
>> I still don't understand though, how much does this really buy us over
>> nodiratime?
>
> Lazy atime buys us a reduction in writes over nodiratime for any
> workload which reads files, such as grep -r, a kernel compile, or
> backup software. Do I misunderstand the question?
I mentioned lazy atime about a year ago, and have played with a patch to
do what I (personally) had in mind. My thinking is that for files the
atime is almost always used in one of two ways, as part of system
administration to see if a file is being used, and to sort files by
atime to identify recently accessed files, such as the one you read just
before the weekend.
So in that light, I proposed that a filesystem might have a mount option
such that atime was only updated when an open or close was done on the
file. In many cases this will both reduce inode writes and still
preserve information "current enough" to be useful, which is unavailable
with noatime. And since noatime is thought useful as a attribute, lazy
atime probably would be, as well.
--
Bill Davidsen <[email protected]>
Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one errors occurs during
wildcard (glob) expansion.