Hi Andrew!
according to todays vfs strategy (hope it hasn't changed
again), here is the first patch, which adds the mount
flags propagation, fixes the /proc display, and implements
noatime and nodiratime per mountpoint ...
please consider for inclusion ...
best,
Herbert
On Tue, Apr 06, 2004 at 04:55:44PM +0200, Herbert Poetzl wrote:
>
> Hi Andrew!
>
> according to todays vfs strategy (hope it hasn't changed
> again), here is the first patch, which adds the mount
> flags propagation, fixes the /proc display, and implements
> noatime and nodiratime per mountpoint ...
>
> please consider for inclusion ...
noatime/nodiratime: OK, but we still have direct modifications of i_atime
that need to be taken care of.
massage of ->show(): more or less OK. However, we don't need to keep
MS_NOATIME and MS_NODIRATIME in flags at all -
> + if (flags & MS_NOATIME)
> + mnt_flags |= MNT_NOATIME;
> + if (flags & MS_NODIRATIME)
> + mnt_flags |= MNT_NODIRATIME;
> flags &= ~(MS_NOSUID|MS_NOEXEC|MS_NODEV);
should remove them from flags in the last line, same way we do that for
nosuid/noexec/nodev, with obvious consequences for ->show().
Note that we don't need to keep MS_NOATIME check in update_atime() - that
animal is purely per-mountpoint now.
> + if (MNT_IS_NOATIME(mnt))
> + return;
> + if (S_ISDIR(inode->i_mode) && MNT_IS_NODIRATIME(mnt))
> + return;
Do we need those to be macros? AFAICS, this is the only place where we
do such checks and we shouldn't get new callers. IOW, keeping them
separate doesn't buy us anything and only obfuscates the code.
> -#define MNT_NOSUID 1
> -#define MNT_NODEV 2
> -#define MNT_NOEXEC 4
> +#define MNT_RDONLY 1
> +#define MNT_NOSUID 2
> +#define MNT_NODEV 4
> +#define MNT_NOEXEC 8
> +#define MNT_NOATIME 16
> +#define MNT_NODIRATIME 32
*ugh*
a) what's the point of reordering them (rdonly shifting the existing ones)?
b) since MNT_RDONLY doesn't do anything at that point, why introduce it
(and associated confusion) now? As it is, your /proc/mounts will pretend
that per-mountpoint r/o works right now. Since it doesn't...
> +#define MNT_IS_RDONLY(m) ((m) && ((m)->mnt_flags & MNT_RDONLY))
> +#define MNT_IS_NOATIME(m) ((m) && ((m)->mnt_flags & MNT_NOATIME))
> +#define MNT_IS_NODIRATIME(m) ((m) && ((m)->mnt_flags & MNT_NODIRATIME))
See above. Besides, are we ever planning to pass NULL to these guys?
On Tue, Apr 06, 2004 at 09:48:44PM +0100, [email protected] wrote:
> noatime/nodiratime: OK, but we still have direct modifications of i_atime
> that need to be taken care of.
Particulary interesting one is in tty_io.c. There we
1) unconditionally touch i_atime on read()
2) do not touch it on write()
3) never mark the inode dirty.
Note that the last one means that doing stat() in a loop will sometimes
give atime going backwards. We also completely ignore noatime here.
There are similar places in some other char drivers. Obvious step would
be to have them do file_accessed() instead; however, I'd really like to
hear the rationale for existing behaviour. Comments?
On Wed, Apr 07, 2004 at 12:11:36AM +0100, [email protected] wrote:
> Note that the last one means that doing stat() in a loop will sometimes
> give atime going backwards. We also completely ignore noatime here.
>
> There are similar places in some other char drivers. Obvious step would
> be to have them do file_accessed() instead; however, I'd really like to
> hear the rationale for existing behaviour. Comments?
I believe its so that we update the data in the cache, and avoid writing
it back to disk unnecessarily - consider the case where you have a lot
of tty activity (which updates atime). You don't particularly want to
be committing atime updates to disk every, what, 5 seconds, or performing
the NFS operations for the same.
The above is my understanding of the situation, which comes from when I
looked into these issues back in 2.0.3x days on a root-NFS machine and
asked (iirc) Alan Cox about it. - in other words, don't attach too much
reliability on it. 8)
[And for those who don't know - why are tty atimes updated in the
first place? For 'w' 'finger' etc which report login idle times
( := now - tty atime ).]
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
On Wed, Apr 07, 2004 at 12:35:06AM +0100, Russell King wrote:
> I believe its so that we update the data in the cache, and avoid writing
> it back to disk unnecessarily - consider the case where you have a lot
> of tty activity (which updates atime). You don't particularly want to
> be committing atime updates to disk every, what, 5 seconds, or performing
> the NFS operations for the same.
OK, but at least we want to dirty the inode at some point (e.g. final
close), so that atime would be monotonous - as it is, we get it reset
when inode goes out of cache and is reread again...
On Tue, Apr 06, 2004 at 09:48:44PM +0100, [email protected] wrote:
> On Tue, Apr 06, 2004 at 04:55:44PM +0200, Herbert Poetzl wrote:
> >
> > Hi Andrew!
> >
> > according to todays vfs strategy (hope it hasn't changed
> > again), here is the first patch, which adds the mount
> > flags propagation, fixes the /proc display, and implements
> > noatime and nodiratime per mountpoint ...
> >
> > please consider for inclusion ...
>
> noatime/nodiratime: OK, but we still have direct modifications of i_atime
> that need to be taken care of.
okay, will look at it ...
> massage of ->show(): more or less OK. However, we don't need to keep
> MS_NOATIME and MS_NODIRATIME in flags at all -
> > + if (flags & MS_NOATIME)
> > + mnt_flags |= MNT_NOATIME;
> > + if (flags & MS_NODIRATIME)
> > + mnt_flags |= MNT_NODIRATIME;
> > flags &= ~(MS_NOSUID|MS_NOEXEC|MS_NODEV);
>
> should remove them from flags in the last line, same way we do that for
> nosuid/noexec/nodev, with obvious consequences for ->show().
>
> Note that we don't need to keep MS_NOATIME check in update_atime() - that
> animal is purely per-mountpoint now.
hmm, wasn't there a reason, for having them per inode
like for 'special' files, which I do not remember atm?
> > + if (MNT_IS_NOATIME(mnt))
> > + return;
> > + if (S_ISDIR(inode->i_mode) && MNT_IS_NODIRATIME(mnt))
> > + return;
>
> Do we need those to be macros? AFAICS, this is the only place where we
> do such checks and we shouldn't get new callers. IOW, keeping them
> separate doesn't buy us anything and only obfuscates the code.
okay, no problem with that ...
> > -#define MNT_NOSUID 1
> > -#define MNT_NODEV 2
> > -#define MNT_NOEXEC 4
> > +#define MNT_RDONLY 1
> > +#define MNT_NOSUID 2
> > +#define MNT_NODEV 4
> > +#define MNT_NOEXEC 8
> > +#define MNT_NOATIME 16
> > +#define MNT_NODIRATIME 32
>
> *ugh*
>
> a) what's the point of reordering them (rdonly shifting the existing ones)?
simple, to match the MS_* counterparts, something which
actually confused me in the first place (in the code)
> b) since MNT_RDONLY doesn't do anything at that point, why introduce it
> (and associated confusion) now? As it is, your /proc/mounts will pretend
> that per-mountpoint r/o works right now. Since it doesn't...
no, they won't. the required MNT_RDONLY flag in the proc
function isn't set, so only MS_RDONLY will be reported
(from the superblock) which is equivalent to what is
reported in the current version.
> > +#define MNT_IS_RDONLY(m) ((m) && ((m)->mnt_flags & MNT_RDONLY))
> > +#define MNT_IS_NOATIME(m) ((m) && ((m)->mnt_flags & MNT_NOATIME))
> > +#define MNT_IS_NODIRATIME(m) ((m) && ((m)->mnt_flags & MNT_NODIRATIME))
>
> See above. Besides, are we ever planning to pass NULL to these guys?
originally there was a 'comment' which said, the
(m) check can be removed, when we are sure that this
isn't called with mnt == NULL ...
so maybe a BUGON(!m) might be useful?
I'll add the 'updates' today ...
best,
Herbert
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Wed, Apr 07, 2004 at 08:46:37AM +0200, Herbert Poetzl wrote:
> originally there was a 'comment' which said, the
> (m) check can be removed, when we are sure that this
> isn't called with mnt == NULL ...
>
> so maybe a BUGON(!m) might be useful?
Accessing m->mnt_flags will do just fine ;-)
> > Note that we don't need to keep MS_NOATIME check in update_atime() - that
> > animal is purely per-mountpoint now.
>
> hmm, wasn't there a reason, for having them per inode
> like for 'special' files, which I do not remember atm?
Yes, but that's S_NOATIME in inode->i_flags, not MS_NOATIME in sb->s_flags.
Actually, I've been wrong here - we *do* need that check, since there are
filesystems that force noatime or nodiratime.
>From grepping for that stuff it appears that
a) a bunch of filesystems force nodiratime and do not allow to override it
on remount. Same for noatime (BTW, noatime implies nodiratime, so setting
both is pointless).
b) some filesystems force nodiratime at mount time, but do not care to preserve
it on remount. Most of those are my fault - I've missed the remount side of
things in readdir patch. They should have nodiratime always on to match
the original behaviour. IMO, both (a) and (b) should be handled by a new
field - sb->s_forced_flags. do_remount_sb() would set those in flags before
doing anything else. Note that assignment to ->s_flags in do_remount_sb()
is not safe - e.g. ext[23] can get forced r/o by fs error and if that
happens after return from ->remount_sb() but before assignement, we are
screwed. IOW, do_remount_sb() will need more work.
c) XFS has an odd wankfest around MS_NOATIME - grep for XFSMNT_NOATIME and
XFS_MOUNT_NOATIME. AFAICS the only use is in xfs_ichgtime() and it's
redundant - we check IS_NOATIME() right after checking for XFS_MOUNT_NOATIME
there. However, that will become very interesting when it gets to
per-mountpoint r/o
/*
* We're not supposed to change timestamps in readonly-mounted
* filesystems. Throw it away if anyone asks us.
*/
if (unlikely(vp->v_vfsp->vfs_flag & VFS_RDONLY))
return;
in xfs_ichgtime() is too damn deep in call chains, so...
d) nfs_getattr() is very odd - it overloads semantics of noatime and nodiratime
for NFS in a strange way.
On Wed, Apr 07, 2004 at 09:47:22AM +0100, [email protected] wrote:
> On Wed, Apr 07, 2004 at 08:46:37AM +0200, Herbert Poetzl wrote:
> > originally there was a 'comment' which said, the
> > (m) check can be removed, when we are sure that this
> > isn't called with mnt == NULL ...
> >
> > so maybe a BUGON(!m) might be useful?
>
> Accessing m->mnt_flags will do just fine ;-)
right ;)
> > > Note that we don't need to keep MS_NOATIME check in update_atime() - that
> > > animal is purely per-mountpoint now.
> >
> > hmm, wasn't there a reason, for having them per inode
> > like for 'special' files, which I do not remember atm?
>
> Yes, but that's S_NOATIME in inode->i_flags, not MS_NOATIME in sb->s_flags.
> Actually, I've been wrong here - we *do* need that check, since there are
> filesystems that force noatime or nodiratime.
so we keep them?
> >From grepping for that stuff it appears that
>
> a) a bunch of filesystems force nodiratime and do not allow to override it
> on remount. Same for noatime (BTW, noatime implies nodiratime, so setting
> both is pointless).
>
> b) some filesystems force nodiratime at mount time, but do not care to preserve
> it on remount. Most of those are my fault - I've missed the remount side of
> things in readdir patch. They should have nodiratime always on to match
> the original behaviour. IMO, both (a) and (b) should be handled by a new
> field - sb->s_forced_flags. do_remount_sb() would set those in flags before
> doing anything else. Note that assignment to ->s_flags in do_remount_sb()
> is not safe - e.g. ext[23] can get forced r/o by fs error and if that
> happens after return from ->remount_sb() but before assignement, we are
> screwed. IOW, do_remount_sb() will need more work.
are you going to do this?
> c) XFS has an odd wankfest around MS_NOATIME - grep for XFSMNT_NOATIME and
> XFS_MOUNT_NOATIME. AFAICS the only use is in xfs_ichgtime() and it's
> redundant - we check IS_NOATIME() right after checking for XFS_MOUNT_NOATIME
> there. However, that will become very interesting when it gets to
> per-mountpoint r/o
> /*
> * We're not supposed to change timestamps in readonly-mounted
> * filesystems. Throw it away if anyone asks us.
> */
> if (unlikely(vp->v_vfsp->vfs_flag & VFS_RDONLY))
> return;
> in xfs_ichgtime() is too damn deep in call chains, so...
>
> d) nfs_getattr() is very odd - it overloads semantics of noatime and nodiratime
> for NFS in a strange way.
open issues:
>>> a) what's the point of reordering them (rdonly shifting the existing ones)?
>> simple, to match the MS_* counterparts, something which
>> actually confused me in the first place (in the code)
so is this okay? actually I'd prefer to use the same
values for the MNT_NOATIME and MNT_NODIRATIME too ...
(as in the previous version I did)
best,
Herbert
On Wed, Apr 07, 2004 at 12:19:12PM +0200, Herbert Poetzl wrote:
> > Yes, but that's S_NOATIME in inode->i_flags, not MS_NOATIME in sb->s_flags.
> > Actually, I've been wrong here - we *do* need that check, since there are
> > filesystems that force noatime or nodiratime.
>
> so we keep them?
yes.
> > it on remount. Most of those are my fault - I've missed the remount side of
> > things in readdir patch. They should have nodiratime always on to match
> > the original behaviour. IMO, both (a) and (b) should be handled by a new
> > field - sb->s_forced_flags. do_remount_sb() would set those in flags before
> > doing anything else. Note that assignment to ->s_flags in do_remount_sb()
> > is not safe - e.g. ext[23] can get forced r/o by fs error and if that
> > happens after return from ->remount_sb() but before assignement, we are
> > screwed. IOW, do_remount_sb() will need more work.
>
> are you going to do this?
Yes - for now I'll do a fix that doesn't change API (IOW, add/update
->remount_fs() to filesystems with forced flags, forced r/o has the
same problems); we'll see what should be done in the long run when
the things clean up a bit. I'd rather avoid struct super_block changes
that would have to be reverted soon.
> >> simple, to match the MS_* counterparts, something which
> >> actually confused me in the first place (in the code)
>
> so is this okay? actually I'd prefer to use the same
> values for the MNT_NOATIME and MNT_NODIRATIME too ...
> (as in the previous version I did)
Hmm... Potentially we are breaking ABI for no good reason, since these
defines are visible to out-of-tree code. I don't think that we should
care about matching MS_... stuff, simply because MS_... encoding is ugly
as hell and there's no reason to use MNT_... and MS_... in the same
context.
On Wed, Apr 07, 2004 at 01:46:41PM +0100, [email protected] wrote:
> > >> simple, to match the MS_* counterparts, something which
> > >> actually confused me in the first place (in the code)
> >
> > so is this okay? actually I'd prefer to use the same
> > values for the MNT_NOATIME and MNT_NODIRATIME too ...
> > (as in the previous version I did)
>
> Hmm... Potentially we are breaking ABI for no good reason, since these
> defines are visible to out-of-tree code. I don't think that we should
> care about matching MS_... stuff, simply because MS_... encoding is ugly
> as hell and there's no reason to use MNT_... and MS_... in the same
> context.
hmm, well breaking ABI here should not hurt anywhere, as
they are 'just' defined flags, and no code should rely on the
actual values (or if it does, it is broken anyway, right?)
but if you 'think' that it will break something, it's okay
for me to keep the 'old' values ... and 'just' add new ones
for RDONLY, NOATIME, and NODIRATIME ...
best,
Herbert
On Wed, 7 Apr 2004 [email protected] wrote:
>
> On Tue, Apr 06, 2004 at 09:48:44PM +0100, [email protected] wrote:
>
> > noatime/nodiratime: OK, but we still have direct modifications of i_atime
> > that need to be taken care of.
>
> Particulary interesting one is in tty_io.c. There we
> 1) unconditionally touch i_atime on read()
> 2) do not touch it on write()
> 3) never mark the inode dirty.
All of 1-3 are correct.
They all mean that:
- atime never gets written out for tty devices, because there is no
point, and we shouldn't cause disk activity (or worse, network
activity) just because somebody is typing at the keyboard. Thus #3.
- atime is maintained properly for "last effective read" purposes, which
is what "ps"/"w" and friends want to see for idle time reporting.
Thus both #1 and #2 are important.
- atime is only valid as long as the tty is open (again, think "idle
time" - nobody actually cares what the atime is after the device has
been closed). Thus "atime potentially going backwards" due to #3 is a
non-issue.
So yes, tty atime updates are strange, but they are strange on PURPOSE.
> Note that the last one means that doing stat() in a loop will sometimes
> give atime going backwards. We also completely ignore noatime here.
Ignoring noatime is potentially the only one we should look at, but since
tty's really _are_ "noatime" as far as the filesystem is concerned, I
think it makes sense in the situation we are in anyway. The real reason
for "noatime" is to avoid unnecessary filesystem activity, not that we
necessarily want a constant atime.
> There are similar places in some other char drivers. Obvious step would
> be to have them do file_accessed() instead; however, I'd really like to
> hear the rationale for existing behaviour. Comments?
I don't know about other char drivers, those may well be wrong. But for
tty's in particular, idle time calculations really do pretty much require
the behaviour (apart from #3 - and #3 is, I think, effectively required by
not wanting to touch the disk on keyboard accesses).
Doing effectively a update_atime() on final tty close might be ok just to
avoid the backwards-running time, but you'd have to open-code it to avoid
the "inode_times_differ()" check. Not worth it, I feel, since atime on a
tty that has been closed is irrelevant.
Linus
On Wed, Apr 14, 2004 at 08:14:18AM -0700, Linus Torvalds wrote:
> Ignoring noatime is potentially the only one we should look at, but since
> tty's really _are_ "noatime" as far as the filesystem is concerned, I
> think it makes sense in the situation we are in anyway. The real reason
> for "noatime" is to avoid unnecessary filesystem activity, not that we
> necessarily want a constant atime.
Another thing we are ignoring is r/o. Oh, well - the same arguments apply.
> > There are similar places in some other char drivers. Obvious step would
> > be to have them do file_accessed() instead; however, I'd really like to
> > hear the rationale for existing behaviour. Comments?
>
> I don't know about other char drivers, those may well be wrong. But for
> tty's in particular, idle time calculations really do pretty much require
> the behaviour (apart from #3 - and #3 is, I think, effectively required by
> not wanting to touch the disk on keyboard accesses).
AFAICS, they simply copy what tty_io.c does. Out of these guys busmouse.c
might have a similar excuse; qtronix.c and sonypi.c AFAICS have no reason
to touch atime at all. No idea what should usb/core/devio.c do...
Anyway, I'm going down right now; expect a patchbomb tonight after I get
some sleep...