2023-10-18 20:47:54

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing

On Wed, 2023-10-18 at 12:18 -0700, Linus Torvalds wrote:
> On Wed, 18 Oct 2023 at 10:41, Jeff Layton <[email protected]> wrote:
> >
> > One way to prevent this is to ensure that when we stamp a file with a
> > fine-grained timestamp, that we use that value to establish a floor for
> > any later timestamp update.
>
> I'm very leery of this.
>
> I don't like how it's using a global time - and a global fine-grained
> offset - when different filesystems will very naturally have different
> granularities. I also don't like how it's no using that global lock.
>
> Yes, yes, since the use of this all is then gated by the 'is_mgtime()'
> thing, any filesystem with big granularities will presumably never set
> FS_MGTIME in the first time, and that hides the worst pointless cases.
> But it still feels iffy to me.
>

Thanks for taking a look!

I'm not too crazy about the global lock either, but the global fine
grained value ensures that when we have mtime changes that occur across
filesystems that they appear to be in the correct order.

We could (hypothetically) track an offset per superblock or something,
but then you could see out-of-order timestamps in inodes across
different filesystems (even of the same type). I think it'd better not
to do that if we can get away with it.


> Also, the whole current_ctime() logic seems wrong. Later (in 4/9), you do this:
>
> static struct timespec64 current_ctime(struct inode *inode)
> {
> if (is_mgtime(inode))
> return current_mgtime(inode);
>
> and current_mgtime() does
>
> if (nsec & I_CTIME_QUERIED) {
> ktime_get_real_ts64(&now);
> return timestamp_truncate(now, inode);
> }
>
> so once the code has set I_CTIME_QUERIED, it will now use the
> expensive fine-grained time - even when it makes no sense.
>
> As far as I can tell, there is *never* a reason to get the
> fine-grained time if the old inode ctime is already sufficiently far
> away.
>
> IOW, my gut feel is that all this logic should always not only be
> guarded by FS_MGTIME (like you do seem to do), *and* by "has anybody
> even queried this time" - it should *also* always be guarded by "if I
> get the coarse-grained time, is that sufficient?"
>
> So I think the current_ctime() logic should be something along the lines of
>
> static struct timespec64 current_ctime(struct inode *inode)
> {
> struct timespec64 ts64 = current_time(inode);
> unsigned long old_ctime_sec = inode->i_ctime_sec;
> unsigned int old_ctime_nsec = inode->i_ctime_nsec;
>
> if (ts64.tv_sec != old_ctime_sec)
> return ts64;
>
> /*
> * No need to check is_mgtime(inode) - the I_CTIME_QUERIED
> * flag is only set for mgtime filesystems
> */
> if (!(old_ctime_nsec & I_CTIME_QUERIED))
> return ts64;
> old_ctime_nsec &= ~I_CTIME_QUERIED;
> if (ts64.tv_nsec > old_ctime_nsec + inode->i_sb->s_time_gran)
> return ts64;
>

Does that really do what you expect here? current_time will return a
value that has already been through timestamp_truncate. Regardless, I
don't think this function makes as big a difference as you might think.

>
> /* Ok, only *now* do we do a finegrained value */
> ktime_get_real_ts64(&ts64);
> return timestamp_truncate(ts64);
> }
>
> or whatever. Make it *very* clear that the finegrained timestamp is
> the absolute last option, after we have checked that the regular one
> isn't possible.

current_mgtime is calling ktime_get_real_ts64, which is an existing
interface that does not take the global spinlock and won't advance the
global offset. That call should be quite cheap.

The reason we can use that call here is because current_ctime and
current_mgtime are only called from inode_needs_update_time, which is
only called to check whether we need to get write access to the
inode.?What we do is look at the current clock and see whether the
timestamps would perceivably change if we were to do the update right
then.

If so, we get write access and then call inode_set_ctime_current(). That
will fetch its own timestamps and reevaluate what sort of update to do.
That's the only place that fetches an expensive fine-grained timestamp
that advances the offset.

So, I think this set already is only getting the expensive fine-grained
timestamps as a last resort.

This is probably an indicator that I need to document this code better
though. It may also be a good idea to reorganize
inode_needs_update_time, current_ctime and current_mgtime for better
clarity.

Many thanks for the comments!
--
Jeff Layton <[email protected]>


2023-10-18 21:32:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing

On Wed, 18 Oct 2023 at 13:47, Jeff Layton <[email protected]> wrote:
>
> > old_ctime_nsec &= ~I_CTIME_QUERIED;
> > if (ts64.tv_nsec > old_ctime_nsec + inode->i_sb->s_time_gran)
> > return ts64;
> >
>
> Does that really do what you expect here? current_time will return a
> value that has already been through timestamp_truncate.

Yeah, you're right.

I think you can actually remove the s_time_gran addition. Both the
old_ctime_nsec and the current ts64,tv_nsec are already rounded, so
just doing

if (ts64.tv_nsec > old_ctime_nsec)
return ts64;

would already guarantee that it's different enough.

> current_mgtime is calling ktime_get_real_ts64, which is an existing
> interface that does not take the global spinlock and won't advance the
> global offset. That call should be quite cheap.

Ahh, I did indeed mis-read that as the new one with the lock.

I did react to the fact that is_mgtime(inode) itself is horribly
expensive if it's not cached (following three quite possibly cold
pointers), which was part of that whole "look at I_CTIME_QUERIED
instead".

I see the pointer chasing as a huge VFS timesink in all my profiles,
although usually it's the disgusting security pointer (because selinux
creates separate security nodes for each inode, even when the contents
are often identical). So I'm a bit sensitive to "follow several
pointers from 'struct inode'" patterns from looking at too many
instruction profiles.

Linus