2022-08-23 14:37:56

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] iversion: update comments with info about atime updates

On Tue, 2022-08-23 at 21:38 +1000, NeilBrown wrote:
> On Tue, 23 Aug 2022, Jeff Layton wrote:
> > So, we can refer to that and simply say:
> >
> > "If the function updates the mtime or ctime on the inode, then the
> > i_version should be incremented. If only the atime is being updated,
> > then the i_version should not be incremented. The exception to this rule
> > is explicit atime updates via utimes() or similar mechanism, which
> > should result in the i_version being incremented."
>
> Is that exception needed? utimes() updates ctime.
>
> https://man7.org/linux/man-pages/man2/utimes.2.html
>
> doesn't say that, but
>
> https://pubs.opengroup.org/onlinepubs/007904875/functions/utimes.html
>
> does, as does the code.
>

Oh, good point! I think we can leave that out. Even better!
--
Jeff Layton <[email protected]>


2022-08-23 22:28:12

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] iversion: update comments with info about atime updates

On Tue, 23 Aug 2022, Jeff Layton wrote:
> On Tue, 2022-08-23 at 21:38 +1000, NeilBrown wrote:
> > On Tue, 23 Aug 2022, Jeff Layton wrote:
> > > So, we can refer to that and simply say:
> > >
> > > "If the function updates the mtime or ctime on the inode, then the
> > > i_version should be incremented. If only the atime is being updated,
> > > then the i_version should not be incremented. The exception to this rule
> > > is explicit atime updates via utimes() or similar mechanism, which
> > > should result in the i_version being incremented."
> >
> > Is that exception needed? utimes() updates ctime.
> >
> > https://man7.org/linux/man-pages/man2/utimes.2.html
> >
> > doesn't say that, but
> >
> > https://pubs.opengroup.org/onlinepubs/007904875/functions/utimes.html
> >
> > does, as does the code.
> >
>
> Oh, good point! I think we can leave that out. Even better!

Further, implicit mtime updates (file_update_time()) also update ctime.
So all you need is
If the function updates the ctime, then i_version should be
incremented.

and I have to ask - why not just use the ctime? Why have another number
that is parallel?

Timestamps are updated at HZ (ktime_get_course) which is at most every
millisecond.
xfs stores nanosecond resolution, so about 20 bits are currently wasted.
We could put a counter like i_version in there that only increments
after it is viewed, then we can get all the precision we need but with
exactly ctime semantics.

The 64 change-id could comprise
35 bits of seconds (nearly a millenium)
16 bits of sub-seconds (just in case a higher precision time was wanted
one day)
13 bits of counter. - 8192 changes per tick

The value exposed in i_ctime would hide the counter and just show the
timestamp portion of what the filesystem stores. This would ensure we
never get changes on different files that happen in one order leaving
timestamps with the reversed order (the timestamps could be the same,
but that is expected).

This scheme could be made to handle a sustained update rate of 1
increment every 8 nanoseconds (if the counter were allowed to overflow
into unused bits of the sub-second field). This is one ever 24 CPU
cycles. Incrementing a counter and making it visible to all CPUs can
probably be done in 24 cycles. Accessing it and setting the "seen" flag
as well might just fit with faster memory. Getting any other useful
work done while maintaining that rate on a single file seems unlikely.

NeilBrown

2022-08-24 12:55:38

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] iversion: update comments with info about atime updates

On Wed, 2022-08-24 at 08:24 +1000, NeilBrown wrote:
> On Tue, 23 Aug 2022, Jeff Layton wrote:
> > On Tue, 2022-08-23 at 21:38 +1000, NeilBrown wrote:
> > > On Tue, 23 Aug 2022, Jeff Layton wrote:
> > > > So, we can refer to that and simply say:
> > > >
> > > > "If the function updates the mtime or ctime on the inode, then the
> > > > i_version should be incremented. If only the atime is being updated,
> > > > then the i_version should not be incremented. The exception to this rule
> > > > is explicit atime updates via utimes() or similar mechanism, which
> > > > should result in the i_version being incremented."
> > >
> > > Is that exception needed? utimes() updates ctime.
> > >
> > > https://man7.org/linux/man-pages/man2/utimes.2.html
> > >
> > > doesn't say that, but
> > >
> > > https://pubs.opengroup.org/onlinepubs/007904875/functions/utimes.html
> > >
> > > does, as does the code.
> > >
> >
> > Oh, good point! I think we can leave that out. Even better!
>
> Further, implicit mtime updates (file_update_time()) also update ctime.
> So all you need is
> If the function updates the ctime, then i_version should be
> incremented.
>
> and I have to ask - why not just use the ctime? Why have another number
> that is parallel?
>
> Timestamps are updated at HZ (ktime_get_course) which is at most every
> millisecond.
> xfs stores nanosecond resolution, so about 20 bits are currently wasted.
> We could put a counter like i_version in there that only increments
> after it is viewed, then we can get all the precision we need but with
> exactly ctime semantics.
>
> The 64 change-id could comprise
> 35 bits of seconds (nearly a millenium)
> 16 bits of sub-seconds (just in case a higher precision time was wanted
> one day)
> 13 bits of counter. - 8192 changes per tick

We'd need a "seen" flag too, so maybe only 4096 changes per tick...

>
> The value exposed in i_ctime would hide the counter and just show the
> timestamp portion of what the filesystem stores. This would ensure we
> never get changes on different files that happen in one order leaving
> timestamps with the reversed order (the timestamps could be the same,
> but that is expected).
>
> This scheme could be made to handle a sustained update rate of 1
> increment every 8 nanoseconds (if the counter were allowed to overflow
> into unused bits of the sub-second field). This is one ever 24 CPU
> cycles. Incrementing a counter and making it visible to all CPUs can
> probably be done in 24 cycles. Accessing it and setting the "seen" flag
> as well might just fit with faster memory. Getting any other useful
> work done while maintaining that rate on a single file seems unlikely.

This is an interesting idea.

So, for NFSv4 you'd just mask off the counter bits (and "seen" flag) to
get the ctime, and for the change attribute we'd just mask off the
"seen" flag and put it all in there.

* Implementing that for all filesystems would be a huge project though.
If we were implementing the i_version counter from scratch, I'd
probably do something along these lines. Given that we already have
an existing i_version counter, would there be any real benefit to
pursuing this avenue instead?
--
Jeff Layton <[email protected]>