2023-09-28 17:49:39

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 86/87] fs: switch timespec64 fields in inode to discrete integers

On Thu, 2023-09-28 at 10:19 -0700, Darrick J. Wong wrote:
> On Thu, Sep 28, 2023 at 01:06:03PM -0400, Jeff Layton wrote:
> > On Thu, 2023-09-28 at 11:48 -0400, Arnd Bergmann wrote:
> > > On Thu, Sep 28, 2023, at 07:05, Jeff Layton wrote:
> > > > This shaves 8 bytes off struct inode, according to pahole.
> > > >
> > > > Signed-off-by: Jeff Layton <[email protected]>
> > >
> > > FWIW, this is similar to the approach that Deepa suggested
> > > back in 2016:
> > >
> > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > It was NaKed at the time because of the added complexity,
> > > though it would have been much easier to do it then,
> > > as we had to touch all the timespec references anyway.
> > >
> > > The approach still seems ok to me, but I'm not sure it's worth
> > > doing it now if we didn't do it then.
> > >
> >
> > I remember seeing those patches go by. I don't remember that change
> > being NaK'ed, but I wasn't paying close attention at the time
> >
> > Looking at it objectively now, I think it's worth it to recover 8 bytes
> > per inode and open a 4 byte hole that Amir can use to grow the
> > i_fsnotify_mask. We might even able to shave off another 12 bytes
> > eventually if we can move to a single 64-bit word per timestamp.
>
> I don't think you can, since btrfs timestamps utilize s64 seconds
> counting in both directions from the Unix epoch. They also support ns
> resolution:
>
> struct btrfs_timespec {
> __le64 sec;
> __le32 nsec;
> } __attribute__ ((__packed__));
>

Correct. We'd lose some fidelity in currently stored timestamps, but as
Linus and Ted pointed out, anything below ~100ns granularity is
effectively just noise, as that's the floor overhead for calling into
the kernel. It's hard to argue that any application needs that sort of
timestamp resolution, at least with contemporary hardware.

Doing that would mean that tests that store specific values in the
atime/mtime and expect to be able to fetch exactly that value back would
break though, so we'd have to be OK with that if we want to try it. The
good news is that it's relatively easy to experiment with new ways to
store timestamps with these wrappers in place.

--
Jeff Layton <[email protected]>


2023-09-29 00:30:36

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 86/87] fs: switch timespec64 fields in inode to discrete integers

On Thu, Sep 28, 2023 at 01:40:55PM -0400, Jeff Layton wrote:
>
> Correct. We'd lose some fidelity in currently stored timestamps, but as
> Linus and Ted pointed out, anything below ~100ns granularity is
> effectively just noise, as that's the floor overhead for calling into
> the kernel. It's hard to argue that any application needs that sort of
> timestamp resolution, at least with contemporary hardware.
>
> Doing that would mean that tests that store specific values in the
> atime/mtime and expect to be able to fetch exactly that value back would
> break though, so we'd have to be OK with that if we want to try it. The
> good news is that it's relatively easy to experiment with new ways to
> store timestamps with these wrappers in place.

The reason why we store 1ns granularity in ext4's on-disk format (and
accept that we only support times only a couple of centuries into the
future, as opposed shooting for an on-disk format good for several
millennia :-), was in case there was userspace that might try to store
a very fine-grained timestamp and want to be able to get it back
bit-for-bit identical.

For example, what if someone was trying to implement some kind of
steganographic scheme where they going store a secret message (or more
likely, a 256-bit AES key) in the nanosecond fields of the file's
{c,m,a,cr}time timestamps, "hiding in plain sight". Not that I think
that we have to support something like that, since the field is for
*timestamps* not cryptographic bits, so if we break someone who is
doing that, do we care?

I don't think anyone will complain about breaking the userspace API
--- especially since if, say, the CIA was using this for their spies'
drop boxes, they probably wouldn't want to admit it. :-)

- Ted

2023-09-29 01:27:50

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 86/87] fs: switch timespec64 fields in inode to discrete integers

On Thu, Sep 28, 2023, at 13:40, Jeff Layton wrote:
> On Thu, 2023-09-28 at 10:19 -0700, Darrick J. Wong wrote:
>>
>> > I remember seeing those patches go by. I don't remember that change
>> > being NaK'ed, but I wasn't paying close attention at the time
>> >
>> > Looking at it objectively now, I think it's worth it to recover 8 bytes
>> > per inode and open a 4 byte hole that Amir can use to grow the
>> > i_fsnotify_mask. We might even able to shave off another 12 bytes
>> > eventually if we can move to a single 64-bit word per timestamp.
>>
>> I don't think you can, since btrfs timestamps utilize s64 seconds
>> counting in both directions from the Unix epoch. They also support ns
>> resolution:
>>
>> struct btrfs_timespec {
>> __le64 sec;
>> __le32 nsec;
>> } __attribute__ ((__packed__));
>>
>
> Correct. We'd lose some fidelity in currently stored timestamps, but as
> Linus and Ted pointed out, anything below ~100ns granularity is
> effectively just noise, as that's the floor overhead for calling into
> the kernel. It's hard to argue that any application needs that sort of
> timestamp resolution, at least with contemporary hardware.

There are probably applications that have come up with creative
ways to use the timestamp fields of file systems that 94 bits
of data, with both the MSB of the seconds and the LSB of the
nanoseconds carrying information that they expect to be preserved.

Dropping any information in the nanoseconds other than the top two
bits would trivially change the 'ls -t' output when two files have
the same timestamp in one kernel but slightly different timestamps
in another one. For large values of 'tv_sec', there are fewer
obvious things that break, but if current kernels are able to
retrieve arbitrary times that were stored with utimensat(), then we
should probably make sure future kernels can see the same.

Arnd

2023-09-29 13:37:08

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 86/87] fs: switch timespec64 fields in inode to discrete integers


Jeff Layton <[email protected]> wrote:

> Correct. We'd lose some fidelity in currently stored timestamps, but as
> Linus and Ted pointed out, anything below ~100ns granularity is
> effectively just noise, as that's the floor overhead for calling into
> the kernel. It's hard to argue that any application needs that sort of
> timestamp resolution, at least with contemporary hardware.

Albeit with the danger of making Steve French very happy;-), would it make
sense to switch internally to Microsoft-style 64-bit timestamps with their
100ns granularity?

David