2014-05-30 20:01:24

by Arnd Bergmann

[permalink] [raw]
Subject: [RFC 00/32] making inode time stamps y2038 ready

Based on the recent discussion about 64-bit time_t for new
architectures, and for solving the year 2038 problem in general,
I decided to try out what it would take to solve part of the
kernel side of things.

This is a proof-of-concept work to get us to the point where
two system calls (utimes and stat) provide a working interface
to user space to pass 64-bit inode time stamps in and out of
the kernel all the way to the file systems.

I picked this because it is a fairly isolated problem, as the
inode time stamps are rarely assigned to any other time values.
As a byproduct of this work, I documented for each of the file
systems we support how long the on-disk format can work[1].

Obviously we also need to convert all the other syscalls and
have a proper libc implementation using those for this to
be really useful, but it's a start and it can be tested
independently (I didn't so far, want to wait for initial
feedback).

All the interesting stuff is in the first five patches here,
the rest is the straightforward conversion of all file systems
that use 'timespec' values internally.

There are of course a number of open questions:

a) is this the right approach in general? The previous discussion
pointed this way, but there may be other opinions.
b) what type should we use internally to represent inode time
stamps? The code contains three different versions that would
all work, we just have to pick a good tradeoff between
efficiency and the range of times we want to cover.
c) Should we continue this way for all 32-bit platforms for
consistency, including future ones, or should we go to
different 64-bit types right away? My feeling is that the
second approach would complicate this work.

Arnd

[1] http://kernelnewbies.org/y2038

Arnd Bergmann (32):
fs: introduce new 'struct inode_time'
uapi: add struct __kernel_timespec{32,64}
fs: introduce sys_utimens64at
fs: introduce sys_newfstat64/sys_newfstatat64
arch: hook up new stat and utimes syscalls
isofs: fix timestamps beyond 2027
fs/nfs: convert to struct inode_time
fs/ceph: convert to 'struct inode_time'
fs/pstore: convert to struct inode_time
fs/coda: convert to struct inode_time
xfs: convert to struct inode_time
btrfs: convert to struct inode_time
ext3: convert to struct inode_time
ext4: convert to struct inode_time
cifs: convert to struct inode_time
ntfs: convert to struct inode_time
ubifs: convert to struct inode_time
ocfs2: convert to struct inode_time
fs/fat: convert to struct inode_time
afs: convert to struct inode_time
udf: convert to struct inode_time
fs: convert simple fs to inode_time
logfs: convert to struct inode_time
hfs, hfsplus: convert to struct inode_time
gfs2: convert to struct inode_time
reiserfs: convert to struct inode_time
jffs2: convert to struct inode_time
adfs: convert to struct inode_time
f2fs: convert to struct inode_time
fuse: convert to struct inode_time
scsi: fnic: use current_kernel_time() for timestamp
fs: use new inode_time definition unconditionally

arch/alpha/kernel/osf_sys.c | 2 +-
arch/arm/include/asm/unistd.h | 2 +-
arch/arm/include/uapi/asm/stat.h | 25 +++++++++++++++++
arch/arm/include/uapi/asm/unistd.h | 3 +++
arch/arm/kernel/calls.S | 3 +++
arch/arm64/include/asm/unistd32.h | 5 +++-
arch/x86/include/uapi/asm/stat.h | 28 +++++++++++++++++++
arch/x86/syscalls/syscall_32.tbl | 3 +++
drivers/block/rbd.c | 2 +-
drivers/firmware/efi/efi-pstore.c | 28 +++++++++----------
drivers/scsi/fnic/fnic_trace.c | 2 +-
drivers/tty/tty_io.c | 2 +-
drivers/usb/gadget/f_fs.c | 2 +-
fs/adfs/inode.c | 4 +--
fs/afs/afs.h | 6 ++---
fs/afs/fsclient.c | 2 +-
fs/attr.c | 8 +++---
fs/btrfs/file.c | 6 ++---
fs/btrfs/inode.c | 4 +--
fs/btrfs/ioctl.c | 4 +--
fs/btrfs/root-tree.c | 2 +-
fs/btrfs/transaction.c | 2 +-
fs/ceph/cache.c | 2 +-
fs/ceph/caps.c | 6 ++---
fs/ceph/file.c | 4 +--
fs/ceph/inode.c | 20 +++++++-------
fs/ceph/super.h | 8 +++---
fs/cifs/cache.c | 6 ++---
fs/cifs/cifsglob.h | 6 ++---
fs/cifs/cifsproto.h | 6 ++---
fs/cifs/cifssmb.c | 5 ++--
fs/cifs/inode.c | 2 +-
fs/cifs/netmisc.c | 15 ++++++-----
fs/coda/coda_linux.c | 18 ++++++++-----
fs/compat.c | 19 ++-----------
fs/configfs/inode.c | 6 ++---
fs/cramfs/inode.c | 2 +-
fs/ext3/inode.c | 4 +--
fs/ext4/ext4.h | 10 +++----
fs/ext4/extents.c | 2 +-
fs/f2fs/file.c | 6 ++---
fs/fat/dir.c | 2 +-
fs/fat/fat.h | 6 ++---
fs/fat/misc.c | 4 +--
fs/fat/namei_msdos.c | 8 +++---
fs/fat/namei_vfat.c | 10 +++----
fs/fuse/inode.c | 6 ++---
fs/gfs2/dir.c | 6 ++---
fs/gfs2/glops.c | 4 +--
fs/hfs/hfs_fs.h | 2 +-
fs/hfsplus/hfsplus_fs.h | 2 +-
fs/inode.c | 18 ++++++-------
fs/isofs/util.c | 2 +-
fs/jffs2/os-linux.h | 2 +-
fs/locks.c | 4 +--
fs/logfs/readwrite.c | 18 ++++++-------
fs/nfs/callback.h | 4 +--
fs/nfs/callback_xdr.c | 6 ++---
fs/nfs/file.c | 2 +-
fs/nfs/fscache-index.c | 8 +++---
fs/nfs/inode.c | 10 +++----
fs/nfs/internal.h | 4 +--
fs/nfs/netns.h | 2 +-
fs/nfs/nfs2xdr.c | 8 +++---
fs/nfs/nfs3xdr.c | 10 +++----
fs/nfs/nfs4xdr.c | 20 +++++++-------
fs/nfsd/nfs3xdr.c | 6 ++---
fs/nfsd/nfsfh.h | 4 +--
fs/nfsd/nfsxdr.c | 2 +-
fs/ntfs/inode.c | 12 ++++-----
fs/ntfs/time.h | 8 +++---
fs/ocfs2/dlmglue.c | 16 +++++------
fs/ocfs2/file.c | 6 ++---
fs/ocfs2/ocfs2.h | 2 +-
fs/pstore/inode.c | 2 +-
fs/pstore/internal.h | 2 +-
fs/pstore/platform.c | 2 +-
fs/pstore/ram.c | 18 +++++++------
fs/reiserfs/namei.c | 2 +-
fs/reiserfs/xattr.c | 4 +--
fs/stat.c | 55 ++++++++++++++++++++++++++++++++++++++
fs/ubifs/dir.c | 2 +-
fs/ubifs/file.c | 16 +++++------
fs/ubifs/misc.h | 2 +-
fs/udf/udf_i.h | 2 +-
fs/udf/udf_sb.h | 2 +-
fs/udf/udfdecl.h | 7 ++---
fs/udf/udftime.c | 7 ++---
fs/utimes.c | 47 +++++++++++++++++++++++++++-----
fs/xfs/time.h | 4 +--
fs/xfs/xfs_inode.c | 2 +-
fs/xfs/xfs_iops.c | 2 +-
fs/xfs/xfs_trans_inode.c | 6 ++---
include/linux/ceph/decode.h | 8 +++---
include/linux/ceph/osd_client.h | 4 +--
include/linux/compat.h | 2 +-
include/linux/fs.h | 32 +++++++++++-----------
include/linux/nfs_fs_sb.h | 2 +-
include/linux/nfs_xdr.h | 14 +++++-----
include/linux/pstore.h | 4 +--
include/linux/stat.h | 6 ++---
include/linux/syscalls.h | 9 ++++++-
include/linux/time.h | 44 +++++++++++++++++++++++++++---
include/uapi/asm-generic/stat.h | 29 ++++++++++++++++++--
include/uapi/asm-generic/unistd.h | 8 +++++-
include/uapi/linux/coda.h | 1 +
include/uapi/linux/time.h | 40 ++++++++++++++++++++++++++-
init/initramfs.c | 2 +-
kernel/audit.c | 2 +-
kernel/auditsc.c | 2 +-
kernel/time.c | 44 +++++++++++++++++++++++++-----
kernel/time/timekeeping.c | 16 +++++++++++
net/ceph/auth_x.c | 2 +-
net/ceph/osd_client.c | 4 +--
114 files changed, 642 insertions(+), 333 deletions(-)

--
1.8.3.2

Bcc: "J. Bruce Fields" <[email protected]>
Bcc: "Theodore Ts'o" <[email protected]>
Bcc: Adrian Hunter <[email protected]>
Bcc: Andreas Dilger <[email protected]>
Bcc: Andrew Morton <[email protected]>
Bcc: Anton Altaparmakov <[email protected]>
Bcc: Anton Vorontsov <[email protected]>
Bcc: Artem Bityutskiy <[email protected]>
Bcc: Brian Uchino <[email protected]>
Bcc: Chris Mason <[email protected]>
Bcc: Colin Cross <[email protected]>
Bcc: Dave Chinner <[email protected]>
Bcc: David Howells <[email protected]>
Bcc: David Woodhouse <[email protected]>
Bcc: Greg Kroah-Hartman <[email protected]>
Bcc: Hiral Patel <[email protected]>
Bcc: Jaegeuk Kim <[email protected]>
Bcc: Jan Harkes <[email protected]>
Bcc: Jan Kara <[email protected]>
Bcc: Joel Becker <[email protected]>
Bcc: Joern Engel <[email protected]>
Bcc: Josef Bacik <[email protected]>
Bcc: Kees Cook <[email protected]>
Bcc: Mark Fasheh <[email protected]>
Bcc: Miklos Szeredi <[email protected]>
Bcc: OGAWA Hirofumi <[email protected]>
Bcc: Prasad Joshi <[email protected]>
Bcc: Sage Weil <[email protected]>
Bcc: Steve French <[email protected]>
Bcc: Steven Whitehouse <[email protected]>
Bcc: Suma Ramars <[email protected]>
Bcc: Tony Luck <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]


2014-05-30 20:01:38

by Arnd Bergmann

[permalink] [raw]
Subject: [RFC 14/32] ext4: convert to struct inode_time

ext4fs uses unsigned 34-bit seconds for inode timestamps, which will work
for the next 500 years, but the VFS uses struct timespec for timestamps,
which is only good until 2038 on 32-bit CPUs.

This gets us one small step closer to lifting the VFS limit by using
struct inode_time in ext4.

Signed-off-by: Arnd Bergmann <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: [email protected]
---
fs/ext4/ext4.h | 10 +++++-----
fs/ext4/extents.c | 2 +-
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 92d9f1a..b60adc9 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -726,14 +726,14 @@ struct move_extent {
<= (EXT4_GOOD_OLD_INODE_SIZE + \
(einode)->i_extra_isize)) \

-static inline __le32 ext4_encode_extra_time(struct timespec *time)
+static inline __le32 ext4_encode_extra_time(struct inode_time *time)
{
return cpu_to_le32((sizeof(time->tv_sec) > 4 ?
(time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) |
((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK));
}

-static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra)
+static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra)
{
if (sizeof(time->tv_sec) > 4)
time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
@@ -879,9 +879,9 @@ struct ext4_inode_info {

/*
* File creation time. Its function is same as that of
- * struct timespec i_{a,c,m}time in the generic inode.
+ * struct inode_time i_{a,c,m}time in the generic inode.
*/
- struct timespec i_crtime;
+ struct inode_time i_crtime;

/* mballoc */
struct list_head i_prealloc_list;
@@ -1354,7 +1354,7 @@ static inline struct ext4_inode_info *EXT4_I(struct inode *inode)
return container_of(inode, struct ext4_inode_info, vfs_inode);
}

-static inline struct timespec ext4_current_time(struct inode *inode)
+static inline struct inode_time ext4_current_time(struct inode *inode)
{
return (inode->i_sb->s_time_gran < NSEC_PER_SEC) ?
current_fs_time(inode->i_sb) : CURRENT_TIME_SEC;
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ee14768..ed11d79 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4894,7 +4894,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
int ret = 0;
int flags;
ext4_lblk_t lblk;
- struct timespec tv;
+ struct inode_time tv;
unsigned int blkbits = inode->i_blkbits;

/* Return error if mode is not supported */
--
1.8.3.2

2014-05-30 20:01:37

by Arnd Bergmann

[permalink] [raw]
Subject: [RFC 13/32] ext3: convert to struct inode_time

ext3fs uses unsigned 32-bit seconds for inode timestamps, which will work
for the next 92 years, but the VFS uses struct timespec for timestamps,
which is only good until 2038 on 32-bit CPUs.

This gets us one small step closer to lifting the VFS limit by using
struct inode_time in ext3. The on-disk format limit is lifted in ext4,
which will work until 2514.

Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: [email protected]
---
fs/ext3/inode.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 4d32133..8b76f80 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -752,7 +752,7 @@ static int ext3_splice_branch(handle_t *handle, struct inode *inode,
struct ext3_block_alloc_info *block_i;
ext3_fsblk_t current_block;
struct ext3_inode_info *ei = EXT3_I(inode);
- struct timespec now;
+ struct inode_time now;

block_i = ei->i_block_alloc_info;
/*
@@ -793,7 +793,7 @@ static int ext3_splice_branch(handle_t *handle, struct inode *inode,

/* We are done with atomic stuff, now do the rest of housekeeping */
now = CURRENT_TIME_SEC;
- if (!timespec_equal(&inode->i_ctime, &now) || !where->bh) {
+ if (!inode_time_equal(&inode->i_ctime, &now) || !where->bh) {
inode->i_ctime = now;
ext3_mark_inode_dirty(handle, inode);
}
--
1.8.3.2

2014-05-31 09:10:45

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC 13/32] ext3: convert to struct inode_time

On 05/30/2014 01:01 PM, Arnd Bergmann wrote:
> ext3fs uses unsigned 32-bit seconds for inode timestamps, which will work
> for the next 92 years, but the VFS uses struct timespec for timestamps,
> which is only good until 2038 on 32-bit CPUs.
>
> This gets us one small step closer to lifting the VFS limit by using
> struct inode_time in ext3. The on-disk format limit is lifted in ext4,
> which will work until 2514.
>

This may be what the spec says, but when I experimented with this just
now it does seem that both ext2 and ext3 actually interpret timestamps
as *signed* 32-bit seconds.

-hpa


2014-05-31 14:30:49

by Viacheslav Dubeyko

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

Hi Arnd,

On Fri, 2014-05-30 at 22:01 +0200, Arnd Bergmann wrote:

[snip]
>
> Arnd Bergmann (32):
> fs: introduce new 'struct inode_time'
> uapi: add struct __kernel_timespec{32,64}
> fs: introduce sys_utimens64at
> fs: introduce sys_newfstat64/sys_newfstatat64
> arch: hook up new stat and utimes syscalls
> isofs: fix timestamps beyond 2027
> fs/nfs: convert to struct inode_time
> fs/ceph: convert to 'struct inode_time'
> fs/pstore: convert to struct inode_time
> fs/coda: convert to struct inode_time
> xfs: convert to struct inode_time
> btrfs: convert to struct inode_time
> ext3: convert to struct inode_time
> ext4: convert to struct inode_time
> cifs: convert to struct inode_time
> ntfs: convert to struct inode_time
> ubifs: convert to struct inode_time
> ocfs2: convert to struct inode_time
> fs/fat: convert to struct inode_time
> afs: convert to struct inode_time
> udf: convert to struct inode_time
> fs: convert simple fs to inode_time
> logfs: convert to struct inode_time
> hfs, hfsplus: convert to struct inode_time
> gfs2: convert to struct inode_time
> reiserfs: convert to struct inode_time
> jffs2: convert to struct inode_time
> adfs: convert to struct inode_time
> f2fs: convert to struct inode_time
> fuse: convert to struct inode_time
> scsi: fnic: use current_kernel_time() for timestamp
> fs: use new inode_time definition unconditionally
>

By the way, what about NILFS2? Is NILFS2 ready for suggested approach
without any changes?

Thanks,
Vyacheslav Dubeyko.



2014-05-31 14:32:58

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 13/32] ext3: convert to struct inode_time

On Saturday 31 May 2014 02:10:45 H. Peter Anvin wrote:
> On 05/30/2014 01:01 PM, Arnd Bergmann wrote:
> > ext3fs uses unsigned 32-bit seconds for inode timestamps, which will work
> > for the next 92 years, but the VFS uses struct timespec for timestamps,
> > which is only good until 2038 on 32-bit CPUs.
> >
> > This gets us one small step closer to lifting the VFS limit by using
> > struct inode_time in ext3. The on-disk format limit is lifted in ext4,
> > which will work until 2514.
> >
>
> This may be what the spec says, but when I experimented with this just
> now it does seem that both ext2 and ext3 actually interpret timestamps
> as *signed* 32-bit seconds.

Right, I can see that in ext3_iget() now:

inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime);

I may have just looked at ext3_do_update_inode(), which uses this
unsigned conversion:

raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec);

and didn't realize that this is only half of the story, and since it
converts from (potentially 64-bit) long to u32, it doesn't matter
whether that is signed or unsigned.

I may have to go through all of them again to see if I made the same
mistake in other file systems as well.

Arnd

2014-05-31 14:51:15

by Richard Cochran

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote:
>
> I picked this because it is a fairly isolated problem, as the
> inode time stamps are rarely assigned to any other time values.
> As a byproduct of this work, I documented for each of the file
> systems we support how long the on-disk format can work[1].

Why are some of the time stamp expiration dates marked as "never"?

Thanks,
Richard

2014-05-31 15:23:02

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Saturday 31 May 2014 16:51:15 Richard Cochran wrote:
> On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote:
> >
> > I picked this because it is a fairly isolated problem, as the
> > inode time stamps are rarely assigned to any other time values.
> > As a byproduct of this work, I documented for each of the file
> > systems we support how long the on-disk format can work[1].
>
> Why are some of the time stamp expiration dates marked as "never"?

It's an approximation:
with 64-bit timestamps, you can represent close to 300 billion
years, which is way past the time that our planet can sustain
life of any form[1].

Arnd

[1] http://en.wikipedia.org/wiki/Timeline_of_the_far_future

2014-05-31 16:20:43

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Sat, May 31, 2014 at 5:23 PM, Arnd Bergmann <[email protected]> wrote:
> On Saturday 31 May 2014 16:51:15 Richard Cochran wrote:
>> On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote:
>> > I picked this because it is a fairly isolated problem, as the
>> > inode time stamps are rarely assigned to any other time values.
>> > As a byproduct of this work, I documented for each of the file
>> > systems we support how long the on-disk format can work[1].
>>
>> Why are some of the time stamp expiration dates marked as "never"?
>
> It's an approximation:
> with 64-bit timestamps, you can represent close to 300 billion
> years, which is way past the time that our planet can sustain
> life of any form[1].

FWIW, the 48-bit second limit of befs marked never happens sooner
than the 32-bit day limit of affs marked as Y11760870.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2014-05-31 18:22:37

by Richard Cochran

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote:
>
> It's an approximation:

(Approximately never ;)

> with 64-bit timestamps, you can represent close to 300 billion
> years, which is way past the time that our planet can sustain
> life of any form[1].

Did you mean mean 64 bits worth of seconds?

2^64 / (3600*24*365) = 584,942,417,355

That is more than 300 billion years, and still, it is not quite the
same as "never".

In any case, that term is not too helpful in the comparison table,
IMHO. One could think that some sort of clever running count relative
to the last mount time was implied.

Thanks,
Richard

[1] You are forgetting the immortal robotic overlords.

2014-05-31 19:34:12

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

Typically they are using 64-bit signed seconds.

On May 31, 2014 11:22:37 AM PDT, Richard Cochran <[email protected]> wrote:
>On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote:
>>
>> It's an approximation:
>
>(Approximately never ;)
>
>> with 64-bit timestamps, you can represent close to 300 billion
>> years, which is way past the time that our planet can sustain
>> life of any form[1].
>
>Did you mean mean 64 bits worth of seconds?
>
> 2^64 / (3600*24*365) = 584,942,417,355
>
>That is more than 300 billion years, and still, it is not quite the
>same as "never".
>
>In any case, that term is not too helpful in the comparison table,
>IMHO. One could think that some sort of clever running count relative
>to the last mount time was implied.
>
>Thanks,
>Richard
>
>[1] You are forgetting the immortal robotic overlords.

--
Sent from my mobile phone. Please pardon brevity and lack of formatting.

2014-06-01 04:44:36

by Richard Cochran

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote:
> On Saturday 31 May 2014 16:51:15 Richard Cochran wrote:
> >
> > Why are some of the time stamp expiration dates marked as "never"?
>
> It's an approximation:

Also, the term "never" might mean using arbitrarily long integers
as in ASN.1.

Thanks,
Richard

2014-06-01 04:46:09

by Richard Cochran

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Sat, May 31, 2014 at 12:34:12PM -0700, H. Peter Anvin wrote:
> Typically they are using 64-bit signed seconds.

Okay, that is what I wanted to know.

Thanks,
Richard

2014-06-02 13:52:19

by Joseph Myers

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Fri, 30 May 2014, Arnd Bergmann wrote:

> a) is this the right approach in general? The previous discussion
> pointed this way, but there may be other opinions.

The syscall changes seem like the sort of thing I'd expect, although
patches adding new syscalls or otherwise affecting the kernel/userspace
interface (as opposed to those relating to an individual filesystem)
should go to linux-api as well as other relevant lists.

--
Joseph S. Myers
[email protected]

2014-06-02 19:19:55

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
> On Fri, 30 May 2014, Arnd Bergmann wrote:
>
> > a) is this the right approach in general? The previous discussion
> > pointed this way, but there may be other opinions.
>
> The syscall changes seem like the sort of thing I'd expect, although
> patches adding new syscalls or otherwise affecting the kernel/userspace
> interface (as opposed to those relating to an individual filesystem)
> should go to linux-api as well as other relevant lists.

Ok. Sorry about missing linux-api, I confused it with linux-arch, which
may not be as relevant here, except for the one question whether we
actually want to have the new ABI on all 32-bit architectures or only
as an opt-in for those that expect to stay around for another 24 years.

Two more questions for you:

- are you (and others) happy with adding this type of stat syscall
(fstatat64/fstat64) as opposed to the more generic xstat that has
been discussed in the past and that never made it through the bike-
shedding discussion?

- once we have enough buy-in from reviewers to merge this initial
series, should we proceed to define rest of the syscall ABI
(minus driver ioctls) so glibc and kernel can do the conversion
on top of that, or should we better try to do things one syscall
family at a time and actually get the kernel to handle them
correctly internally?

Arnd

2014-06-02 19:26:22

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On 06/02/2014 12:19 PM, Arnd Bergmann wrote:
> On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
>> On Fri, 30 May 2014, Arnd Bergmann wrote:
>>
>>> a) is this the right approach in general? The previous discussion
>>> pointed this way, but there may be other opinions.
>>
>> The syscall changes seem like the sort of thing I'd expect, although
>> patches adding new syscalls or otherwise affecting the kernel/userspace
>> interface (as opposed to those relating to an individual filesystem)
>> should go to linux-api as well as other relevant lists.
>
> Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> may not be as relevant here, except for the one question whether we
> actually want to have the new ABI on all 32-bit architectures or only
> as an opt-in for those that expect to stay around for another 24 years.
>
> Two more questions for you:
>
> - are you (and others) happy with adding this type of stat syscall
> (fstatat64/fstat64) as opposed to the more generic xstat that has
> been discussed in the past and that never made it through the bike-
> shedding discussion?
>
> - once we have enough buy-in from reviewers to merge this initial
> series, should we proceed to define rest of the syscall ABI
> (minus driver ioctls) so glibc and kernel can do the conversion
> on top of that, or should we better try to do things one syscall
> family at a time and actually get the kernel to handle them
> correctly internally?
>

The bit that is really going to hurt is every single ioctl that uses a
timespec.

Honestly, though, I really don't understand the point with "struct
inode_time". It seems like the zeroeth-order thing is to change the
kernel internal version of struct timespec to have a 64-bit time... it
isn't just about inodes. We then should be explicit about the external
uses of time, and use accessors.

-hpa



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech

2014-06-02 19:55:52

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Monday 02 June 2014 12:26:22 H. Peter Anvin wrote:
> On 06/02/2014 12:19 PM, Arnd Bergmann wrote:
> > On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
> >> On Fri, 30 May 2014, Arnd Bergmann wrote:
> >>
> >>> a) is this the right approach in general? The previous discussion
> >>> pointed this way, but there may be other opinions.
> >>
> >> The syscall changes seem like the sort of thing I'd expect, although
> >> patches adding new syscalls or otherwise affecting the kernel/userspace
> >> interface (as opposed to those relating to an individual filesystem)
> >> should go to linux-api as well as other relevant lists.
> >
> > Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> > may not be as relevant here, except for the one question whether we
> > actually want to have the new ABI on all 32-bit architectures or only
> > as an opt-in for those that expect to stay around for another 24 years.
> >
> > Two more questions for you:
> >
> > - are you (and others) happy with adding this type of stat syscall
> > (fstatat64/fstat64) as opposed to the more generic xstat that has
> > been discussed in the past and that never made it through the bike-
> > shedding discussion?
> >
> > - once we have enough buy-in from reviewers to merge this initial
> > series, should we proceed to define rest of the syscall ABI
> > (minus driver ioctls) so glibc and kernel can do the conversion
> > on top of that, or should we better try to do things one syscall
> > family at a time and actually get the kernel to handle them
> > correctly internally?
> >
>
> The bit that is really going to hurt is every single ioctl that uses a
> timespec.
>
> Honestly, though, I really don't understand the point with "struct
> inode_time". It seems like the zeroeth-order thing is to change the
> kernel internal version of struct timespec to have a 64-bit time... it
> isn't just about inodes. We then should be explicit about the external
> uses of time, and use accessors.

I picked these because they are fairly isolated from all other uses,
in particular since inode times are the only things where we really
care about times in the distant past or future (decades away as opposed
to things that happened between boot and shutdown).

For other kernel-internal uses, we may be better off migrating to
a completely different representation, such as nanoseconds since
boot or the architecture specific ktime_t, but this is really something
to decide for each subsystem.

I just tried building an arm32 kernel with a s64 time_t, and that
failed horribly, I get linker errors for missing 64-bit divides
and lots of warnings for code that expects time_t pointers to
functions taking a 'long' or vice versa. I also think the only
way to maintain ABI compatibility is to separate the internal uses
from the interface, which means auditing all code in the end.

Arnd

2014-06-02 21:02:15

by Joseph Myers

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Mon, 2 Jun 2014, Arnd Bergmann wrote:

> Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> may not be as relevant here, except for the one question whether we
> actually want to have the new ABI on all 32-bit architectures or only
> as an opt-in for those that expect to stay around for another 24 years.

For glibc I think it will make the most sense to add the support for
64-bit time_t across all architectures that currently have 32-bit time_t
(with the new interfaces having fallback support to implementation in
terms of the 32-bit kernel interfaces, if the 64-bit syscalls are
unavailable either at runtime or in the kernel headers against which glibc
is compiled - this fallback code will of course need to check for overflow
when passing a time value to the kernel, hopefully with error handling
consistent with whatever the kernel ends up doing when a filesystem can't
support a timestamp). If some architectures don't provide the new
interfaces in the kernel then that will mean the fallback code in glibc
can't be removed until glibc support for those architectures is removed
(as opposed to removing it when glibc no longer supports kernels predating
the kernel support).

> Two more questions for you:
>
> - are you (and others) happy with adding this type of stat syscall
> (fstatat64/fstat64) as opposed to the more generic xstat that has
> been discussed in the past and that never made it through the bike-
> shedding discussion?

I am.

> - once we have enough buy-in from reviewers to merge this initial
> series, should we proceed to define rest of the syscall ABI
> (minus driver ioctls) so glibc and kernel can do the conversion
> on top of that, or should we better try to do things one syscall
> family at a time and actually get the kernel to handle them
> correctly internally?

I don't have any comments on that ordering question.

--
Joseph S. Myers
[email protected]

2014-06-02 21:57:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
>>
>> The bit that is really going to hurt is every single ioctl that uses a
>> timespec.
>>
>> Honestly, though, I really don't understand the point with "struct
>> inode_time". It seems like the zeroeth-order thing is to change the
>> kernel internal version of struct timespec to have a 64-bit time... it
>> isn't just about inodes. We then should be explicit about the external
>> uses of time, and use accessors.
>
> I picked these because they are fairly isolated from all other uses,
> in particular since inode times are the only things where we really
> care about times in the distant past or future (decades away as opposed
> to things that happened between boot and shutdown).
>

If nothing else, I would expect to be able to set the system time to
weird values for testing. So I'm not so sure I agree with that...

> For other kernel-internal uses, we may be better off migrating to
> a completely different representation, such as nanoseconds since
> boot or the architecture specific ktime_t, but this is really something
> to decide for each subsystem.

Having a bunch of different time representations in the kernel seems
like a real headache...

-hpa

2014-06-03 12:21:29

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Saturday 31 May 2014 18:30:49 Vyacheslav Dubeyko wrote:
> By the way, what about NILFS2? Is NILFS2 ready for suggested approach
> without any changes?

nilfs2 and a lot of other file systems don't need any changes for
this, because they don't assign the inode time stamp fields to
a 'struct timespec'.

FWIW, nilfs2 uses a 64-bit seconds value, which is always safe and
can represent the full range of user space timespec on all machines.

Arnd

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech

2014-06-03 14:22:19

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> >>
> >> The bit that is really going to hurt is every single ioctl that uses a
> >> timespec.
> >>
> >> Honestly, though, I really don't understand the point with "struct
> >> inode_time". It seems like the zeroeth-order thing is to change the
> >> kernel internal version of struct timespec to have a 64-bit time... it
> >> isn't just about inodes. We then should be explicit about the external
> >> uses of time, and use accessors.
> >
> > I picked these because they are fairly isolated from all other uses,
> > in particular since inode times are the only things where we really
> > care about times in the distant past or future (decades away as opposed
> > to things that happened between boot and shutdown).
> >
>
> If nothing else, I would expect to be able to set the system time to
> weird values for testing. So I'm not so sure I agree with that...

I think John Stultz and Thomas Gleixner have already started looking
at how the timekeeping code can be updated. Once that is done, we should
be able to add a functional 64-bit gettimeofday/settimeofday syscall
pair. While I definitely agree this is one of the most basic things to
have, it's also not an area of the kernel that is easy to change.

> > For other kernel-internal uses, we may be better off migrating to
> > a completely different representation, such as nanoseconds since
> > boot or the architecture specific ktime_t, but this is really something
> > to decide for each subsystem.
>
> Having a bunch of different time representations in the kernel seems
> like a real headache...

We already have time_t, ktime_t, timeval, timespec, compat_timespec,
clock_t, cputime_t, cputime64_t, tm, nanoseconds, jiffies, jiffies64,
and lots of driver or file system specific representations. I'm all for
removing a bunch of these from the kernel, but my feeling is that this is
one of the cases where we first have to add new ones in order to remove
those that are already there.
To complicate things further, we also have various times bases
(realtime/utc, realtime/tai, monotonic, monotonic_raw, boottime, ...),
and at least for the timespec values we pass around, it's not always
obvious which one is used, of if that's the right one.

We probably don't want to add a lot of new representations, and it's
possible that we can change most of the internal code we have to
ktime_t and then convert that to whatever user space wants at the
interfaces.

The possible uses I can see for non-ktime_t types in the kernel are:
* inodes need 96 bit timestamps to represent the full range of values
that can be stored in a file system, you made a convincing argument
for that. Almost everything else can fit into 64 bit on a 32-bit
kernel, in theory also on a 64-bit kernel if we want that.
* A number of interfaces pass relative timespecs: nanosleep(), poll(),
select(), sigtimedwait(), alarm(), futex() and probably more. There is
nothing wrong with the use of timespec here, and it may be good to
annotate that by using a new type (e.g. struct timeout) that is defined
as compatible with the current timespec.
* For new user interfaces, we need a new type such as the
__kernel_timespec64 I introduced, so it doesn't clash with the normal
user timespec that may be smaller, depending on the libc.
* A lot of drivers will need new ioctl commands, and for drivers that
just need time stamps (audio, v4l, sockets, ...) it may be more
efficient and more correct to use a new timestamp_t (e.g. boot time
64-bit nanoseconds) than __kernel_timespec64, which is not normally
monotonic and requires a normalization step. If we end up introducing
such a type in the user interface, we can also start using it in the
kernel.

Arnd

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech

2014-06-03 14:33:10

by Joseph Myers

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Tue, 3 Jun 2014, Arnd Bergmann wrote:

> I think John Stultz and Thomas Gleixner have already started looking
> at how the timekeeping code can be updated. Once that is done, we should
> be able to add a functional 64-bit gettimeofday/settimeofday syscall
> pair. While I definitely agree this is one of the most basic things to
> have, it's also not an area of the kernel that is easy to change.

64-bit clock_gettime / clock_settime instead of gettimeofday /
settimeofday should avoid the need for the kernel to have a 64-bit version
of struct timeval. (Userspace 64-bit gettimeofday / settimeofday would
need to use a combination of the syscalls if the tz pointer is non-NULL.)

--
Joseph S. Myers
[email protected]

2014-06-03 14:37:46

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Tuesday 03 June 2014 14:33:10 Joseph S. Myers wrote:
> On Tue, 3 Jun 2014, Arnd Bergmann wrote:
>
> > I think John Stultz and Thomas Gleixner have already started looking
> > at how the timekeeping code can be updated. Once that is done, we should
> > be able to add a functional 64-bit gettimeofday/settimeofday syscall
> > pair. While I definitely agree this is one of the most basic things to
> > have, it's also not an area of the kernel that is easy to change.
>
> 64-bit clock_gettime / clock_settime instead of gettimeofday /
> settimeofday should avoid the need for the kernel to have a 64-bit version
> of struct timeval. (Userspace 64-bit gettimeofday / settimeofday would
> need to use a combination of the syscalls if the tz pointer is non-NULL.)

Yes, that's what I meant.

Arnd

2014-06-03 21:38:02

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote:
> On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> > On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> The possible uses I can see for non-ktime_t types in the kernel are:
> * inodes need 96 bit timestamps to represent the full range of values
> that can be stored in a file system, you made a convincing argument
> for that. Almost everything else can fit into 64 bit on a 32-bit
> kernel, in theory also on a 64-bit kernel if we want that.

Just ot be pedantic, inodes don't *need* 96 bit timestamps - some
filesystems can *support up to* 96 bit timestamps. If the kernel
only supports 64 bit timestamps and that's all the kernel can
represent, then the upper bits of the 96 bit on-disk inode
timestamps simply remain zero.

If you move the filesystem between kernels with different time
ranges, then the filesystem needs to be able to tell the kernel what
it's supported range is. This is where having the VFS limit the
range of supported timestamps is important: the limit is the
min(kernel range, filesystem range). This allows the filesystems
to be indepenent of the kernel time representation, and the kernel
to be independent of the physical filesystem time encoding....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-06-04 15:03:47

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Tuesday 03 June 2014, Dave Chinner wrote:
> On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote:
> > On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> > > On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> > The possible uses I can see for non-ktime_t types in the kernel are:
> > * inodes need 96 bit timestamps to represent the full range of values
> > that can be stored in a file system, you made a convincing argument
> > for that. Almost everything else can fit into 64 bit on a 32-bit
> > kernel, in theory also on a 64-bit kernel if we want that.
>
> Just ot be pedantic, inodes don't need 96 bit timestamps - some
> filesystems can *support up to* 96 bit timestamps. If the kernel
> only supports 64 bit timestamps and that's all the kernel can
> represent, then the upper bits of the 96 bit on-disk inode
> timestamps simply remain zero.

I meant the reverse: since we have file systems that can store
96-bit timestamps when using 64-bit kernels, we need to extend
32-bit kernels to have the same internal representation so we
can actually read those file systems correctly.

> If you move the filesystem between kernels with different time
> ranges, then the filesystem needs to be able to tell the kernel what
> it's supported range is. This is where having the VFS limit the
> range of supported timestamps is important: the limit is the
> min(kernel range, filesystem range). This allows the filesystems
> to be indepenent of the kernel time representation, and the kernel
> to be independent of the physical filesystem time encoding....

I agree it makes sense to let the kernel know about the limits
of the file system it accesses, but for the reverse, we're probably
better off just making the kernel representation large enough (i.e.
96 bits) so it can work with any known file system. We need another
check at the user space boundary to turn that into a value that the
user can understand, but that's another problem.

Arnd

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech

2014-06-04 15:05:27

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Monday 02 June 2014, Joseph S. Myers wrote:
> On Mon, 2 Jun 2014, Arnd Bergmann wrote:
>
> > Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> > may not be as relevant here, except for the one question whether we
> > actually want to have the new ABI on all 32-bit architectures or only
> > as an opt-in for those that expect to stay around for another 24 years.
>
> For glibc I think it will make the most sense to add the support for
> 64-bit time_t across all architectures that currently have 32-bit time_t
> (with the new interfaces having fallback support to implementation in
> terms of the 32-bit kernel interfaces, if the 64-bit syscalls are
> unavailable either at runtime or in the kernel headers against which glibc
> is compiled - this fallback code will of course need to check for overflow
> when passing a time value to the kernel, hopefully with error handling
> consistent with whatever the kernel ends up doing when a filesystem can't
> support a timestamp). If some architectures don't provide the new
> interfaces in the kernel then that will mean the fallback code in glibc
> can't be removed until glibc support for those architectures is removed
> (as opposed to removing it when glibc no longer supports kernels predating
> the kernel support).

Ok, that's a good reason to just provide the new interfaces on all
architectures right away. Thanks for the insight!

Arnd

2014-06-04 17:30:32

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Wed, 4 Jun 2014, Arnd Bergmann wrote:

> On Tuesday 03 June 2014, Dave Chinner wrote:
> > Just ot be pedantic, inodes don't need 96 bit timestamps - some
> > filesystems can *support up to* 96 bit timestamps. If the kernel
> > only supports 64 bit timestamps and that's all the kernel can
> > represent, then the upper bits of the 96 bit on-disk inode
> > timestamps simply remain zero.
>
> I meant the reverse: since we have file systems that can store
> 96-bit timestamps when using 64-bit kernels, we need to extend
> 32-bit kernels to have the same internal representation so we
> can actually read those file systems correctly.
>
> > If you move the filesystem between kernels with different time
> > ranges, then the filesystem needs to be able to tell the kernel what
> > it's supported range is. This is where having the VFS limit the
> > range of supported timestamps is important: the limit is the
> > min(kernel range, filesystem range). This allows the filesystems
> > to be indepenent of the kernel time representation, and the kernel
> > to be independent of the physical filesystem time encoding....
>
> I agree it makes sense to let the kernel know about the limits
> of the file system it accesses, but for the reverse, we're probably
> better off just making the kernel representation large enough (i.e.
> 96 bits) so it can work with any known file system.

Depends... 96 bit handling may get prohibitive on 32-bit archs.

The important point here is for the kernel to be able to represent the
time _range_ used by any known filesystem, not necessarily the time
_precision_.

For example, a 64 bit representation can be made of 40 bits for seconds
spanning 34865 years, and 24 bits for fractional seconds providing
precision down to 60 nanosecs. That ought to be plenty good on 32 bit
systems while still being cheap to handle.


Nicolas

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

2014-06-04 19:24:42

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Wednesday 04 June 2014 13:30:32 Nicolas Pitre wrote:
> On Wed, 4 Jun 2014, Arnd Bergmann wrote:
>
> > On Tuesday 03 June 2014, Dave Chinner wrote:
> > > Just ot be pedantic, inodes don't need 96 bit timestamps - some
> > > filesystems can *support up to* 96 bit timestamps. If the kernel
> > > only supports 64 bit timestamps and that's all the kernel can
> > > represent, then the upper bits of the 96 bit on-disk inode
> > > timestamps simply remain zero.
> >
> > I meant the reverse: since we have file systems that can store
> > 96-bit timestamps when using 64-bit kernels, we need to extend
> > 32-bit kernels to have the same internal representation so we
> > can actually read those file systems correctly.
> >
> > > If you move the filesystem between kernels with different time
> > > ranges, then the filesystem needs to be able to tell the kernel what
> > > it's supported range is. This is where having the VFS limit the
> > > range of supported timestamps is important: the limit is the
> > > min(kernel range, filesystem range). This allows the filesystems
> > > to be indepenent of the kernel time representation, and the kernel
> > > to be independent of the physical filesystem time encoding....
> >
> > I agree it makes sense to let the kernel know about the limits
> > of the file system it accesses, but for the reverse, we're probably
> > better off just making the kernel representation large enough (i.e.
> > 96 bits) so it can work with any known file system.
>
> Depends... 96 bit handling may get prohibitive on 32-bit archs.
>
> The important point here is for the kernel to be able to represent the
> time _range_ used by any known filesystem, not necessarily the time
> _precision_.
>
> For example, a 64 bit representation can be made of 40 bits for seconds
> spanning 34865 years, and 24 bits for fractional seconds providing
> precision down to 60 nanosecs. That ought to be plenty good on 32 bit
> systems while still being cheap to handle.

I have checked earlier that we don't do any computation on inode
time stamps in common code, we just pass them around, so there is
very little runtime overhead. There is a small bit of space overhead
(12 byte) per inode, but that structure is already on the order of
500 bytes.

For other timekeeping stuff in the kernel, I agree that using some
64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
...) has advantages, that's exactly the point I was making earlier
against simply extending the internal time_t/timespec to 64-bit
seconds for everything.

Arnd

2014-06-05 00:10:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On 06/04/2014 12:24 PM, Arnd Bergmann wrote:
>
> For other timekeeping stuff in the kernel, I agree that using some
> 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
> ...) has advantages, that's exactly the point I was making earlier
> against simply extending the internal time_t/timespec to 64-bit
> seconds for everything.
>

How much of a performance issue is it to make time_t 64 bits, and for
the bits there are, how hard are they to fix?

-hpa



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech

2014-06-10 09:54:14

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 00/32] making inode time stamps y2038 ready

On Wednesday 04 June 2014 17:10:24 H. Peter Anvin wrote:
> On 06/04/2014 12:24 PM, Arnd Bergmann wrote:
> >
> > For other timekeeping stuff in the kernel, I agree that using some
> > 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
> > ...) has advantages, that's exactly the point I was making earlier
> > against simply extending the internal time_t/timespec to 64-bit
> > seconds for everything.
> >
>
> How much of a performance issue is it to make time_t 64 bits, and for
> the bits there are, how hard are they to fix?

Probably very little overhead for most uses, it's more the regression
potential in the less common parts of the kernel I'm worried about.

There is a significant but not overwhelming number of uses of the
main problematic types in the kernel:

arnd@wuerfel:~/arm-soc$ git grep -wl time_t | wc
188 188 5566
arnd@wuerfel:~/arm-soc$ git grep -wl timeval | wc
320 320 10353
arnd@wuerfel:~/arm-soc$ git grep -wl timespec | wc
406 406 10886

I believe we have to audit all of them anyway if we want to change
the kernel to less problematic types and introduce new user
interfaces.

IMHO this work is helped if we change the uses to a new type
as we find the problems. This lets us do the work one subsystem
at a time and avoid accidental ABI changes. I don't care much what
type that will be, and having a 96-bit type will certainly work
well in a lot of cases, but I don't see a strong reason to use
that over other types, especially when they can be more efficient.

Arnd