Based on the recent discussion about 64-bit time_t for new
architectures, and for solving the year 2038 problem in general,
I decided to try out what it would take to solve part of the
kernel side of things.
This is a proof-of-concept work to get us to the point where
two system calls (utimes and stat) provide a working interface
to user space to pass 64-bit inode time stamps in and out of
the kernel all the way to the file systems.
I picked this because it is a fairly isolated problem, as the
inode time stamps are rarely assigned to any other time values.
As a byproduct of this work, I documented for each of the file
systems we support how long the on-disk format can work[1].
Obviously we also need to convert all the other syscalls and
have a proper libc implementation using those for this to
be really useful, but it's a start and it can be tested
independently (I didn't so far, want to wait for initial
feedback).
All the interesting stuff is in the first five patches here,
the rest is the straightforward conversion of all file systems
that use 'timespec' values internally.
There are of course a number of open questions:
a) is this the right approach in general? The previous discussion
pointed this way, but there may be other opinions.
b) what type should we use internally to represent inode time
stamps? The code contains three different versions that would
all work, we just have to pick a good tradeoff between
efficiency and the range of times we want to cover.
c) Should we continue this way for all 32-bit platforms for
consistency, including future ones, or should we go to
different 64-bit types right away? My feeling is that the
second approach would complicate this work.
Arnd
[1] http://kernelnewbies.org/y2038
Arnd Bergmann (32):
fs: introduce new 'struct inode_time'
uapi: add struct __kernel_timespec{32,64}
fs: introduce sys_utimens64at
fs: introduce sys_newfstat64/sys_newfstatat64
arch: hook up new stat and utimes syscalls
isofs: fix timestamps beyond 2027
fs/nfs: convert to struct inode_time
fs/ceph: convert to 'struct inode_time'
fs/pstore: convert to struct inode_time
fs/coda: convert to struct inode_time
xfs: convert to struct inode_time
btrfs: convert to struct inode_time
ext3: convert to struct inode_time
ext4: convert to struct inode_time
cifs: convert to struct inode_time
ntfs: convert to struct inode_time
ubifs: convert to struct inode_time
ocfs2: convert to struct inode_time
fs/fat: convert to struct inode_time
afs: convert to struct inode_time
udf: convert to struct inode_time
fs: convert simple fs to inode_time
logfs: convert to struct inode_time
hfs, hfsplus: convert to struct inode_time
gfs2: convert to struct inode_time
reiserfs: convert to struct inode_time
jffs2: convert to struct inode_time
adfs: convert to struct inode_time
f2fs: convert to struct inode_time
fuse: convert to struct inode_time
scsi: fnic: use current_kernel_time() for timestamp
fs: use new inode_time definition unconditionally
arch/alpha/kernel/osf_sys.c | 2 +-
arch/arm/include/asm/unistd.h | 2 +-
arch/arm/include/uapi/asm/stat.h | 25 +++++++++++++++++
arch/arm/include/uapi/asm/unistd.h | 3 +++
arch/arm/kernel/calls.S | 3 +++
arch/arm64/include/asm/unistd32.h | 5 +++-
arch/x86/include/uapi/asm/stat.h | 28 +++++++++++++++++++
arch/x86/syscalls/syscall_32.tbl | 3 +++
drivers/block/rbd.c | 2 +-
drivers/firmware/efi/efi-pstore.c | 28 +++++++++----------
drivers/scsi/fnic/fnic_trace.c | 2 +-
drivers/tty/tty_io.c | 2 +-
drivers/usb/gadget/f_fs.c | 2 +-
fs/adfs/inode.c | 4 +--
fs/afs/afs.h | 6 ++---
fs/afs/fsclient.c | 2 +-
fs/attr.c | 8 +++---
fs/btrfs/file.c | 6 ++---
fs/btrfs/inode.c | 4 +--
fs/btrfs/ioctl.c | 4 +--
fs/btrfs/root-tree.c | 2 +-
fs/btrfs/transaction.c | 2 +-
fs/ceph/cache.c | 2 +-
fs/ceph/caps.c | 6 ++---
fs/ceph/file.c | 4 +--
fs/ceph/inode.c | 20 +++++++-------
fs/ceph/super.h | 8 +++---
fs/cifs/cache.c | 6 ++---
fs/cifs/cifsglob.h | 6 ++---
fs/cifs/cifsproto.h | 6 ++---
fs/cifs/cifssmb.c | 5 ++--
fs/cifs/inode.c | 2 +-
fs/cifs/netmisc.c | 15 ++++++-----
fs/coda/coda_linux.c | 18 ++++++++-----
fs/compat.c | 19 ++-----------
fs/configfs/inode.c | 6 ++---
fs/cramfs/inode.c | 2 +-
fs/ext3/inode.c | 4 +--
fs/ext4/ext4.h | 10 +++----
fs/ext4/extents.c | 2 +-
fs/f2fs/file.c | 6 ++---
fs/fat/dir.c | 2 +-
fs/fat/fat.h | 6 ++---
fs/fat/misc.c | 4 +--
fs/fat/namei_msdos.c | 8 +++---
fs/fat/namei_vfat.c | 10 +++----
fs/fuse/inode.c | 6 ++---
fs/gfs2/dir.c | 6 ++---
fs/gfs2/glops.c | 4 +--
fs/hfs/hfs_fs.h | 2 +-
fs/hfsplus/hfsplus_fs.h | 2 +-
fs/inode.c | 18 ++++++-------
fs/isofs/util.c | 2 +-
fs/jffs2/os-linux.h | 2 +-
fs/locks.c | 4 +--
fs/logfs/readwrite.c | 18 ++++++-------
fs/nfs/callback.h | 4 +--
fs/nfs/callback_xdr.c | 6 ++---
fs/nfs/file.c | 2 +-
fs/nfs/fscache-index.c | 8 +++---
fs/nfs/inode.c | 10 +++----
fs/nfs/internal.h | 4 +--
fs/nfs/netns.h | 2 +-
fs/nfs/nfs2xdr.c | 8 +++---
fs/nfs/nfs3xdr.c | 10 +++----
fs/nfs/nfs4xdr.c | 20 +++++++-------
fs/nfsd/nfs3xdr.c | 6 ++---
fs/nfsd/nfsfh.h | 4 +--
fs/nfsd/nfsxdr.c | 2 +-
fs/ntfs/inode.c | 12 ++++-----
fs/ntfs/time.h | 8 +++---
fs/ocfs2/dlmglue.c | 16 +++++------
fs/ocfs2/file.c | 6 ++---
fs/ocfs2/ocfs2.h | 2 +-
fs/pstore/inode.c | 2 +-
fs/pstore/internal.h | 2 +-
fs/pstore/platform.c | 2 +-
fs/pstore/ram.c | 18 +++++++------
fs/reiserfs/namei.c | 2 +-
fs/reiserfs/xattr.c | 4 +--
fs/stat.c | 55 ++++++++++++++++++++++++++++++++++++++
fs/ubifs/dir.c | 2 +-
fs/ubifs/file.c | 16 +++++------
fs/ubifs/misc.h | 2 +-
fs/udf/udf_i.h | 2 +-
fs/udf/udf_sb.h | 2 +-
fs/udf/udfdecl.h | 7 ++---
fs/udf/udftime.c | 7 ++---
fs/utimes.c | 47 +++++++++++++++++++++++++++-----
fs/xfs/time.h | 4 +--
fs/xfs/xfs_inode.c | 2 +-
fs/xfs/xfs_iops.c | 2 +-
fs/xfs/xfs_trans_inode.c | 6 ++---
include/linux/ceph/decode.h | 8 +++---
include/linux/ceph/osd_client.h | 4 +--
include/linux/compat.h | 2 +-
include/linux/fs.h | 32 +++++++++++-----------
include/linux/nfs_fs_sb.h | 2 +-
include/linux/nfs_xdr.h | 14 +++++-----
include/linux/pstore.h | 4 +--
include/linux/stat.h | 6 ++---
include/linux/syscalls.h | 9 ++++++-
include/linux/time.h | 44 +++++++++++++++++++++++++++---
include/uapi/asm-generic/stat.h | 29 ++++++++++++++++++--
include/uapi/asm-generic/unistd.h | 8 +++++-
include/uapi/linux/coda.h | 1 +
include/uapi/linux/time.h | 40 ++++++++++++++++++++++++++-
init/initramfs.c | 2 +-
kernel/audit.c | 2 +-
kernel/auditsc.c | 2 +-
kernel/time.c | 44 +++++++++++++++++++++++++-----
kernel/time/timekeeping.c | 16 +++++++++++
net/ceph/auth_x.c | 2 +-
net/ceph/osd_client.c | 4 +--
114 files changed, 642 insertions(+), 333 deletions(-)
--
1.8.3.2
Bcc: "J. Bruce Fields" <[email protected]>
Bcc: "Theodore Ts'o" <[email protected]>
Bcc: Adrian Hunter <[email protected]>
Bcc: Andreas Dilger <[email protected]>
Bcc: Andrew Morton <[email protected]>
Bcc: Anton Altaparmakov <[email protected]>
Bcc: Anton Vorontsov <[email protected]>
Bcc: Artem Bityutskiy <[email protected]>
Bcc: Brian Uchino <[email protected]>
Bcc: Chris Mason <[email protected]>
Bcc: Colin Cross <[email protected]>
Bcc: Dave Chinner <[email protected]>
Bcc: David Howells <[email protected]>
Bcc: David Woodhouse <[email protected]>
Bcc: Greg Kroah-Hartman <[email protected]>
Bcc: Hiral Patel <[email protected]>
Bcc: Jaegeuk Kim <[email protected]>
Bcc: Jan Harkes <[email protected]>
Bcc: Jan Kara <[email protected]>
Bcc: Joel Becker <[email protected]>
Bcc: Joern Engel <[email protected]>
Bcc: Josef Bacik <[email protected]>
Bcc: Kees Cook <[email protected]>
Bcc: Mark Fasheh <[email protected]>
Bcc: Miklos Szeredi <[email protected]>
Bcc: OGAWA Hirofumi <[email protected]>
Bcc: Prasad Joshi <[email protected]>
Bcc: Sage Weil <[email protected]>
Bcc: Steve French <[email protected]>
Bcc: Steven Whitehouse <[email protected]>
Bcc: Suma Ramars <[email protected]>
Bcc: Tony Luck <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote:
>
> I picked this because it is a fairly isolated problem, as the
> inode time stamps are rarely assigned to any other time values.
> As a byproduct of this work, I documented for each of the file
> systems we support how long the on-disk format can work[1].
Why are some of the time stamp expiration dates marked as "never"?
Thanks,
Richard
Typically they are using 64-bit signed seconds.
On May 31, 2014 11:22:37 AM PDT, Richard Cochran <[email protected]> wrote:
>On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote:
>>
>> It's an approximation:
>
>(Approximately never ;)
>
>> with 64-bit timestamps, you can represent close to 300 billion
>> years, which is way past the time that our planet can sustain
>> life of any form[1].
>
>Did you mean mean 64 bits worth of seconds?
>
> 2^64 / (3600*24*365) = 584,942,417,355
>
>That is more than 300 billion years, and still, it is not quite the
>same as "never".
>
>In any case, that term is not too helpful in the comparison table,
>IMHO. One could think that some sort of clever running count relative
>to the last mount time was implied.
>
>Thanks,
>Richard
>
>[1] You are forgetting the immortal robotic overlords.
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
Hi Arnd,
On Fri, 2014-05-30 at 22:01 +0200, Arnd Bergmann wrote:
[snip]
>
> Arnd Bergmann (32):
> fs: introduce new 'struct inode_time'
> uapi: add struct __kernel_timespec{32,64}
> fs: introduce sys_utimens64at
> fs: introduce sys_newfstat64/sys_newfstatat64
> arch: hook up new stat and utimes syscalls
> isofs: fix timestamps beyond 2027
> fs/nfs: convert to struct inode_time
> fs/ceph: convert to 'struct inode_time'
> fs/pstore: convert to struct inode_time
> fs/coda: convert to struct inode_time
> xfs: convert to struct inode_time
> btrfs: convert to struct inode_time
> ext3: convert to struct inode_time
> ext4: convert to struct inode_time
> cifs: convert to struct inode_time
> ntfs: convert to struct inode_time
> ubifs: convert to struct inode_time
> ocfs2: convert to struct inode_time
> fs/fat: convert to struct inode_time
> afs: convert to struct inode_time
> udf: convert to struct inode_time
> fs: convert simple fs to inode_time
> logfs: convert to struct inode_time
> hfs, hfsplus: convert to struct inode_time
> gfs2: convert to struct inode_time
> reiserfs: convert to struct inode_time
> jffs2: convert to struct inode_time
> adfs: convert to struct inode_time
> f2fs: convert to struct inode_time
> fuse: convert to struct inode_time
> scsi: fnic: use current_kernel_time() for timestamp
> fs: use new inode_time definition unconditionally
>
By the way, what about NILFS2? Is NILFS2 ready for suggested approach
without any changes?
Thanks,
Vyacheslav Dubeyko.
This makes the nfs client and server code use 'struct inode_time'
instead of 'struct timespec', to lift the time stamp limitation
on 32-bit systems. With NFS version 2 and 3, this means we can
represent years up until 2106 rather than 2038. With NFS version
4, the on-wire representation allows 64-bit seconds.
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Cc: [email protected]
---
fs/nfs/callback.h | 4 ++--
fs/nfs/callback_xdr.c | 6 +++---
fs/nfs/file.c | 2 +-
fs/nfs/fscache-index.c | 8 ++++----
fs/nfs/inode.c | 10 +++++-----
fs/nfs/internal.h | 4 ++--
fs/nfs/netns.h | 2 +-
fs/nfs/nfs2xdr.c | 8 ++++----
fs/nfs/nfs3xdr.c | 10 +++++-----
fs/nfs/nfs4xdr.c | 20 ++++++++++----------
fs/nfsd/nfs3xdr.c | 6 +++---
fs/nfsd/nfsfh.h | 4 ++--
fs/nfsd/nfsxdr.c | 2 +-
include/linux/nfs_fs_sb.h | 2 +-
include/linux/nfs_xdr.h | 14 +++++++-------
15 files changed, 51 insertions(+), 51 deletions(-)
diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
index 84326e9..3a3e6b4 100644
--- a/fs/nfs/callback.h
+++ b/fs/nfs/callback.h
@@ -71,8 +71,8 @@ struct cb_getattrres {
uint32_t bitmap[2];
uint64_t size;
uint64_t change_attr;
- struct timespec ctime;
- struct timespec mtime;
+ struct inode_time ctime;
+ struct inode_time mtime;
};
struct cb_recallargs {
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index f4ccfe6..177a6f7 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -596,7 +596,7 @@ static __be32 encode_attr_size(struct xdr_stream *xdr, const uint32_t *bitmap, u
return 0;
}
-static __be32 encode_attr_time(struct xdr_stream *xdr, const struct timespec *time)
+static __be32 encode_attr_time(struct xdr_stream *xdr, const struct inode_time *time)
{
__be32 *p;
@@ -608,14 +608,14 @@ static __be32 encode_attr_time(struct xdr_stream *xdr, const struct timespec *ti
return 0;
}
-static __be32 encode_attr_ctime(struct xdr_stream *xdr, const uint32_t *bitmap, const struct timespec *time)
+static __be32 encode_attr_ctime(struct xdr_stream *xdr, const uint32_t *bitmap, const struct inode_time *time)
{
if (!(bitmap[1] & FATTR4_WORD1_TIME_METADATA))
return 0;
return encode_attr_time(xdr,time);
}
-static __be32 encode_attr_mtime(struct xdr_stream *xdr, const uint32_t *bitmap, const struct timespec *time)
+static __be32 encode_attr_mtime(struct xdr_stream *xdr, const uint32_t *bitmap, const struct inode_time *time)
{
if (!(bitmap[1] & FATTR4_WORD1_TIME_MODIFY))
return 0;
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 4042ff5..9bdd210 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -772,7 +772,7 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
}
static int
-is_time_granular(struct timespec *ts) {
+is_time_granular(struct inode_time *ts) {
return ((ts->tv_sec == 0) && (ts->tv_nsec <= 1000));
}
diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index 7cf2c46..ae75bad 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -157,10 +157,10 @@ const struct fscache_cookie_def nfs_fscache_super_index_def = {
* cache object.
*/
struct nfs_fscache_inode_auxdata {
- struct timespec mtime;
- struct timespec ctime;
- loff_t size;
- u64 change_attr;
+ struct inode_time mtime;
+ struct inode_time ctime;
+ loff_t size;
+ u64 change_attr;
};
/*
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index c496f8a..99c9145 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1107,14 +1107,14 @@ static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr
/* If we have atomic WCC data, we may update some attributes */
if ((fattr->valid & NFS_ATTR_FATTR_PRECTIME)
&& (fattr->valid & NFS_ATTR_FATTR_CTIME)
- && timespec_equal(&inode->i_ctime, &fattr->pre_ctime)) {
+ && inode_time_equal(&inode->i_ctime, &fattr->pre_ctime)) {
memcpy(&inode->i_ctime, &fattr->ctime, sizeof(inode->i_ctime));
ret |= NFS_INO_INVALID_ATTR;
}
if ((fattr->valid & NFS_ATTR_FATTR_PREMTIME)
&& (fattr->valid & NFS_ATTR_FATTR_MTIME)
- && timespec_equal(&inode->i_mtime, &fattr->pre_mtime)) {
+ && inode_time_equal(&inode->i_mtime, &fattr->pre_mtime)) {
memcpy(&inode->i_mtime, &fattr->mtime, sizeof(inode->i_mtime));
if (S_ISDIR(inode->i_mode))
nfsi->cache_validity |= NFS_INO_INVALID_DATA;
@@ -1163,7 +1163,7 @@ static int nfs_check_inode_attributes(struct inode *inode, struct nfs_fattr *fat
invalid |= NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE;
/* Verify a few of the more important attributes */
- if ((fattr->valid & NFS_ATTR_FATTR_MTIME) && !timespec_equal(&inode->i_mtime, &fattr->mtime))
+ if ((fattr->valid & NFS_ATTR_FATTR_MTIME) && !inode_time_equal(&inode->i_mtime, &fattr->mtime))
invalid |= NFS_INO_INVALID_ATTR;
if (fattr->valid & NFS_ATTR_FATTR_SIZE) {
@@ -1185,7 +1185,7 @@ static int nfs_check_inode_attributes(struct inode *inode, struct nfs_fattr *fat
if ((fattr->valid & NFS_ATTR_FATTR_NLINK) && inode->i_nlink != fattr->nlink)
invalid |= NFS_INO_INVALID_ATTR;
- if ((fattr->valid & NFS_ATTR_FATTR_ATIME) && !timespec_equal(&inode->i_atime, &fattr->atime))
+ if ((fattr->valid & NFS_ATTR_FATTR_ATIME) && !inode_time_equal(&inode->i_atime, &fattr->atime))
invalid |= NFS_INO_INVALID_ATIME;
if (invalid != 0)
@@ -1199,7 +1199,7 @@ static int nfs_ctime_need_update(const struct inode *inode, const struct nfs_fat
{
if (!(fattr->valid & NFS_ATTR_FATTR_CTIME))
return 0;
- return timespec_compare(&fattr->ctime, &inode->i_ctime) > 0;
+ return inode_time_compare(&fattr->ctime, &inode->i_ctime) > 0;
}
static int nfs_size_need_update(const struct inode *inode, const struct nfs_fattr *fattr)
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 0e4e804..97e06f1 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -605,14 +605,14 @@ unsigned int nfs_page_array_len(unsigned int base, size_t len)
}
/*
- * Convert a struct timespec into a 64-bit change attribute
+ * Convert a struct inode_time into a 64-bit change attribute
*
* This does approximately the same thing as timespec_to_ns(),
* but for calculation efficiency, we multiply the seconds by
* 1024*1024*1024.
*/
static inline
-u64 nfs_timespec_to_change_attr(const struct timespec *ts)
+u64 nfs_time_to_change_attr(const struct inode_time *ts)
{
return ((u64)ts->tv_sec << 30) + ts->tv_nsec;
}
diff --git a/fs/nfs/netns.h b/fs/nfs/netns.h
index 8ee1fab..f665fbd 100644
--- a/fs/nfs/netns.h
+++ b/fs/nfs/netns.h
@@ -28,7 +28,7 @@ struct nfs_net {
int cb_users[NFS4_MAX_MINOR_VERSION + 1];
#endif
spinlock_t nfs_client_lock;
- struct timespec boot_time;
+ struct inode_time boot_time;
};
extern int nfs_net_id;
diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index 62db136..984b7cd 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -222,7 +222,7 @@ out_overflow:
* unsigned int useconds;
* };
*/
-static __be32 *xdr_encode_time(__be32 *p, const struct timespec *timep)
+static __be32 *xdr_encode_time(__be32 *p, const struct inode_time *timep)
{
*p++ = cpu_to_be32(timep->tv_sec);
if (timep->tv_nsec != 0)
@@ -240,14 +240,14 @@ static __be32 *xdr_encode_time(__be32 *p, const struct timespec *timep)
* Illustrated" by Brent Callaghan, Addison-Wesley, ISBN 0-201-32750-5.
*/
static __be32 *xdr_encode_current_server_time(__be32 *p,
- const struct timespec *timep)
+ const struct inode_time *timep)
{
*p++ = cpu_to_be32(timep->tv_sec);
*p++ = cpu_to_be32(1000000);
return p;
}
-static __be32 *xdr_decode_time(__be32 *p, struct timespec *timep)
+static __be32 *xdr_decode_time(__be32 *p, struct inode_time *timep)
{
timep->tv_sec = be32_to_cpup(p++);
timep->tv_nsec = be32_to_cpup(p++) * NSEC_PER_USEC;
@@ -315,7 +315,7 @@ static int decode_fattr(struct xdr_stream *xdr, struct nfs_fattr *fattr)
p = xdr_decode_time(p, &fattr->atime);
p = xdr_decode_time(p, &fattr->mtime);
xdr_decode_time(p, &fattr->ctime);
- fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
+ fattr->change_attr = nfs_time_to_change_attr(&fattr->ctime);
return 0;
out_uid:
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index fa6d721..09c40f2 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -477,21 +477,21 @@ static void zero_nfs_fh3(struct nfs_fh *fh)
}
/*
- * nfstime3
+ * nfstime3
*
* struct nfstime3 {
* uint32 seconds;
* uint32 nseconds;
* };
*/
-static __be32 *xdr_encode_nfstime3(__be32 *p, const struct timespec *timep)
+static __be32 *xdr_encode_nfstime3(__be32 *p, const struct inode_time *timep)
{
*p++ = cpu_to_be32(timep->tv_sec);
*p++ = cpu_to_be32(timep->tv_nsec);
return p;
}
-static __be32 *xdr_decode_nfstime3(__be32 *p, struct timespec *timep)
+static __be32 *xdr_decode_nfstime3(__be32 *p, struct inode_time *timep)
{
timep->tv_sec = be32_to_cpup(p++);
timep->tv_nsec = be32_to_cpup(p++);
@@ -675,7 +675,7 @@ static int decode_fattr3(struct xdr_stream *xdr, struct nfs_fattr *fattr)
p = xdr_decode_nfstime3(p, &fattr->atime);
p = xdr_decode_nfstime3(p, &fattr->mtime);
xdr_decode_nfstime3(p, &fattr->ctime);
- fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
+ fattr->change_attr = nfs_time_to_change_attr(&fattr->ctime);
fattr->valid |= NFS_ATTR_FATTR_V3;
return 0;
@@ -739,7 +739,7 @@ static int decode_wcc_attr(struct xdr_stream *xdr, struct nfs_fattr *fattr)
p = xdr_decode_size3(p, &fattr->pre_size);
p = xdr_decode_nfstime3(p, &fattr->pre_mtime);
xdr_decode_nfstime3(p, &fattr->pre_ctime);
- fattr->pre_change_attr = nfs_timespec_to_change_attr(&fattr->pre_ctime);
+ fattr->pre_change_attr = nfs_time_to_change_attr(&fattr->pre_ctime);
return 0;
out_overflow:
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 73ce8d4..a41265b 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -4073,7 +4073,7 @@ out_overflow:
return -EIO;
}
-static int decode_attr_time(struct xdr_stream *xdr, struct timespec *time)
+static int decode_attr_time(struct xdr_stream *xdr, struct inode_time *time)
{
__be32 *p;
uint64_t sec;
@@ -4084,7 +4084,7 @@ static int decode_attr_time(struct xdr_stream *xdr, struct timespec *time)
goto out_overflow;
p = xdr_decode_hyper(p, &sec);
nsec = be32_to_cpup(p);
- time->tv_sec = (time_t)sec;
+ time->tv_sec = sec;
time->tv_nsec = (long)nsec;
return 0;
out_overflow:
@@ -4092,7 +4092,7 @@ out_overflow:
return -EIO;
}
-static int decode_attr_time_access(struct xdr_stream *xdr, uint32_t *bitmap, struct timespec *time)
+static int decode_attr_time_access(struct xdr_stream *xdr, uint32_t *bitmap, struct inode_time *time)
{
int status = 0;
@@ -4106,11 +4106,11 @@ static int decode_attr_time_access(struct xdr_stream *xdr, uint32_t *bitmap, str
status = NFS_ATTR_FATTR_ATIME;
bitmap[1] &= ~FATTR4_WORD1_TIME_ACCESS;
}
- dprintk("%s: atime=%ld\n", __func__, (long)time->tv_sec);
+ dprintk("%s: atime=%lld\n", __func__, (long long)time->tv_sec);
return status;
}
-static int decode_attr_time_metadata(struct xdr_stream *xdr, uint32_t *bitmap, struct timespec *time)
+static int decode_attr_time_metadata(struct xdr_stream *xdr, uint32_t *bitmap, struct inode_time *time)
{
int status = 0;
@@ -4124,12 +4124,12 @@ static int decode_attr_time_metadata(struct xdr_stream *xdr, uint32_t *bitmap, s
status = NFS_ATTR_FATTR_CTIME;
bitmap[1] &= ~FATTR4_WORD1_TIME_METADATA;
}
- dprintk("%s: ctime=%ld\n", __func__, (long)time->tv_sec);
+ dprintk("%s: ctime=%lld\n", __func__, (long long)time->tv_sec);
return status;
}
static int decode_attr_time_delta(struct xdr_stream *xdr, uint32_t *bitmap,
- struct timespec *time)
+ struct inode_time *time)
{
int status = 0;
@@ -4141,7 +4141,7 @@ static int decode_attr_time_delta(struct xdr_stream *xdr, uint32_t *bitmap,
status = decode_attr_time(xdr, time);
bitmap[1] &= ~FATTR4_WORD1_TIME_DELTA;
}
- dprintk("%s: time_delta=%ld %ld\n", __func__, (long)time->tv_sec,
+ dprintk("%s: time_delta=%lld %ld\n", __func__, (long long)time->tv_sec,
(long)time->tv_nsec);
return status;
}
@@ -4196,7 +4196,7 @@ out_overflow:
return -EIO;
}
-static int decode_attr_time_modify(struct xdr_stream *xdr, uint32_t *bitmap, struct timespec *time)
+static int decode_attr_time_modify(struct xdr_stream *xdr, uint32_t *bitmap, struct inode_time *time)
{
int status = 0;
@@ -4210,7 +4210,7 @@ static int decode_attr_time_modify(struct xdr_stream *xdr, uint32_t *bitmap, str
status = NFS_ATTR_FATTR_MTIME;
bitmap[1] &= ~FATTR4_WORD1_TIME_MODIFY;
}
- dprintk("%s: mtime=%ld\n", __func__, (long)time->tv_sec);
+ dprintk("%s: mtime=%lld\n", __func__, (long long)time->tv_sec);
return status;
}
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index de6e39e..46d2eb1 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -30,14 +30,14 @@ static u32 nfs3_ftypes[] = {
* XDR functions for basic NFS types
*/
static __be32 *
-encode_time3(__be32 *p, struct timespec *time)
+encode_time3(__be32 *p, struct inode_time *time)
{
*p++ = htonl((u32) time->tv_sec); *p++ = htonl(time->tv_nsec);
return p;
}
static __be32 *
-decode_time3(__be32 *p, struct timespec *time)
+decode_time3(__be32 *p, struct inode_time *time)
{
time->tv_sec = ntohl(*p++);
time->tv_nsec = ntohl(*p++);
@@ -292,7 +292,7 @@ nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p,
p = decode_sattr3(p, &args->attrs);
if ((args->check_guard = ntohl(*p++)) != 0) {
- struct timespec time;
+ struct inode_time time;
p = decode_time3(p, &time);
args->guardtime = time.tv_sec;
}
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 2e89e70..a7eb5af 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -39,8 +39,8 @@ typedef struct svc_fh {
/* Pre-op attributes saved during fh_lock */
__u64 fh_pre_size; /* size before operation */
- struct timespec fh_pre_mtime; /* mtime before oper */
- struct timespec fh_pre_ctime; /* ctime before oper */
+ struct inode_time fh_pre_mtime; /* mtime before oper */
+ struct inode_time fh_pre_ctime; /* ctime before oper */
/*
* pre-op nfsv4 change attr: note must check IS_I_VERSION(inode)
* to find out if it is valid.
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
index 9c769a4..cfac45c 100644
--- a/fs/nfsd/nfsxdr.c
+++ b/fs/nfsd/nfsxdr.c
@@ -146,7 +146,7 @@ encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp,
{
struct dentry *dentry = fhp->fh_dentry;
int type;
- struct timespec time;
+ struct inode_time time;
u32 f;
type = (stat->mode & S_IFMT);
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 1150ea4..2370468 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -147,7 +147,7 @@ struct nfs_server {
struct nfs_fsid fsid;
__u64 maxfilesize; /* maximum file size */
- struct timespec time_delta; /* smallest time granularity */
+ struct inode_time time_delta; /* smallest time granularity */
unsigned long mount_time; /* when this fs was mounted */
struct super_block *super; /* VFS super block */
dev_t s_dev; /* superblock dev numbers */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 6fb5b23..ae27bf4 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -61,14 +61,14 @@ struct nfs_fattr {
struct nfs_fsid fsid;
__u64 fileid;
__u64 mounted_on_fileid;
- struct timespec atime;
- struct timespec mtime;
- struct timespec ctime;
+ struct inode_time atime;
+ struct inode_time mtime;
+ struct inode_time ctime;
__u64 change_attr; /* NFSv4 change attribute */
__u64 pre_change_attr;/* pre-op NFSv4 change attribute */
__u64 pre_size; /* pre_op_attr.size */
- struct timespec pre_mtime; /* pre_op_attr.mtime */
- struct timespec pre_ctime; /* pre_op_attr.ctime */
+ struct inode_time pre_mtime; /* pre_op_attr.mtime */
+ struct inode_time pre_ctime; /* pre_op_attr.ctime */
unsigned long time_start;
unsigned long gencount;
struct nfs4_string *owner_name;
@@ -137,7 +137,7 @@ struct nfs_fsinfo {
__u32 wtmult; /* writes should be multiple of this */
__u32 dtpref; /* pref. readdir transfer size */
__u64 maxfilesize;
- struct timespec time_delta; /* server time granularity */
+ struct inode_time time_delta; /* server time granularity */
__u32 lease_time; /* in seconds */
__u32 layouttype; /* supported pnfs layout driver */
__u32 blksize; /* preferred pnfs io block size */
@@ -745,7 +745,7 @@ struct nfs3_sattrargs {
struct nfs_fh * fh;
struct iattr * sattr;
unsigned int guard;
- struct timespec guardtime;
+ struct inode_time guardtime;
};
struct nfs3_diropargs {
--
1.8.3.2
On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote:
>
> It's an approximation:
(Approximately never ;)
> with 64-bit timestamps, you can represent close to 300 billion
> years, which is way past the time that our planet can sustain
> life of any form[1].
Did you mean mean 64 bits worth of seconds?
2^64 / (3600*24*365) = 584,942,417,355
That is more than 300 billion years, and still, it is not quite the
same as "never".
In any case, that term is not too helpful in the comparison table,
IMHO. One could think that some sort of clever running count relative
to the last mount time was implied.
Thanks,
Richard
[1] You are forgetting the immortal robotic overlords.
On Saturday 31 May 2014 16:51:15 Richard Cochran wrote:
> On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote:
> >
> > I picked this because it is a fairly isolated problem, as the
> > inode time stamps are rarely assigned to any other time values.
> > As a byproduct of this work, I documented for each of the file
> > systems we support how long the on-disk format can work[1].
>
> Why are some of the time stamp expiration dates marked as "never"?
It's an approximation:
with 64-bit timestamps, you can represent close to 300 billion
years, which is way past the time that our planet can sustain
life of any form[1].
Arnd
[1] http://en.wikipedia.org/wiki/Timeline_of_the_far_future
On Sat, May 31, 2014 at 5:23 PM, Arnd Bergmann <[email protected]> wrote:
> On Saturday 31 May 2014 16:51:15 Richard Cochran wrote:
>> On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote:
>> > I picked this because it is a fairly isolated problem, as the
>> > inode time stamps are rarely assigned to any other time values.
>> > As a byproduct of this work, I documented for each of the file
>> > systems we support how long the on-disk format can work[1].
>>
>> Why are some of the time stamp expiration dates marked as "never"?
>
> It's an approximation:
> with 64-bit timestamps, you can represent close to 300 billion
> years, which is way past the time that our planet can sustain
> life of any form[1].
FWIW, the 48-bit second limit of befs marked never happens sooner
than the 32-bit day limit of affs marked as Y11760870.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
On Mon, 2 Jun 2014, Arnd Bergmann wrote:
> Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> may not be as relevant here, except for the one question whether we
> actually want to have the new ABI on all 32-bit architectures or only
> as an opt-in for those that expect to stay around for another 24 years.
For glibc I think it will make the most sense to add the support for
64-bit time_t across all architectures that currently have 32-bit time_t
(with the new interfaces having fallback support to implementation in
terms of the 32-bit kernel interfaces, if the 64-bit syscalls are
unavailable either at runtime or in the kernel headers against which glibc
is compiled - this fallback code will of course need to check for overflow
when passing a time value to the kernel, hopefully with error handling
consistent with whatever the kernel ends up doing when a filesystem can't
support a timestamp). If some architectures don't provide the new
interfaces in the kernel then that will mean the fallback code in glibc
can't be removed until glibc support for those architectures is removed
(as opposed to removing it when glibc no longer supports kernels predating
the kernel support).
> Two more questions for you:
>
> - are you (and others) happy with adding this type of stat syscall
> (fstatat64/fstat64) as opposed to the more generic xstat that has
> been discussed in the past and that never made it through the bike-
> shedding discussion?
I am.
> - once we have enough buy-in from reviewers to merge this initial
> series, should we proceed to define rest of the syscall ABI
> (minus driver ioctls) so glibc and kernel can do the conversion
> on top of that, or should we better try to do things one syscall
> family at a time and actually get the kernel to handle them
> correctly internally?
I don't have any comments on that ordering question.
--
Joseph S. Myers
[email protected]
On Wednesday 04 June 2014 17:10:24 H. Peter Anvin wrote:
> On 06/04/2014 12:24 PM, Arnd Bergmann wrote:
> >
> > For other timekeeping stuff in the kernel, I agree that using some
> > 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
> > ...) has advantages, that's exactly the point I was making earlier
> > against simply extending the internal time_t/timespec to 64-bit
> > seconds for everything.
> >
>
> How much of a performance issue is it to make time_t 64 bits, and for
> the bits there are, how hard are they to fix?
Probably very little overhead for most uses, it's more the regression
potential in the less common parts of the kernel I'm worried about.
There is a significant but not overwhelming number of uses of the
main problematic types in the kernel:
arnd@wuerfel:~/arm-soc$ git grep -wl time_t | wc
188 188 5566
arnd@wuerfel:~/arm-soc$ git grep -wl timeval | wc
320 320 10353
arnd@wuerfel:~/arm-soc$ git grep -wl timespec | wc
406 406 10886
I believe we have to audit all of them anyway if we want to change
the kernel to less problematic types and introduce new user
interfaces.
IMHO this work is helped if we change the uses to a new type
as we find the problems. This lets us do the work one subsystem
at a time and avoid accidental ABI changes. I don't care much what
type that will be, and having a 96-bit type will certainly work
well in a lot of cases, but I don't see a strong reason to use
that over other types, especially when they can be more efficient.
Arnd
On Tue, 3 Jun 2014, Arnd Bergmann wrote:
> I think John Stultz and Thomas Gleixner have already started looking
> at how the timekeeping code can be updated. Once that is done, we should
> be able to add a functional 64-bit gettimeofday/settimeofday syscall
> pair. While I definitely agree this is one of the most basic things to
> have, it's also not an area of the kernel that is easy to change.
64-bit clock_gettime / clock_settime instead of gettimeofday /
settimeofday should avoid the need for the kernel to have a 64-bit version
of struct timeval. (Userspace 64-bit gettimeofday / settimeofday would
need to use a combination of the syscalls if the tz pointer is non-NULL.)
--
Joseph S. Myers
[email protected]
On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
>>
>> The bit that is really going to hurt is every single ioctl that uses a
>> timespec.
>>
>> Honestly, though, I really don't understand the point with "struct
>> inode_time". It seems like the zeroeth-order thing is to change the
>> kernel internal version of struct timespec to have a 64-bit time... it
>> isn't just about inodes. We then should be explicit about the external
>> uses of time, and use accessors.
>
> I picked these because they are fairly isolated from all other uses,
> in particular since inode times are the only things where we really
> care about times in the distant past or future (decades away as opposed
> to things that happened between boot and shutdown).
>
If nothing else, I would expect to be able to set the system time to
weird values for testing. So I'm not so sure I agree with that...
> For other kernel-internal uses, we may be better off migrating to
> a completely different representation, such as nanoseconds since
> boot or the architecture specific ktime_t, but this is really something
> to decide for each subsystem.
Having a bunch of different time representations in the kernel seems
like a real headache...
-hpa
On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote:
> On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> > On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> The possible uses I can see for non-ktime_t types in the kernel are:
> * inodes need 96 bit timestamps to represent the full range of values
> that can be stored in a file system, you made a convincing argument
> for that. Almost everything else can fit into 64 bit on a 32-bit
> kernel, in theory also on a 64-bit kernel if we want that.
Just ot be pedantic, inodes don't *need* 96 bit timestamps - some
filesystems can *support up to* 96 bit timestamps. If the kernel
only supports 64 bit timestamps and that's all the kernel can
represent, then the upper bits of the 96 bit on-disk inode
timestamps simply remain zero.
If you move the filesystem between kernels with different time
ranges, then the filesystem needs to be able to tell the kernel what
it's supported range is. This is where having the VFS limit the
range of supported timestamps is important: the limit is the
min(kernel range, filesystem range). This allows the filesystems
to be indepenent of the kernel time representation, and the kernel
to be independent of the physical filesystem time encoding....
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Tuesday 03 June 2014 14:33:10 Joseph S. Myers wrote:
> On Tue, 3 Jun 2014, Arnd Bergmann wrote:
>
> > I think John Stultz and Thomas Gleixner have already started looking
> > at how the timekeeping code can be updated. Once that is done, we should
> > be able to add a functional 64-bit gettimeofday/settimeofday syscall
> > pair. While I definitely agree this is one of the most basic things to
> > have, it's also not an area of the kernel that is easy to change.
>
> 64-bit clock_gettime / clock_settime instead of gettimeofday /
> settimeofday should avoid the need for the kernel to have a 64-bit version
> of struct timeval. (Userspace 64-bit gettimeofday / settimeofday would
> need to use a combination of the syscalls if the tz pointer is non-NULL.)
Yes, that's what I meant.
Arnd
On 06/02/2014 12:19 PM, Arnd Bergmann wrote:
> On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
>> On Fri, 30 May 2014, Arnd Bergmann wrote:
>>
>>> a) is this the right approach in general? The previous discussion
>>> pointed this way, but there may be other opinions.
>>
>> The syscall changes seem like the sort of thing I'd expect, although
>> patches adding new syscalls or otherwise affecting the kernel/userspace
>> interface (as opposed to those relating to an individual filesystem)
>> should go to linux-api as well as other relevant lists.
>
> Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> may not be as relevant here, except for the one question whether we
> actually want to have the new ABI on all 32-bit architectures or only
> as an opt-in for those that expect to stay around for another 24 years.
>
> Two more questions for you:
>
> - are you (and others) happy with adding this type of stat syscall
> (fstatat64/fstat64) as opposed to the more generic xstat that has
> been discussed in the past and that never made it through the bike-
> shedding discussion?
>
> - once we have enough buy-in from reviewers to merge this initial
> series, should we proceed to define rest of the syscall ABI
> (minus driver ioctls) so glibc and kernel can do the conversion
> on top of that, or should we better try to do things one syscall
> family at a time and actually get the kernel to handle them
> correctly internally?
>
The bit that is really going to hurt is every single ioctl that uses a
timespec.
Honestly, though, I really don't understand the point with "struct
inode_time". It seems like the zeroeth-order thing is to change the
kernel internal version of struct timespec to have a 64-bit time... it
isn't just about inodes. We then should be explicit about the external
uses of time, and use accessors.
-hpa
On Fri, 30 May 2014, Arnd Bergmann wrote:
> a) is this the right approach in general? The previous discussion
> pointed this way, but there may be other opinions.
The syscall changes seem like the sort of thing I'd expect, although
patches adding new syscalls or otherwise affecting the kernel/userspace
interface (as opposed to those relating to an individual filesystem)
should go to linux-api as well as other relevant lists.
--
Joseph S. Myers
[email protected]
On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
> On Fri, 30 May 2014, Arnd Bergmann wrote:
>
> > a) is this the right approach in general? The previous discussion
> > pointed this way, but there may be other opinions.
>
> The syscall changes seem like the sort of thing I'd expect, although
> patches adding new syscalls or otherwise affecting the kernel/userspace
> interface (as opposed to those relating to an individual filesystem)
> should go to linux-api as well as other relevant lists.
Ok. Sorry about missing linux-api, I confused it with linux-arch, which
may not be as relevant here, except for the one question whether we
actually want to have the new ABI on all 32-bit architectures or only
as an opt-in for those that expect to stay around for another 24 years.
Two more questions for you:
- are you (and others) happy with adding this type of stat syscall
(fstatat64/fstat64) as opposed to the more generic xstat that has
been discussed in the past and that never made it through the bike-
shedding discussion?
- once we have enough buy-in from reviewers to merge this initial
series, should we proceed to define rest of the syscall ABI
(minus driver ioctls) so glibc and kernel can do the conversion
on top of that, or should we better try to do things one syscall
family at a time and actually get the kernel to handle them
correctly internally?
Arnd
On Monday 02 June 2014, Joseph S. Myers wrote:
> On Mon, 2 Jun 2014, Arnd Bergmann wrote:
>
> > Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> > may not be as relevant here, except for the one question whether we
> > actually want to have the new ABI on all 32-bit architectures or only
> > as an opt-in for those that expect to stay around for another 24 years.
>
> For glibc I think it will make the most sense to add the support for
> 64-bit time_t across all architectures that currently have 32-bit time_t
> (with the new interfaces having fallback support to implementation in
> terms of the 32-bit kernel interfaces, if the 64-bit syscalls are
> unavailable either at runtime or in the kernel headers against which glibc
> is compiled - this fallback code will of course need to check for overflow
> when passing a time value to the kernel, hopefully with error handling
> consistent with whatever the kernel ends up doing when a filesystem can't
> support a timestamp). If some architectures don't provide the new
> interfaces in the kernel then that will mean the fallback code in glibc
> can't be removed until glibc support for those architectures is removed
> (as opposed to removing it when glibc no longer supports kernels predating
> the kernel support).
Ok, that's a good reason to just provide the new interfaces on all
architectures right away. Thanks for the insight!
Arnd
On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote:
> On Saturday 31 May 2014 16:51:15 Richard Cochran wrote:
> >
> > Why are some of the time stamp expiration dates marked as "never"?
>
> It's an approximation:
Also, the term "never" might mean using arbitrarily long integers
as in ASN.1.
Thanks,
Richard
On Wed, 4 Jun 2014, Arnd Bergmann wrote:
> On Tuesday 03 June 2014, Dave Chinner wrote:
> > Just ot be pedantic, inodes don't need 96 bit timestamps - some
> > filesystems can *support up to* 96 bit timestamps. If the kernel
> > only supports 64 bit timestamps and that's all the kernel can
> > represent, then the upper bits of the 96 bit on-disk inode
> > timestamps simply remain zero.
>
> I meant the reverse: since we have file systems that can store
> 96-bit timestamps when using 64-bit kernels, we need to extend
> 32-bit kernels to have the same internal representation so we
> can actually read those file systems correctly.
>
> > If you move the filesystem between kernels with different time
> > ranges, then the filesystem needs to be able to tell the kernel what
> > it's supported range is. This is where having the VFS limit the
> > range of supported timestamps is important: the limit is the
> > min(kernel range, filesystem range). This allows the filesystems
> > to be indepenent of the kernel time representation, and the kernel
> > to be independent of the physical filesystem time encoding....
>
> I agree it makes sense to let the kernel know about the limits
> of the file system it accesses, but for the reverse, we're probably
> better off just making the kernel representation large enough (i.e.
> 96 bits) so it can work with any known file system.
Depends... 96 bit handling may get prohibitive on 32-bit archs.
The important point here is for the kernel to be able to represent the
time _range_ used by any known filesystem, not necessarily the time
_precision_.
For example, a 64 bit representation can be made of 40 bits for seconds
spanning 34865 years, and 24 bits for fractional seconds providing
precision down to 60 nanosecs. That ought to be plenty good on 32 bit
systems while still being cheap to handle.
Nicolas
On Wednesday 04 June 2014 13:30:32 Nicolas Pitre wrote:
> On Wed, 4 Jun 2014, Arnd Bergmann wrote:
>
> > On Tuesday 03 June 2014, Dave Chinner wrote:
> > > Just ot be pedantic, inodes don't need 96 bit timestamps - some
> > > filesystems can *support up to* 96 bit timestamps. If the kernel
> > > only supports 64 bit timestamps and that's all the kernel can
> > > represent, then the upper bits of the 96 bit on-disk inode
> > > timestamps simply remain zero.
> >
> > I meant the reverse: since we have file systems that can store
> > 96-bit timestamps when using 64-bit kernels, we need to extend
> > 32-bit kernels to have the same internal representation so we
> > can actually read those file systems correctly.
> >
> > > If you move the filesystem between kernels with different time
> > > ranges, then the filesystem needs to be able to tell the kernel what
> > > it's supported range is. This is where having the VFS limit the
> > > range of supported timestamps is important: the limit is the
> > > min(kernel range, filesystem range). This allows the filesystems
> > > to be indepenent of the kernel time representation, and the kernel
> > > to be independent of the physical filesystem time encoding....
> >
> > I agree it makes sense to let the kernel know about the limits
> > of the file system it accesses, but for the reverse, we're probably
> > better off just making the kernel representation large enough (i.e.
> > 96 bits) so it can work with any known file system.
>
> Depends... 96 bit handling may get prohibitive on 32-bit archs.
>
> The important point here is for the kernel to be able to represent the
> time _range_ used by any known filesystem, not necessarily the time
> _precision_.
>
> For example, a 64 bit representation can be made of 40 bits for seconds
> spanning 34865 years, and 24 bits for fractional seconds providing
> precision down to 60 nanosecs. That ought to be plenty good on 32 bit
> systems while still being cheap to handle.
I have checked earlier that we don't do any computation on inode
time stamps in common code, we just pass them around, so there is
very little runtime overhead. There is a small bit of space overhead
(12 byte) per inode, but that structure is already on the order of
500 bytes.
For other timekeeping stuff in the kernel, I agree that using some
64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
...) has advantages, that's exactly the point I was making earlier
against simply extending the internal time_t/timespec to 64-bit
seconds for everything.
Arnd
On 06/04/2014 12:24 PM, Arnd Bergmann wrote:
>
> For other timekeeping stuff in the kernel, I agree that using some
> 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
> ...) has advantages, that's exactly the point I was making earlier
> against simply extending the internal time_t/timespec to 64-bit
> seconds for everything.
>
How much of a performance issue is it to make time_t 64 bits, and for
the bits there are, how hard are they to fix?
-hpa
On Sat, May 31, 2014 at 12:34:12PM -0700, H. Peter Anvin wrote:
> Typically they are using 64-bit signed seconds.
Okay, that is what I wanted to know.
Thanks,
Richard
On Tuesday 03 June 2014, Dave Chinner wrote:
> On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote:
> > On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> > > On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> > The possible uses I can see for non-ktime_t types in the kernel are:
> > * inodes need 96 bit timestamps to represent the full range of values
> > that can be stored in a file system, you made a convincing argument
> > for that. Almost everything else can fit into 64 bit on a 32-bit
> > kernel, in theory also on a 64-bit kernel if we want that.
>
> Just ot be pedantic, inodes don't need 96 bit timestamps - some
> filesystems can *support up to* 96 bit timestamps. If the kernel
> only supports 64 bit timestamps and that's all the kernel can
> represent, then the upper bits of the 96 bit on-disk inode
> timestamps simply remain zero.
I meant the reverse: since we have file systems that can store
96-bit timestamps when using 64-bit kernels, we need to extend
32-bit kernels to have the same internal representation so we
can actually read those file systems correctly.
> If you move the filesystem between kernels with different time
> ranges, then the filesystem needs to be able to tell the kernel what
> it's supported range is. This is where having the VFS limit the
> range of supported timestamps is important: the limit is the
> min(kernel range, filesystem range). This allows the filesystems
> to be indepenent of the kernel time representation, and the kernel
> to be independent of the physical filesystem time encoding....
I agree it makes sense to let the kernel know about the limits
of the file system it accesses, but for the reverse, we're probably
better off just making the kernel representation large enough (i.e.
96 bits) so it can work with any known file system. We need another
check at the user space boundary to turn that into a value that the
user can understand, but that's another problem.
Arnd
On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> >>
> >> The bit that is really going to hurt is every single ioctl that uses a
> >> timespec.
> >>
> >> Honestly, though, I really don't understand the point with "struct
> >> inode_time". It seems like the zeroeth-order thing is to change the
> >> kernel internal version of struct timespec to have a 64-bit time... it
> >> isn't just about inodes. We then should be explicit about the external
> >> uses of time, and use accessors.
> >
> > I picked these because they are fairly isolated from all other uses,
> > in particular since inode times are the only things where we really
> > care about times in the distant past or future (decades away as opposed
> > to things that happened between boot and shutdown).
> >
>
> If nothing else, I would expect to be able to set the system time to
> weird values for testing. So I'm not so sure I agree with that...
I think John Stultz and Thomas Gleixner have already started looking
at how the timekeeping code can be updated. Once that is done, we should
be able to add a functional 64-bit gettimeofday/settimeofday syscall
pair. While I definitely agree this is one of the most basic things to
have, it's also not an area of the kernel that is easy to change.
> > For other kernel-internal uses, we may be better off migrating to
> > a completely different representation, such as nanoseconds since
> > boot or the architecture specific ktime_t, but this is really something
> > to decide for each subsystem.
>
> Having a bunch of different time representations in the kernel seems
> like a real headache...
We already have time_t, ktime_t, timeval, timespec, compat_timespec,
clock_t, cputime_t, cputime64_t, tm, nanoseconds, jiffies, jiffies64,
and lots of driver or file system specific representations. I'm all for
removing a bunch of these from the kernel, but my feeling is that this is
one of the cases where we first have to add new ones in order to remove
those that are already there.
To complicate things further, we also have various times bases
(realtime/utc, realtime/tai, monotonic, monotonic_raw, boottime, ...),
and at least for the timespec values we pass around, it's not always
obvious which one is used, of if that's the right one.
We probably don't want to add a lot of new representations, and it's
possible that we can change most of the internal code we have to
ktime_t and then convert that to whatever user space wants at the
interfaces.
The possible uses I can see for non-ktime_t types in the kernel are:
* inodes need 96 bit timestamps to represent the full range of values
that can be stored in a file system, you made a convincing argument
for that. Almost everything else can fit into 64 bit on a 32-bit
kernel, in theory also on a 64-bit kernel if we want that.
* A number of interfaces pass relative timespecs: nanosleep(), poll(),
select(), sigtimedwait(), alarm(), futex() and probably more. There is
nothing wrong with the use of timespec here, and it may be good to
annotate that by using a new type (e.g. struct timeout) that is defined
as compatible with the current timespec.
* For new user interfaces, we need a new type such as the
__kernel_timespec64 I introduced, so it doesn't clash with the normal
user timespec that may be smaller, depending on the libc.
* A lot of drivers will need new ioctl commands, and for drivers that
just need time stamps (audio, v4l, sockets, ...) it may be more
efficient and more correct to use a new timestamp_t (e.g. boot time
64-bit nanoseconds) than __kernel_timespec64, which is not normally
monotonic and requires a normalization step. If we end up introducing
such a type in the user interface, we can also start using it in the
kernel.
Arnd
On Saturday 31 May 2014 18:30:49 Vyacheslav Dubeyko wrote:
> By the way, what about NILFS2? Is NILFS2 ready for suggested approach
> without any changes?
nilfs2 and a lot of other file systems don't need any changes for
this, because they don't assign the inode time stamp fields to
a 'struct timespec'.
FWIW, nilfs2 uses a 64-bit seconds value, which is always safe and
can represent the full range of user space timespec on all machines.
Arnd
On Monday 02 June 2014 12:26:22 H. Peter Anvin wrote:
> On 06/02/2014 12:19 PM, Arnd Bergmann wrote:
> > On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
> >> On Fri, 30 May 2014, Arnd Bergmann wrote:
> >>
> >>> a) is this the right approach in general? The previous discussion
> >>> pointed this way, but there may be other opinions.
> >>
> >> The syscall changes seem like the sort of thing I'd expect, although
> >> patches adding new syscalls or otherwise affecting the kernel/userspace
> >> interface (as opposed to those relating to an individual filesystem)
> >> should go to linux-api as well as other relevant lists.
> >
> > Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> > may not be as relevant here, except for the one question whether we
> > actually want to have the new ABI on all 32-bit architectures or only
> > as an opt-in for those that expect to stay around for another 24 years.
> >
> > Two more questions for you:
> >
> > - are you (and others) happy with adding this type of stat syscall
> > (fstatat64/fstat64) as opposed to the more generic xstat that has
> > been discussed in the past and that never made it through the bike-
> > shedding discussion?
> >
> > - once we have enough buy-in from reviewers to merge this initial
> > series, should we proceed to define rest of the syscall ABI
> > (minus driver ioctls) so glibc and kernel can do the conversion
> > on top of that, or should we better try to do things one syscall
> > family at a time and actually get the kernel to handle them
> > correctly internally?
> >
>
> The bit that is really going to hurt is every single ioctl that uses a
> timespec.
>
> Honestly, though, I really don't understand the point with "struct
> inode_time". It seems like the zeroeth-order thing is to change the
> kernel internal version of struct timespec to have a 64-bit time... it
> isn't just about inodes. We then should be explicit about the external
> uses of time, and use accessors.
I picked these because they are fairly isolated from all other uses,
in particular since inode times are the only things where we really
care about times in the distant past or future (decades away as opposed
to things that happened between boot and shutdown).
For other kernel-internal uses, we may be better off migrating to
a completely different representation, such as nanoseconds since
boot or the architecture specific ktime_t, but this is really something
to decide for each subsystem.
I just tried building an arm32 kernel with a s64 time_t, and that
failed horribly, I get linker errors for missing 64-bit divides
and lots of warnings for code that expects time_t pointers to
functions taking a 'long' or vice versa. I also think the only
way to maintain ABI compatibility is to separate the internal uses
from the interface, which means auditing all code in the end.
Arnd