2014-06-01 19:57:49

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > readonly if not in reality than in practice.
>
> For those (legacy) filesystems with a signed 32-bit timestamps, any
> attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
> (silently) clamped to 0x7fffffff and that value (the last representable
> time) used as an overflow indicator. The filesystem driver should
> convert that value into a corresponding overflow value for whatever
> kernel internal time representation being used when read back, and this
> should be propagated up to user space. It should not be a hard error
> otherwise, as you rightfully stated, everything non read-only would come
> to a halt on that day.

I don't think there is much of a difference between not being able to
write at all and all newly written files having the same timestamp,
causing random things to break differently.

The clamp to the maximum supported time stamp sounds like a reasonable
choice for 'utimens' and related syscalls for the case of someone
setting an arbitrary future date beyond what the file system can
represent. Then again, I don't see a reason why that shouldn't just
cause an error to be returned.

For actually running kernels beyond 2038, the best idea I've seen so
far is to disallow all broken code at compile time. I don't see
a choice but to audit the entire kernel for invalid uses on both
32 and 64 bit in the next few years. A lot of code will get changed
in the process so we can actually keep running 32-bit kernels and
file systems, but other code will likely go away:

* any system calls that pass a time_t, timeval or timespec on
32-bit systems return -ENOSYS, to ensure all user land uses
the replacements we will put into place
* The definition of 'time_t', 'timval' and 'timespec' can be hidden
from the kernel, and all code using it left out.
* ext2 and ext3 file system code will have to be disabled, but that's
file since ext4 can mount old file systems.
* until xfs gets extended, we can also disiable it at build time.

For most users, we probably want to leave all that enabled by
default until we get much closer to 2038, but a compile time
option should allow us to test what works or doesn't, and it
can be set by embedded developers that want to ensure their
code keeps running for the next few decades.

Arnd


2014-06-01 20:29:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

Perhaps we should make this a kernel command line option instead, with the settings: error out on outside the standard window, or a date indicating the earliest date that should be recognized and do windowing (0 for no windowing, 1970 for retconning the Unix epoch as unsigned...)

But again, the kernel is probably the least problem here...

On June 1, 2014 12:56:52 PM PDT, Arnd Bergmann <[email protected]> wrote:
>On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
>> > readonly if not in reality than in practice.
>>
>> For those (legacy) filesystems with a signed 32-bit timestamps, any
>> attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
>
>> (silently) clamped to 0x7fffffff and that value (the last
>representable
>> time) used as an overflow indicator. The filesystem driver should
>> convert that value into a corresponding overflow value for whatever
>> kernel internal time representation being used when read back, and
>this
>> should be propagated up to user space. It should not be a hard error
>
>> otherwise, as you rightfully stated, everything non read-only would
>come
>> to a halt on that day.
>
>I don't think there is much of a difference between not being able to
>write at all and all newly written files having the same timestamp,
>causing random things to break differently.
>
>The clamp to the maximum supported time stamp sounds like a reasonable
>choice for 'utimens' and related syscalls for the case of someone
>setting an arbitrary future date beyond what the file system can
>represent. Then again, I don't see a reason why that shouldn't just
>cause an error to be returned.
>
>For actually running kernels beyond 2038, the best idea I've seen so
>far is to disallow all broken code at compile time. I don't see
>a choice but to audit the entire kernel for invalid uses on both
>32 and 64 bit in the next few years. A lot of code will get changed
>in the process so we can actually keep running 32-bit kernels and
>file systems, but other code will likely go away:
>
>* any system calls that pass a time_t, timeval or timespec on
> 32-bit systems return -ENOSYS, to ensure all user land uses
> the replacements we will put into place
>* The definition of 'time_t', 'timval' and 'timespec' can be hidden
> from the kernel, and all code using it left out.
>* ext2 and ext3 file system code will have to be disabled, but that's
> file since ext4 can mount old file systems.
>* until xfs gets extended, we can also disiable it at build time.
>
>For most users, we probably want to leave all that enabled by
>default until we get much closer to 2038, but a compile time
>option should allow us to test what works or doesn't, and it
>can be set by embedded developers that want to ensure their
>code keeps running for the next few decades.
>
> Arnd

--
Sent from my mobile phone. Please pardon brevity and lack of formatting.

2014-06-02 01:36:30

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Sun, 1 Jun 2014, Arnd Bergmann wrote:

> On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > > readonly if not in reality than in practice.
> >
> > For those (legacy) filesystems with a signed 32-bit timestamps, any
> > attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
> > (silently) clamped to 0x7fffffff and that value (the last representable
> > time) used as an overflow indicator. The filesystem driver should
> > convert that value into a corresponding overflow value for whatever
> > kernel internal time representation being used when read back, and this
> > should be propagated up to user space. It should not be a hard error
> > otherwise, as you rightfully stated, everything non read-only would come
> > to a halt on that day.
>
> I don't think there is much of a difference between not being able to
> write at all and all newly written files having the same timestamp,
> causing random things to break differently.

Well, in one case you have a crash certitude. In the other case you have
some probability that your system might still be usable.

> The clamp to the maximum supported time stamp sounds like a reasonable
> choice for 'utimens' and related syscalls for the case of someone
> setting an arbitrary future date beyond what the file system can
> represent. Then again, I don't see a reason why that shouldn't just
> cause an error to be returned.

Resiliance is better than outright failure.

> For actually running kernels beyond 2038, the best idea I've seen so
> far is to disallow all broken code at compile time. I don't see
> a choice but to audit the entire kernel for invalid uses on both
> 32 and 64 bit in the next few years. A lot of code will get changed
> in the process so we can actually keep running 32-bit kernels and
> file systems, but other code will likely go away:
>
> * any system calls that pass a time_t, timeval or timespec on
> 32-bit systems return -ENOSYS, to ensure all user land uses
> the replacements we will put into place
> * The definition of 'time_t', 'timval' and 'timespec' can be hidden
> from the kernel, and all code using it left out.
> * ext2 and ext3 file system code will have to be disabled, but that's
> file since ext4 can mount old file systems.

Syscalls and libs can be "fixed". Existing filesystem content might
not. So if you need to mount some old media in read-write mode after
2038 and that happens to content an ext2 or similarly limited filesystem
then it'd better just "work". Having the kernel refuse to modify the
filesystem would be unacceptable.


Nicolas

2014-06-02 02:22:37

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Sun, Jun 01, 2014 at 09:36:26PM -0400, Nicolas Pitre wrote:
> On Sun, 1 Jun 2014, Arnd Bergmann wrote:
> > On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > For actually running kernels beyond 2038, the best idea I've seen so
> > far is to disallow all broken code at compile time. I don't see
> > a choice but to audit the entire kernel for invalid uses on both
> > 32 and 64 bit in the next few years. A lot of code will get changed
> > in the process so we can actually keep running 32-bit kernels and
> > file systems, but other code will likely go away:
> >
> > * any system calls that pass a time_t, timeval or timespec on
> > 32-bit systems return -ENOSYS, to ensure all user land uses
> > the replacements we will put into place
> > * The definition of 'time_t', 'timval' and 'timespec' can be hidden
> > from the kernel, and all code using it left out.
> > * ext2 and ext3 file system code will have to be disabled, but that's
> > file since ext4 can mount old file systems.
>
> Syscalls and libs can be "fixed". Existing filesystem content might
> not. So if you need to mount some old media in read-write mode after
> 2038 and that happens to content an ext2 or similarly limited filesystem
> then it'd better just "work". Having the kernel refuse to modify the
> filesystem would be unacceptable.

We can already tell the VFS/filesystems not to update timestamps:

inode->i_flags |= S_NOATIME | S_NOCMTIME;

Just enforce that everywhere (i.e. notify_change()) rather than just
on the IO path and the "legacy filesystem timestamp" problem is
"solved".

New interfaces need to return errors when an out-of-range parameter
is set. And right now, >epoch dates are out of range for most
filesystems, and so we need to handle that condition appropriately.
Silent date overflow == filesystem corruption, and as such I'm going
to error out such conditions in the filesystem regardless of what
the userspace API says.

Filesystems place all sorts of userspace visible limits on storage -
ever tried to create a file >16TB on ext4? The on-disk format
doesn't support it, so it returns an out of range error (E2BIG, I
think) if you try. XFS, OTOH, handles this just fine and so it
continues to work. It's exactly the same with timestamps - there's a
physical limit to what can sanely be stored in any given filesystem
and it's an *error condition* to go beyond that limit....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-06-02 07:09:28

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Mon, Jun 2, 2014 at 4:22 AM, Dave Chinner <[email protected]> wrote:
> Filesystems place all sorts of userspace visible limits on storage -
> ever tried to create a file >16TB on ext4? The on-disk format
> doesn't support it, so it returns an out of range error (E2BIG, I
> think) if you try. XFS, OTOH, handles this just fine and so it
> continues to work. It's exactly the same with timestamps - there's a
> physical limit to what can sanely be stored in any given filesystem
> and it's an *error condition* to go beyond that limit....

This comparison doesn't fly.
File sizes do not depend on the current time (except for the increase of
megapixels in your new camera ;-).
Writing a 15 GiB file to ext4 is not something that magically stops working
tomorrow.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2014-06-02 10:58:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote:
>
> > For actually running kernels beyond 2038, the best idea I've seen so
> > far is to disallow all broken code at compile time. I don't see
> > a choice but to audit the entire kernel for invalid uses on both
> > 32 and 64 bit in the next few years. A lot of code will get changed
> > in the process so we can actually keep running 32-bit kernels and
> > file systems, but other code will likely go away:
> >
> > * any system calls that pass a time_t, timeval or timespec on
> > 32-bit systems return -ENOSYS, to ensure all user land uses
> > the replacements we will put into place
> > * The definition of 'time_t', 'timval' and 'timespec' can be hidden
> > from the kernel, and all code using it left out.
> > * ext2 and ext3 file system code will have to be disabled, but that's
> > file since ext4 can mount old file systems.
>
> Syscalls and libs can be "fixed". Existing filesystem content might
> not. So if you need to mount some old media in read-write mode after
> 2038 and that happens to content an ext2 or similarly limited filesystem
> then it'd better just "work". Having the kernel refuse to modify the
> filesystem would be unacceptable.

I think you misunderstood what I suggested: the intent is to avoid
seeing things break in 2038 by making them break much earlier. We have
a solution for ext2 file systems, it's called ext4, and we just need
to ensure that everybody knows they have to migrate eventually.

At some point before the mid 2030ies, you should no longer be able to
build a kernel that has support for ext2 or any other module that will
run into bugs later. Until then (rather sooner than later), I'd like
to get to the point where you can choose whether to include those
modules at build time or not, and then get everybody to turn off that
option and fix the bugs they run into. You wouldn't need that for a
2014-generation long-term support disto (rhel 7, sles 12, debian 7,
ubuntu 14.04, ...), but perhaps for the next generation, or the
one after that.

Arnd

2014-06-02 11:02:28

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Sunday 01 June 2014 13:26:03 H. Peter Anvin wrote:
> Perhaps we should make this a kernel command line option instead, with the
> settings: error out on outside the standard window, or a date indicating the
> earliest date that should be recognized and do windowing (0 for no windowing,
> 1970 for retconning the Unix epoch as unsigned...)

What's wrong with compile-time errors? We have a pretty good understanding
of how time values are passed in the kernel, and we know they will all break
in 2038 for 32-bit kernels unless we do something about it.

> But again, the kernel is probably the least problem here...

I agree the glibc side is harder than this, but we have to get the kernel
into shape first (at the minimum we have to do the APIs), and there is enough
work to do here.

Arnd

2014-06-02 11:58:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
>
> I think you misunderstood what I suggested: the intent is to avoid
> seeing things break in 2038 by making them break much earlier. We have
> a solution for ext2 file systems, it's called ext4, and we just need
> to ensure that everybody knows they have to migrate eventually.
>
> At some point before the mid 2030ies, you should no longer be able to
> build a kernel that has support for ext2 or any other module that will
> run into bugs later....

Even for ext4, it's not quite so simple as that. You only have
support for times post 2038 if you are using an inode size > 128
bytes. There are a very, very large number of machines which even
today, are using 128 byte inodes with ext4 for performance reasons.

The vast majority of those machines which I know of can probably move
to 256 byte inodes relatively easily, since hard drive replacement
cycles are order 5-6 years tops, so I'm not that concerned, but it
just goes to show this is a very complicated problem.

And even if we're talking about flash and embedded devices, the good
news is if you assume that 10 years is enough time for people to
update their embedded OS builds, and that the vast majority of
deployed devices will probably only be in service for 10-15 years, we
do have enough time to make file system format changes, although
admittedly we can't afford to dilly-dally.

Regards,

- Ted

2014-06-02 12:40:13

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
> >
> > I think you misunderstood what I suggested: the intent is to avoid
> > seeing things break in 2038 by making them break much earlier. We have
> > a solution for ext2 file systems, it's called ext4, and we just need
> > to ensure that everybody knows they have to migrate eventually.
> >
> > At some point before the mid 2030ies, you should no longer be able to
> > build a kernel that has support for ext2 or any other module that will
> > run into bugs later....
>
> Even for ext4, it's not quite so simple as that. You only have
> support for times post 2038 if you are using an inode size > 128
> bytes. There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
>
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.

Ok, I see.

I also now noticed this comment above EXT4_FITS_IN_INODE():

"For new inodes we always reserve enough space for the kernel's known
extended fields, but for inodes created with an old kernel this might
not have been the case. None of the extended inode fields is critical
for correct filesystem operation."

Do we have to worry about this for inodes that contain extended
attributes and that get updated after 2038?

Arnd

2014-06-02 12:52:49

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
> >
> > I think you misunderstood what I suggested: the intent is to avoid
> > seeing things break in 2038 by making them break much earlier. We have
> > a solution for ext2 file systems, it's called ext4, and we just need
> > to ensure that everybody knows they have to migrate eventually.
> >
> > At some point before the mid 2030ies, you should no longer be able to
> > build a kernel that has support for ext2 or any other module that will
> > run into bugs later....
>
> Even for ext4, it's not quite so simple as that. You only have
> support for times post 2038 if you are using an inode size > 128
> bytes. There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
>
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.

One stupid question about the current code:

static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra)
{
if (sizeof(time->tv_sec) > 4)
time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
<< 32;
time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
}

#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode) \
do { \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \
(einode)->xtime.tv_sec = \
(signed)le32_to_cpu((raw_inode)->xtime); \
else \
(einode)->xtime.tv_sec = 0; \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra)) \
ext4_decode_extra_time(&(einode)->xtime, \
raw_inode->xtime ## _extra); \
else \
(einode)->xtime.tv_nsec = 0; \
} while (0)

For a time between 2038 and 2106, this looks like xtime.tv_sec is
negative when ext4_decode_extra_time gets called, so the '|=' operator
doesn't actually do anything. Shouldn't that be '+='?

Arnd

2014-06-02 13:07:27

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

Yes, there are some ongoing dicussions about changing the post-2038
encoding of the timestamp in ext4, which is why this hasn't been fixed
yet. The main thing that's been missing is time for me to review the
patches, and a good way of writing regression tests that will work (or
at least not fail) on build environments with a 32-bit time_t and
32-bit-only capable versions of functions such as gmtime(3).

And given current discussions, I may want to think about some kind of
superblock flag to allow the use of a 32-bit unsigned encoding for
file systems using a 128-byte inode, with a way of setting that flag
after scanning the file system to make sure there are no times that
are previous to January 1, 1970. (Or more generally, allow any epoch
to be defined using a 64-bit time_t offset stored in the superblock...)

Cheers,

- Ted

2014-06-02 13:16:08

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Mon, Jun 02, 2014 at 02:38:09PM +0200, Arnd Bergmann wrote:
>
> "For new inodes we always reserve enough space for the kernel's known
> extended fields, but for inodes created with an old kernel this might
> not have been the case. None of the extended inode fields is critical
> for correct filesystem operation."
>
> Do we have to worry about this for inodes that contain extended
> attributes and that get updated after 2038?

In practice, the extended timestamps was one of the first things added
to ext4, so the vast majority of ext4 file systems with inode sizes >
128 bytes will have room for the extended timestamps. There are some
legacy ext3 file systems with 256-byte inodes (enabled for fast
sotrage of SELinux xattrs) that in theory, could have been converted
to ext4 and had enough xattrs so that the extended timestamps couldn't
be added. That would be a vanishingly small use case, and in
practice, it's not likely to be the case for the embedded market.

I could imagine someone worrying about file systems originally
formatted using RHEL 4 post-2038 (perhaps running in a VM), but I
don't work for IBM any more, and hopefully even IBM would just tell
such customers that they need to suck it up, and do a
backup/reformat/restore pass.

Cheers,
- Ted

2014-06-02 14:56:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time


> On Jun 2, 2014, at 4:57, "Theodore Ts'o" <[email protected]> wrote:
>
>> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
>>
>> I think you misunderstood what I suggested: the intent is to avoid
>> seeing things break in 2038 by making them break much earlier. We have
>> a solution for ext2 file systems, it's called ext4, and we just need
>> to ensure that everybody knows they have to migrate eventually.
>>
>> At some point before the mid 2030ies, you should no longer be able to
>> build a kernel that has support for ext2 or any other module that will
>> run into bugs later....
>
> Even for ext4, it's not quite so simple as that. You only have
> support for times post 2038 if you are using an inode size > 128
> bytes. There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
>
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.
>
> And even if we're talking about flash and embedded devices, the good
> news is if you assume that 10 years is enough time for people to
> update their embedded OS builds, and that the vast majority of
> deployed devices will probably only be in service for 10-15 years, we
> do have enough time to make file system format changes, although
> admittedly we can't afford to dilly-dally.

I have a number of file systems older than any device they are sitting on. RAID allows individual disks to be swapped out, and when all disks have been swapped out, extend the file system online. The system doesn't even have to be taken offline in the process if it is possible to physically get to the drives with the system powered (e.g. hot plug bays), which is really damned nice.-

2014-06-02 15:03:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Monday 02 June 2014 09:07:00 Theodore Ts'o wrote:
> Yes, there are some ongoing dicussions about changing the post-2038
> encoding of the timestamp in ext4, which is why this hasn't been fixed
> yet. The main thing that's been missing is time for me to review the
> patches, and a good way of writing regression tests that will work (or
> at least not fail) on build environments with a 32-bit time_t and
> 32-bit-only capable versions of functions such as gmtime(3).
>
> And given current discussions, I may want to think about some kind of
> superblock flag to allow the use of a 32-bit unsigned encoding for
> file systems using a 128-byte inode, with a way of setting that flag
> after scanning the file system to make sure there are no times that
> are previous to January 1, 1970. (Or more generally, allow any epoch
> to be defined using a 64-bit time_t offset stored in the superblock...)

FWIW, I've gone through the other file system implementations once
more. The most common pattern I've encountered is to have a read_inode
function with

inode->i_mtime = le32_to_cpu(raw_inode->mtime);

which results in interpreting the time as 'signed' on 32-bit
kernels, but as 'unsigned' on 64-bit kernels. This could have been
done intentionally to extend the valid time range to 2106 on 64-bit
kernels, but it seems more likely that the code was written with
no thought given to 64-bit time_t at all. I see this pattern on
p9fs (old protocol only), afs, bfs, ceph, efs, freevxfs, hpfs, jffs2,
jfs, minix, nfsv2/v3 (this was clearly intentional and is
spelled out in the RFC), qnx4, qnx6, reiserfs, squashfs, sysv,
and ufs (protocol version 1 only).

The other behavior I see is to treat the on-disk 32-bit value
as signed on both 32-bit and 64-bit kernels:

inode->i_mtime = (signed)le32_to_cpu(raw_inode->mtime);

this seems to be done intentionally in all cases, to maintain
compatibility between 32-bit and 64-bit kernels, but it's
relatively rare: exofs, ext2/3/4 (good old inodes) and xfs
are the only ones doing this.

In case of ext2/3/4, the sign handlign was introduced here:
http://www.spinics.net/lists/linux-ext4/msg01758.html

exofs and xfs seem to have done it like this for all of git
history.

Arnd

2014-06-02 15:05:27

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time


On Jun 2, 2014, at 6:56 AM, Arnd Bergmann <[email protected]> wrote:

> On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote:
>>
>>> For actually running kernels beyond 2038, the best idea I've seen so
>>> far is to disallow all broken code at compile time. I don't see
>>> a choice but to audit the entire kernel for invalid uses on both
>>> 32 and 64 bit in the next few years. A lot of code will get changed
>>> in the process so we can actually keep running 32-bit kernels and
>>> file systems, but other code will likely go away:
>>>
>>> * any system calls that pass a time_t, timeval or timespec on
>>> 32-bit systems return -ENOSYS, to ensure all user land uses
>>> the replacements we will put into place
>>> * The definition of 'time_t', 'timval' and 'timespec' can be hidden
>>> from the kernel, and all code using it left out.
>>> * ext2 and ext3 file system code will have to be disabled, but that's
>>> file since ext4 can mount old file systems.
>>
>> Syscalls and libs can be "fixed". Existing filesystem content might
>> not. So if you need to mount some old media in read-write mode after
>> 2038 and that happens to content an ext2 or similarly limited filesystem
>> then it'd better just "work". Having the kernel refuse to modify the
>> filesystem would be unacceptable.
>
> I think you misunderstood what I suggested: the intent is to avoid
> seeing things break in 2038 by making them break much earlier. We have
> a solution for ext2 file systems, it's called ext4, and we just need
> to ensure that everybody knows they have to migrate eventually.
>
> At some point before the mid 2030ies, you should no longer be able to
> build a kernel that has support for ext2 or any other module that will
> run into bugs later. Until then (rather sooner than later), I'd like
> to get to the point where you can choose whether to include those
> modules at build time or not, and then get everybody to turn off that
> option and fix the bugs they run into. You wouldn't need that for a
> 2014-generation long-term support disto (rhel 7, sles 12, debian 7,
> ubuntu 14.04, ...), but perhaps for the next generation, or the
> one after that.

I?m wondering what should be done about NFS. A solution for NFS should
match any scheme that is considered for local file systems, IMO.

NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
(See the definition of nfstime3 in RFC 1813).

NFSv4 uses a signed 64-bit value where zero represents midnight UTC
on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
the definition of nfstime4 in RFC 5661).

The NFSv4 protocol is probably not problematic, and NFSv3 should be out
of the picture by 2038. But if changes are planned for dealing _now_
with timestamp issues, compatibility with NFSv3 is a consideration.

It is already the case that, via NFSv3, the Linux NFS client transmits
timestamps earlier than 1970 as large positive numbers. Try this with
xfstests generic/258.

Maybe nfs3_proc_setattr() should recognize pre-epoch timestamps and
timestamps larger than can be represented in an unsigned 32-bit field
and return an immediate error to the requesting application (like EINVAL).

If the Linux NFS server encounters a local file with a timestamp that
cannot be represented via a u32, should it also return NFS3ERR_INVAL?

RFC 1813 does not provide guidance on the behavior nor does it suggest
a particular error status code. The Solaris 11 server appears to return
NFS3ERR_INVAL in this case.

An alternative would be to ?cap? the timestamps transmitted via NFSv3 by
Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
timestamp is transmitted as UINT_MAX.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com


2014-06-02 15:31:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Mon, Jun 02, 2014 at 11:04:23AM -0400, Chuck Lever wrote:
> I’m wondering what should be done about NFS. A solution for NFS should
> match any scheme that is considered for local file systems, IMO.
>
> An alternative would be to “cap” the timestamps transmitted via NFSv3 by
> Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
> timestamp is transmitted as UINT_MAX.


I wonder if it would make sense to try to promulgate via the Austin
group, and possibly the C standards committee the concept of a bit
pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
unknown", or "time indefinite" or "we couldn't encode the time".

We would then teach gmtime(3) and asctime(3) to print some appropriate
message, and we could teach programs like find (with the -mtime)
option, make, tmpwatch, et. al., that they can't make any presumption
about the comparibility of any timestamp which has a value of
TIME_UNDEFINIED.

It would be problematic for time(2) or gettimeofday(2) to return
TIME_UNDEFINED, since there are programs that care about time ticking
forward, but I could imagine a new interface which would be permitted
to return a flag indicating that we don't know the current time
(because the CMOS battery had run down, etc.) so instead we're going
to be counting the number of seconds since the system was booted.

- Ted

2014-06-02 17:16:02

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
>
> I wonder if it would make sense to try to promulgate via the Austin
> group, and possibly the C standards committee the concept of a bit
> pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
> unknown", or "time indefinite" or "we couldn't encode the time".
>

(time_t)-1 already has this meaning for some calls (e.g. time(2)).
However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
something similar applies to all possible bit patterns, certainly within
the range of an int.

> We would then teach gmtime(3) and asctime(3) to print some appropriate
> message, and we could teach programs like find (with the -mtime)
> option, make, tmpwatch, et. al., that they can't make any presumption
> about the comparibility of any timestamp which has a value of
> TIME_UNDEFINIED.
>
> It would be problematic for time(2) or gettimeofday(2) to return
> TIME_UNDEFINED, since there are programs that care about time ticking
> forward, but I could imagine a new interface which would be permitted
> to return a flag indicating that we don't know the current time
> (because the CMOS battery had run down, etc.) so instead we're going
> to be counting the number of seconds since the system was booted.

This assumes that we actually know that that is the case, which may be
an aggressive assumption.

-hpa

2014-06-02 18:52:29

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Monday 02 June 2014 10:12:37 H. Peter Anvin wrote:
> On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
> >
> > I wonder if it would make sense to try to promulgate via the Austin
> > group, and possibly the C standards committee the concept of a bit
> > pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
> > unknown", or "time indefinite" or "we couldn't encode the time".
> >
>
> (time_t)-1 already has this meaning for some calls (e.g. time(2)).
> However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
> something similar applies to all possible bit patterns, certainly within
> the range of an int.

Worse than Wed Dec 31 23:59:59 UTC 1969, on NFSv3 it also means
"Sun Feb 7 07:28:15 CET 2106", and that is much harder to distinguish
from a real future date.

If we had the choice, I'd go for something like 1, i.e.
"Thu Jan 1 01:00:01 CET 1970".

> > We would then teach gmtime(3) and asctime(3) to print some appropriate
> > message, and we could teach programs like find (with the -mtime)
> > option, make, tmpwatch, et. al., that they can't make any presumption
> > about the comparibility of any timestamp which has a value of
> > TIME_UNDEFINIED.
> >
> > It would be problematic for time(2) or gettimeofday(2) to return
> > TIME_UNDEFINED, since there are programs that care about time ticking
> > forward, but I could imagine a new interface which would be permitted
> > to return a flag indicating that we don't know the current time
> > (because the CMOS battery had run down, etc.) so instead we're going
> > to be counting the number of seconds since the system was booted.
>
> This assumes that we actually know that that is the case, which may be
> an aggressive assumption.

It's harder for time(2), but for the inode case, we can definitely
detect when the file system specific representation overflows
or underflows, which may be be at a number of very different points
of time.

Arnd

2014-06-02 18:52:53

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Monday 02 June 2014 11:04:23 Chuck Lever wrote:
> I’m wondering what should be done about NFS. A solution for NFS should
> match any scheme that is considered for local file systems, IMO.
>
> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> (See the definition of nfstime3 in RFC 1813).
>
> NFSv4 uses a signed 64-bit value where zero represents midnight UTC
> on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
> the definition of nfstime4 in RFC 5661).
>
> The NFSv4 protocol is probably not problematic, and NFSv3 should be out
> of the picture by 2038. But if changes are planned for dealing _now_
> with timestamp issues, compatibility with NFSv3 is a consideration.
>
> It is already the case that, via NFSv3, the Linux NFS client transmits
> timestamps earlier than 1970 as large positive numbers. Try this with
> xfstests generic/258.

If I read the code correctly, a pre-1970 timestamp will be sent as
a large unsigned integer, but received as a post-2038 timestamp on
64-bit kernels, both in the nfs client and server code.

This behavior is clearly wrong, but it's the same bug that we have
in lots of other file systems, and it makes sense to have the
same fix everywhere, at lease the cases where we know what interpretation
we actually want. NFS has the luxury of having an actual specification
saying that the value is unsigned. For most of the legacy file systems,
we can only make a guess at how other OSs would interpret the same
numbers.

Arnd

2014-06-02 18:58:27

by Roger Willcocks

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time


On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:

> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> (See the definition of nfstime3 in RFC 1813).
>

nfstime3 could be extended by redefining the otherwise unused
nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
seconds field and an unsigned 30-bit nanoseconds field.

This could represent 1970 +/- 272 years.

Servers could indicate they can understand the extended time format by
adding a new FSINFO capability - FSF3_CANSETTIME_EX.

Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
timestamps so old servers would be protected from new clients.

Old clients don't need to be protected from new servers because the
on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
so they're no worse off than they were before.

Arguably the new server ought to clamp out-of-range timestamps before
sending them to old clients but that would need per-client state (and
nfs3 is stateless.)

--
Roger

2014-06-02 19:05:39

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time


On Jun 2, 2014, at 2:58 PM, Roger Willcocks <[email protected]> wrote:

>
> On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
>
>> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
>> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
>> (See the definition of nfstime3 in RFC 1813).
>>
>
> nfstime3 could be extended by redefining the otherwise unused
> nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
> seconds field and an unsigned 30-bit nanoseconds field.
>
> This could represent 1970 +/- 272 years.
>
> Servers could indicate they can understand the extended time format by
> adding a new FSINFO capability - FSF3_CANSETTIME_EX.
>
> Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
> timestamps so old servers would be protected from new clients.

You would have to get the IETF?s NFSv4 working group to sign off on
this change. Otherwise, Linux would be the only NFSv3 implementation
that supports the extension.

But I suspect the answer you?d get is ?Use NFSv4.?

> Old clients don't need to be protected from new servers because the
> on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
> so they're no worse off than they were before.
>
> Arguably the new server ought to clamp out-of-range timestamps before
> sending them to old clients but that would need per-client state (and
> nfs3 is stateless.)

There?s no reliable way in NFSv3 for clients and servers to identify
the software running on the peer.

Practically speaking, you should assume that the NFSv3 protocol is never
going to change.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com


2014-06-02 19:11:15

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Monday 02 June 2014 15:04:27 Chuck Lever wrote:
> On Jun 2, 2014, at 2:58 PM, Roger Willcocks <[email protected]> wrote:
>
> >
> > On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
> >
> >> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> >> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> >> (See the definition of nfstime3 in RFC 1813).
> >>
> >
> > nfstime3 could be extended by redefining the otherwise unused
> > nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
> > seconds field and an unsigned 30-bit nanoseconds field.
> >
> > This could represent 1970 +/- 272 years.
> >
> > Servers could indicate they can understand the extended time format by
> > adding a new FSINFO capability - FSF3_CANSETTIME_EX.
> >
> > Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
> > timestamps so old servers would be protected from new clients.
>
> You would have to get the IETF’s NFSv4 working group to sign off on
> this change. Otherwise, Linux would be the only NFSv3 implementation
> that supports the extension.
>
> But I suspect the answer you’d get is “Use NFSv4.”

While I've never dealt with an NFS standardization, I'd assume this is
a workable answer. The NFSv2 and NFSv3 definition clearly defines a valid
range of times until 2106 using unsigned seconds, and that should really
give enough time to migrate to something better (not necessarily NFSv4).

Arnd

2014-06-02 22:30:42

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Mon, Jun 02, 2014 at 10:12:37AM -0700, H. Peter Anvin wrote:
> > It would be problematic for time(2) or gettimeofday(2) to return
> > TIME_UNDEFINED, since there are programs that care about time ticking
> > forward, but I could imagine a new interface which would be permitted
> > to return a flag indicating that we don't know the current time
> > (because the CMOS battery had run down, etc.) so instead we're going
> > to be counting the number of seconds since the system was booted.
>
> This assumes that we actually know that that is the case, which may be
> an aggressive assumption.

We won't know if the RTC clock is wrong, true --- but the kernel will
know if (a) the hardware doesn't have RTC clock at all, or if (b) the
RTC clock is ticking some time that can't be encoded using the current
time_t type. So in that case, the fallback would be to be for the
kernel to tick starting with time_t == 0 when the system is initially
booted, and the "time indefinite flag" would be set.

Now assume that we have a new system call, gettimestampofday(2), which
returns a new timestamp structure which has a 64-bit ts_sec field, the
ts_nsec field (ala struct timespec), and a ts_flags field, where the
kernel could signal things like "time invalid", or "time can't be
encoded in the legacy time_t type", or "I'm not sure if the time is
correct" --- i.e., because the RTC battery isn't working.

Not all hardware might be able to support the last, of course, but if
the battery is low, or the system has been exposed to very low
temperatures (or large amounts of cosmic radiation, etc.) the RTC
time may just be plain wrong. No system is going to be perfect, but
it should be possible to make htings better, at for certain classes of
hardware.

And since we are already returning (time_t) -1 in some cases, we might
as well try to make things a bit more formal.

- Ted

2014-06-02 22:35:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
>
> And since we are already returning (time_t) -1 in some cases, we might
> as well try to make things a bit more formal.
>

Are we? I am not aware of *Linux* actually using that.

-hpa

2014-06-02 23:32:40

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote:
> On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
> >
> > And since we are already returning (time_t) -1 in some cases, we might
> > as well try to make things a bit more formal.
> >
>
> Are we? I am not aware of *Linux* actually using that.

Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
the Posix specification:

SYSCALL_DEFINE1(time, time_t __user *, tloc)
{
time_t i = get_seconds();

if (tloc) {
if (put_user(i,tloc))
return -EFAULT;
}
force_successful_syscall_return();
return i;
}

Cheers,

- Ted

2014-06-02 23:36:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On 06/02/2014 04:32 PM, Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote:
>> On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
>>>
>>> And since we are already returning (time_t) -1 in some cases, we might
>>> as well try to make things a bit more formal.
>>>
>>
>> Are we? I am not aware of *Linux* actually using that.
>
> Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
> the Posix specification:
>
> SYSCALL_DEFINE1(time, time_t __user *, tloc)
> {
> time_t i = get_seconds();
>
> if (tloc) {
> if (put_user(i,tloc))
> return -EFAULT;
> }
> force_successful_syscall_return();
> return i;
> }
>

OK, I guess I should have said... other than for -EFAULT.

I just don't know of anyone using time(2) with an argument other than NULL.

-hpa

2014-06-03 13:09:49

by Roger Willcocks

[permalink] [raw]
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time


On Mon, 2014-06-02 at 19:32 -0400, Theodore Ts'o wrote:

> Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
> the Posix specification:
>
> SYSCALL_DEFINE1(time, time_t __user *, tloc)
> {
> time_t i = get_seconds();
>
> if (tloc) {
> if (put_user(i,tloc))
> return -EFAULT;
> }
> force_successful_syscall_return();
> return i;
> }

get_seconds() returns an unsigned long so there's potential for overflow
here.

--
Roger