LinuxLists.cc - Re: [RFC 11/32] xfs: convert to struct inode

2014-06-02 15:05:24

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Jun 2, 2014, at 6:56 AM, Arnd Bergmann <[email protected]> wrote:

> On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote:
>>
>>> For actually running kernels beyond 2038, the best idea I've seen so
>>> far is to disallow all broken code at compile time. I don't see
>>> a choice but to audit the entire kernel for invalid uses on both
>>> 32 and 64 bit in the next few years. A lot of code will get changed
>>> in the process so we can actually keep running 32-bit kernels and
>>> file systems, but other code will likely go away:
>>>
>>> * any system calls that pass a time_t, timeval or timespec on
>>> 32-bit systems return -ENOSYS, to ensure all user land uses
>>> the replacements we will put into place
>>> * The definition of 'time_t', 'timval' and 'timespec' can be hidden
>>> from the kernel, and all code using it left out.
>>> * ext2 and ext3 file system code will have to be disabled, but that's
>>> file since ext4 can mount old file systems.
>>
>> Syscalls and libs can be "fixed". Existing filesystem content might
>> not. So if you need to mount some old media in read-write mode after
>> 2038 and that happens to content an ext2 or similarly limited filesystem
>> then it'd better just "work". Having the kernel refuse to modify the
>> filesystem would be unacceptable.
>
> I think you misunderstood what I suggested: the intent is to avoid
> seeing things break in 2038 by making them break much earlier. We have
> a solution for ext2 file systems, it's called ext4, and we just need
> to ensure that everybody knows they have to migrate eventually.
>
> At some point before the mid 2030ies, you should no longer be able to
> build a kernel that has support for ext2 or any other module that will
> run into bugs later. Until then (rather sooner than later), I'd like
> to get to the point where you can choose whether to include those
> modules at build time or not, and then get everybody to turn off that
> option and fix the bugs they run into. You wouldn't need that for a
> 2014-generation long-term support disto (rhel 7, sles 12, debian 7,
> ubuntu 14.04, ...), but perhaps for the next generation, or the
> one after that.

I?m wondering what should be done about NFS. A solution for NFS should
match any scheme that is considered for local file systems, IMO.

NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
(See the definition of nfstime3 in RFC 1813).

NFSv4 uses a signed 64-bit value where zero represents midnight UTC
on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
the definition of nfstime4 in RFC 5661).

The NFSv4 protocol is probably not problematic, and NFSv3 should be out
of the picture by 2038. But if changes are planned for dealing _now_
with timestamp issues, compatibility with NFSv3 is a consideration.

It is already the case that, via NFSv3, the Linux NFS client transmits
timestamps earlier than 1970 as large positive numbers. Try this with
xfstests generic/258.

Maybe nfs3_proc_setattr() should recognize pre-epoch timestamps and
timestamps larger than can be represented in an unsigned 32-bit field
and return an immediate error to the requesting application (like EINVAL).

If the Linux NFS server encounters a local file with a timestamp that
cannot be represented via a u32, should it also return NFS3ERR_INVAL?

RFC 1813 does not provide guidance on the behavior nor does it suggest
a particular error status code. The Solaris 11 server appears to return
NFS3ERR_INVAL in this case.

An alternative would be to ?cap? the timestamps transmitted via NFSv3 by
Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
timestamp is transmitted as UINT_MAX.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2014-06-02 22:35:49

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
>
> And since we are already returning (time_t) -1 in some cases, we might
> as well try to make things a bit more formal.
>

Are we? I am not aware of *Linux* actually using that.

-hpa

2014-06-02 23:36:36

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On 06/02/2014 04:32 PM, Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote:
>> On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
>>>
>>> And since we are already returning (time_t) -1 in some cases, we might
>>> as well try to make things a bit more formal.
>>>
>>
>> Are we? I am not aware of *Linux* actually using that.
>
> Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
> the Posix specification:
>
> SYSCALL_DEFINE1(time, time_t __user *, tloc)
> {
> time_t i = get_seconds();
>
> if (tloc) {
> if (put_user(i,tloc))
> return -EFAULT;
> }
> force_successful_syscall_return();
> return i;
> }
>

OK, I guess I should have said... other than for -EFAULT.

I just don't know of anyone using time(2) with an argument other than NULL.

-hpa

2014-06-02 18:52:51

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Monday 02 June 2014 11:04:23 Chuck Lever wrote:
> I’m wondering what should be done about NFS. A solution for NFS should
> match any scheme that is considered for local file systems, IMO.
>
> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> (See the definition of nfstime3 in RFC 1813).
>
> NFSv4 uses a signed 64-bit value where zero represents midnight UTC
> on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
> the definition of nfstime4 in RFC 5661).
>
> The NFSv4 protocol is probably not problematic, and NFSv3 should be out
> of the picture by 2038. But if changes are planned for dealing _now_
> with timestamp issues, compatibility with NFSv3 is a consideration.
>
> It is already the case that, via NFSv3, the Linux NFS client transmits
> timestamps earlier than 1970 as large positive numbers. Try this with
> xfstests generic/258.

If I read the code correctly, a pre-1970 timestamp will be sent as
a large unsigned integer, but received as a post-2038 timestamp on
64-bit kernels, both in the nfs client and server code.

This behavior is clearly wrong, but it's the same bug that we have
in lots of other file systems, and it makes sense to have the
same fix everywhere, at lease the cases where we know what interpretation
we actually want. NFS has the luxury of having an actual specification
saying that the value is unsigned. For most of the legacy file systems,
we can only make a guess at how other OSs would interpret the same
numbers.

Arnd

2014-06-02 18:52:27

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Monday 02 June 2014 10:12:37 H. Peter Anvin wrote:
> On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
> >
> > I wonder if it would make sense to try to promulgate via the Austin
> > group, and possibly the C standards committee the concept of a bit
> > pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
> > unknown", or "time indefinite" or "we couldn't encode the time".
> >
>
> (time_t)-1 already has this meaning for some calls (e.g. time(2)).
> However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
> something similar applies to all possible bit patterns, certainly within
> the range of an int.

Worse than Wed Dec 31 23:59:59 UTC 1969, on NFSv3 it also means
"Sun Feb 7 07:28:15 CET 2106", and that is much harder to distinguish
from a real future date.

If we had the choice, I'd go for something like 1, i.e.
"Thu Jan 1 01:00:01 CET 1970".

> > We would then teach gmtime(3) and asctime(3) to print some appropriate
> > message, and we could teach programs like find (with the -mtime)
> > option, make, tmpwatch, et. al., that they can't make any presumption
> > about the comparibility of any timestamp which has a value of
> > TIME_UNDEFINIED.
> >
> > It would be problematic for time(2) or gettimeofday(2) to return
> > TIME_UNDEFINED, since there are programs that care about time ticking
> > forward, but I could imagine a new interface which would be permitted
> > to return a flag indicating that we don't know the current time
> > (because the CMOS battery had run down, etc.) so instead we're going
> > to be counting the number of seconds since the system was booted.
>
> This assumes that we actually know that that is the case, which may be
> an aggressive assumption.

It's harder for time(2), but for the inode case, we can definitely
detect when the file system specific representation overflows
or underflows, which may be be at a number of very different points
of time.

Arnd

2014-06-02 19:05:06

by Roger Willcocks

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:

> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> (See the definition of nfstime3 in RFC 1813).
>

nfstime3 could be extended by redefining the otherwise unused
nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
seconds field and an unsigned 30-bit nanoseconds field.

This could represent 1970 +/- 272 years.

Servers could indicate they can understand the extended time format by
adding a new FSINFO capability - FSF3_CANSETTIME_EX.

Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
timestamps so old servers would be protected from new clients.

Old clients don't need to be protected from new servers because the
on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
so they're no worse off than they were before.

Arguably the new server ought to clamp out-of-range timestamps before
sending them to old clients but that would need per-client state (and
nfs3 is stateless.)

--
Roger

2014-06-02 23:32:37

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote:
> On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
> >
> > And since we are already returning (time_t) -1 in some cases, we might
> > as well try to make things a bit more formal.
> >
>
> Are we? I am not aware of *Linux* actually using that.

Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
the Posix specification:

SYSCALL_DEFINE1(time, time_t __user *, tloc)
{
time_t i = get_seconds();

if (tloc) {
if (put_user(i,tloc))
return -EFAULT;
}
force_successful_syscall_return();
return i;
}

Cheers,

- Ted

2014-06-02 15:31:43

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Mon, Jun 02, 2014 at 11:04:23AM -0400, Chuck Lever wrote:
> I’m wondering what should be done about NFS. A solution for NFS should
> match any scheme that is considered for local file systems, IMO.
>
> An alternative would be to “cap” the timestamps transmitted via NFSv3 by
> Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
> timestamp is transmitted as UINT_MAX.

I wonder if it would make sense to try to promulgate via the Austin
group, and possibly the C standards committee the concept of a bit
pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
unknown", or "time indefinite" or "we couldn't encode the time".

We would then teach gmtime(3) and asctime(3) to print some appropriate
message, and we could teach programs like find (with the -mtime)
option, make, tmpwatch, et. al., that they can't make any presumption
about the comparibility of any timestamp which has a value of
TIME_UNDEFINIED.

It would be problematic for time(2) or gettimeofday(2) to return
TIME_UNDEFINED, since there are programs that care about time ticking
forward, but I could imagine a new interface which would be permitted
to return a flag indicating that we don't know the current time
(because the CMOS battery had run down, etc.) so instead we're going
to be counting the number of seconds since the system was booted.

- Ted

2014-06-03 13:09:46

by Roger Willcocks

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Mon, 2014-06-02 at 19:32 -0400, Theodore Ts'o wrote:

> Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
> the Posix specification:
>
> SYSCALL_DEFINE1(time, time_t __user *, tloc)
> {
> time_t i = get_seconds();
>
> if (tloc) {
> if (put_user(i,tloc))
> return -EFAULT;
> }
> force_successful_syscall_return();
> return i;
> }

get_seconds() returns an unsigned long so there's potential for overflow
here.

--
Roger

2014-06-02 22:30:40

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Mon, Jun 02, 2014 at 10:12:37AM -0700, H. Peter Anvin wrote:
> > It would be problematic for time(2) or gettimeofday(2) to return
> > TIME_UNDEFINED, since there are programs that care about time ticking
> > forward, but I could imagine a new interface which would be permitted
> > to return a flag indicating that we don't know the current time
> > (because the CMOS battery had run down, etc.) so instead we're going
> > to be counting the number of seconds since the system was booted.
>
> This assumes that we actually know that that is the case, which may be
> an aggressive assumption.

We won't know if the RTC clock is wrong, true --- but the kernel will
know if (a) the hardware doesn't have RTC clock at all, or if (b) the
RTC clock is ticking some time that can't be encoded using the current
time_t type. So in that case, the fallback would be to be for the
kernel to tick starting with time_t == 0 when the system is initially
booted, and the "time indefinite flag" would be set.

Now assume that we have a new system call, gettimestampofday(2), which
returns a new timestamp structure which has a 64-bit ts_sec field, the
ts_nsec field (ala struct timespec), and a ts_flags field, where the
kernel could signal things like "time invalid", or "time can't be
encoded in the legacy time_t type", or "I'm not sure if the time is
correct" --- i.e., because the RTC battery isn't working.

Not all hardware might be able to support the last, of course, but if
the battery is low, or the system has been exposed to very low
temperatures (or large amounts of cosmic radiation, etc.) the RTC
time may just be plain wrong. No system is going to be perfect, but
it should be possible to make htings better, at for certain classes of
hardware.

And since we are already returning (time_t) -1 in some cases, we might
as well try to make things a bit more formal.

- Ted

2014-06-02 17:15:56

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
>
> I wonder if it would make sense to try to promulgate via the Austin
> group, and possibly the C standards committee the concept of a bit
> pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
> unknown", or "time indefinite" or "we couldn't encode the time".
>

(time_t)-1 already has this meaning for some calls (e.g. time(2)).
However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
something similar applies to all possible bit patterns, certainly within
the range of an int.

> We would then teach gmtime(3) and asctime(3) to print some appropriate
> message, and we could teach programs like find (with the -mtime)
> option, make, tmpwatch, et. al., that they can't make any presumption
> about the comparibility of any timestamp which has a value of
> TIME_UNDEFINIED.
>
> It would be problematic for time(2) or gettimeofday(2) to return
> TIME_UNDEFINED, since there are programs that care about time ticking
> forward, but I could imagine a new interface which would be permitted
> to return a flag indicating that we don't know the current time
> (because the CMOS battery had run down, etc.) so instead we're going
> to be counting the number of seconds since the system was booted.

This assumes that we actually know that that is the case, which may be
an aggressive assumption.

-hpa

2014-06-02 19:11:13

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Monday 02 June 2014 15:04:27 Chuck Lever wrote:
> On Jun 2, 2014, at 2:58 PM, Roger Willcocks <[email protected]> wrote:
>
> >
> > On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
> >
> >> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> >> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> >> (See the definition of nfstime3 in RFC 1813).
> >>
> >
> > nfstime3 could be extended by redefining the otherwise unused
> > nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
> > seconds field and an unsigned 30-bit nanoseconds field.
> >
> > This could represent 1970 +/- 272 years.
> >
> > Servers could indicate they can understand the extended time format by
> > adding a new FSINFO capability - FSF3_CANSETTIME_EX.
> >
> > Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
> > timestamps so old servers would be protected from new clients.
>
> You would have to get the IETF’s NFSv4 working group to sign off on
> this change. Otherwise, Linux would be the only NFSv3 implementation
> that supports the extension.
>
> But I suspect the answer you’d get is “Use NFSv4.”

While I've never dealt with an NFS standardization, I'd assume this is
a workable answer. The NFSv2 and NFSv3 definition clearly defines a valid
range of times until 2106 using unsigned seconds, and that should really
give enough time to migrate to something better (not necessarily NFSv4).

Arnd

2014-06-02 19:05:36

by Chuck Lever III

[permalink] [raw]

Subject: Re: [RFC 11/32] xfs: convert to struct inode_time

On Jun 2, 2014, at 2:58 PM, Roger Willcocks <[email protected]> wrote:

>
> On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
>
>> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
>> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
>> (See the definition of nfstime3 in RFC 1813).
>>
>
> nfstime3 could be extended by redefining the otherwise unused
> nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
> seconds field and an unsigned 30-bit nanoseconds field.
>
> This could represent 1970 +/- 272 years.
>
> Servers could indicate they can understand the extended time format by
> adding a new FSINFO capability - FSF3_CANSETTIME_EX.
>
> Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
> timestamps so old servers would be protected from new clients.

You would have to get the IETF?s NFSv4 working group to sign off on
this change. Otherwise, Linux would be the only NFSv3 implementation
that supports the extension.

But I suspect the answer you?d get is ?Use NFSv4.?

> Old clients don't need to be protected from new servers because the
> on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
> so they're no worse off than they were before.
>
> Arguably the new server ought to clamp out-of-range timestamps before
> sending them to old clients but that would need per-client state (and
> nfs3 is stateless.)

There?s no reliable way in NFSv3 for clients and servers to identify
the software running on the peer.

Practically speaking, you should assume that the NFSv3 protocol is never
going to change.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com