2010-06-25 19:22:52

by Oleg Nesterov

[permalink] [raw]
Subject: Q: sys_futex() && timespec_valid()

Hello.

Another stupid question about the trivial problem I am going to ask,
just to report the authoritative answer back to bugzilla. The problem
is, personally I am not sure we should/can add the user-visible change
required by glibc maintainers, and I am in no position to suggest them
to fix the user-space code instead.


In short, glibc developers believe that sys_futex(ts) is buggy and
needs the fix to return -ETIMEDOUT instead of -EINVAL in case when
ts->tv_sec < 0 and the timeout is absolute.

Ignoring the possible cleanups/microoptimizations, something like this:

--- x/kernel/futex.c
+++ x/kernel/futex.c
@@ -2625,6 +2625,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
cmd == FUTEX_WAIT_REQUEUE_PI)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
+
+ // absolute timeout
+ if (cmd != FUTEX_WAIT) {
+ if (ts->tv_nsec >= NSEC_PER_SEC)
+ return -EINVAL;
+ if (ts->tv_sec < 0)
+ return -ETIMEDOUT;
+ }
+
+
if (!timespec_valid(&ts))
return -EINVAL;

------------------------------------------------------------------------

Otherwise, pthread_rwlock_timedwrlock(ts) hangs spinning in user-space
forever if ts->tv_sec < 0.

To clarify: this depends on libc version and arch.

This happens because pthread_rwlock_timedwrlock(rwlock, ts) on x86_64
roughly does:

for (;;) {
if (fast_path_succeeds(rwlock))
return 0;

if (ts->tv_nsec >= NSEC_PER_SEC)
return EINVAL;

errcode = sys_futex(FUTEX_WAIT_BITSET_PRIVATE, ts);
if (errcode == ETIMEDOUT)
return ETIMEDOUT;
}

and since the kernel return EINVAL due to !timespec_valid(ts), the
code above loops forever.

(btw, we have same problem with EFAULT, and this is considered as
a caller's problem).

IOW, pthread_rwlock_timedwrlock() assumes that in this case
sys_futex() can return nothing interesting except 0 or ETIMEDOUT.
I guess pthread_rwlock_timedwrlock() is not alone, but I didn't check.



So, the question: do you think we can change sys_futex() to make
glibc happy?

Or, do you think it is user-space who should check tv_sec < 0 if
it wants ETIMEDOUT with the negative timeout ?

Thanks,

Oleg.


2010-06-25 19:43:15

by Darren Hart

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

On 06/25/2010 12:20 PM, Oleg Nesterov wrote:
> Hello.
>

Hi Oleg,

> Another stupid question about the trivial problem I am going to ask,
> just to report the authoritative answer back to bugzilla. The problem
> is, personally I am not sure we should/can add the user-visible change
> required by glibc maintainers, and I am in no position to suggest them
> to fix the user-space code instead.
>
> In short, glibc developers believe that sys_futex(ts) is buggy and
> needs the fix to return -ETIMEDOUT instead of -EINVAL in case when
> ts->tv_sec< 0 and the timeout is absolute.
>

Just a question of semantics I guess. Seems reasonable to me to call a
negative timeout invalid. However, I certainly don't feel strongly
enough about it to fight for it. Glibc is the principle user of
sys_futex(). While there are certainly other users out there (Mathieu
Desnoyers' Userspace RCU comes to mind), I doubt any of them depend on
-EINVAL for negative timeouts to function properly.

Unless there is some good reason to object to breaking the API that I am
missing, I don't mind changing it to -ETIMEDOUT (although -EINVAL seems
more intuitive to me).

--
Darren "Little Fish" Hart

> Ignoring the possible cleanups/microoptimizations, something like this:
>
> --- x/kernel/futex.c
> +++ x/kernel/futex.c
> @@ -2625,6 +2625,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
> cmd == FUTEX_WAIT_REQUEUE_PI)) {
> if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
> return -EFAULT;
> +
> + // absolute timeout
> + if (cmd != FUTEX_WAIT) {
> + if (ts->tv_nsec>= NSEC_PER_SEC)
> + return -EINVAL;
> + if (ts->tv_sec< 0)
> + return -ETIMEDOUT;
> + }
> +
> +
> if (!timespec_valid(&ts))
> return -EINVAL;
>
> ------------------------------------------------------------------------
>
> Otherwise, pthread_rwlock_timedwrlock(ts) hangs spinning in user-space
> forever if ts->tv_sec< 0.
>
> To clarify: this depends on libc version and arch.
>
> This happens because pthread_rwlock_timedwrlock(rwlock, ts) on x86_64
> roughly does:
>
> for (;;) {
> if (fast_path_succeeds(rwlock))
> return 0;
>
> if (ts->tv_nsec>= NSEC_PER_SEC)
> return EINVAL;
>
> errcode = sys_futex(FUTEX_WAIT_BITSET_PRIVATE, ts);
> if (errcode == ETIMEDOUT)
> return ETIMEDOUT;
> }
>
> and since the kernel return EINVAL due to !timespec_valid(ts), the
> code above loops forever.
>
> (btw, we have same problem with EFAULT, and this is considered as
> a caller's problem).
>
> IOW, pthread_rwlock_timedwrlock() assumes that in this case
> sys_futex() can return nothing interesting except 0 or ETIMEDOUT.
> I guess pthread_rwlock_timedwrlock() is not alone, but I didn't check.
>
>
>
> So, the question: do you think we can change sys_futex() to make
> glibc happy?
>
> Or, do you think it is user-space who should check tv_sec< 0 if
> it wants ETIMEDOUT with the negative timeout ?
>
> Thanks,
>
> Oleg.
>


--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

2010-06-25 19:49:35

by Ulrich Drepper

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

----- "Darren Hart" <[email protected]> wrote:
> Unless there is some good reason to object to breaking the API that I
> am missing, I don't mind changing it to -ETIMEDOUT (although -EINVAL
> seems more intuitive to me).

It's only not intuitive because Oleg misrepresented or at least didn't describe the issue.

The kernel already catches invalid timespec values. Unfortunately the code used comes from the time when all timeouts where specified with relative values. In such situations negative tv_sec values were in fact invalid and rejected with EINVAL.

But for absolute timeouts tv_sec = -1 means a time before Epoch. This is not an invalid value, it just is one of many points in time which have passed and therefore the kernel has to respond with ETIMEDOUT.

This is no semantic change or anything like that. It pure and simply a bug fix. When Thomas worked on that come we simply missed updating the test for invalid timespec values.

The kernel code should be fixed to always check tv_nsec for < 0 and > 1000000000. But the tv_sec test for < 0 should be skipped if the timeout value is interpreted as an absolute time value.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2010-06-25 19:56:57

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

* Darren Hart ([email protected]) wrote:
> On 06/25/2010 12:20 PM, Oleg Nesterov wrote:
>> Hello.
>>
>
> Hi Oleg,
>
>> Another stupid question about the trivial problem I am going to ask,
>> just to report the authoritative answer back to bugzilla. The problem
>> is, personally I am not sure we should/can add the user-visible change
>> required by glibc maintainers, and I am in no position to suggest them
>> to fix the user-space code instead.
>>
>> In short, glibc developers believe that sys_futex(ts) is buggy and
>> needs the fix to return -ETIMEDOUT instead of -EINVAL in case when
>> ts->tv_sec< 0 and the timeout is absolute.
>>
>
> Just a question of semantics I guess. Seems reasonable to me to call a
> negative timeout invalid. However, I certainly don't feel strongly
> enough about it to fight for it. Glibc is the principle user of
> sys_futex(). While there are certainly other users out there (Mathieu
> Desnoyers' Userspace RCU comes to mind), I doubt any of them depend on
> -EINVAL for negative timeouts to function properly.

Userspace RCU does not use futex timeouts (the parameter is always NULL). So
this change/fix won't have any effect as far as urcu is concerned.

Thanks,

Mathieu

>
> Unless there is some good reason to object to breaking the API that I am
> missing, I don't mind changing it to -ETIMEDOUT (although -EINVAL seems
> more intuitive to me).
>
> --
> Darren "Little Fish" Hart
>
>> Ignoring the possible cleanups/microoptimizations, something like this:
>>
>> --- x/kernel/futex.c
>> +++ x/kernel/futex.c
>> @@ -2625,6 +2625,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
>> cmd == FUTEX_WAIT_REQUEUE_PI)) {
>> if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
>> return -EFAULT;
>> +
>> + // absolute timeout
>> + if (cmd != FUTEX_WAIT) {
>> + if (ts->tv_nsec>= NSEC_PER_SEC)
>> + return -EINVAL;
>> + if (ts->tv_sec< 0)
>> + return -ETIMEDOUT;
>> + }
>> +
>> +
>> if (!timespec_valid(&ts))
>> return -EINVAL;
>>
>> ------------------------------------------------------------------------
>>
>> Otherwise, pthread_rwlock_timedwrlock(ts) hangs spinning in user-space
>> forever if ts->tv_sec< 0.
>>
>> To clarify: this depends on libc version and arch.
>>
>> This happens because pthread_rwlock_timedwrlock(rwlock, ts) on x86_64
>> roughly does:
>>
>> for (;;) {
>> if (fast_path_succeeds(rwlock))
>> return 0;
>>
>> if (ts->tv_nsec>= NSEC_PER_SEC)
>> return EINVAL;
>>
>> errcode = sys_futex(FUTEX_WAIT_BITSET_PRIVATE, ts);
>> if (errcode == ETIMEDOUT)
>> return ETIMEDOUT;
>> }
>>
>> and since the kernel return EINVAL due to !timespec_valid(ts), the
>> code above loops forever.
>>
>> (btw, we have same problem with EFAULT, and this is considered as
>> a caller's problem).
>>
>> IOW, pthread_rwlock_timedwrlock() assumes that in this case
>> sys_futex() can return nothing interesting except 0 or ETIMEDOUT.
>> I guess pthread_rwlock_timedwrlock() is not alone, but I didn't check.
>>
>>
>>
>> So, the question: do you think we can change sys_futex() to make
>> glibc happy?
>>
>> Or, do you think it is user-space who should check tv_sec< 0 if
>> it wants ETIMEDOUT with the negative timeout ?
>>
>> Thanks,
>>
>> Oleg.
>>
>
>
> --
> Darren Hart
> IBM Linux Technology Center
> Real-Time Linux Team

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-06-25 19:59:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

B1;2005;0cOleg,

On Fri, 25 Jun 2010, Oleg Nesterov wrote:

> Hello.
>
> Another stupid question about the trivial problem I am going to ask,
> just to report the authoritative answer back to bugzilla. The problem
> is, personally I am not sure we should/can add the user-visible change
> required by glibc maintainers, and I am in no position to suggest them
> to fix the user-space code instead.
>
>
> In short, glibc developers believe that sys_futex(ts) is buggy and
> needs the fix to return -ETIMEDOUT instead of -EINVAL in case when
> ts->tv_sec < 0 and the timeout is absolute.

Oh well. We followed the validity check for all other syscalls which
hand in [absolute] timespecs:

The rqtp argument specified a nanosecond value less than zero or
greater than or equal to 1000 million; or the TIMER_ABSTIME flag was
specified in flags and the rqtp argument is outside the range for the
clock specified by clock_id;

tv->sec < 0 is definitely an invalid value for both CLOCK_REALTIME and
CLOCK_MONOTONIC. And I consider any code assuming that it's sane as
buggy by definition.

I'm strictly against having different definitions of sanity for
different syscalls.

> Ignoring the possible cleanups/microoptimizations, something like this:
>
> --- x/kernel/futex.c
> +++ x/kernel/futex.c
> @@ -2625,6 +2625,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
> cmd == FUTEX_WAIT_REQUEUE_PI)) {
> if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
> return -EFAULT;
> +
> + // absolute timeout
> + if (cmd != FUTEX_WAIT) {
> + if (ts->tv_nsec >= NSEC_PER_SEC)
> + return -EINVAL;
> + if (ts->tv_sec < 0)
> + return -ETIMEDOUT;
> + }
> +
> +
> if (!timespec_valid(&ts))
> return -EINVAL;

Btw, you'd need that ugly check in the compat syscall as well.

> ------------------------------------------------------------------------
>
> Otherwise, pthread_rwlock_timedwrlock(ts) hangs spinning in user-space
> forever if ts->tv_sec < 0.
>
> To clarify: this depends on libc version and arch.

Ouch. So we have code in libc which makes different assumptions about
the syscall semantics ?

> This happens because pthread_rwlock_timedwrlock(rwlock, ts) on x86_64
> roughly does:
>
> for (;;) {
> if (fast_path_succeeds(rwlock))
> return 0;
>
> if (ts->tv_nsec >= NSEC_PER_SEC)
> return EINVAL;
>
> errcode = sys_futex(FUTEX_WAIT_BITSET_PRIVATE, ts);
> if (errcode == ETIMEDOUT)
> return ETIMEDOUT;
> }
>
> and since the kernel return EINVAL due to !timespec_valid(ts), the
> code above loops forever.
>
> (btw, we have same problem with EFAULT, and this is considered as
> a caller's problem).

Brilliant.

> IOW, pthread_rwlock_timedwrlock() assumes that in this case
> sys_futex() can return nothing interesting except 0 or ETIMEDOUT.
> I guess pthread_rwlock_timedwrlock() is not alone, but I didn't check.
>
> So, the question: do you think we can change sys_futex() to make
> glibc happy?

Do we really want to add crap to the kernel, just because some
lunatics have interesting assumptions about validation ?

Definitely NOT

> Or, do you think it is user-space who should check tv_sec < 0 if
> it wants ETIMEDOUT with the negative timeout ?

If user space folks consider tv_sec < 0 a value which is sane and
inside the valid range of CLOCK_MONO/REAL then I can't do much more
than shrug.

Thanks,

tglx

2010-06-25 20:05:12

by Ulrich Drepper

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

----- "Thomas Gleixner" <[email protected]> wrote:
> tv->sec < 0 is definitely an invalid value for both CLOCK_REALTIME
> and CLOCK_MONOTONIC.

CLOCK_MONOTONIC is different but it's wrong for CLOCK_REALTIME. Why would it be invalid? Because times before Epoch will not be used? By that logic you would have to declare all values before Linus' first running kernel as invalid. None of this makes sense.

The tv_sec in timespec is of type time_t and for absolute time values the same semantics as for naked time_t values applies. The absolute time is

epoch + tv_sec + tv_nsec / 1000000000

If tv_sec is negative these are values before epoch.

If there are other interfaces with absolute timeouts they certainly should be changed as well.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2010-06-25 20:11:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

On Fri, 25 Jun 2010, Ulrich Drepper wrote:

Can you please use a mail client with proper line breaks ?

> ----- "Darren Hart" <[email protected]> wrote:
> > Unless there is some good reason to object to breaking the API that I
> > am missing, I don't mind changing it to -ETIMEDOUT (although -EINVAL
> > seems more intuitive to me).
>
> It's only not intuitive because Oleg misrepresented or at least
> didn't describe the issue.

> The kernel already catches invalid timespec values. Unfortunately
> the code used comes from the time when all timeouts where specified
> with relative values. In such situations negative tv_sec values
> were in fact invalid and rejected with EINVAL.

> But for absolute timeouts tv_sec = -1 means a time before Epoch.
> This is not an invalid value, it just is one of many points in time
> which have passed and therefore the kernel has to respond with
> ETIMEDOUT.

That's simply wrong.

... or the TIMER_ABSTIME flag was specified in flags and the rqtp
argument is outside the range for the clock specified by clock_id;

And I consider anything before the EPOCH or before the computer booted
outside of the range. Simply because that's outside the range which we
can read back from the clock, out of the range to which we can set the
clock.

And it's completely illogical to treat relative and abolute timeouts
different. If we'd accept that before the EPOCH or before the computer
started is valid for ABSTIME, then there is no freaking reason to
treat relative timeouts any different.

> This is no semantic change or anything like that. It pure and
> simply a bug fix. When Thomas worked on that come we simply missed
> updating the test for invalid timespec values.

No, that's how we treat every damned timespec in the
syscalls. clock_nanosleep(ABSTIME) has this behaviour forever and we
have this behaviour in sys_futex since we merged PI futex support way
before we added the BITSET stuff.

So just because you messed up your glibc implementation you want us to
fix glibc in the kernel based on some backwards arguments ?

> The kernel code should be fixed to always check tv_nsec for < 0 and
> > 1000000000. But the tv_sec test for < 0 should be skipped if the
> timeout value is interpreted as an absolute time value.

Definitely NOT!

tglx

2010-06-25 20:26:09

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

On Fri, 25 Jun 2010, Ulrich Drepper wrote:

> ----- "Thomas Gleixner" <[email protected]> wrote:
> > tv->sec < 0 is definitely an invalid value for both CLOCK_REALTIME
> > and CLOCK_MONOTONIC.

> CLOCK_MONOTONIC is different but it's wrong for CLOCK_REALTIME.

Why is CLOCK_MONOTONIC any different ? According to you argumentation
it's just _BEFORE_ we started the system.

> Why would it be invalid? Because times before Epoch will not be
> used? By that logic you would have to declare all values before
> Linus' first running kernel as invalid. None of this makes sense.

That's utter bullshit and you know that. Every system which does not
have an RTC starts with the EPOCH and the range check is correct in
the terms of the specification:

Outside the valid range of the clock_id.

The kernel treats the range valid which it can theoretically hand back
to user space:

EPOCH <= CLOCK_REALTIME <= Y2038_or_far_away

0 <= CLOCK_MONOTONIC <= far_away

> The tv_sec in timespec is of type time_t and for absolute time
> values the same semantics as for naked time_t values applies. The
> absolute time is
> epoch + tv_sec + tv_nsec / 1000000000
>
> If tv_sec is negative these are values before epoch.
>
> If there are other interfaces with absolute timeouts they certainly
> should be changed as well.

If we'd change that, then we'd need to allow negative relative
timeouts as well and not restrict it to CLOCK_REALTIME.

If we'd change the notion of "valid" then we do it in a consistent
way for everything not just for a corner case of some glibc wreckage.

Thanks,

tglx

2010-06-28 14:01:14

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

On 06/25, Ulrich Drepper wrote:
>
> ----- "Darren Hart" <[email protected]> wrote:
> > Unless there is some good reason to object to breaking the API that I
> > am missing, I don't mind changing it to -ETIMEDOUT (although -EINVAL
> > seems more intuitive to me).
>
> It's only not intuitive because Oleg misrepresented or at least didn't
> describe the issue.
> ...
> It pure and simply
> a bug fix.

Because personally I disagree that sys_futex()->timespec_valid() is buggy.

I repeated this many times during the previous discussion. I didn't even
try to judge if it is really right or not, because my opinion doesn't matter
at all here.

But it is unfair (imho) to state this code is buggy. The code is correct
even if it does not match your expectations, it works as expected/designed.
And, sys_futex() does this since 2006 iirc.


Honestly, it looks a bit strange to me that you blame the correct code,
and at the same time you ignore the test-case which hangs because the
kernel returns -EFAULT saying that this is the caller's problem.

Oleg.

2010-06-28 14:37:01

by Jakub Jelinek

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

On Mon, Jun 28, 2010 at 03:58:25PM +0200, Oleg Nesterov wrote:
> Honestly, it looks a bit strange to me that you blame the correct code,

Whether that is correct or not is what is being disputed.

> and at the same time you ignore the test-case which hangs because the
> kernel returns -EFAULT saying that this is the caller's problem.

The userspace code reads the abstime->tv_nsec value, so if it wouldn't
be valid address, the code would already segfault. And that's fine, POSIX
certainly allows that, reporting EFAULT isn't required. Well, it doesn't
read abstime->tv_sec in the assembly version, so if you try hard, you can
avoid the segfault, yet get EFAULT from futex syscall by putting abstime
8 bytes before start of some page with previous page not mmapped.

Jakub

2010-06-28 15:04:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

On 06/28, Jakub Jelinek wrote:
>
> On Mon, Jun 28, 2010 at 03:58:25PM +0200, Oleg Nesterov wrote:
> > Honestly, it looks a bit strange to me that you blame the correct code,
>
> Whether that is correct or not is what is being disputed.

OK. I only argued with the "buggy" term. Once again, the code works
as expected.

> > and at the same time you ignore the test-case which hangs because the
> > kernel returns -EFAULT saying that this is the caller's problem.
>
> The userspace code reads the abstime->tv_nsec value, so if it wouldn't
> be valid address, the code would already segfault. And that's fine, POSIX
> certainly allows that, reporting EFAULT isn't required. Well, it doesn't
> read abstime->tv_sec in the assembly version, so if you try hard, you can
> avoid the segfault, yet get EFAULT from futex syscall by putting abstime
> 8 bytes before start of some page with previous page not mmapped.

And this is exactly what I did to prove that (in my opinion) libc needs
fixes anyway, even if we change the kernel to treat tv_sec < 0 specially.

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/mman.h>

pthread_rwlock_t rwlock;

static struct timespec *make_efault_ts(void)
{
int page_size = sysconf(_SC_PAGESIZE);
void *ptr = mmap(0, 2 * page_size, PROT_READ|PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
munmap(ptr, page_size);
return ptr + page_size - sizeof(long);
}

static void *thread_func(void *arg)
{
int ret = pthread_rwlock_timedwrlock(&rwlock, make_efault_ts());
printf("lock: ret=%d %m\n", ret);
return NULL;
}

int main(int argv, char *argc[])
{
pthread_t tid;

pthread_rwlock_init(&rwlock, NULL);
pthread_rwlock_wrlock(&rwlock);

pthread_create(&tid, NULL, thread_func, NULL);
pthread_join(tid, NULL);

return 0;
}

It may hang or segfault on your machine, this depends on libc version.
It hangs on the testing machine which also suffers from the reported
timespec_valid() issue. I did this test-case looking at
"objdump -d /lib64/libpthread.so".

To me, this looks like a bug in libc, but I won't insist.

Oleg.

2010-06-28 15:16:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

On Fri, Jun 25, 2010 at 1:04 PM, Ulrich Drepper <[email protected]> wrote:
>
> The tv_sec in timespec is of type time_t and for absolute time values the same
> semantics as for naked time_t values applies. ?The absolute time is
>
> ?epoch + tv_sec + tv_nsec / 1000000000
>
> If tv_sec is negative these are values before epoch.

That's a totally bogus argument.

Ulrich - you're wrong. Go away.

According to that same argument, negative tv_nsec values (or values
above a billion) would also be ok, since that expression would also
give some random value for time. So clearly, the fact that the
expression gives some random value does _not_ in any way mean that we
should accept that random value. Your "argument" isn't an argument at
all, it's just worthless random noise.

The fact is, a negative tv_sec could indeed mean "before the epoch",
but it could equally well (and way more likely) mean "totally buggy
application" or "overflow of a very large positive value".

In the end, it's quite simple: the kernel doesn't accept invalid
timevals. And negative tv_secs are invalid. It's that simple. If
somebody gives the kernel a timeout from before the epoch, that
somebody is being a total idiot. We know it's not a valid absolute
timeout, since there's no way somebody is "waiting" for something that
happened in the sixties.

Yeah, yeah, maybe you're waiting for flower power and free sex. Good
for you. But if you are, don't ask the Linux kernel to wait with you.
Ok?

Linus

2010-06-28 15:30:31

by Andreas Schwab

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

Linus Torvalds <[email protected]> writes:

> We know it's not a valid absolute timeout, since there's no way
> somebody is "waiting" for something that happened in the sixties.

Should it reject timestamps from the seventies?

Andreas.

--
Andreas Schwab, [email protected]
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84 5EC7 45C6 250E 6F00 984E
"And now for something completely different."

2010-06-28 15:35:09

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

On Mon, 28 Jun 2010, Andreas Schwab wrote:

> Linus Torvalds <[email protected]> writes:
>
> > We know it's not a valid absolute timeout, since there's no way
> > somebody is "waiting" for something that happened in the sixties.
>
> Should it reject timestamps from the seventies?

No, because that's in the _VALID_ range of CLOCK_REALTIME and tv_sec
is >= 0. So we are nice to the post flower power folks.

Thanks,

tglx

2010-06-28 16:05:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: Q: sys_futex() && timespec_valid()

On Mon, Jun 28, 2010 at 8:29 AM, Andreas Schwab <[email protected]> wrote:
> Linus Torvalds <[email protected]> writes:
>
>> We know it's not a valid absolute timeout, since there's no way
>> somebody is "waiting" for something that happened in the sixties.
>
> Should it reject timestamps from the seventies?

Conceptually, we could certainly say "any timeouts from before the
boot of the machine are obviously bogus". It would be stupid and
complicated, and it would be a total pain for anybody who wants to do
migration or anything like that, so we shouldn't do it, but I could
imagine some system that rejected such timeouts as "crazy".

At the same time, for us, there's simply no _reason_ to do that. A
positive time_t value is well-defined. In contrast, a negative tv_sec
value is inherently suspect. Traditionally, you couldn't even know if
time_t was a signed quantity to begin with! And on 32-bit machines, a
negative time_t is quite often the result of overflow (no, you don't
have to get to 2038 to see it - you can get overflows from simply
doing large relative timeouts etc).

So there is no _reason_ to reject timestamps from the seventies. Why
would we care about a specific date? In contrast, negative timestamps
make no sense for absolute timeouts, and they are inherently much less
trustworthy. So it's not about 1970 per se, it's more an issue about
the fundamental representation of time.

In other words, think of negative tv_sec values as a "hmm, I wonder
what the thing is thinking - let's reject it, because I don't want to
guess what the heck is wrong with this guy".

And it's also what we've apparently been doing for a long time, so
changing it had better had some serious reason.

There are certainly cases where negative tv_sec makes sense. Dates on
files, things like that. But why would it make sense to have a
negative tv_sec for an absolute timeout? (Side note: for a _relative_
timeout a negative value makes much more sense: it happens quite
naturally when you end up subtracting two times from each other - but
Ulrich seems to explicitly want negative time for the case where it
makes _less_ sense)

Linus