2007-11-07 23:12:48

by David Brown

[permalink] [raw]
Subject: compat_sys_times() bogus until jiffies >= 0.

compat_sys_times() has bogus return until jiffies is >= 0. I discovered
this running LTP within 5 minutes of booting.

The return result

return compat_jiffies_to_clock_t(jiffies);

will return '-1' to user space and set the negated clock_t value to errno.

I'm not sure what the correct fix for this is. I can come up with a patch
if anyone has ideas on how to fix it.

At minimum, perhaps it should return a sane errno value.

Thanks,
David Brown


2007-11-07 23:29:17

by Andrew Morton

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

> On Wed, 7 Nov 2007 14:47:22 -0800 David Brown <[email protected]> wrote:
> compat_sys_times() has bogus return until jiffies is >= 0. I discovered
> this running LTP within 5 minutes of booting.
>
> The return result
>
> return compat_jiffies_to_clock_t(jiffies);
>
> will return '-1' to user space and set the negated clock_t value to errno.
>
> I'm not sure what the correct fix for this is. I can come up with a patch
> if anyone has ideas on how to fix it.
>
> At minimum, perhaps it should return a sane errno value.

RETURN VALUE
times() returns the number of clock ticks that have elapsed since an
arbitrary point in the past. For Linux 2.4 and earlier this point is
the moment the system was booted. Since Linux 2.6, this point is
(2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot
time. The return value may overflow the possible range of type
clock_t. On error, (clock_t) -1 is returned, and errno is set appro-
priately.


Perhaps this is a bug in glibc: it is interpreting the times() return value
in the same way as other syscalls.

It would have been sensible for us to add INITIAL_JIFFIES to the value
instead of exposing this kernel-only detail to the world, although the
problem will of course reoccur once jiffies hits 0x80000000. Unfortunately
we've even gone and enshrined this bogon in the manpage.

Proposed fix:

- return compat_jiffies_to_clock_t(jiffies);
+ return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) &
+ 0x7fffffff);

?

2007-11-08 00:19:57

by Andrew Morton

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

> On Wed, 7 Nov 2007 15:28:33 -0800 Andrew Morton <[email protected]> wrote:
> > On Wed, 7 Nov 2007 14:47:22 -0800 David Brown <[email protected]> wrote:
> > compat_sys_times() has bogus return until jiffies is >= 0. I discovered
> > this running LTP within 5 minutes of booting.
> >
> > The return result
> >
> > return compat_jiffies_to_clock_t(jiffies);
> >
> > will return '-1' to user space and set the negated clock_t value to errno.
> >
> > I'm not sure what the correct fix for this is. I can come up with a patch
> > if anyone has ideas on how to fix it.
> >
> > At minimum, perhaps it should return a sane errno value.
>
> RETURN VALUE
> times() returns the number of clock ticks that have elapsed since an
> arbitrary point in the past. For Linux 2.4 and earlier this point is
> the moment the system was booted. Since Linux 2.6, this point is
> (2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot
> time. The return value may overflow the possible range of type
> clock_t. On error, (clock_t) -1 is returned, and errno is set appro-
> priately.
>
>
> Perhaps this is a bug in glibc: it is interpreting the times() return value
> in the same way as other syscalls.
>
> It would have been sensible for us to add INITIAL_JIFFIES to the value
> instead of exposing this kernel-only detail to the world, although the
> problem will of course reoccur once jiffies hits 0x80000000. Unfortunately
> we've even gone and enshrined this bogon in the manpage.
>
> Proposed fix:
>
> - return compat_jiffies_to_clock_t(jiffies);
> + return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) &
> + 0x7fffffff);
>
> ?

Like this?

It gets messy.


From: Andrew Morton <[email protected]>

David Brown points out that compat_sys_times() (and sys_times()) can return
arbitrary 32-bit (or 64-bit values). If these happen to be negative (jiffy
wrap, or before INITIAL_JIFFIES) then libc will interpret this as an error and
will return -1 to the libc user and will set errno.

The manpage for times(2) says:

times() returns the number of clock ticks that have elapsed since an
arbitrary point in the past. For Linux 2.4 and earlier this point is
the moment the system was booted. Since Linux 2.6, this point is
(2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot
time. The return value may overflow the possible range of type
clock_t. On error, (clock_t) -1 is returned, and errno is set appro-
priately.

We can fix this by masking the return value down to a 31-bit (63-bit) value.

Also, let's correct for INTIAL_JIFFIES - this isn't a detail which should be
exposed to userspace.

Unfortunately this change can break userspace. If a program was (correctly)
doing:

unsigned long start = times(...);
...
unsigned long end = times(...);
unsigned long delta = end - start;

then `delta' can be grossly wrong if we wrapped in the interval. Instead
userspace will need to mask `delta' by 0x7fffffff (0x7fffffffffffffff) to get
the correct number.

But userspace was already busted in the presence of wraparound, due to glibc's
convert-to-negative-one behaviour.

Given all this stuff, the return value from sys_times() doesn't seem a
particularly useful or reliable kernel interface.

Cc: David Brown <[email protected]>
Cc: Ulrich Drepper <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

kernel/compat.c | 3 ++-
kernel/sys.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff -puN kernel/sys.c~a kernel/sys.c
--- a/kernel/sys.c~a
+++ a/kernel/sys.c
@@ -897,7 +897,8 @@ asmlinkage long sys_times(struct tms __u
if (copy_to_user(tbuf, &tmp, sizeof(struct tms)))
return -EFAULT;
}
- return (long) jiffies_64_to_clock_t(get_jiffies_64());
+ return jiffies_64_to_clock_t((get_jiffies_64() + INITIAL_JIFFIES) &
+ LONG_MAX);
}

/*
diff -puN kernel/compat.c~a kernel/compat.c
--- a/kernel/compat.c~a
+++ a/kernel/compat.c
@@ -162,7 +162,8 @@ asmlinkage long compat_sys_times(struct
if (copy_to_user(tbuf, &tmp, sizeof(tmp)))
return -EFAULT;
}
- return compat_jiffies_to_clock_t(jiffies);
+ return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) &
+ LONG_MAX);
}

/*
_

2007-11-08 00:50:31

by David Miller

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

From: Andrew Morton <[email protected]>
Date: Wed, 7 Nov 2007 15:28:33 -0800

> Perhaps this is a bug in glibc: it is interpreting the times() return value
> in the same way as other syscalls.

The problem is more likely that we are failing to
invoke force_successful_syscall_return() here.

Otherwise the syscall return path interprets negative
values as errors, and sets the cpu condition codes.

And that is what userspace is actually checking for
to determine if there is an error or not.

2007-11-08 00:54:50

by Andreas Schwab

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

Andrew Morton <[email protected]> writes:

> diff -puN kernel/compat.c~a kernel/compat.c
> --- a/kernel/compat.c~a
> +++ a/kernel/compat.c
> @@ -162,7 +162,8 @@ asmlinkage long compat_sys_times(struct
> if (copy_to_user(tbuf, &tmp, sizeof(tmp)))
> return -EFAULT;
> }
> - return compat_jiffies_to_clock_t(jiffies);
> + return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) &
> + LONG_MAX);

Are you sure you want LONG_MAX here, not 0x7fffffff?

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2007-11-08 01:14:52

by Andrew Morton

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

> On Wed, 07 Nov 2007 16:50:22 -0800 (PST) David Miller <[email protected]> wrote:
> From: Andrew Morton <[email protected]>
> Date: Wed, 7 Nov 2007 15:28:33 -0800
>
> > Perhaps this is a bug in glibc: it is interpreting the times() return value
> > in the same way as other syscalls.
>
> The problem is more likely that we are failing to
> invoke force_successful_syscall_return() here.
>
> Otherwise the syscall return path interprets negative
> values as errors, and sets the cpu condition codes.
>
> And that is what userspace is actually checking for
> to determine if there is an error or not.

hm, I'd forgotten about that.

It seems to be a no-op on lots of architectures?

2007-11-08 01:18:43

by Andrew Morton

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

> On Thu, 08 Nov 2007 01:54:40 +0100 Andreas Schwab <[email protected]> wrote:
> Andrew Morton <[email protected]> writes:
>
> > diff -puN kernel/compat.c~a kernel/compat.c
> > --- a/kernel/compat.c~a
> > +++ a/kernel/compat.c
> > @@ -162,7 +162,8 @@ asmlinkage long compat_sys_times(struct
> > if (copy_to_user(tbuf, &tmp, sizeof(tmp)))
> > return -EFAULT;
> > }
> > - return compat_jiffies_to_clock_t(jiffies);
> > + return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) &
> > + LONG_MAX);
>
> Are you sure you want LONG_MAX here, not 0x7fffffff?
>

I'm not sure of anything - I'm just trolling ;)

That's 0x7fffffffffffffff for architectures which implement this function.
I think that lines up correctly with jiffies and the return value from
compat_sys_times().

Perhaps formally it should be USERSPACE_CLOCK_T_MAX, but we don't have that.

2007-11-08 01:54:18

by Paul Mackerras

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

Andrew Morton writes:

> Given all this stuff, the return value from sys_times() doesn't seem a
> particularly useful or reliable kernel interface.

I think the best thing would be to ignore any error from copy_to_user
and always return the number of clock ticks. We should call
force_successful_syscall_return, and glibc on x86 should be taught not
to interpret negative values as an error.

POSIX doesn't require us to return an EFAULT error if the buf argument
is bogus. If userspace does supply a bogus buf pointer, then either
it will dereference it itself and get a segfault, or it won't
dereference it, in which case it obviously didn't care about the
values we tried to put there.

If we try to return an error under some circumstances, then there is
at least one 32-bit value for the number of ticks that will cause
confusion. We can either change that value (or values) to some other
value, which seems pretty bogus, or we can just decide not to return
any errors. The latter seems to me to have no significant downside
and to be the simplest solution to the problem.

Paul.

2007-11-08 02:09:28

by David Miller

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

From: Paul Mackerras <[email protected]>
Date: Thu, 8 Nov 2007 12:53:57 +1100

> Andrew Morton writes:
>
> > Given all this stuff, the return value from sys_times() doesn't seem a
> > particularly useful or reliable kernel interface.
>
> I think the best thing would be to ignore any error from copy_to_user
> and always return the number of clock ticks. We should call
> force_successful_syscall_return, and glibc on x86 should be taught not
> to interpret negative values as an error.
>
> POSIX doesn't require us to return an EFAULT error if the buf argument
> is bogus. If userspace does supply a bogus buf pointer, then either
> it will dereference it itself and get a segfault, or it won't
> dereference it, in which case it obviously didn't care about the
> values we tried to put there.
>
> If we try to return an error under some circumstances, then there is
> at least one 32-bit value for the number of ticks that will cause
> confusion. We can either change that value (or values) to some other
> value, which seems pretty bogus, or we can just decide not to return
> any errors. The latter seems to me to have no significant downside
> and to be the simplest solution to the problem.

I agree with this analysis.

The Linux man page for times() explicitly lists (clock_t) -1 as a
return value meaning error.

So even if we did make some effort to return errors "properly" (via
force_successful_syscall_return() et al.) userspace would still be
screwed because (clock_t) -1 would be interpreted as an error.

Actually I think this basically proves we cannot return (clock_t) -1
ever because all existing userland (I'm not talking about inside
glibc, I'm talking about inside of applications) will see this as an
error.

User applications have no other way to check for error.

This API is definitely very poorly designed, no matter which way we
"fix" this some case will remain broken.

2007-11-08 03:08:19

by Andrew Morton

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

> On Thu, 8 Nov 2007 12:53:57 +1100 Paul Mackerras <[email protected]> wrote:
> Andrew Morton writes:
>
> > Given all this stuff, the return value from sys_times() doesn't seem a
> > particularly useful or reliable kernel interface.
>
> I think the best thing would be to ignore any error from copy_to_user
> and always return the number of clock ticks. We should call
> force_successful_syscall_return, and glibc on x86 should be taught not
> to interpret negative values as an error.

Changing glibc might be hard ;)

> POSIX doesn't require us to return an EFAULT error if the buf argument
> is bogus. If userspace does supply a bogus buf pointer, then either
> it will dereference it itself and get a segfault, or it won't
> dereference it, in which case it obviously didn't care about the
> values we tried to put there.
>
> If we try to return an error under some circumstances, then there is
> at least one 32-bit value for the number of ticks that will cause
> confusion. We can either change that value (or values) to some other
> value, which seems pretty bogus, or we can just decide not to return
> any errors. The latter seems to me to have no significant downside
> and to be the simplest solution to the problem.

"the latter" is what my protopatch does isn't it? It wraps at 0x7fffffff.
It appears that glibc treats all of 0x80000000-0xffffffff as an error.

2007-11-08 03:14:08

by David Miller

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

From: Andrew Morton <[email protected]>
Date: Wed, 7 Nov 2007 19:07:14 -0800

> It appears that glibc treats all of 0x80000000-0xffffffff as an
> error.

glibc treats it as an error if the system call returns with
the carry condition code set. At least that's how I've
understood it to work and at a minimum this is how it works
on sparc, ppc, ia64, mips, etc.

The error indication is being created by the system call return path
in the kernel. It tests for values between -512 and 0, and marks
those as errors unless force_successful_syscall() has been called.

I can't see where x86 is doing this though, so perhaps for x86
glibc does make the negative value check. But I doubt it is
checking the range 0x80000000-0xffffffff, otherwise mmap() would
be busted.

2007-11-08 04:59:28

by Paul Mackerras

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

Andrew Morton writes:

> "the latter" is what my protopatch does isn't it? It wraps at 0x7fffffff.
> It appears that glibc treats all of 0x80000000-0xffffffff as an error.

Not on powerpc. On powerpc the error indication is carried separately
in a condition register bit. So a force_successful_syscall_return()
call will make glibc automatically do the right thing without any
glibc changes on powerpc.

Wrapping at 0x7fffffff will cause programs to see large negative
deltas between successive calls when the wrap occurs. I can see that
giving userspace fits. :)

Paul.

2007-11-08 05:16:25

by Paul Mackerras

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

David Miller writes:

> I can't see where x86 is doing this though, so perhaps for x86
> glibc does make the negative value check. But I doubt it is
> checking the range 0x80000000-0xffffffff, otherwise mmap() would
> be busted.

At least for the INTERNAL_SYSCALL macro in glibc, the error check is:

#define INTERNAL_SYSCALL_ERROR_P(val, err) \
((unsigned int) (val) >= 0xfffff001u)

in sysdeps/unix/sysv/linux/i386/sysdep.h. Similarly the PSEUDO macro
in that file does a cmpl $-4095,%eax to test for error. (There is also
a PSEUDO_NOERRNO which doesn't test for error.)

So the convention on (32-bit) x86 is that -4095 .. -1 are error
values, and other values are successful return values.

Paul.

2007-11-08 05:20:44

by Andrew Morton

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

> On Thu, 8 Nov 2007 15:59:12 +1100 Paul Mackerras <[email protected]> wrote:
> Andrew Morton writes:
>
> > "the latter" is what my protopatch does isn't it? It wraps at 0x7fffffff.
> > It appears that glibc treats all of 0x80000000-0xffffffff as an error.
>
> Not on powerpc. On powerpc the error indication is carried separately
> in a condition register bit. So a force_successful_syscall_return()
> call will make glibc automatically do the right thing without any
> glibc changes on powerpc.

OK

> Wrapping at 0x7fffffff will cause programs to see large negative
> deltas between successive calls when the wrap occurs. I can see that
> giving userspace fits. :)
>

Yup. But userspace will already have a fit if either the start or end time
advanced into the glibc-thought-that-was-an-error range.

2007-11-08 05:36:24

by Paul Mackerras

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

Andrew Morton writes:

> Yup. But userspace will already have a fit if either the start or end time
> advanced into the glibc-thought-that-was-an-error range.

Not nearly as much of a fit. The effect on x86 is that values between
-4095 and -1 are reported as -1, so the end-start difference will be
out by less than 41 seconds. That's not nearly as dramatic as a
difference of 21 million seconds (over 16 years). :)

I really think that wrapping at 0x7fffffff makes the situation worse,
not better.

Paul.

2007-11-08 06:00:39

by David Brown

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

On Wed, Nov 07, 2007 at 03:28:33PM -0800, Andrew Morton wrote:
>> On Wed, 7 Nov 2007 14:47:22 -0800 David Brown <[email protected]> wrote:

>> will return '-1' to user space and set the negated clock_t value to errno.
>>
>> At minimum, perhaps it should return a sane errno value.
>
>RETURN VALUE
> times() returns the number of clock ticks that have elapsed since an
> arbitrary point in the past. For Linux 2.4 and earlier this point is
> the moment the system was booted. Since Linux 2.6, this point is
> (2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot
> time. The return value may overflow the possible range of type
> clock_t. On error, (clock_t) -1 is returned, and errno is set appro-
> priately.

The strange -1 behavior is enshrined in history. I think a better answer
is to tell people to use getrusage() if they want a return result without
this problem.

Adding INITIAL_JIFFIES will fix the case where an embedded system is booted
up to run a test and then shut down, and the mask, although it causes
discontinuities periodically at least moves them away from the early boot.

INITIAL_JIFFIES was a good idea, but it is probably best to keep it inside
of the kernel.

David Brown

2007-11-08 06:12:37

by Andrew Morton

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

> On Thu, 8 Nov 2007 16:36:08 +1100 Paul Mackerras <[email protected]> wrote:
> Andrew Morton writes:
>
> > Yup. But userspace will already have a fit if either the start or end time
> > advanced into the glibc-thought-that-was-an-error range.
>
> Not nearly as much of a fit. The effect on x86 is that values between
> -4095 and -1 are reported as -1, so the end-start difference will be
> out by less than 41 seconds. That's not nearly as dramatic as a
> difference of 21 million seconds (over 16 years). :)
>
> I really think that wrapping at 0x7fffffff makes the situation worse,
> not better.
>

Sure.

So we need to do what you say: never return an error from sys_times() and
change glibc to not perform error-interpretation on sys_times() return
values and recommend that people bypass libc and go direct to the syscall
so they'll work correctly on older glibc. Lovely.

I wonder what happens with things like F_GETOWN, shmat() and lseek(/dev/mem)
on x86 (things which use force_successful_syscall_return()). According
to the comment in include/linux/ptrace.h, glibc should be special-casing
these.

2007-11-08 06:22:58

by David Miller

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

From: Paul Mackerras <[email protected]>
Date: Thu, 8 Nov 2007 15:59:12 +1100

> Not on powerpc. On powerpc the error indication is carried separately
> in a condition register bit. So a force_successful_syscall_return()
> call will make glibc automatically do the right thing without any
> glibc changes on powerpc.

It still won't fix the problem.

When the return value is (clock_t) -1, all the
force_successful_syscall_return() calls and glibc condition
codes checks in the world are not going to fix the application
code which checks for error using -1.

2007-11-08 06:24:18

by David Miller

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

From: Paul Mackerras <[email protected]>
Date: Thu, 8 Nov 2007 16:15:51 +1100

> David Miller writes:
>
> > I can't see where x86 is doing this though, so perhaps for x86
> > glibc does make the negative value check. But I doubt it is
> > checking the range 0x80000000-0xffffffff, otherwise mmap() would
> > be busted.
>
> At least for the INTERNAL_SYSCALL macro in glibc, the error check is:
>
> #define INTERNAL_SYSCALL_ERROR_P(val, err) \
> ((unsigned int) (val) >= 0xfffff001u)
>
> in sysdeps/unix/sysv/linux/i386/sysdep.h. Similarly the PSEUDO macro
> in that file does a cmpl $-4095,%eax to test for error. (There is also
> a PSEUDO_NOERRNO which doesn't test for error.)
>
> So the convention on (32-bit) x86 is that -4095 .. -1 are error
> values, and other values are successful return values.

Thanks for figuring that out.

Really there is no way to fix sys_times() return values
universally. Each proposed solution either doesn't fix
the problem, or adds a new failure mode.

2007-11-08 06:25:38

by David Miller

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

From: Andrew Morton <[email protected]>
Date: Wed, 7 Nov 2007 21:20:05 -0800

> Yup. But userspace will already have a fit if either the start or end time
> advanced into the glibc-thought-that-was-an-error range.

On x86 only. We could use force_successful_syscall_return()
to make sure the condition codes get set correctly on
other platforms.

But even in that case we'd still be broken when the return
value is exactly -1 and that's what the application is going
to compare against to test for errors.

2007-11-08 07:09:59

by Andrew Morton

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

> On Wed, 07 Nov 2007 22:25:30 -0800 (PST) David Miller <[email protected]> wrote:
> From: Andrew Morton <[email protected]>
> Date: Wed, 7 Nov 2007 21:20:05 -0800
>
> > Yup. But userspace will already have a fit if either the start or end time
> > advanced into the glibc-thought-that-was-an-error range.
>
> On x86 only. We could use force_successful_syscall_return()
> to make sure the condition codes get set correctly on
> other platforms.
>
> But even in that case we'd still be broken when the return
> value is exactly -1 and that's what the application is going
> to compare against to test for errors.

I don't think that's a big problem? This syscall can (oddly) return any
32-bit (64-bit) number and a smart application developer (after saying wtf)
would realise that he just can't check for errors and have correctly
working code.

Then again, if he was smart he just wouldn't use times(2)'s return value
for anything. But what is the alternative? I don't think there is one,
apart from much saner things like gettimeofday().

2007-11-08 07:14:50

by David Miller

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

From: Andrew Morton <[email protected]>
Date: Wed, 7 Nov 2007 23:09:16 -0800

> I don't think that's a big problem? This syscall can (oddly) return any
> 32-bit (64-bit) number and a smart application developer (after saying wtf)
> would realise that he just can't check for errors and have correctly
> working code.
>
> Then again, if he was smart he just wouldn't use times(2)'s return value
> for anything. But what is the alternative? I don't think there is one,
> apart from much saner things like gettimeofday().

You and I would say "wtf", but the manual states what it does:

On error, (clock_t) -1 is returned, and errno is set appro-
priately.

And I think this (obviously bogus) convention is something we
are really stuck with.

Another awful aspect of this is that glibc is going to overwrite
'errno' for this return value range. That will likely cause more
application misbehavior than some of the other side effects we've been
discussing.

In short we have two problems:

1) glibc thinks -4096 < x < 0 is an error, and will write this
value into errno and return -1 to the application

2) the manual states that -1 means error

2007-11-08 08:53:57

by Paul Mackerras

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

David Miller writes:

> On x86 only. We could use force_successful_syscall_return()
> to make sure the condition codes get set correctly on
> other platforms.
>
> But even in that case we'd still be broken when the return
> value is exactly -1 and that's what the application is going
> to compare against to test for errors.

We could special-case that and turn it into 0. That would introduce a
0.01 second blip, which would be better than a 41 second window for
bad behaviour like we have at the moment.

It's also possible that many applications already don't check for
errors. For example, glibc deliberately doesn't check for errors when
it calls __times in the clock() implementation. There is a comment in
sysdeps/unix/sysv/linux/clock.c that says this:

/* We don't check for errors here. The only error the kernel
returns is EFAULT if the value cannot be written to the struct we
pass a pointer to. Otherwise the kernel returns an `unsigned
long' value which is the number of jiffies since system start.
But this number can be negative (when read as `long') when the
system is up for some time. Ignoring errors should therefore
have no negative impacts but solve the problem. */
__times (&buf);

Paul.

2007-11-08 10:20:30

by Andreas Schwab

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

David Miller <[email protected]> writes:

> I agree with this analysis.
>
> The Linux man page for times() explicitly lists (clock_t) -1 as a
> return value meaning error.
>
> So even if we did make some effort to return errors "properly" (via
> force_successful_syscall_return() et al.) userspace would still be
> screwed because (clock_t) -1 would be interpreted as an error.
>
> Actually I think this basically proves we cannot return (clock_t) -1
> ever because all existing userland (I'm not talking about inside
> glibc, I'm talking about inside of applications) will see this as an
> error.
>
> User applications have no other way to check for error.
>
> This API is definitely very poorly designed, no matter which way we
> "fix" this some case will remain broken.

A possible remedy is to return the ticks since process start time, which
delays the wrap around much further. POSIX only demands consistency
within the same process.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2007-11-08 14:43:50

by Chris Friesen

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

Andreas Schwab wrote:

> A possible remedy is to return the ticks since process start time, which
> delays the wrap around much further. POSIX only demands consistency
> within the same process.

This would be an interesting solution.

The man page for linux states that the return code is time since system
boot, so that could realistically be expected to correlate between
different processes.

Could we get away with changing the man page and breaking any apps
relying on this previously-documented behaviour?



Chris

2007-11-08 19:26:17

by Denys Vlasenko

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

On Thursday 08 November 2007 02:09, David Miller wrote:
> > I think the best thing would be to ignore any error from copy_to_user
> > and always return the number of clock ticks. We should call
> > force_successful_syscall_return, and glibc on x86 should be taught not
> > to interpret negative values as an error.
> >
> > POSIX doesn't require us to return an EFAULT error if the buf argument
> > is bogus. If userspace does supply a bogus buf pointer, then either
> > it will dereference it itself and get a segfault, or it won't
> > dereference it, in which case it obviously didn't care about the
> > values we tried to put there.
> >
> > If we try to return an error under some circumstances, then there is
> > at least one 32-bit value for the number of ticks that will cause
> > confusion. We can either change that value (or values) to some other
> > value, which seems pretty bogus, or we can just decide not to return
> > any errors. The latter seems to me to have no significant downside
> > and to be the simplest solution to the problem.
>
> I agree with this analysis.
>
> The Linux man page for times() explicitly lists (clock_t) -1 as a
> return value meaning error.
>
> So even if we did make some effort to return errors "properly" (via
> force_successful_syscall_return() et al.) userspace would still be
> screwed because (clock_t) -1 would be interpreted as an error.
>
> Actually I think this basically proves we cannot return (clock_t) -1
> ever because all existing userland (I'm not talking about inside
> glibc, I'm talking about inside of applications) will see this as an
> error.

What error? I'd argue it's perfectly sane for application to
assume that times() never fails.

struct tms t;
clock_t start = times(&t);
...
clock_t end = times(&t);
clock_t delta = end - start;

The only error form kernel POV is that passed pointer can be
invalid. But from application POV in the above example it
cannot be true and

if (start == -1)
error("error in times!");

would be and exercise in wasting CPU cycles, producing dead code
and feeding one's paranoia.

> User applications have no other way to check for error.

And in all realistic scenarios it doesn't need to.

In this particular case, it makes sense to ignore standards and
never return an error. If user indeed passed invalid pointer,
just don't store anything there, but still return valid value.
--
vda

2007-11-08 19:27:34

by Denys Vlasenko

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

On Thursday 08 November 2007 03:07, Andrew Morton wrote:
> > On Thu, 8 Nov 2007 12:53:57 +1100 Paul Mackerras <[email protected]> wrote:
> > Andrew Morton writes:
> >
> > > Given all this stuff, the return value from sys_times() doesn't seem a
> > > particularly useful or reliable kernel interface.
> >
> > I think the best thing would be to ignore any error from copy_to_user
> > and always return the number of clock ticks. We should call
> > force_successful_syscall_return, and glibc on x86 should be taught not
> > to interpret negative values as an error.
>
> Changing glibc might be hard ;)
>
> > POSIX doesn't require us to return an EFAULT error if the buf argument
> > is bogus. If userspace does supply a bogus buf pointer, then either
> > it will dereference it itself and get a segfault, or it won't
> > dereference it, in which case it obviously didn't care about the
> > values we tried to put there.
> >
> > If we try to return an error under some circumstances, then there is
> > at least one 32-bit value for the number of ticks that will cause
> > confusion. We can either change that value (or values) to some other
> > value, which seems pretty bogus, or we can just decide not to return
> > any errors. The latter seems to me to have no significant downside
> > and to be the simplest solution to the problem.
>
> "the latter" is what my protopatch does isn't it? It wraps at 0x7fffffff.
> It appears that glibc treats all of 0x80000000-0xffffffff as an error.

The best solution is to change the kernel to never return an error
and to change glibc to never treat return as an error.
--
vda

2007-11-09 18:21:16

by Ulrich Drepper

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Friesen wrote:
>> A possible remedy is to return the ticks since process start time, which
>> delays the wrap around much further. POSIX only demands consistency
>> within the same process.
>
> This would be an interesting solution.
>
> The man page for linux states that the return code is time since system
> boot, so that could realistically be expected to correlate between
> different processes.

The Linux man page is documenting existing functionality on top of what
the standard requires. Programmers should ever only require what the
standard guarantees.

I am perfectly willing to support a solution where the time is measured
from process startup time. The only code using times() I found is
cross-platform and most likely does not depend on the value returned is
usable in isolation (only in a difference).

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHNKTZ2ijCOnn/RHQRAv2wAJsHOnWRrbE2N2Z4R35bsU1+BIZEGQCguaxL
zY9f4XEhJnAoNF5jFxm76qI=
=0nsU
-----END PGP SIGNATURE-----

2007-12-20 11:37:12

by Michael Kerrisk

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

David, Andrew, Paul,

A late coda to this thread, but I'll just note some changes I'm making to
the man page (which I'd like you to review -- please see below), and note a
few other points.

Andrew, you asked about what happens for x86 with the -1 to -4095 return
for other syscalls. At least two other syscalls suffer the same problem.
>From the fcntl(2) man page:

BUGS
A limitation of the Linux system call conventions on some
architectures (notably x86) means that if a (negative)
process group ID to be returned by F_GETOWN falls in the
range -1 to -4095, then the return value is wrongly
interpreted by glibc as an error in the system call; that
is, the return value of fcntl() will be -1, and errno
will contain the (positive) process group ID.

Some testing just now shows me that lseek() on /dev/mem suffers similar
problems when seeking to bytes 0xfffff001 through to 0xffffffff.

Ulrich Drepper wrote:
> Chris Friesen wrote:
>>> A possible remedy is to return the ticks since process start time, which
>>> delays the wrap around much further. POSIX only demands consistency
>>> within the same process.
>> This would be an interesting solution.
>
>> The man page for linux states that the return code is time since system
>> boot, so that could realistically be expected to correlate between
>> different processes.
>
> The Linux man page is documenting existing functionality on top of what
> the standard requires. Programmers should ever only require what the
> standard guarantees.
>
> I am perfectly willing to support a solution where the time is measured
>>from process startup time. The only code using times() I found is
> cross-platform and most likely does not depend on the value returned is
> usable in isolation (only in a difference).

Did I miss anything? Is anyone actually working on a solution along these
lines?

In the meantime, for man-pages-2.74, I've reworked the description of the
return value:

RETURN VALUE
times() returns the number of clock ticks that have
elapsed since an arbitrary point in the past. The return
value may overflow the possible range of type clock_t.
On error, (clock_t) -1 is returned, and errno is set
appropriately.

I moved the Linux specific details of the return value to NOTES, and added
a warning about relying on those details:

NOTES
...

On Linux, the "arbitrary point in the past" from which
the return value of times() is measured has varied across
kernel versions. On Linux 2.4 and earlier this point is
the moment the system was booted. Since Linux 2.6, this
point is (2^32/HZ) - 300 (i.e., about 429 million) sec-
onds before system boot time. This variability across
kernel versions (and across Unix implementations), com-
bined with the fact that the returned value may overflow
the range of clock_t, means that a portable application
would be wise to avoid using this value. To measure
changes in elapsed time, use gettimeofday(2) instead.

Under BUGS I added:

BUGS
A limitation of the Linux system call conventions on some
architectures (notably x86) means that on Linux 2.6 there
is a small time window (41 seconds) soon after boot when
times(2) can return -1, falsely indicating that an error
occurred. The same problem can occur when the return
value wraps passed the maximum value that can be stored
in clockid_t.

Look okay to you folks?

Cheers,

Michael
--
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug? Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html

2007-12-20 11:51:18

by David Miller

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

From: Michael Kerrisk <[email protected]>
Date: Thu, 20 Dec 2007 12:36:52 +0100

> Some testing just now shows me that lseek() on /dev/mem suffers similar
> problems when seeking to bytes 0xfffff001 through to 0xffffffff.

Only on x86 platforms. Sparc, IA64, MIPS, powerpc, etc. all get this
case right.

Yes it's another unfortunate side effect of how error status is
indicated for x86 system calls.

I would suggest, that we put something in place to fix this
in the long term:

1) Start setting the condition codes properly to indicate
error in the system call return path on x86 like other
platforms do now. Make sure that it tips off on
force_successful_syscall_return(), as needed.

2) Come up with a transition plan for glibc et al. to take
advantage of this.

It actually sounds like the kind of problem that could be
solved well using the VDSO page. :-)

2007-12-22 00:42:19

by Andi Kleen

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

David Miller <[email protected]> writes:

> Only on x86 platforms. Sparc, IA64, MIPS, powerpc, etc. all get this
> case right.

Do they for 32bit kernels or for 64bit userland?

> Yes it's another unfortunate side effect of how error status is
> indicated for x86 system calls.

Maybe I'm dense, but doesn't all the kernel code pass it the
same way as the x86 syscall code? For your proposal you
would need a separate error bit coming out of the sys_* to
handle this case. Basically rewrite all code that ever returns
errors in the kernel. Or do I miss something?

Or are you talking about solving it only for 32bit compat emulation
on 64bit kernels? There it would be possible, but only doing
it there and not on native 32bit systems wouldn't seem clean to me.

At least on x86-64 the compat code's (near) only goal in live is to be
as compatible to 32it as possible not better.

-Andi

2007-12-22 01:41:33

by David Miller

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

From: Andi Kleen <[email protected]>
Date: Sat, 22 Dec 2007 01:42:02 +0100

> David Miller <[email protected]> writes:
>
> > Only on x86 platforms. Sparc, IA64, MIPS, powerpc, etc. all get this
> > case right.
>
> Do they for 32bit kernels or for 64bit userland?

Both. This is not a compat issue.

> > Yes it's another unfortunate side effect of how error status is
> > indicated for x86 system calls.
>
> Maybe I'm dense, but doesn't all the kernel code pass it the
> same way as the x86 syscall code? For your proposal you
> would need a separate error bit coming out of the sys_* to
> handle this case. Basically rewrite all code that ever returns
> errors in the kernel. Or do I miss something?

I'm suggesting that you set the condition codes based upon whether
there is an error or not. That is the critical thing x86 doesn't do
that all the other platforms do.

x86 relies on interpretation of purely the integer returned from the
system call to userspace, and that means a certain chunk of the return
value space can never represent valid values.

If you use the condition codes to signal "the return value is an
error" you don't have these problems.

2007-12-22 01:45:55

by David Miller

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

From: David Miller <[email protected]>
Date: Fri, 21 Dec 2007 17:41:24 -0800 (PST)

> I'm suggesting that you set the condition codes based upon whether
> there is an error or not. That is the critical thing x86 doesn't do
> that all the other platforms do.

And if you still don't get it, I'm saying that x86, in the syscall
trap return path, should set the conditon codes based upon whether the
system call is really signalling an error or not.

And to handle potentially ambiguous cases we, for a long time, have
the force_successful_syscall_return() arch hook. System call
implementations use this when the return values they give could be
mis-construed as error values.

And if you'll notice x86 makes no attempt to implement that hook,
because it currently can't. That's what needs to be fixed.

2007-12-22 01:48:34

by Andi Kleen

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

> I'm suggesting that you set the condition codes based upon whether
> there is an error or not.

And the only way the syscall code could find out if there is an error is by
checking err < 0 && err >= -4096 like glibc (except for the compat
syscall on 64bit kernel case)

Or rewrite all code that returns errors to system calls to pass
a separate flag too.

> That is the critical thing x86 doesn't do
> that all the other platforms do.

It doesn't do it because it's useless without a kernel rewrite.

I frankly doubt it really works on Sparc :-) Maybe it could work
there on a hypothetical rewritten kernel, but not today.

-Andi

2007-12-22 01:52:24

by Andi Kleen

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

> And to handle potentially ambiguous cases we, for a long time, have
> the force_successful_syscall_return() arch hook.

Ah I see what you mean now.

Thanks for the clarification.

Ok that could be in theory made to work yes. The migration would
probably be ugly though (how would glibc figure out if the kernel
does that or not?)

-Andi

2007-12-22 04:36:50

by David Miller

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

From: Andi Kleen <[email protected]>
Date: Sat, 22 Dec 2007 02:53:11 +0100

> > And to handle potentially ambiguous cases we, for a long time, have
> > the force_successful_syscall_return() arch hook.
>
> Ah I see what you mean now.
>
> Thanks for the clarification.

Thanks for continuing to insist it's "impossible" :-)

2007-12-22 12:46:24

by Andi Kleen

[permalink] [raw]
Subject: Re: compat_sys_times() bogus until jiffies >= 0.

On Fri, Dec 21, 2007 at 08:36:40PM -0800, David Miller wrote:
> From: Andi Kleen <[email protected]>
> Date: Sat, 22 Dec 2007 02:53:11 +0100
>
> > > And to handle potentially ambiguous cases we, for a long time, have
> > > the force_successful_syscall_return() arch hook.
> >
> > Ah I see what you mean now.
> >
> > Thanks for the clarification.
>
> Thanks for continuing to insist it's "impossible" :-)

It's still hard -- e.g. i'm not sure your condition flag setting would
be even possible for i386 SYSEXIT which does not restore EFLAGS
from memory and has some other constraints too. And there is no
free other register to use for this either on i386 nor x86-64.

Ok you could always disable SYSEXIT on force_successfull_return(),
but then e.g. all lseek()s would use the slow path which might not
be a good idea.

-Andi