2002-07-19 09:49:47

by Paul Eggert

[permalink] [raw]
Subject: [PATCH] 'select' failure or signal should not update timeout

[This follows up a thread in the emacs-pretesters mailing list about a
problem with Emacs, 'select', SA_RESTART, and interrupts.]

> Date: Wed, 17 Jul 2002 08:30:57 -0600 (MDT)
> From: Richard Stallman <[email protected]>
>
> Is it true today that restarting a `select' call after a signal (with
> SA_RESTART) alters the contents of the user's timeout parameter?

I looked into this a bit more, and it turns out to be a problem in the
Linux kernel. POSIX 1003.1-2001
<http://www.opengroup.org/onlinepubs/007904975/functions/select.html>
says that 'select' may modify its timeout argument only "upon
successful completion". However, the Linux kernel sometimes modifies
the timeout argument even when 'select' fails or is interrupted.

Here is a program that illustrates the problem. At the end of this
message I enclose a proposed patch to the Linux kernel to fix the
problem.

/* Conformance test for POSIX 1003.1-2001's requirement that select()
must not update its timeout argument unless it succeeds. */

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <unistd.h>

static void
moan (char const *string)
{
perror (string);
exit (1);
}

struct timeval timeout;
struct timeval timeout_when_in_handler;
struct sigaction act;

void
handle_sigalrm (int sig)
{
timeout_when_in_handler = timeout;
}

enum { TIMEOUT_SECONDS = 5 };

int
main (int argc, char **argv)
{
act.sa_handler = handle_sigalrm;
act.sa_flags = SA_RESTART;
if (sigaction (SIGALRM, &act, 0) != 0)
moan ("sigaction");
timeout.tv_sec = TIMEOUT_SECONDS;
timeout_when_in_handler = timeout;
alarm (TIMEOUT_SECONDS / 2);
if (select (0, 0, 0, 0, &timeout) != 0)
{
if (timeout.tv_sec != TIMEOUT_SECONDS)
{
perror ("select");
fprintf (stderr, "select failed, but timeout was updated to %ld.%.9ld seconds\n",
(long) timeout.tv_sec, timeout.tv_usec);
}
}

if (timeout_when_in_handler.tv_sec != TIMEOUT_SECONDS)
fprintf (stderr, "timeout was updated to %ld.%.9ld seconds while signal handler was active\n",
(long) timeout_when_in_handler.tv_sec,
timeout_when_in_handler.tv_usec);

return 0;
}


Here is a proposed patch to Linux kernel 2.5.26. The patch also
applies to Linux 2.4.18, though you have to ignore the patches to
files that do not exist in 2.4.18. I haven't tested this patch, but
it's pretty straightforward.

diff -prU6 2.5.26/arch/ia64/ia32/sys_ia32.c 2.5.26-select/arch/ia64/ia32/sys_ia32.c
--- 2.5.26/arch/ia64/ia32/sys_ia32.c Tue Jul 16 16:49:35 2002
+++ 2.5.26-select/arch/ia64/ia32/sys_ia32.c Fri Jul 19 02:17:08 2002
@@ -1058,32 +1058,32 @@ sys32_select (int n, fd_set *inp, fd_set
zero_fd_set(n, fds.res_in);
zero_fd_set(n, fds.res_out);
zero_fd_set(n, fds.res_ex);

ret = do_select(n, &fds, &timeout);

+ if (ret < 0)
+ goto out;
+ if (!ret) {
+ ret = -ERESTARTNOHAND;
+ if (signal_pending(current))
+ goto out;
+ ret = 0;
+ }
+
if (tvp32 && !(current->personality & STICKY_TIMEOUTS)) {
time_t sec = 0, usec = 0;
if (timeout) {
sec = timeout / HZ;
usec = timeout % HZ;
usec *= (1000000/HZ);
}
if (put_user(sec, &tvp32->tv_sec) || put_user(usec, &tvp32->tv_usec)) {
ret = -EFAULT;
goto out;
}
- }
-
- if (ret < 0)
- goto out;
- if (!ret) {
- ret = -ERESTARTNOHAND;
- if (signal_pending(current))
- goto out;
- ret = 0;
}

set_fd_set(n, inp, fds.res_in);
set_fd_set(n, outp, fds.res_out);
set_fd_set(n, exp, fds.res_ex);

diff -prU6 2.5.26/arch/mips64/kernel/linux32.c 2.5.26-select/arch/mips64/kernel/linux32.c
--- 2.5.26/arch/mips64/kernel/linux32.c Tue Jul 16 16:49:25 2002
+++ 2.5.26-select/arch/mips64/kernel/linux32.c Fri Jul 19 02:18:19 2002
@@ -1170,30 +1170,30 @@ asmlinkage int sys32_select(int n, u32 *
zero_fd_set(n, fds.res_in);
zero_fd_set(n, fds.res_out);
zero_fd_set(n, fds.res_ex);

ret = do_select(n, &fds, &timeout);

+ if (ret < 0)
+ goto out;
+ if (!ret) {
+ ret = -ERESTARTNOHAND;
+ if (signal_pending(current))
+ goto out;
+ ret = 0;
+ }
+
if (tvp && !(current->personality & STICKY_TIMEOUTS)) {
time_t sec = 0, usec = 0;
if (timeout) {
sec = timeout / HZ;
usec = timeout % HZ;
usec *= (1000000/HZ);
}
put_user(sec, &tvp->tv_sec);
put_user(usec, &tvp->tv_usec);
- }
-
- if (ret < 0)
- goto out;
- if (!ret) {
- ret = -ERESTARTNOHAND;
- if (signal_pending(current))
- goto out;
- ret = 0;
}

set_fd_set32(nn, inp, fds.res_in);
set_fd_set32(nn, outp, fds.res_out);
set_fd_set32(nn, exp, fds.res_ex);

diff -prU6 2.5.26/arch/ppc64/kernel/sys_ppc32.c 2.5.26-select/arch/ppc64/kernel/sys_ppc32.c
--- 2.5.26/arch/ppc64/kernel/sys_ppc32.c Tue Jul 16 16:49:36 2002
+++ 2.5.26-select/arch/ppc64/kernel/sys_ppc32.c Fri Jul 19 02:17:58 2002
@@ -762,30 +762,30 @@ asmlinkage long sys32_select(int n, u32
zero_fd_set(n, fds.res_in);
zero_fd_set(n, fds.res_out);
zero_fd_set(n, fds.res_ex);

ret = do_select(n, &fds, &timeout);

+ if (ret < 0)
+ goto out;
+ if (!ret) {
+ ret = -ERESTARTNOHAND;
+ if (signal_pending(current))
+ goto out;
+ ret = 0;
+ }
+
if (tvp && !(current->personality & STICKY_TIMEOUTS)) {
time_t sec = 0, usec = 0;
if (timeout) {
sec = timeout / HZ;
usec = timeout % HZ;
usec *= (1000000/HZ);
}
put_user(sec, &tvp->tv_sec);
put_user(usec, &tvp->tv_usec);
- }
-
- if (ret < 0)
- goto out;
- if (!ret) {
- ret = -ERESTARTNOHAND;
- if (signal_pending(current))
- goto out;
- ret = 0;
}

set_fd_set32(nn, inp, fds.res_in);
set_fd_set32(nn, outp, fds.res_out);
set_fd_set32(nn, exp, fds.res_ex);

diff -prU6 2.5.26/arch/s390x/kernel/linux32.c 2.5.26-select/arch/s390x/kernel/linux32.c
--- 2.5.26/arch/s390x/kernel/linux32.c Tue Jul 16 16:49:27 2002
+++ 2.5.26-select/arch/s390x/kernel/linux32.c Fri Jul 19 02:18:27 2002
@@ -1420,30 +1420,30 @@ asmlinkage int sys32_select(int n, u32 *
zero_fd_set(n, fds.res_in);
zero_fd_set(n, fds.res_out);
zero_fd_set(n, fds.res_ex);

ret = do_select(n, &fds, &timeout);

+ if (ret < 0)
+ goto out;
+ if (!ret) {
+ ret = -ERESTARTNOHAND;
+ if (signal_pending(current))
+ goto out;
+ ret = 0;
+ }
+
if (tvp && !(current->personality & STICKY_TIMEOUTS)) {
int sec = 0, usec = 0;
if (timeout) {
sec = timeout / HZ;
usec = timeout % HZ;
usec *= (1000000/HZ);
}
put_user(sec, &tvp->tv_sec);
put_user(usec, &tvp->tv_usec);
- }
-
- if (ret < 0)
- goto out;
- if (!ret) {
- ret = -ERESTARTNOHAND;
- if (signal_pending(current))
- goto out;
- ret = 0;
}

set_fd_set32(nn, inp, fds.res_in);
set_fd_set32(nn, outp, fds.res_out);
set_fd_set32(nn, exp, fds.res_ex);

diff -prU6 2.5.26/arch/sparc64/kernel/sys_sparc32.c 2.5.26-select/arch/sparc64/kernel/sys_sparc32.c
--- 2.5.26/arch/sparc64/kernel/sys_sparc32.c Tue Jul 16 16:49:35 2002
+++ 2.5.26-select/arch/sparc64/kernel/sys_sparc32.c Fri Jul 19 02:16:59 2002
@@ -1387,30 +1387,30 @@ asmlinkage int sys32_select(int n, u32 *
zero_fd_set(n, fds.res_in);
zero_fd_set(n, fds.res_out);
zero_fd_set(n, fds.res_ex);

ret = do_select(n, &fds, &timeout);

+ if (ret < 0)
+ goto out;
+ if (!ret) {
+ ret = -ERESTARTNOHAND;
+ if (signal_pending(current))
+ goto out;
+ ret = 0;
+ }
+
if (tvp && !(current->personality & STICKY_TIMEOUTS)) {
time_t sec = 0, usec = 0;
if (timeout) {
sec = timeout / HZ;
usec = timeout % HZ;
usec *= (1000000/HZ);
}
put_user(sec, &tvp->tv_sec);
put_user(usec, &tvp->tv_usec);
- }
-
- if (ret < 0)
- goto out;
- if (!ret) {
- ret = -ERESTARTNOHAND;
- if (signal_pending(current))
- goto out;
- ret = 0;
}

set_fd_set32(nn, inp, fds.res_in);
set_fd_set32(nn, outp, fds.res_out);
set_fd_set32(nn, exp, fds.res_ex);

diff -prU6 2.5.26/arch/x86_64/ia32/sys_ia32.c 2.5.26-select/arch/x86_64/ia32/sys_ia32.c
--- 2.5.26/arch/x86_64/ia32/sys_ia32.c Tue Jul 16 16:49:32 2002
+++ 2.5.26-select/arch/x86_64/ia32/sys_ia32.c Fri Jul 19 02:18:07 2002
@@ -878,30 +878,30 @@ sys32_select(int n, fd_set *inp, fd_set
zero_fd_set(n, fds.res_in);
zero_fd_set(n, fds.res_out);
zero_fd_set(n, fds.res_ex);

ret = do_select(n, &fds, &timeout);

+ if (ret < 0)
+ goto out;
+ if (!ret) {
+ ret = -ERESTARTNOHAND;
+ if (signal_pending(current))
+ goto out;
+ ret = 0;
+ }
+
if (tvp32 && !(current->personality & STICKY_TIMEOUTS)) {
time_t sec = 0, usec = 0;
if (timeout) {
sec = timeout / HZ;
usec = timeout % HZ;
usec *= (1000000/HZ);
}
put_user(sec, (int *)&tvp32->tv_sec);
put_user(usec, (int *)&tvp32->tv_usec);
- }
-
- if (ret < 0)
- goto out;
- if (!ret) {
- ret = -ERESTARTNOHAND;
- if (signal_pending(current))
- goto out;
- ret = 0;
}

set_fd_set(n, inp, fds.res_in);
set_fd_set(n, outp, fds.res_out);
set_fd_set(n, exp, fds.res_ex);

diff -prU6 2.5.26/fs/select.c 2.5.26-select/fs/select.c
--- 2.5.26/fs/select.c Tue Jul 16 16:49:24 2002
+++ 2.5.26-select/fs/select.c Fri Jul 19 02:18:41 2002
@@ -316,30 +316,30 @@ sys_select(int n, fd_set *inp, fd_set *o
zero_fd_set(n, fds.res_in);
zero_fd_set(n, fds.res_out);
zero_fd_set(n, fds.res_ex);

ret = do_select(n, &fds, &timeout);

+ if (ret < 0)
+ goto out;
+ if (!ret) {
+ ret = -ERESTARTNOHAND;
+ if (signal_pending(current))
+ goto out;
+ ret = 0;
+ }
+
if (tvp && !(current->personality & STICKY_TIMEOUTS)) {
time_t sec = 0, usec = 0;
if (timeout) {
sec = timeout / HZ;
usec = timeout % HZ;
usec *= (1000000/HZ);
}
put_user(sec, &tvp->tv_sec);
put_user(usec, &tvp->tv_usec);
- }
-
- if (ret < 0)
- goto out;
- if (!ret) {
- ret = -ERESTARTNOHAND;
- if (signal_pending(current))
- goto out;
- ret = 0;
}

set_fd_set(n, inp, fds.res_in);
set_fd_set(n, outp, fds.res_out);
set_fd_set(n, exp, fds.res_ex);


2002-07-20 00:35:34

by Alan Cox

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

> <http://www.opengroup.org/onlinepubs/007904975/functions/select.html>
> says that 'select' may modify its timeout argument only "upon
> successful completion". However, the Linux kernel sometimes modifies
> the timeout argument even when 'select' fails or is interrupted.

This is extremely useful behaviour. POSIX is broken here. Fix it in the
C library or somewhere it doesn't harm the clueful

You should raise this with the standards committee instead

2002-07-20 03:53:36

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

Alan Cox wrote:
> > <http://www.opengroup.org/onlinepubs/007904975/functions/select.html>
> > says that 'select' may modify its timeout argument only "upon
> > successful completion". However, the Linux kernel sometimes modifies
> > the timeout argument even when 'select' fails or is interrupted.
>
> This is extremely useful behaviour. POSIX is broken here.

I tried to make use of this behavior back in 2.2 days, I think,
and ran into trouble. The time remaining wasn't quite right, I seem
to recall, making this nifty feature less useful. I've since
given up on it.

> Fix it in the C library or somewhere it doesn't harm the clueful

Can you give an example of a clueful package that makes
use of this feature and would be harmed if select() suddenly
became posix-compliant?

- Dan

2002-07-20 05:54:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

In article <[email protected]>,
Alan Cox <[email protected]> wrote:
>> <http://www.opengroup.org/onlinepubs/007904975/functions/select.html>
>> says that 'select' may modify its timeout argument only "upon
>> successful completion". However, the Linux kernel sometimes modifies
>> the timeout argument even when 'select' fails or is interrupted.
>
>This is extremely useful behaviour. POSIX is broken here. Fix it in the
>C library or somewhere it doesn't harm the clueful

Personally, I've gotten to the point where I think that the select()
time is broken.

The thing is, nobody should really ever use timeouts, because the notion
of "I want to sleep X seconds" is simply not _useful_ if the process
also just got delayed by a page-out event as it said so. What does "X
seconds" mean at that point? It's ambiguous - and the kernel will (quite
naturally) just always assume that it is "X seconds from when the kernel
got notified".

A _useful_ interface would be to say "I want to sleep to at most time X"
or "to at least time X". Those are unambiguous things to say, and are
not open to interpretation.

The "I want to sleep until at least time X" (or "at most time X") also
has the added advantage that it is inherently re-startable - restarting
the sleep has _no_ rounding issues, and again no ambiguity.

Note that select() is definitely not the only offender here. Other
system calls like "nanosleep()" have the exact same problem - what do
you do if you get interrupted by a signal and need to restart?

The Linux behaviour of modifying the timeout is a half-assed try for
restartability, but the problem is that (a) nobody else does that or
expects it to happen, despite the man-pages originally claiming that
they were supposed to and (b) it inherently has rounding problems and
other ambiguities - making it even less useful.

Oh, well.

I suspect almost nobody actually uses the Linux timeout feature because
of the nonportability issues, making the whole mess even less tasty.

Linus

2002-07-21 03:31:53

by Peter T. Breuer

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout


Dan writes:
> Alan Cox wrote:
> > > <http://www.opengroup.org/onlinepubs/007904975/functions/select.html>
> > > says that 'select' may modify its timeout argument only "upon
> > > successful completion". However, the Linux kernel sometimes modifies
> > > the timeout argument even when 'select' fails or is interrupted.
> >
> > This is extremely useful behaviour. POSIX is broken here.
>
> I tried to make use of this behavior back in 2.2 days, I think,
> and ran into trouble. The time remaining wasn't quite right, I seem
> to recall, making this nifty feature less useful. I've since
> given up on it.
>
> > Fix it in the C library or somewhere it doesn't harm the clueful
>
> Can you give an example of a clueful package that makes
> use of this feature and would be harmed if select() suddenly
> became posix-compliant?

Daemons that I've written for linux-specific tasks all
use the select timeout in order to wait for an event for a fixed
amount of time, across possible interrupts.

That is to say, they watch the errno after return from select, and if
it was EINTR, then they reenter the select without further ado.
Since the timeout has changed to reflect the time remaining, that's
quite right.

This is typical of deamons doing tcp/ip. I guess one answer would be to
make tcp timeout more configurable.


Peter

2002-07-21 15:44:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

[email protected] (Linus Torvalds) writes:

> In article <[email protected]>,
> Alan Cox <[email protected]> wrote:
> >> <http://www.opengroup.org/onlinepubs/007904975/functions/select.html>
> >> says that 'select' may modify its timeout argument only "upon
> >> successful completion". However, the Linux kernel sometimes modifies
> >> the timeout argument even when 'select' fails or is interrupted.
> >
> >This is extremely useful behaviour. POSIX is broken here. Fix it in the
> >C library or somewhere it doesn't harm the clueful
>
> Personally, I've gotten to the point where I think that the select()
> time is broken.
>
> The thing is, nobody should really ever use timeouts, because the notion
> of "I want to sleep X seconds" is simply not _useful_ if the process
> also just got delayed by a page-out event as it said so. What does "X
> seconds" mean at that point? It's ambiguous - and the kernel will (quite
> naturally) just always assume that it is "X seconds from when the kernel
> got notified".
>
> A _useful_ interface would be to say "I want to sleep to at most time X"
> or "to at least time X". Those are unambiguous things to say, and are
> not open to interpretation.

Sleeping until at most time X is only useful if the kernel can actually
make a guarantee like that. If you are doing hard real time fine, otherwise
that doesn't work to well.

> The "I want to sleep until at least time X" (or "at most time X") also
> has the added advantage that it is inherently re-startable - restarting
> the sleep has _no_ rounding issues, and again no ambiguity.
>
> Note that select() is definitely not the only offender here. Other
> system calls like "nanosleep()" have the exact same problem - what do
> you do if you get interrupted by a signal and need to restart?
>
> The Linux behaviour of modifying the timeout is a half-assed try for
> restartability, but the problem is that (a) nobody else does that or
> expects it to happen, despite the man-pages originally claiming that
> they were supposed to and (b) it inherently has rounding problems and
> other ambiguities - making it even less useful.
>
> Oh, well.
>
> I suspect almost nobody actually uses the Linux timeout feature because
> of the nonportability issues, making the whole mess even less tasty.

Actually I have had occasion in dosemu to not use the timeout features
because it did not do a good job of attempting to sleep for X seconds.
There can be a lot of time from when the kernel updates the timeout
value, and when the system call is restarted.

The desired semantics in this case were I want to sleep until time X,
and I want to wake up as soon afterwards as is reasonable. Calling
gettimeofday before restarting the system call resulted in a much
better approximation of the desired result.

Eric

2002-07-21 15:58:19

by Christoph Rohland

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

Hi Linus,

On Sat, 20 Jul 2002, Linus Torvalds wrote:
> The thing is, nobody should really ever use timeouts, because the
> notion of "I want to sleep X seconds" is simply not _useful_ if the
> process also just got delayed by a page-out event as it said so.
> What does "X seconds" mean at that point? It's ambiguous - and the
> kernel will (quite naturally) just always assume that it is "X
> seconds from when the kernel got notified".
>
> A _useful_ interface would be to say "I want to sleep to at most
> time X" or "to at least time X". Those are unambiguous things to
> say, and are not open to interpretation.

Yes, so everybody really using select assumes it's _at least_ X
seconds... So where's the problem? I always know it's at least in a
multiprocess environment. (At least as long as I do not want to fiddle
with scheduling and priorities)

> The Linux behaviour of modifying the timeout is a half-assed try for
> restartability, but the problem is that (a) nobody else does that or
> expects it to happen, despite the man-pages originally claiming that
> they were supposed to and (b) it inherently has rounding problems
> and other ambiguities - making it even less useful.

Yes, and probably select is one of the calls you most of the time use
because of portability. So IMHO a linuxism isn't worth the effort.

Greetings
Christoph


2002-07-21 16:24:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout


On Sat, 20 Jul 2002, Linus Torvalds wrote:

> The thing is, nobody should really ever use timeouts, because the notion
> of "I want to sleep X seconds" is simply not _useful_ if the process
> also just got delayed by a page-out event as it said so. What does "X
> seconds" mean at that point? It's ambiguous - and the kernel will (quite
> naturally) just always assume that it is "X seconds from when the kernel
> got notified".
>
> A _useful_ interface would be to say "I want to sleep to at most time X"
> or "to at least time X". Those are unambiguous things to say, and are
> not open to interpretation.

on the other hand, the application itself cannot even know what exact
absolute time it is, in any unambiguous form - what if right after the
gettimeofday() it got scheduled away and swapped out for many seconds?

so the notion of 'sleep until absolute time X' just brings the 'time
uncertainity' down one more level, it doesnt eliminate it.

the rounding issue is valid when an unlimited number of restarts are
allowed - N x relative timeouts are numerically inaccurate. But there is
no fundamental difference (only performance difference): correct timeouts
can be achieved even if the kernel interface only supports relative
timeouts: the application has to save the absolute target time and has to
recalculate the relative timeout based on the target date and current
date. (which involves multiple calls to gettimeofday(), so it's additional
overhead.)

Ingo

2002-07-21 16:40:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout



On 21 Jul 2002, Christoph Rohland wrote:
>
> Yes, so everybody really using select assumes it's _at least_ X
> seconds... So where's the problem?

Have you tried to _do_ this? I doubt you have, since you think it works
well already.

The fact is, that if you're doing soft-realtime, you end up having to call
gettimeofday() a lot more than you should. Your timeouts are fundamentally
"real time" (ie they are _not_ of the type "I should show the next frame
in 0.0333 seconds" but they are really "I showed frame N at time X, so I
need to show frame N+1 at time X+0.0333").

The fact that select() and friends do not work with real time, but
offsets, and is not restartable means that you end up having to do two
gettimeofday() calls per select in these situations.

In contrast, if you could just rely on absolute time in select(), you
would be re-startable _and_ you'd not have to do the extra "what time is
it now, so that I know what timeout I need to use for the next thing"?

> Yes, and probably select is one of the calls you most of the time use
> because of portability. So IMHO a linuxism isn't worth the effort.

The fact is, the linuxism exists, and breaking it is worse than not
breaking it.

The number of users is probably small, but they do exist.

Linus

2002-07-21 17:48:53

by dean gaudet

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

On Sun, 21 Jul 2002, Linus Torvalds wrote:

> The fact is, the linuxism exists, and breaking it is worse than not
> breaking it.

fortunately, glibc uses poll() rather than select() these days (so that it
avoids bugs with programs with huge numbers of fds). so that ancient code
in the libc5 resolver (see res_send) which still relies on this linuxism
is dying out :)

-dean

2002-07-21 20:11:48

by Richard M. Stallman

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

This is extremely useful behaviour. POSIX is broken here. Fix it in the
C library or somewhere it doesn't harm the clueful

Why is it useful? For signal handlers to see how much waiting time is
left?

2002-07-22 06:48:25

by Christoph Rohland

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

Hi Linus,

On Sun, 21 Jul 2002, Linus Torvalds wrote:
>> Yes, so everybody really using select assumes it's _at least_ X
>> seconds... So where's the problem?
>
> Have you tried to _do_ this? I doubt you have, since you think it
> works well already.

Well enough for me and my customers :-)

> The fact is, that if you're doing soft-realtime, you end up having
> to call gettimeofday() a lot more than you should.

OK, I do not do (soft-)realtime. For non-realtime needs the current
scheme with relative timeouts is easier to use since you do not need
to call gettimeofday at all.

Greetings
Christoph


2002-07-22 03:57:33

by Edgar Toernig

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

Linus Torvalds wrote:
>
> In contrast, if you could just rely on absolute time in select(), you
> would be re-startable _and_ you'd not have to do the extra "what time is
> it now, so that I know what timeout I need to use for the next thing"?

I agree. Absolute times are nicer. Just one note: to make that work
you need a sane time source! gettimeofday jumps back and forth. You
want a getuptime (or similar) that gives a constant monotonous growing
value not adjustable from userspace (and preferably the same for all
processes).

Ciao, ET.

2002-07-24 13:42:40

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

Eric W. Biederman wrote:
> [email protected] (Linus Torvalds) writes:
> > A _useful_ interface would be to say "I want to sleep to at most time X"
> > or "to at least time X". Those are unambiguous things to say, and are
> > not open to interpretation.
>
> Sleeping until at most time X is only useful if the kernel can actually
> make a guarantee like that. If you are doing hard real time fine, otherwise
> that doesn't work to well.

Oh, that would definitely be useful even if it's only a "soft"
guarantee. Especially with recent HZ changes.

Typical soft real-time code looks a bit like this pseudo-code (excuse
the bugs :-):

void wait_until_time (const struct timeval * until)
{
struct timeval now, timeout;
while (1) {
gettimeofday (&now, 0);
timeout.tv_sec = until->tv_sec - now.tv_sec;
timeout.tv_usec = until->tv_usec - now.tv_usec;
if (timeout.tv_usec < 0) {
timeout.tv_usec += 1000000;
timeout.tv_sec -= 1;
}
if (timeout.tv_sec < 0)
break; /* Finished! */
timeout.tv_usec -= SCHEDULER_GRANULARITY;
if (timeout.tv_usec < 0) {
timeout.tv_usec += 1000000;
timeout.tv_sec -= 1;
}
/* Busy wait if within scheduler granularity. */
if (timeout.tv_sec > 0) {
select (0, 0, 0, &timeout);
}
}
}

Note that SCHEDULER_GRANULARITY is an architecure-specific and
OS-specific constant that has to be determined somehow.

The select() call in the above code is one that would, ideally, be "wait
until at most TIME" even if that is limited by the granularity of
scheduler timeouts. The scheduler may not be able to _guarantee_ to
schedule the process before TIME (fair enough, that's why we call it
soft real-time), but at least the tick calculations etc. in the kernel
would be rounded down, rather than up.

-- Jamie

2002-07-24 18:45:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout


On Wed, 24 Jul 2002, Jamie Lokier wrote:
>
> Typical soft real-time code looks a bit like this pseudo-code (excuse
> the bugs :-):

Yup, looks familiar.

The thing is, we cannot change existing select semantics, and the question
is whether what most soft-realtime wants is actually select, or whether
people really want a "waittimeofday()".

Like your example, the only uses I've had personally (DVD playback) have
really had an empty select, so it wasn't really select itself that was
horribly important.

Linus

2002-07-24 19:04:14

by Chris Friesen

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

Linus Torvalds wrote:

> The thing is, we cannot change existing select semantics, and the question
> is whether what most soft-realtime wants is actually select, or whether
> people really want a "waittimeofday()".

Actually, I'd like a
waitonmonotonicallyincreasingnonadjustablehighres64bittime().

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2002-07-24 23:28:09

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

Linus Torvalds wrote:
> Like your example, the only uses I've had personally (DVD playback) have
> really had an empty select, so it wasn't really select itself that was
> horribly important.

All the real examples I've encountered are waiting on file descriptors
too -- and occasionally also signals.

-- Jamie

2002-07-25 07:18:02

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

On Wed, 24 Jul 2002 11:48:10 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

> The thing is, we cannot change existing select semantics, and the
> question is whether what most soft-realtime wants is actually select, or
> whether people really want a "waittimeofday()".

NOT waittimeofday. You need a *new* measure which can't be set forwards
or back if you want this to be sane. pthreads has absolute timeouts (eg.
pthread_cond_timedwait), but they suck IRL for this reason.

Of course, doesn't need any correlation with absolute time, it could be a
"microseconds since boot" kind of thing.

Rusty.
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy

2002-07-25 16:44:11

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

Linus Torvalds <[email protected]> writes:

> On Wed, 24 Jul 2002, Jamie Lokier wrote:
> >
> > Typical soft real-time code looks a bit like this pseudo-code (excuse
> > the bugs :-):
>
> Yup, looks familiar.
>
> The thing is, we cannot change existing select semantics, and the question
> is whether what most soft-realtime wants is actually select, or whether
> people really want a "waittimeofday()".
>
> Like your example, the only uses I've had personally (DVD playback) have
> really had an empty select, so it wasn't really select itself that was
> horribly important.

Baring minor quibbles waittimeofday is essentially what we have
today. Fixing up the interface to take an absolute time, from an
absolute timer cleans up some races but doesn't attack the fundamental
problem.

There are two fundamental problems with the current interface. The
timer granularity is much to large, and we don't know the granularity
that user space cares about.

The posted wait_for_time implementation had one very interesting
aspect the timer granularity of the kernel (HZ) was known to the
application, and it very deliberately rounded the sleep interval down
based on the kernel timer granularity, so it could busy wait all by
itself for the rest. Problem, user space applications can only get
what they want through busy waiting. But they can tell when they have
gotten what they want because the gettimeofday resolution is much
better than the kernel timer resolution.

There are two states a unix box can be in. cpu load_average < 1. In
this state multiple processes run per scheduler quantum. They run for
short periods of time and then go back to sleep. Latency is very good
because the other processes get out of the way, and sleeping process
can count on running at the next timer tick. cpu load_average > 1.
In this state one or several processes run for their full cpu
quantum. Latency is bad, and opportunistically the timer resolution
does not get better.

When we have a cpu load average < 1, it is trivial to increase the
timer granularity to something resembling the gettimeofday resolution
simply by internally doing gettimeofday when schedule is called, and
adding those processes that have just become runnable to the run
queue. To get the most out of this the idle task would need to busy
wait looking for timer events, when we have an event scheduled before
the next timer tick.

The goal is twofold, to remove the need for user space applications to
busy wait, so sometimes the system can get something done another
process is waiting, and to increase the internal kernel timer
granularity to the point where user space doesn't care anymore. With
the only timer we sleep past the desired time is when the kernel
decides there is some higher priority task to run.

On the timer queue I would use either microseconds (the resolution of
struct timeval), or the natural resolution of the timer, instead of
something artificial like HZ. This allows faster machines under the
same load as slower machines to become more precise, with the same
code. Polling for timers only in schedule and the idle task means
that when the load is low the kernel offers more precision than when
the load is high. The frequency of timer interrupts would have to up,
at some point to handle some loads, but just keeping the load light
would allow the program to work as expected even without a kernel
recompile.

Having user space know HZ and use it for their internal calculations
is dangerous because HZ will change as time goes by. But having user
space specify it's desired timer resolution to the kernel will allow
the kernel to round the time to a place where it can efficiently
handle the timer, and still meet the user space deadline.

The most interesting use I have seen is a high performance local area
data transfer utility, that would do short sleeps in between sending
packets to avoid pushing the switch to the point where it would drop
packets. But it was perfectly fine if a new packet came in before it
was done waiting for the old packet to go out the wire.

Eric

2002-07-25 17:12:39

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

Eric W. Biederman wrote:
> When we have a cpu load average < 1, it is trivial to increase the
> timer granularity to something resembling the gettimeofday resolution
> simply by internally doing gettimeofday when schedule is called, and
> adding those processes that have just become runnable to the run
> queue. To get the most out of this the idle task would need to busy
> wait looking for timer events, when we have an event scheduled before
> the next timer tick.

Unfortunately, this does not help "soft real-time" tasks like the
hypothetical video game with a compile running in the background. That
needs to preempt lower priority tasks somehow.

Ideally, because they don't use much CPU but do want to run on time, it
should be possible to run those programs using non-real-time priority,
and they would run on time simply because they always have a high
dynamic priority.

To be fair, although 100Hz timer resolution wasn't good enough even for
a simple "snake" video game with no other load (the eye detects the time
variance as an apparent velocity variance), 1000Hz is probably fine.

> The goal is twofold, to remove the need for user space applications to
> busy wait, so sometimes the system can get something done another
> process is waiting, and to increase the internal kernel timer
> granularity to the point where user space doesn't care anymore. With
> the only timer we sleep past the desired time is when the kernel
> decides there is some higher priority task to run.

What will happen if the timer granularity remains at 1000Hz when
loadavg > 1 is that time-sensitive interactive apps will still busy wait
for the remainder of a tick. _But_, if we can define select() or similar
semantics to mean, as Linus suggested, "wait until at most TIME", then
it becomes possible to avoid the busy wait at low loads (paradoxically).

> The most interesting use I have seen is a high performance local area
> data transfer utility, that would do short sleeps in between sending
> packets to avoid pushing the switch to the point where it would drop
> packets. But it was perfectly fine if a new packet came in before it
> was done waiting for the old packet to go out the wire.

That's the sort of thing I work on :) The resolution required of a
packet shaper is measured in 10s of microseconds, though, so I just
accept that user space must busy wait _all_ the time the link isn't idle.

-- Jamie

2002-07-25 18:29:02

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

Rusty Russell wrote:
>
> On Wed, 24 Jul 2002 11:48:10 -0700 (PDT)
> Linus Torvalds <[email protected]> wrote:
>
> > The thing is, we cannot change existing select semantics, and the
> > question is whether what most soft-realtime wants is actually select, or
> > whether people really want a "waittimeofday()".
>
> NOT waittimeofday. You need a *new* measure which can't be set forwards
> or back if you want this to be sane. pthreads has absolute timeouts (eg.
> pthread_cond_timedwait), but they suck IRL for this reason.
>
> Of course, doesn't need any correlation with absolute time, it could be a
> "microseconds since boot" kind of thing.
>
The POSIX clocks & timers API defines CLOCK_MONOTONIC for
this sort of thing (CLOCK_MONOTONIC can not be set). It
also defines an API for clock_nanosleep() that CAN use an
absolute time which is supposed to follow any clock setting
that is done. Combine the two and you have a fixed time
definition.

AND, guess what, the high-res-timers patch does all this and
more.
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-07-28 05:36:55

by David Schwartz

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout


>NOT waittimeofday. You need a *new* measure which can't be set forwards
>or back if you want this to be sane. pthreads has absolute timeouts (eg.
>pthread_cond_timedwait), but they suck IRL for this reason.

>Rusty.

The usual way to deal with this is to have a 'clock watcher' thread. If the
system time jumps any significant amount, you signal all condition variables.
You're not guaranteed any particular latency anyway.

I don't think a DVD playback skipping when the system time is changed by a
large amount is unacceptable. However, the use of some sort of linear
timebase is much more convenient for many things.

DS


2002-07-28 10:30:07

by George Spelvin

[permalink] [raw]
Subject: Re: [PATCH] 'select' failure or signal should not update timeout

Chris Friesen asked for:
> waitonmonotonicallyincreasingnonadjustablehighres64bittime()

Well, take the POSIX clock_gettime() interface and add clock_waittime().
Oh, wait.. they already did it. clock_nanosleep().

The POSIX folks realized that people want a variety of tiemrs, and
so the functions take a clockid_t first argument, which is just an enum.
They defined two values, but leave the field open to others:
- CLOCK_MONOTONIC, which is what you want. Unspecified epoch
(possibly boot time), and never gets adjusted
- CLOCK_REALTIME, which is the classig time() UTC time.

Extensions define CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID.

The clock weenies are welcome to add CLOCK_TAI, CLOCK_GPS, CLOCK_UTS
(see Markus Kuhn's suggestion), CLOCK_UTC (with some "better" leap-second
handling), CLOCK_FREQADJUST (uses frequency but not phase adjustments),
CLOCK_NOSTEP (frequency and phase adjustments, but doesn't step),
and anything else you like.

Astronomers might add CLOCK_UT1, CLOCK_UT0, CLOCK_SIDERIAL, CLOCK_TDB,
CLOCK_TDT, CLOCK_TCG, CLOCK_TCB, and maybe a few things I haven't thought
of. The interface doesn't require that all of these be implemented in
the kernel, of course.