2005-03-20 12:25:11

by Jan Engelhardt

[permalink] [raw]
Subject: Short sleep precision

Hello,


I have found that FreeBSD has a very good precision of small sleeps --
what's holding Linux back from doing the same? Using the code snippet below,
FBSD yields between 2 and 80 us on the average while Linux is at
"constantly" ~100 (with HZ=1000) and ~1000 (HZ=100).


Jan Engelhardt
--

#include <sys/time.h>
#include <stdio.h>
#include <time.h>
#define MICROSECOND 1000000
static unsigned long calc_ovcorr(unsigned long ad, int rd) {
struct timespec s = {.tv_sec = 0, .tv_nsec = ad};
struct timeval start, stop;
unsigned long av = 0;
int count = rd;

while(count--) {
gettimeofday(&start, NULL);
nanosleep(&s, NULL);
gettimeofday(&stop, NULL);
av += MICROSECOND * (stop.tv_sec - start.tv_sec) +
stop.tv_usec - start.tv_usec;
}

av /= rd;
fprintf(stderr, " %lu us\n", av);
return av;
}

int main(void) {
calc_ovcorr(0, 100);
return 0;
}

//eof


2005-03-20 12:43:27

by Jesper Juhl

[permalink] [raw]
Subject: Re: Short sleep precision

On Sun, 20 Mar 2005, Jan Engelhardt wrote:

> Hello,
>
>
> I have found that FreeBSD has a very good precision of small sleeps --
> what's holding Linux back from doing the same? Using the code snippet below,
> FBSD yields between 2 and 80 us on the average while Linux is at
> "constantly" ~100 (with HZ=1000) and ~1000 (HZ=100).
>
Running your program here I see even worse values than that on 2.6.11-mm4
and it's also interresting to see that for a lot of continuous runs the
values reported drop steadily and eventually settle around ~1100, but if I
insert a sleep 1 between runs, then I see a steady ~1000 reported.
This is all with HZ = 1000


juhl@dragon:~/download/kernel/linux-2.6.11-mm4$ for i in `seq 1 40`; do ./a.out ; done
1414 us
1434 us
1423 us
1424 us
1433 us
1423 us
1420 us
1439 us
1434 us
1433 us
1463 us
1462 us
1450 us
1431 us
1403 us
1391 us
1376 us
1364 us
1362 us
1344 us
1353 us
1353 us
1334 us
1323 us
1313 us
1314 us
1293 us
1293 us
1273 us
1274 us
1271 us
1264 us
1264 us
1244 us
1244 us
1214 us
1214 us
1214 us
1183 us
1183 us

juhl@dragon:~/download/kernel/linux-2.6.11-mm4$ for i in `seq 1 10`; do (./a.out ; sleep 1) ; done
1113 us
997 us
997 us
998 us
997 us
997 us
996 us
997 us
996 us
996 us


2005-03-20 13:04:24

by Andrew Morton

[permalink] [raw]
Subject: Re: Short sleep precision

Jan Engelhardt <[email protected]> wrote:
>
> I have found that FreeBSD has a very good precision of small sleeps --

Linux nanosleep() used to have a busywait loop for sleeps less than two
milliseconds. 2.4.x still does.

We thought it was stupid and took it out.

> what's holding Linux back from doing the same? Using the code snippet below,
> FBSD yields between 2 and 80 us on the average while Linux is at
> "constantly" ~100 (with HZ=1000) and ~1000 (HZ=100).
>

You can spin on the gettimeofday() result in userspace.

2005-03-20 13:35:23

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Short sleep precision

>Running your program here I see even worse values than that on 2.6.11-mm4
>and it's also interresting to see that for a lot of continuous runs the
>values reported drop steadily and eventually settle around ~1100, but if I
>insert a sleep 1 between runs, then I see a steady ~1000 reported.
>This is all with HZ = 1000

That may be related to the new mix of USER_HZ. I can't really tell,
just observed that there is such.

>Linux nanosleep() used to have a busywait loop for sleeps less than two
>milliseconds. 2.4.x still does.

Yes, I know. Linus threw it out during 2.5 because it was deemed to buggy.
That did not affect me, because that busywaiting is only possible in
!SCHED_OTHER, which I cannot use, because tiny delays are needed in a
user-runnable userspace app, and I don't want it suid and stuff.

(I've developed an "overhead correction" for accurate(*) realtime replay of
logfiles. (*) with respect to the total runtime. Works well.)

>You can spin on the gettimeofday() result in userspace.

How can I use it? / What does it help me? I just have the gettimeofday() in
the example script to measure the total time of nanosleep(). Sometimes,
nanosleep completes in the same tick, sometimes (95%), another task is
scheduled before returning. I am calling nanosleep repetedly to find out the
_average_ time for a 0us-nanosleep(), usually 100/1000 us.



Jan Engelhardt
--

2005-03-21 16:35:39

by Chris Friesen

[permalink] [raw]
Subject: Re: Short sleep precision

Jan Engelhardt wrote:

>>You can spin on the gettimeofday() result in userspace.
>
>
> How can I use it?

Something like:

gettimeofday(&curtime,0);
add_usecs(&curtime, time_to_sleep);
do {
gettimeofday(&curtime,0);
} while (time_before(&curtime, &expiry);


Of course, if someone changes the system time on you you're screwed....

Chris

2005-03-22 02:51:04

by Robert White

[permalink] [raw]
Subject: RE: Short sleep precision

Actually look at linux/Documentation/rtc.txt for a "reasonably portable" way to get
very small quanta with fair regularity.

Since the original poster wanted it to be user accessible, and since it is a
contended/exclusive device, he may want to make a broker daemon or something.

Since nanosleep doesn't constrain the max time of the sleep, you get the same
performance by setting the timer the "a good way" and then spinning on it from user
space, or blocking or whatever. Confabulating the right thing to do for
non-power-of-2 times is pretty trivial too.

Not perfect, but trying to be all things to all people, and all that... 8-)


Rob White,
Casabyte, Inc.


2005-03-22 07:46:22

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Short sleep precision

Hello,

>> > You can spin on the gettimeofday() result in userspace.
>> How can I use it?
> Something like:
>
> gettimeofday(&curtime,0);
> add_usecs(&curtime, time_to_sleep);
> do {
> gettimeofday(&curtime,0);
> } while (time_before(&curtime, &expiry);

That's looks like a lot of CPU consumption, which I would like to avoid
because time_to_sleep is nondeterministic in my case.


Jan Engelhardt
--

2005-03-22 15:22:08

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Short sleep precision

>> That's looks like a lot of CPU consumption, which I would like to avoid
>> because time_to_sleep is nondeterministic in my case.
>
> If you want to delay for less than a tick, you pretty much need to busy-wait.
> There's no way to set a timer for intervals less than a tick in the regular
> kernel.[...]

I bet I stay with my current approach -- count the time we actually slept, and
sleep less the next time. Thanks for your time and thoughts!


Jan Engelhardt
--