Syscall stubs must be declared to return "long", to ensure things work
properly on all 64-bit platforms (see earlier discussion on this
topic). Patch below.
On a related note: as far as I can see, timer_t is declared as "int"
on all platforms (both by kernel and glibc). Yet if my reading of the
kernel code is right, it's supposed to be "long" (and allegedly some
standard claims that timer_t should be the "widest" integer on a
platform). But then again, I'm not familiar with the POSIX timer
interface myself, so perhaps I'm completely off base here.
--david
diff -Nru a/kernel/posix-timers.c b/kernel/posix-timers.c
--- a/kernel/posix-timers.c Thu Mar 6 14:59:46 2003
+++ b/kernel/posix-timers.c Thu Mar 6 14:59:46 2003
@@ -423,7 +423,7 @@
/* Create a POSIX.1b interval timer. */
-asmlinkage int
+asmlinkage long
sys_timer_create(clockid_t which_clock,
struct sigevent *timer_event_spec, timer_t * created_timer_id)
{
@@ -663,7 +663,7 @@
}
}
/* Get the time remaining on a POSIX.1b interval timer. */
-asmlinkage int
+asmlinkage long
sys_timer_gettime(timer_t timer_id, struct itimerspec *setting)
{
struct k_itimer *timr;
@@ -695,7 +695,7 @@
*/
-asmlinkage int
+asmlinkage long
sys_timer_getoverrun(timer_t timer_id)
{
struct k_itimer *timr;
@@ -848,7 +848,7 @@
}
/* Set a POSIX.1b interval timer */
-asmlinkage int
+asmlinkage long
sys_timer_settime(timer_t timer_id, int flags,
const struct itimerspec *new_setting,
struct itimerspec *old_setting)
@@ -922,7 +922,7 @@
}
/* Delete a POSIX.1b interval timer. */
-asmlinkage int
+asmlinkage long
sys_timer_delete(timer_t timer_id)
{
struct k_itimer *timer;
@@ -1054,7 +1054,7 @@
return -EINVAL;
}
-asmlinkage int
+asmlinkage long
sys_clock_settime(clockid_t which_clock, const struct timespec *tp)
{
struct timespec new_tp;
@@ -1069,7 +1069,7 @@
new_tp.tv_nsec /= NSEC_PER_USEC;
return do_sys_settimeofday((struct timeval *) &new_tp, NULL);
}
-asmlinkage int
+asmlinkage long
sys_clock_gettime(clockid_t which_clock, struct timespec *tp)
{
struct timespec rtn_tp;
@@ -1088,7 +1088,7 @@
return error;
}
-asmlinkage int
+asmlinkage long
sys_clock_getres(clockid_t which_clock, struct timespec *tp)
{
struct timespec rtn_tp;
David Mosberger wrote:
> Syscall stubs must be declared to return "long", to ensure things work
> properly on all 64-bit platforms (see earlier discussion on this
> topic). Patch below.
>
> On a related note: as far as I can see, timer_t is declared as "int"
> on all platforms (both by kernel and glibc). Yet if my reading of the
> kernel code is right, it's supposed to be "long" (and allegedly some
> standard claims that timer_t should be the "widest" integer on a
> platform). But then again, I'm not familiar with the POSIX timer
> interface myself, so perhaps I'm completely off base here.
Leaving the standard aside, I think there is a bit of a problem in the
idr code (.../lib/idr.c) which manages the id allocation. Seems we
are returning "long" from functions declared as int. If I remember
the code correctly this will work, but it does eliminate the sequence
number that should be in the high 8 bits of the id. This assumes that
you never allocate more than 2,147,483,647 timers at once :) I will
look at this and send in a patch. I think we should return what ever
timer_t is, so we should run that to ground first.
I suspect we should also have a look at all the structures with a view
to alignment issues or is this not a problem? I.e. is this struct ok:
struct {
long a;
int b;
long c;
}
-g
>
> --david
>
> diff -Nru a/kernel/posix-timers.c b/kernel/posix-timers.c
> --- a/kernel/posix-timers.c Thu Mar 6 14:59:46 2003
> +++ b/kernel/posix-timers.c Thu Mar 6 14:59:46 2003
> @@ -423,7 +423,7 @@
>
> /* Create a POSIX.1b interval timer. */
>
> -asmlinkage int
> +asmlinkage long
> sys_timer_create(clockid_t which_clock,
> struct sigevent *timer_event_spec, timer_t * created_timer_id)
> {
> @@ -663,7 +663,7 @@
> }
> }
> /* Get the time remaining on a POSIX.1b interval timer. */
> -asmlinkage int
> +asmlinkage long
> sys_timer_gettime(timer_t timer_id, struct itimerspec *setting)
> {
> struct k_itimer *timr;
> @@ -695,7 +695,7 @@
>
> */
>
> -asmlinkage int
> +asmlinkage long
> sys_timer_getoverrun(timer_t timer_id)
> {
> struct k_itimer *timr;
> @@ -848,7 +848,7 @@
> }
>
> /* Set a POSIX.1b interval timer */
> -asmlinkage int
> +asmlinkage long
> sys_timer_settime(timer_t timer_id, int flags,
> const struct itimerspec *new_setting,
> struct itimerspec *old_setting)
> @@ -922,7 +922,7 @@
> }
>
> /* Delete a POSIX.1b interval timer. */
> -asmlinkage int
> +asmlinkage long
> sys_timer_delete(timer_t timer_id)
> {
> struct k_itimer *timer;
> @@ -1054,7 +1054,7 @@
> return -EINVAL;
> }
>
> -asmlinkage int
> +asmlinkage long
> sys_clock_settime(clockid_t which_clock, const struct timespec *tp)
> {
> struct timespec new_tp;
> @@ -1069,7 +1069,7 @@
> new_tp.tv_nsec /= NSEC_PER_USEC;
> return do_sys_settimeofday((struct timeval *) &new_tp, NULL);
> }
> -asmlinkage int
> +asmlinkage long
> sys_clock_gettime(clockid_t which_clock, struct timespec *tp)
> {
> struct timespec rtn_tp;
> @@ -1088,7 +1088,7 @@
> return error;
>
> }
> -asmlinkage int
> +asmlinkage long
> sys_clock_getres(clockid_t which_clock, struct timespec *tp)
> {
> struct timespec rtn_tp;
>
>
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
David Mosberger wrote:
> On a related note: as far as I can see, timer_t is declared as "int"
> on all platforms (both by kernel and glibc). Yet if my reading of the
> kernel code is right, it's supposed to be "long" (and allegedly some
> standard claims that timer_t should be the "widest" integer on a
> platform).
There is no such claim, don't spread misinformation.
timer_t is just an ID. No specifics of the type are defined in the
standard. 'int' is as fine as 'long' if it is OK for the implementation.
- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQE+Z+Sh2ijCOnn/RHQRApRKAKCPikdEfOcciOJpineUOYjFQ78IPwCgte4x
2CxYAdahZoQ4TjEr6imBZ3U=
=UsgY
-----END PGP SIGNATURE-----
>>>>> On Thu, 06 Mar 2003 15:53:50 -0800, george anzinger <[email protected]> said:
George> I think there is a bit of a problem in the idr code
George> (.../lib/idr.c) which manages the id allocation. Seems we
George> are returning "long" from functions declared as int. If I
George> remember the code correctly this will work, but it does
George> eliminate the sequence number that should be in the high 8
George> bits of the id.
Yes. We have had some reports of problems with POSIX timers and I
suspect this might be the reason (though I don't know what the exact
code-base was that the person reporting the problem was using).
George> This assumes that you never allocate more than 2,147,483,647
George> timers at once :) I will look at this and send in a patch.
George> I think we should return what ever timer_t is, so we should
George> run that to ground first.
Yes, that would be better. According to Uli, a 32-bit timer_t is fine
as far as the standards are concerned. That's good.
George> I suspect we should also have a look at all the structures
George> with a view to alignment issues or is this not a problem?
George> I.e. is this struct ok:
George> struct { long a; int b; long c; }
Such code may be OK correctnesswise, but to avoid wasting space, it's
clearly better to list larger members first.
--david
David Mosberger wrote:
>>>>>>On Thu, 06 Mar 2003 15:53:50 -0800, george anzinger <[email protected]> said:
>
>
> George> I think there is a bit of a problem in the idr code
> George> (.../lib/idr.c) which manages the id allocation. Seems we
> George> are returning "long" from functions declared as int. If I
> George> remember the code correctly this will work, but it does
> George> eliminate the sequence number that should be in the high 8
> George> bits of the id.
>
> Yes. We have had some reports of problems with POSIX timers and I
> suspect this might be the reason (though I don't know what the exact
> code-base was that the person reporting the problem was using).
>
> George> This assumes that you never allocate more than 2,147,483,647
> George> timers at once :) I will look at this and send in a patch.
> George> I think we should return what ever timer_t is, so we should
> George> run that to ground first.
>
> Yes, that would be better. According to Uli, a 32-bit timer_t is fine
> as far as the standards are concerned. That's good.
>
> George> I suspect we should also have a look at all the structures
> George> with a view to alignment issues or is this not a problem?
> George> I.e. is this struct ok:
>
> George> struct { long a; int b; long c; }
>
> Such code may be OK correctnesswise, but to avoid wasting space, it's
> clearly better to list larger members first.
Ok, I will fix all the above and shoot you a patch. I assume you can
test it on a 64-bit platform. Right?
-g
>
> --david
>
>
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
>>>>> On Thu, 06 Mar 2003 17:39:16 -0800, george anzinger <[email protected]> said:
George> Ok, I will fix all the above and shoot you a patch. I
George> assume you can test it on a 64-bit platform. Right?
Sure, except I don't have a test-program. ;-)
--david
diff -urP -I '\$Id:.*Exp \$' -X /usr/src/patch.exclude linux-2.5.64-kb/include/linux/idr.h linux/include/linux/idr.h
--- linux-2.5.64-kb/include/linux/idr.h 2003-03-05 15:09:48.000000000 -0800
+++ linux/include/linux/idr.h 2003-03-06 18:04:26.000000000 -0800
@@ -25,9 +25,12 @@
#define IDR_MASK ((1 << IDR_BITS)-1)
-/* Leave the possibility of an incomplete final layer */
-#define MAX_LEVEL (BITS_PER_LONG - RESERVED_ID_BITS + IDR_BITS - 1) / IDR_BITS
-#define MAX_ID_SHIFT (BITS_PER_LONG - RESERVED_ID_BITS)
+/* Leave the possibility of an incomplete final layer
+ * Note we will return a 32-bit int, not a long, thus the
+ * 32 below
+*/
+#define MAX_LEVEL (32 - RESERVED_ID_BITS + IDR_BITS - 1) / IDR_BITS
+#define MAX_ID_SHIFT (32 - RESERVED_ID_BITS)
#define MAX_ID_BIT (1L << MAX_ID_SHIFT)
#define MAX_ID_MASK (MAX_ID_BIT - 1)
@@ -36,15 +39,15 @@
struct idr_layer {
unsigned long bitmap; // A zero bit means "space here"
- int count; // When zero, we can release it
struct idr_layer *ary[1<<IDR_BITS];
+ int count; // When zero, we can release it
};
struct idr {
struct idr_layer *top;
- int layers;
- long count;
struct idr_layer *id_free;
+ long count;
+ int layers;
int id_free_cnt;
spinlock_t lock;
};
Binary files linux-2.5.64-kb/lib/gen_crc32table and linux/lib/gen_crc32table differ
diff -urP -I '\$Id:.*Exp \$' -X /usr/src/patch.exclude linux-2.5.64-kb/lib/idr.c linux/lib/idr.c
--- linux-2.5.64-kb/lib/idr.c 2003-03-05 15:09:50.000000000 -0800
+++ linux/lib/idr.c 2003-03-06 18:04:29.000000000 -0800
@@ -150,7 +150,7 @@
static inline int sub_alloc(struct idr *idp, int shift, void *ptr)
{
- long n, v = 0;
+ int n, v = 0;
struct idr_layer *p;
struct idr_layer **pa[MAX_LEVEL];
struct idr_layer ***paa = &pa[0];
@@ -211,7 +211,7 @@
int idr_get_new(struct idr *idp, void *ptr)
{
- long v;
+ int v;
if (idp->id_free_cnt < idp->layers + 1)
return (-1);
Binary files linux-2.5.64-kb/scripts/docproc and linux/scripts/docproc differ
Binary files linux-2.5.64-kb/scripts/fixdep and linux/scripts/fixdep differ
Binary files linux-2.5.64-kb/scripts/kallsyms and linux/scripts/kallsyms differ
Binary files linux-2.5.64-kb/scripts/mk_elfconfig and linux/scripts/mk_elfconfig differ
Binary files linux-2.5.64-kb/scripts/modpost and linux/scripts/modpost differ
Binary files linux-2.5.64-kb/usr/gen_init_cpio and linux/usr/gen_init_cpio differ
Binary files linux-2.5.64-kb/usr/initramfs_data.cpio.gz and linux/usr/initramfs_data.cpio.gz differ
--- high-res-timers.diff/lib/syscall_timer.c 2003-02-25 11:00:30.000000000 +0100
+++ high-res-timers/lib/syscall_timer.c 2003-03-07 10:48:34.000000000 +0100
@@ -11,69 +11,6 @@
#ifndef __set_errno
#define __set_errno(val) (errno = (val))
#endif
-#if defined(__ia64__)
-
-//#include <unistd.h>
-
-
-int timer_create(clockid_t which_clock,
- struct sigevent *timer_event_spec,
- timer_t *created_timer_id)
-{
- return syscall(__NR_timer_create, which_clock, timer_event_spec, created_timer_id);
-}
-
-int timer_gettime(timer_t timer_id, struct itimerspec *setting)
-{
- return syscall(__NR_timer_gettime, timer_id, setting);
-}
-
-int timer_settime(timer_t timer_id,
- int flags,
- const struct itimerspec *new_setting,
- struct itimerspec *old_setting)
-{
- return syscall(__NR_timer_settime, timer_id, flags, new_setting, old_setting);
-}
-
-int timer_getoverrun(timer_t timer_id)
-{
- return syscall(__NR_timer_getoverrun, timer_id);
-}
-
-int timer_delete(timer_t timer_id)
-{
- return syscall(__NR_timer_delete, timer_id);
-}
-
-int clock_gettime(clockid_t which_clock, struct timespec *ts)
-{
- return syscall(__NR_clock_gettime, which_clock, ts);
-}
-
-int clock_settime(clockid_t which_clock,
- const struct timespec *setting)
-{
- return syscall(__NR_clock_settime, which_clock, setting);
-}
-
-int clock_getres(clockid_t which_clock,
- struct timespec *resolution)
-{
- return syscall(__NR_clock_getres, which_clock, resolution);
-}
-
-int clock_nanosleep(clockid_t which_clock,
- int flags,
- const struct timespec *new_setting,
- struct timespec *old_setting)
-{
- return syscall(__NR_clock_nanosleep, which_clock, flags, new_setting, old_setting);
-}
-
-
-
-#else /*! __ia64__ */
#define __NR___timer_create __NR_timer_create
#define __NR___timer_gettime __NR_timer_gettime
@@ -161,4 +98,4 @@
int, flags,
const struct timespec *,rqtp,
struct timespec *,rmtp)
-#endif /*ia64*/
+
george anzinger wrote:
>
> By the way, I am seeing some reports from the clock_nanosleep test
> about sleeping too long or too short. The too long appears to be just
> not being able to preempt what ever else is running. The too short
> (on the x86) is, I believe, due to the fact that more that 1/HZ is
> clocked on the wall clock each jiffie.
>
> Try this:
>
> time sleep 60
>
> On the x86 it reports less than 60, NOT good.
>
I've run the test programs and they pass everything well (with my
patchs) excepted the nanosleeps which seems to be finnished a bit too
early. My system test is a 2.5.64 patched on a 4xItaniumII.
My main question is to know if it's a problem even if the difference
between the wakeup time and the requested time is smaller than the
resolution of the clock, 976562ns ? I mean, at the resolution of the
clock we could consider we woke up right at the good time, couldn't we?
In addition time sleep 60 always gave me time over 1 minute, I guess
it's a good point.
Here is a part of the log of 'do_test':
Testing behavor with clock seting...
Retarding the clock
Clock did not seem to move
was: 1046969027s 703359000ns
requested: 1046969023s 703359000ns
now: 1046969022s 467072000ns
diff is -1.236286998sec
Cool clock_nanosleeptest.c,379:clock_nanosleep(clock, TIMER_ABSTIME,
&ts, NULL)
Testing signal behavor...
handler1 entered: signal 31
expected clock_nanosleeptest.c,227:clock_nanosleep(clock, 0, &ts, &rs):
Interrupted system call
Time remaining is 0s 989257306ns
clock_nanosleeptest.c,245:slept too short!
requested: 275s 207032000ns
now: 275s 207030632ns
diff is -0.000001368sec
Testing undelivered signal behavor...
Cool clock_nanosleeptest.c,267:clock_nanosleep(clock, 0, &ts, &rs)
clock_nanosleeptest.c,283:slept too short!
requested: 275s 223633000ns
now: 275s 223632698ns
diff is -0.000000302sec
--Eric
Eric Piel wrote:
> george anzinger wrote:
>
>>By the way, I am seeing some reports from the clock_nanosleep test
>>about sleeping too long or too short. The too long appears to be just
>>not being able to preempt what ever else is running. The too short
>>(on the x86) is, I believe, due to the fact that more that 1/HZ is
>>clocked on the wall clock each jiffie.
>>
>>Try this:
>>
>>time sleep 60
>>
>>On the x86 it reports less than 60, NOT good.
>>
>
> I've run the test programs and they pass everything well (with my
> patchs) excepted the nanosleeps which seems to be finnished a bit too
> early. My system test is a 2.5.64 patched on a 4xItaniumII.
>
> My main question is to know if it's a problem even if the difference
> between the wakeup time and the requested time is smaller than the
> resolution of the clock, 976562ns ? I mean, at the resolution of the
> clock we could consider we woke up right at the good time, couldn't we?
>
> In addition time sleep 60 always gave me time over 1 minute, I guess
> it's a good point.
>
> Here is a part of the log of 'do_test':
>
> Testing behavor with clock seting...
> Retarding the clock
> Clock did not seem to move
> was: 1046969027s 703359000ns
> requested: 1046969023s 703359000ns
> now: 1046969022s 467072000ns
> diff is -1.236286998sec
> Cool clock_nanosleeptest.c,379:clock_nanosleep(clock, TIMER_ABSTIME,
> &ts, NULL)
Is it possible that a "clock_was_set()" call was missed? I.e in the
set_timeofday code?
>
> Testing signal behavor...
> handler1 entered: signal 31
> expected clock_nanosleeptest.c,227:clock_nanosleep(clock, 0, &ts, &rs):
> Interrupted system call
> Time remaining is 0s 989257306ns
> clock_nanosleeptest.c,245:slept too short!
> requested: 275s 207032000ns
> now: 275s 207030632ns
> diff is -0.000001368sec
>
> Testing undelivered signal behavor...
> Cool clock_nanosleeptest.c,267:clock_nanosleep(clock, 0, &ts, &rs)
> clock_nanosleeptest.c,283:slept too short!
> requested: 275s 223633000ns
> now: 275s 223632698ns
> diff is -0.000000302sec
>
>
> --Eric
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
Eric Piel wrote:
> george anzinger wrote:
>
>>By the way, I am seeing some reports from the clock_nanosleep test
>>about sleeping too long or too short. The too long appears to be just
>>not being able to preempt what ever else is running. The too short
>>(on the x86) is, I believe, due to the fact that more that 1/HZ is
>>clocked on the wall clock each jiffie.
>>
>>Try this:
>>
>>time sleep 60
>>
>>On the x86 it reports less than 60, NOT good.
>>
>
> I've run the test programs and they pass everything well (with my
> patchs) excepted the nanosleeps which seems to be finished a bit too
> early. My system test is a 2.5.64 patched on a 4xItaniumII.
>
> My main question is to know if it's a problem even if the difference
> between the wakeup time and the requested time is smaller than the
> resolution of the clock, 976562ns ? I mean, at the resolution of the
> clock we could consider we woke up right at the good time, couldn't we?
>
> In addition time sleep 60 always gave me time over 1 minute, I guess
> it's a good point.
This test is rather crude because any latencies in the exec and wait
calls are also included in the "time". Larger sleep times cause the
impact of these latencies to be less important.
The problem is caused by the update wall clock routine adding less
than 1/HZ for each jiffie. In the x86 case this was done to more
closely follow the actual time that the PIT gives with the count that
results from 1/HZ. IMHO, the result is NOT standard conforming and
another way of doing things should be found. What actually results
here is that CLOCK_MONOTONIC and CLOCK_REALTIME do not pass time at
the same rate.
The clock update value is computed by TICK_NSEC() located in
include/linux/timex.h and gives other than 1/HZ based on the LATCH and
CLOCK_TICK_RATE not yielding an exact integer value.
All this said, I don't have a good solution. The best thing would be
if we could actually generate the 1/HZ tick with 1 to 2 ppm accuracy.
If this is not possible, short of going to a new way of time keeping
(which is what the high-res-timers do) I don't see a solution. By
"good" I mean one that keeps CLOCK_MONOTONIC and CLOCK_REALTIME in
lock step (excepting clock setting which only CLOCK_REALTIME does).
Note that I am NOT excepting NTP adjustments. In other words, a good
solution would keep CLOCK_MONOTONIC as the standard clock and just
have a offset to get to CLOCK_REALTIME. This offset is what clock
setting would change.
This would, I am afraid, also introduce another problem, i.e. that the
run_timer_list() must occur very shortly after jiffies is advanced.
This requires the generation of tick interrupts to by synced to the
clock. What I am saying is that it is not enough to calculate jiffies
in a way that takes care of a higher resolution (i.e. a sub_jiffie
part) but one must also conspire to make the run_timer_list() occur
when the sub_jiffie is "small".
>
> Here is a part of the log of 'do_test':
>
> Testing behavor with clock seting...
> Retarding the clock
> Clock did not seem to move
> was: 1046969027s 703359000ns
> requested: 1046969023s 703359000ns
> now: 1046969022s 467072000ns
> diff is -1.236286998sec
> Cool clock_nanosleeptest.c,379:clock_nanosleep(clock, TIMER_ABSTIME,
> &ts, NULL)
This is BAD! The code only retards the clock by 1 sec so it appears
that nothing happened here. Oh, I know, it is saying that clock_set
failed. You must run as root to make this work. Otherwise, well I
don't know what could be happening.
>
> Testing signal behavor...
> handler1 entered: signal 31
> expected clock_nanosleeptest.c,227:clock_nanosleep(clock, 0, &ts, &rs):
> Interrupted system call
> Time remaining is 0s 989257306ns
> clock_nanosleeptest.c,245:slept too short!
> requested: 275s 207032000ns
> now: 275s 207030632ns
> diff is -0.000001368sec
This is likely caused by what I discussed above.
>
> Testing undelivered signal behavor...
> Cool clock_nanosleeptest.c,267:clock_nanosleep(clock, 0, &ts, &rs)
> clock_nanosleeptest.c,283:slept too short!
> requested: 275s 223633000ns
> now: 275s 223632698ns
> diff is -0.000000302sec
>
>
> --Eric
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml