2002-10-23 08:33:47

by Jim Houston

[permalink] [raw]
Subject: [PATCH] alternate Posix timer patch


Hi Everyone,

This is the second version of my spin on the Posix timers. I started
with George Anzinger's patch but I have made major changes.

I have been using George's version of the patch and would be glad to
see it included into the 2.5 tree. On the other hand since we don't
know what might appeal to Linus it makes sense to give him a choice.

I sent out the first version of this last friday and had useful
coments from Andi Kleen. I have addressed some of these but mostly
I have just been getting it to work. It now passes most of the
tests that are included in George's timers support package.

Of particular interest is a race (that Andi pointed out) between
saving a task_struct pointer, using this pointer to send signals
and the process exiting. George please look at my changes in
sys_timer_create and exit_itimer.

Here is a summary of my changes:

- A new queue just for Posix timers and code to
handle expiring timers. This supports high resolution
without having to change the existing jiffie based timers.

I implemented this priority queue as a sort list
with a rbtree to index the list. It is deterministic
and fast.

- Change to use the slab allocator. This removes
the CONFIG option for the maximum number of timers.

- A new id allocator/lookup mechanism based on a
radix tree. It includes a bitmap to summarize the portion
of the tree which is in use. Currently the Posix
timers patch reuses the id immediately.

- I keep the timers in seconds and nano-seconds.
I'm hoping that the system time keeping will sort
itself out and the Posix timers can just be a consumer.
Posix timers need two clocks - the time since boot and
the wall clock time.

I'm currently working on nanosleep. I'm trying to come up with an
alternative for the call to do_signal. At the moment my patch may
return from nanosleep early if it receives a debug signal.

This patch should work with linux- 2.5.44.

Jim Houston - Concurrent Computer Corp.

diff -X /usr1/jhouston/dontdiff -urN linux.orig/arch/i386/kernel/entry.S linux.mytimers/arch/i386/kernel/entry.S
--- linux.orig/arch/i386/kernel/entry.S Wed Oct 23 00:54:19 2002
+++ linux.mytimers/arch/i386/kernel/entry.S Wed Oct 23 01:17:51 2002
@@ -737,6 +737,15 @@
.long sys_free_hugepages
.long sys_exit_group
.long sys_lookup_dcookie
+ .long sys_timer_create
+ .long sys_timer_settime /* 255 */
+ .long sys_timer_gettime
+ .long sys_timer_getoverrun
+ .long sys_timer_delete
+ .long sys_clock_settime
+ .long sys_clock_gettime /* 260 */
+ .long sys_clock_getres
+ .long sys_clock_nanosleep

.rept NR_syscalls-(.-sys_call_table)/4
.long sys_ni_syscall
diff -X /usr1/jhouston/dontdiff -urN linux.orig/arch/i386/kernel/time.c linux.mytimers/arch/i386/kernel/time.c
--- linux.orig/arch/i386/kernel/time.c Wed Oct 23 00:54:19 2002
+++ linux.mytimers/arch/i386/kernel/time.c Wed Oct 23 01:17:51 2002
@@ -131,6 +131,7 @@
time_maxerror = NTP_PHASE_LIMIT;
time_esterror = NTP_PHASE_LIMIT;
write_unlock_irq(&xtime_lock);
+ clock_was_set();
}

/*
diff -X /usr1/jhouston/dontdiff -urN linux.orig/fs/exec.c linux.mytimers/fs/exec.c
--- linux.orig/fs/exec.c Wed Oct 23 00:54:21 2002
+++ linux.mytimers/fs/exec.c Wed Oct 23 01:37:27 2002
@@ -756,6 +756,7 @@

flush_signal_handlers(current);
flush_old_files(current->files);
+ exit_itimers(current, 0);

return 0;

diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-generic/siginfo.h linux.mytimers/include/asm-generic/siginfo.h
--- linux.orig/include/asm-generic/siginfo.h Wed Oct 23 00:54:24 2002
+++ linux.mytimers/include/asm-generic/siginfo.h Wed Oct 23 01:17:51 2002
@@ -43,8 +43,9 @@

/* POSIX.1b timers */
struct {
- unsigned int _timer1;
- unsigned int _timer2;
+ timer_t _tid; /* timer id */
+ int _overrun; /* overrun count */
+ sigval_t _sigval; /* same as below */
} _timer;

/* POSIX.1b signals */
@@ -86,8 +87,8 @@
*/
#define si_pid _sifields._kill._pid
#define si_uid _sifields._kill._uid
-#define si_timer1 _sifields._timer._timer1
-#define si_timer2 _sifields._timer._timer2
+#define si_tid _sifields._timer._tid
+#define si_overrun _sifields._timer._overrun
#define si_status _sifields._sigchld._status
#define si_utime _sifields._sigchld._utime
#define si_stime _sifields._sigchld._stime
@@ -221,6 +222,7 @@
#define SIGEV_SIGNAL 0 /* notify via signal */
#define SIGEV_NONE 1 /* other notification: meaningless */
#define SIGEV_THREAD 2 /* deliver via thread creation */
+#define SIGEV_THREAD_ID 4 /* deliver to thread */

#define SIGEV_MAX_SIZE 64
#ifndef SIGEV_PAD_SIZE
@@ -235,6 +237,7 @@
int sigev_notify;
union {
int _pad[SIGEV_PAD_SIZE];
+ int _tid;

struct {
void (*_function)(sigval_t);
@@ -247,6 +250,7 @@

#define sigev_notify_function _sigev_un._sigev_thread._function
#define sigev_notify_attributes _sigev_un._sigev_thread._attribute
+#define sigev_notify_thread_id _sigev_un._tid

#ifdef __KERNEL__

diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-i386/posix_types.h linux.mytimers/include/asm-i386/posix_types.h
--- linux.orig/include/asm-i386/posix_types.h Tue Jan 18 01:22:52 2000
+++ linux.mytimers/include/asm-i386/posix_types.h Wed Oct 23 01:17:51 2002
@@ -22,6 +22,8 @@
typedef long __kernel_time_t;
typedef long __kernel_suseconds_t;
typedef long __kernel_clock_t;
+typedef int __kernel_timer_t;
+typedef int __kernel_clockid_t;
typedef int __kernel_daddr_t;
typedef char * __kernel_caddr_t;
typedef unsigned short __kernel_uid16_t;
diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-i386/signal.h linux.mytimers/include/asm-i386/signal.h
--- linux.orig/include/asm-i386/signal.h Wed Oct 23 00:50:41 2002
+++ linux.mytimers/include/asm-i386/signal.h Wed Oct 23 01:17:51 2002
@@ -219,6 +219,73 @@

struct pt_regs;
extern int FASTCALL(do_signal(struct pt_regs *regs, sigset_t *oldset));
+/*
+ * These macros are used by nanosleep() and clock_nanosleep().
+ * The issue is that these functions need the *regs pointer which is
+ * passed in different ways by the differing archs.
+
+ * Below we do things in two differing ways. In the long run we would
+ * like to see nano_sleep() go away (glibc should call clock_nanosleep
+ * much as we do). When that happens and the nano_sleep() system
+ * call entry is retired, there will no longer be any real need for
+ * sys_nanosleep() so the FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP macro
+ * could be undefined, resulting in not needing to stack all the
+ * parms over again, i.e. better (faster AND smaller) code.
+
+ * And while were at it, there needs to be a way to set the return code
+ * on the way to do_signal(). It (i.e. do_signal()) saves the regs on
+ * the callers stack to call the user handler and then the return is
+ * done using those registers. This means that the error code MUST be
+ * set in the register PRIOR to calling do_signal(). See our answer
+ * below...thanks to Jim Houston <[email protected]>
+ */
+#define FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
+
+
+#ifdef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
+extern long do_clock_nanosleep(struct pt_regs *regs,
+ clockid_t which_clock,
+ int flags,
+ const struct timespec *rqtp,
+ struct timespec *rmtp);
+
+#define NANOSLEEP_ENTRY(a) \
+ asmlinkage long sys_nanosleep( struct timespec* rqtp, \
+ struct timespec * rmtp) \
+{ struct pt_regs *regs = (struct pt_regs *)&rqtp; \
+ return do_clock_nanosleep(regs, CLOCK_REALTIME, 0, rqtp, rmtp); \
+}
+
+#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
+ clockid_t which_clock, \
+ int flags, \
+ const struct timespec *rqtp, \
+ struct timespec *rmtp) \
+{ struct pt_regs *regs = (struct pt_regs *)&which_clock; \
+ return do_clock_nanosleep(regs, which_clock, flags, rqtp, rmtp); \
+} \
+long do_clock_nanosleep(struct pt_regs *regs, \
+ clockid_t which_clock, \
+ int flags, \
+ const struct timespec *rqtp, \
+ struct timespec *rmtp) \
+{ a
+
+#else
+#define NANOSLEEP_ENTRY(a) \
+ asmlinkage long sys_nanosleep( struct timespec* rqtp, \
+ struct timespec * rmtp) \
+{ struct pt_regs *regs = (struct pt_regs *)&rqtp; \
+ a
+#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
+ clockid_t which_clock, \
+ int flags, \
+ const struct timespec *rqtp, \
+ struct timespec *rmtp) \
+{ struct pt_regs *regs = (struct pt_regs *)&which_clock; \
+ a
+#endif
+#define _do_signal() (regs->eax = -EINTR, do_signal(regs, NULL))

#endif /* __KERNEL__ */

diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-i386/unistd.h linux.mytimers/include/asm-i386/unistd.h
--- linux.orig/include/asm-i386/unistd.h Wed Oct 23 00:54:21 2002
+++ linux.mytimers/include/asm-i386/unistd.h Wed Oct 23 01:17:51 2002
@@ -258,6 +258,15 @@
#define __NR_free_hugepages 251
#define __NR_exit_group 252
#define __NR_lookup_dcookie 253
+#define __NR_timer_create 254
+#define __NR_timer_settime (__NR_timer_create+1)
+#define __NR_timer_gettime (__NR_timer_create+2)
+#define __NR_timer_getoverrun (__NR_timer_create+3)
+#define __NR_timer_delete (__NR_timer_create+4)
+#define __NR_clock_settime (__NR_timer_create+5)
+#define __NR_clock_gettime (__NR_timer_create+6)
+#define __NR_clock_getres (__NR_timer_create+7)
+#define __NR_clock_nanosleep (__NR_timer_create+8)


/* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */
diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/id2ptr.h linux.mytimers/include/linux/id2ptr.h
--- linux.orig/include/linux/id2ptr.h Wed Dec 31 19:00:00 1969
+++ linux.mytimers/include/linux/id2ptr.h Wed Oct 23 01:25:23 2002
@@ -0,0 +1,47 @@
+/*
+ * include/linux/id2ptr.h
+ *
+ * 2002-10-18 written by Jim Houston [email protected]
+ * Copyright (C) 2002 by Concurrent Computer Corporation
+ * Distributed under the GNU GPL license version 2.
+ *
+ * Small id to pointer translation service avoiding fixed sized
+ * tables.
+ */
+
+#define ID_BITS 5
+#define ID_MASK ((1 << ID_BITS)-1)
+#define ID_FULL ((1 << (1 << ID_BITS))-1)
+
+/* Number of id_layer structs to leave in free list */
+#define ID_FREE_MAX 6
+
+struct id_layer {
+ unsigned int bitmap;
+ struct id_layer *ary[1<<ID_BITS];
+};
+
+struct id {
+ int layers;
+ int last;
+ int count;
+ int min_wrap;
+ struct id_layer *top;
+};
+
+void *id2ptr_lookup(struct id *idp, int id);
+int id2ptr_new(struct id *idp, void *ptr);
+void id2ptr_remove(struct id *idp, int id);
+void id2ptr_init(struct id *idp, int min_wrap);
+
+
+static inline void update_bitmap(struct id_layer *p, int bit)
+{
+ if (p->ary[bit] && p->ary[bit]->bitmap == 0xffffffff)
+ p->bitmap |= 1<<bit;
+ else
+ p->bitmap &= ~(1<<bit);
+}
+
+extern kmem_cache_t *id_layer_cache;
+
diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/init_task.h linux.mytimers/include/linux/init_task.h
--- linux.orig/include/linux/init_task.h Wed Oct 23 00:54:03 2002
+++ linux.mytimers/include/linux/init_task.h Wed Oct 23 01:17:51 2002
@@ -93,6 +93,7 @@
.sig = &init_signals, \
.pending = { NULL, &tsk.pending.head, {{0}}}, \
.blocked = {{0}}, \
+ .posix_timers = LIST_HEAD_INIT(tsk.posix_timers), \
.alloc_lock = SPIN_LOCK_UNLOCKED, \
.switch_lock = SPIN_LOCK_UNLOCKED, \
.journal_info = NULL, \
diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/posix-timers.h linux.mytimers/include/linux/posix-timers.h
--- linux.orig/include/linux/posix-timers.h Wed Dec 31 19:00:00 1969
+++ linux.mytimers/include/linux/posix-timers.h Wed Oct 23 01:25:02 2002
@@ -0,0 +1,81 @@
+/*
+ * include/linux/posix-timers.h
+ *
+ * 2002-10-22 written by Jim Houston [email protected]
+ * Copyright (C) 2002 by Concurrent Computer Corporation
+ * Distributed under the GNU GPL license version 2.
+ *
+ */
+
+#ifndef _linux_POSIX_TIMERS_H
+#define _linux_POSIX_TIMERS_H
+
+/* This should be in posix-timers.h - but this is easier now. */
+
+enum timer_type {
+ TIMER,
+ NANOSLEEP
+};
+
+struct k_itimer {
+ struct list_head it_pq_list; /* fields for timer priority queue. */
+ struct rb_node it_pq_node;
+ struct timer_pq *it_pq; /* pointer to the queue. */
+
+ struct list_head it_task_list; /* list for exit_itimers */
+ spinlock_t it_lock;
+ clockid_t it_clock; /* which timer type */
+ timer_t it_id; /* timer id */
+ int it_overrun; /* overrun on pending signal */
+ int it_overrun_last; /* overrun on last delivered signal */
+ int it_overrun_deferred; /* overrun on pending timer interrupt */
+ int it_sigev_notify; /* notify word of sigevent struct */
+ int it_sigev_signo; /* signo word of sigevent struct */
+ sigval_t it_sigev_value; /* value word of sigevent struct */
+ struct task_struct *it_process; /* process to send signal to */
+ struct itimerspec it_v; /* expiry time & interval */
+ enum timer_type it_type;
+};
+
+/*
+ * The priority queue is a sorted doubly linked list ordered by
+ * expiry time. A rbtree is used as an index in to this list
+ * so that inserts are O(log2(n)).
+ */
+
+struct timer_pq {
+ struct list_head head;
+ struct rb_root rb_root;
+};
+
+#define TIMER_PQ_INIT(name) { \
+ .rb_root = RB_ROOT, \
+ .head = LIST_HEAD_INIT(name.head), \
+}
+
+
+#if 0
+#include <linux/posix-timers.h>
+#endif
+
+struct k_clock {
+ struct timer_pq pq;
+ int res; /* in nano seconds */
+ int ( *clock_set)(struct timespec *tp);
+ int ( *clock_get)(struct timespec *tp);
+ int ( *nsleep)( int flags,
+ struct timespec*new_setting,
+ struct itimerspec *old_setting);
+ int ( *timer_set)(struct k_itimer *timr, int flags,
+ struct itimerspec *new_setting,
+ struct itimerspec *old_setting);
+ int ( *timer_del)(struct k_itimer *timr);
+ void ( *timer_get)(struct k_itimer *timr,
+ struct itimerspec *cur_setting);
+};
+
+int do_posix_clock_monotonic_gettime(struct timespec *tp);
+int do_posix_clock_monotonic_settime(struct timespec *tp);
+asmlinkage int sys_timer_delete(timer_t timer_id);
+
+#endif
diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/sched.h linux.mytimers/include/linux/sched.h
--- linux.orig/include/linux/sched.h Wed Oct 23 00:54:28 2002
+++ linux.mytimers/include/linux/sched.h Wed Oct 23 01:31:41 2002
@@ -29,6 +29,7 @@
#include <linux/compiler.h>
#include <linux/completion.h>
#include <linux/pid.h>
+#include <linux/posix-timers.h>

struct exec_domain;

@@ -333,6 +334,8 @@
unsigned long it_real_value, it_prof_value, it_virt_value;
unsigned long it_real_incr, it_prof_incr, it_virt_incr;
struct timer_list real_timer;
+ struct list_head posix_timers; /* POSIX.1b Interval Timers */
+ struct k_itimer nanosleep_tmr;
unsigned long utime, stime, cutime, cstime;
unsigned long start_time;
long per_cpu_utime[NR_CPUS], per_cpu_stime[NR_CPUS];
@@ -637,6 +640,7 @@

extern void exit_mm(struct task_struct *);
extern void exit_files(struct task_struct *);
+extern void exit_itimers(struct task_struct *, int);
extern void exit_sighand(struct task_struct *);
extern void __exit_sighand(struct task_struct *);

diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/signal.h linux.mytimers/include/linux/signal.h
--- linux.orig/include/linux/signal.h Wed Oct 23 00:53:01 2002
+++ linux.mytimers/include/linux/signal.h Wed Oct 23 01:17:51 2002
@@ -224,6 +224,36 @@
struct pt_regs;
extern int get_signal_to_deliver(siginfo_t *info, struct pt_regs *regs);
#endif
+/*
+ * We would like the asm/signal.h code to define these so that the using
+ * function can call do_signal(). In loo of that, we define a genaric
+ * version that pretends that do_signal() was called and delivered a signal.
+ * To see how this is used, see nano_sleep() in timer.c and the i386 version
+ * in asm_i386/signal.h.
+ */
+#ifndef PT_REGS_ENTRY
+#define PT_REGS_ENTRY(type,name,p1_type,p1, p2_type,p2) \
+type name(p1_type p1,p2_type p2)\
+{
+#endif
+#ifndef _do_signal
+#define _do_signal() 1
+#endif
+#ifndef NANOSLEEP_ENTRY
+#define NANOSLEEP_ENTRY(a) asmlinkage long sys_nanosleep( struct timespec* rqtp, \
+ struct timespec * rmtp) \
+{ a
+#endif
+#ifndef CLOCK_NANOSLEEP_ENTRY
+#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
+ clockid_t which_clock, \
+ int flags, \
+ const struct timespec *rqtp, \
+ struct timespec *rmtp) \
+{ a
+
+#endif
+

#endif /* __KERNEL__ */

diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/sys.h linux.mytimers/include/linux/sys.h
--- linux.orig/include/linux/sys.h Sun Dec 10 23:56:37 1995
+++ linux.mytimers/include/linux/sys.h Wed Oct 23 01:17:51 2002
@@ -4,7 +4,7 @@
/*
* system call entry points ... but not all are defined
*/
-#define NR_syscalls 256
+#define NR_syscalls 275

/*
* These are system calls that will be removed at some time
diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/time.h linux.mytimers/include/linux/time.h
--- linux.orig/include/linux/time.h Wed Oct 23 00:53:34 2002
+++ linux.mytimers/include/linux/time.h Wed Oct 23 01:17:51 2002
@@ -38,6 +38,19 @@
*/
#define MAX_JIFFY_OFFSET ((~0UL >> 1)-1)

+/* Parameters used to convert the timespec values */
+#ifndef USEC_PER_SEC
+#define USEC_PER_SEC (1000000L)
+#endif
+
+#ifndef NSEC_PER_SEC
+#define NSEC_PER_SEC (1000000000L)
+#endif
+
+#ifndef NSEC_PER_USEC
+#define NSEC_PER_USEC (1000L)
+#endif
+
static __inline__ unsigned long
timespec_to_jiffies(struct timespec *value)
{
@@ -124,6 +137,8 @@
#ifdef __KERNEL__
extern void do_gettimeofday(struct timeval *tv);
extern void do_settimeofday(struct timeval *tv);
+extern int do_sys_settimeofday(struct timeval *tv, struct timezone *tz);
+extern void clock_was_set(void); // call when ever the clock is set
#endif

#define FD_SETSIZE __FD_SETSIZE
@@ -149,5 +164,25 @@
struct timeval it_interval; /* timer interval */
struct timeval it_value; /* current value */
};
+
+
+/*
+ * The IDs of the various system clocks (for POSIX.1b interval timers).
+ */
+#define CLOCK_REALTIME 0
+#define CLOCK_MONOTONIC 1
+#define CLOCK_PROCESS_CPUTIME_ID 2
+#define CLOCK_THREAD_CPUTIME_ID 3
+#define CLOCK_REALTIME_HR 4
+#define CLOCK_MONOTONIC_HR 5
+
+#define MAX_CLOCKS 6
+
+/*
+ * The various flags for setting POSIX.1b interval timers.
+ */
+
+#define TIMER_ABSTIME 0x01
+

#endif
diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/types.h linux.mytimers/include/linux/types.h
--- linux.orig/include/linux/types.h Wed Oct 23 00:54:17 2002
+++ linux.mytimers/include/linux/types.h Wed Oct 23 01:17:51 2002
@@ -23,6 +23,8 @@
typedef __kernel_daddr_t daddr_t;
typedef __kernel_key_t key_t;
typedef __kernel_suseconds_t suseconds_t;
+typedef __kernel_timer_t timer_t;
+typedef __kernel_clockid_t clockid_t;

#ifdef __KERNEL__
typedef __kernel_uid32_t uid_t;
diff -X /usr1/jhouston/dontdiff -urN linux.orig/init/Config.help linux.mytimers/init/Config.help
--- linux.orig/init/Config.help Wed Oct 23 00:50:42 2002
+++ linux.mytimers/init/Config.help Wed Oct 23 01:17:51 2002
@@ -115,3 +115,11 @@
replacement for kerneld.) Say Y here and read about configuring it
in <file:Documentation/kmod.txt>.

+Maximum number of POSIX timers
+CONFIG_MAX_POSIX_TIMERS
+ This option allows you to configure the system wide maximum number of
+ POSIX timers. Timers are allocated as needed so the only memory
+ overhead this adds is about 4 bytes for every 50 or so timers to keep
+ track of each block of timers. The system quietly rounds this number
+ up to fill out a timer allocation block. It is ok to have several
+ thousand timers as needed by your applications.
diff -X /usr1/jhouston/dontdiff -urN linux.orig/init/Config.in linux.mytimers/init/Config.in
--- linux.orig/init/Config.in Wed Oct 23 00:50:45 2002
+++ linux.mytimers/init/Config.in Wed Oct 23 01:17:51 2002
@@ -9,6 +9,7 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+int 'System wide maximum number of POSIX timers' CONFIG_MAX_POSIX_TIMERS 3000
endmenu

mainmenu_option next_comment
diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/Makefile linux.mytimers/kernel/Makefile
--- linux.orig/kernel/Makefile Wed Oct 23 00:54:21 2002
+++ linux.mytimers/kernel/Makefile Wed Oct 23 01:24:01 2002
@@ -10,7 +10,7 @@
module.o exit.o itimer.o time.o softirq.o resource.o \
sysctl.o capability.o ptrace.o timer.o user.o \
signal.o sys.o kmod.o workqueue.o futex.o platform.o pid.o \
- rcupdate.o
+ rcupdate.o posix-timers.o id2ptr.o

obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
obj-$(CONFIG_SMP) += cpu.o
diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/exit.c linux.mytimers/kernel/exit.c
--- linux.orig/kernel/exit.c Wed Oct 23 00:54:21 2002
+++ linux.mytimers/kernel/exit.c Wed Oct 23 01:22:00 2002
@@ -647,6 +647,7 @@
__exit_files(tsk);
__exit_fs(tsk);
exit_namespace(tsk);
+ exit_itimers(tsk, 1);
exit_thread();

if (current->leader)
diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/fork.c linux.mytimers/kernel/fork.c
--- linux.orig/kernel/fork.c Wed Oct 23 00:54:17 2002
+++ linux.mytimers/kernel/fork.c Wed Oct 23 01:17:51 2002
@@ -783,6 +783,7 @@
goto bad_fork_cleanup_files;
if (copy_sighand(clone_flags, p))
goto bad_fork_cleanup_fs;
+ INIT_LIST_HEAD(&p->posix_timers);
if (copy_mm(clone_flags, p))
goto bad_fork_cleanup_sighand;
if (copy_namespace(clone_flags, p))
diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/id2ptr.c linux.mytimers/kernel/id2ptr.c
--- linux.orig/kernel/id2ptr.c Wed Dec 31 19:00:00 1969
+++ linux.mytimers/kernel/id2ptr.c Wed Oct 23 01:23:24 2002
@@ -0,0 +1,223 @@
+/*
+ * linux/kernel/id2ptr.c
+ *
+ * 2002-10-18 written by Jim Houston [email protected]
+ * Copyright (C) 2002 by Concurrent Computer Corporation
+ * Distributed under the GNU GPL license version 2.
+ *
+ * Small id to pointer translation service.
+ *
+ * It uses a radix tree like structure as a sparse array indexed
+ * by the id to obtain the pointer. A bit map is included in each
+ * level of the tree which identifies portions of the tree which
+ * are completely full. This makes the process of allocating a
+ * new id quick.
+ */
+
+
+#include <linux/slab.h>
+#include <linux/id2ptr.h>
+#include <linux/init.h>
+#include <linux/string.h>
+
+static kmem_cache_t *id_layer_cache;
+spinlock_t id_lock = SPIN_LOCK_UNLOCKED;
+
+/*
+ * Since we can't allocate memory with spinlock held and dropping the
+ * lock to allocate gets ugly keep a free list which will satisfy the
+ * worst case allocation.
+ */
+
+struct id_layer *id_free;
+int id_free_cnt;
+
+static inline struct id_layer *alloc_layer(void)
+{
+ struct id_layer *p;
+
+ if (!(p = id_free))
+ BUG();
+ id_free = p->ary[0];
+ id_free_cnt--;
+ p->ary[0] = 0;
+ return(p);
+}
+
+static inline void free_layer(struct id_layer *p)
+{
+ p->ary[0] = id_free;
+ id_free = p;
+ id_free_cnt++;
+}
+
+/*
+ * Lookup the kernel pointer associated with a user supplied
+ * id value.
+ */
+void *id2ptr_lookup(struct id *idp, int id)
+{
+ int n;
+ struct id_layer *p;
+
+ if (id <= 0)
+ return(NULL);
+ id--;
+ spin_lock_irq(&id_lock);
+ n = idp->layers * ID_BITS;
+ p = idp->top;
+ if (id >= (1 << n)) {
+ spin_unlock_irq(&id_lock);
+ return(NULL);
+ }
+
+ while (n > 0 && p) {
+ n -= ID_BITS;
+ p = p->ary[(id >> n) & ID_MASK];
+ }
+ spin_unlock_irq(&id_lock);
+ return((void *)p);
+}
+
+static int sub_alloc(struct id_layer *p, int shift, int id, void *ptr)
+{
+ int n = (id >> shift) & ID_MASK;
+ int bitmap = p->bitmap;
+ int id_base = id & ~((1 << (shift+ID_BITS))-1);
+ int v;
+
+ for ( ; n <= ID_MASK; n++, id = id_base + (n << shift)) {
+ if (bitmap & (1 << n))
+ continue;
+ if (shift == 0) {
+ p->ary[n] = (struct id_layer *)ptr;
+ p->bitmap |= 1<<n;
+ return(id);
+ }
+ if (!p->ary[n])
+ p->ary[n] = alloc_layer();
+ if ((v = sub_alloc(p->ary[n], shift-ID_BITS, id, ptr))) {
+ update_bitmap(p, n);
+ return(v);
+ }
+ }
+ return(0);
+}
+
+/*
+ * Allocate a new id associate the value ptr with this new id.
+ */
+int id2ptr_new(struct id *idp, void *ptr)
+{
+ int n, last, id, v;
+ struct id_layer *new;
+
+ spin_lock_irq(&id_lock);
+ n = idp->layers * ID_BITS;
+ last = idp->last;
+ while (id_free_cnt < n+1) {
+ spin_unlock_irq(&id_lock);
+ new = kmem_cache_alloc(id_layer_cache, GFP_KERNEL);
+ memset(new, 0, sizeof(struct id_layer));
+ spin_lock_irq(&id_lock);
+ free_layer(new);
+ }
+ /*
+ * Add a new layer if the array is full or the last id
+ * was at the limit and we don't want to wrap.
+ */
+ if ((last == ((1 << n)-1) && last < idp->min_wrap) ||
+ idp->count == (1 << n)) {
+ ++idp->layers;
+ n += ID_BITS;
+ new = alloc_layer();
+ new->ary[0] = idp->top;
+ idp->top = new;
+ update_bitmap(new, 0);
+ }
+ if (last >= ((1 << n)-1))
+ last = 0;
+
+ /*
+ * Search for a free id starting after last id allocated.
+ * If that fails wrap back to start.
+ */
+ id = last+1;
+ if (!(v = sub_alloc(idp->top, n-ID_BITS, id, ptr)))
+ v = sub_alloc(idp->top, n-ID_BITS, 1, ptr);
+ idp->last = v;
+ idp->count++;
+ spin_unlock_irq(&id_lock);
+ return(v+1);
+}
+
+
+static int sub_remove(struct id_layer *p, int shift, int id)
+{
+ int n = (id >> shift) & ID_MASK;
+ int i, bitmap, rv;
+
+ rv = 0;
+ bitmap = p->bitmap & ~(1<<n);
+ p->bitmap = bitmap;
+ if (shift == 0) {
+ p->ary[n] = NULL;
+ rv = !bitmap;
+ } else {
+ if (sub_remove(p->ary[n], shift-ID_BITS, id)) {
+ free_layer(p->ary[n]);
+ p->ary[n] = 0;
+ for (i = 0; i < (1 << ID_BITS); i++)
+ if (p->ary[i])
+ break;
+ if (i == (1 << ID_BITS))
+ rv = 1;
+ }
+ }
+ return(rv);
+}
+
+/*
+ * Remove (free) an id value and break the association with
+ * the kernel pointer.
+ */
+void id2ptr_remove(struct id *idp, int id)
+{
+ struct id_layer *p;
+
+ if (id <= 0)
+ return;
+ id--;
+ spin_lock_irq(&id_lock);
+ sub_remove(idp->top, (idp->layers-1)*ID_BITS, id);
+ idp->count--;
+ if (id_free_cnt >= ID_FREE_MAX) {
+
+ p = alloc_layer();
+ spin_unlock_irq(&id_lock);
+ kmem_cache_free(id_layer_cache, p);
+ return;
+ }
+ spin_unlock_irq(&id_lock);
+}
+
+void init_id_cache(void)
+{
+ if (!id_layer_cache)
+ id_layer_cache = kmem_cache_create("id_layer_cache",
+ sizeof(struct id_layer), 0, 0, 0, 0);
+}
+
+void id2ptr_init(struct id *idp, int min_wrap)
+{
+ init_id_cache();
+ idp->count = 1;
+ idp->last = 0;
+ idp->layers = 1;
+ idp->top = kmem_cache_alloc(id_layer_cache, GFP_KERNEL);
+ memset(idp->top, 0, sizeof(struct id_layer));
+ idp->top->bitmap = 0;
+ idp->min_wrap = min_wrap;
+}
+
+__initcall(init_id_cache);
diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/posix-timers.c linux.mytimers/kernel/posix-timers.c
--- linux.orig/kernel/posix-timers.c Wed Dec 31 19:00:00 1969
+++ linux.mytimers/kernel/posix-timers.c Wed Oct 23 01:56:45 2002
@@ -0,0 +1,1109 @@
+/*
+ * linux/kernel/posix_timers.c
+ *
+ *
+ * 2002-10-15 Posix Clocks & timers by George Anzinger
+ * Copyright (C) 2002 by MontaVista Software.
+ *
+ * 2002-10-18 changes by Jim Houston [email protected]
+ * Copyright (C) 2002 by Concurrent Computer Corp.
+ *
+ * - Add a separate queue for posix timers. Its a
+ * priority queue implemented as a sorted doubly
+ * linked list & a rbtree as an index into the list.
+ * - Use a slab cache to allocate the timer structures.
+ * - Allocate timer ids using my new id allocator.
+ * This avoids the immediate reuse of timer ids.
+ * - Uses seconds and nano-seconds rather than
+ * jiffies and sub_jiffies.
+ *
+ * This is an experimental change. I'm sending it out to
+ * the mailing list in the hope that it will stimulate
+ * discussion.
+ */
+
+/* These are all the functions necessary to implement
+ * POSIX clocks & timers
+ */
+
+#include <linux/mm.h>
+#include <linux/smp_lock.h>
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/time.h>
+
+#include <asm/uaccess.h>
+#include <asm/semaphore.h>
+#include <linux/list.h>
+#include <linux/init.h>
+#include <linux/nmi.h>
+#include <linux/compiler.h>
+#include <linux/id2ptr.h>
+#include <linux/rbtree.h>
+#include <linux/posix-timers.h>
+
+
+#ifndef div_long_long_rem
+#include <asm/div64.h>
+
+#define div_long_long_rem(dividend,divisor,remainder) ({ \
+ u64 result = dividend; \
+ *remainder = do_div(result,divisor); \
+ result; })
+
+#endif /* ifndef div_long_long_rem */
+
+
+/*
+ * Lets keep our timers in a slab cache :-)
+ */
+static kmem_cache_t *posix_timers_cache;
+struct id posix_timers_id;
+
+/*
+ * This lock portects the timer queues it is held for the
+ * duration of the timer expiry process.
+ */
+spinlock_t posix_timers_lock = SPIN_LOCK_UNLOCKED;
+
+/*
+ * Kluge until I can wire into the timer interrupt.
+ */
+int poll_timer_running;
+void run_posix_timers(unsigned long dummy);
+static struct timer_list poll_posix_timers = {
+ .function = &run_posix_timers,
+};
+
+struct k_clock clock_realtime = {
+ .pq = TIMER_PQ_INIT(clock_realtime.pq),
+ .res = NSEC_PER_SEC/HZ,
+};
+
+struct k_clock clock_monotonic = {
+ .pq = TIMER_PQ_INIT(clock_monotonic.pq),
+ .res= NSEC_PER_SEC/HZ,
+ .clock_get = do_posix_clock_monotonic_gettime,
+ .clock_set = do_posix_clock_monotonic_settime
+};
+
+/*
+ * Insert a timer into a priority queue. This is a sorted
+ * list of timers. A rbtree is used to index the list.
+ */
+
+static int timer_insert_nolock(struct timer_pq *pq, struct k_itimer *t)
+{
+ struct rb_node ** p = &pq->rb_root.rb_node;
+ struct rb_node * parent = NULL;
+ struct k_itimer *cur;
+ struct list_head *prev;
+ prev = &pq->head;
+
+ if (t->it_pq)
+ BUG();
+ t->it_pq = pq;
+ while (*p) {
+ parent = *p;
+ cur = rb_entry(parent, struct k_itimer , it_pq_node);
+
+ /*
+ * We allow non unique entries. This works
+ * but there might be opportunity to do something
+ * clever.
+ */
+ if (t->it_v.it_value.tv_sec < cur->it_v.it_value.tv_sec ||
+ (t->it_v.it_value.tv_sec == cur->it_v.it_value.tv_sec &&
+ t->it_v.it_value.tv_nsec < cur->it_v.it_value.tv_nsec))
+ p = &(*p)->rb_left;
+ else {
+ prev = &cur->it_pq_list;
+ p = &(*p)->rb_right;
+ }
+ }
+ /* link into rbtree. */
+ rb_link_node(&t->it_pq_node, parent, p);
+ rb_insert_color(&t->it_pq_node, &pq->rb_root);
+ /* link it into the list */
+ list_add(&t->it_pq_list, prev);
+ /*
+ * We need to setup a timer interrupt if the new timer is
+ * at the head of the queue.
+ */
+ return(pq->head.next == &t->it_pq_list);
+}
+
+static inline void timer_remove_nolock(struct k_itimer *t)
+{
+ struct timer_pq *pq;
+
+ if (!(pq = t->it_pq))
+ return;
+ rb_erase(&t->it_pq_node, &pq->rb_root);
+ list_del(&t->it_pq_list);
+ t->it_pq = 0;
+}
+
+static void timer_remove(struct k_itimer *t)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&posix_timers_lock, flags);
+ timer_remove_nolock(t);
+ spin_unlock_irqrestore(&posix_timers_lock, flags);
+}
+
+
+static int timer_insert(struct timer_pq *pq, struct k_itimer *t)
+{
+ unsigned long flags;
+ int rv;
+
+ spin_lock_irqsave(&posix_timers_lock, flags);
+ rv = timer_insert_nolock(pq, t);
+ spin_unlock_irqrestore(&posix_timers_lock, flags);
+ if (!poll_timer_running) {
+ poll_timer_running = 1;
+ poll_posix_timers.expires = jiffies + 1;
+ add_timer(&poll_posix_timers);
+ }
+ return(rv);
+}
+
+/*
+ * If we are late delivering a periodic timer we may
+ * have missed several expiries. We want to calculate the
+ * number we have missed both as the overrun count but also
+ * so that we can pick next expiry.
+ *
+ * You really need this if you schedule a high frequency timer
+ * and then make a big change to the current time.
+ */
+
+int handle_overrun(struct k_itimer *t, struct timespec dt)
+{
+ int ovr;
+#if 0
+ long long ldt, in;
+ long sec, nsec;
+
+ in = (long long)t->it_v.it_interval.tv_sec*1000000000 +
+ t->it_v.it_interval.tv_nsec;
+ ldt = (long long)dt.tv_sec * 1000000000 + dt.tv_nsec;
+ ovr = ldt/in + 1;
+ ldt = (long long)t->it_v.it_interval.tv_nsec * ovr;
+ nsec = ldt % 1000000000;
+ sec = ldt / 1000000000;
+ sec += ovr * t->it_v.it_interval.tv_sec;
+ nsec += t->it_v.it_value.tv_nsec;
+ sec += t->it_v.it_value.tv_sec;
+ if (nsec > 1000000000) {
+ sec++;
+ nsec -= 1000000000;
+ }
+ t->it_v.it_value.tv_sec = sec;
+ t->it_v.it_value.tv_nsec = nsec;
+#else
+ /* Temporary hack */
+ ovr = 0;
+ while (dt.tv_sec > t->it_v.it_interval.tv_sec ||
+ (dt.tv_sec == t->it_v.it_interval.tv_sec &&
+ dt.tv_nsec > t->it_v.it_interval.tv_nsec)) {
+ dt.tv_sec -= t->it_v.it_interval.tv_sec;
+ dt.tv_nsec -= t->it_v.it_interval.tv_nsec;
+ if (dt.tv_nsec < 0) {
+ dt.tv_sec--;
+ dt.tv_nsec += 1000000000;
+ }
+ t->it_v.it_value.tv_sec += t->it_v.it_interval.tv_sec;
+ t->it_v.it_value.tv_nsec += t->it_v.it_interval.tv_nsec;
+ if (t->it_v.it_value.tv_nsec >= 1000000000) {
+ t->it_v.it_value.tv_sec++;
+ t->it_v.it_value.tv_nsec -= 1000000000;
+ }
+ ovr++;
+ }
+#endif
+ return(ovr);
+}
+
+int sending_signal_failed;
+
+/*
+ * Yes I calculate an overrun but don't deliver it. I need to
+ * play with this code.
+ */
+static void timer_notify_task(struct k_itimer *timr, int ovr)
+{
+ struct siginfo info;
+ int ret;
+
+ if (! (timr->it_sigev_notify & SIGEV_NONE)) {
+ memset(&info, 0, sizeof(info));
+ /* Send signal to the process that owns this timer. */
+ info.si_signo = timr->it_sigev_signo;
+ info.si_errno = 0;
+ info.si_code = SI_TIMER;
+ info.si_tid = timr->it_id;
+ info.si_value = timr->it_sigev_value;
+ info.si_overrun = timr->it_overrun_deferred;
+ ret = send_sig_info(info.si_signo, &info, timr->it_process);
+ switch (ret) {
+ case 0: /* all's well new signal queued */
+ timr->it_overrun_last = timr->it_overrun;
+ timr->it_overrun = timr->it_overrun_deferred;
+ break;
+ case 1: /* signal from this timer was already in the queue */
+ timr->it_overrun += timr->it_overrun_deferred + 1;
+ break;
+ default:
+ sending_signal_failed++;
+ break;
+ }
+ }
+}
+
+void do_expiry(struct k_itimer *t, int ovr)
+{
+ switch (t->it_type) {
+ case TIMER:
+ timer_notify_task(t, ovr);
+ return;
+ case NANOSLEEP:
+ wake_up_process(t->it_process);
+ return;
+ }
+}
+
+/*
+ * Check if the timer at the head of the priority queue has
+ * expired and handle the expiry. Return time in nsec till
+ * the next expiry. We only really care about expiries
+ * before the next clock tick so we use a 32 bit int here.
+ */
+
+static int check_expiry(struct timer_pq *pq, struct timespec *tv)
+{
+ struct k_itimer *t;
+ struct timespec dt;
+ int ovr;
+ long sec, nsec;
+ unsigned long flags;
+
+ ovr = 1;
+ spin_lock_irqsave(&posix_timers_lock, flags);
+ while (!list_empty(&pq->head)) {
+ t = list_entry(pq->head.next, struct k_itimer, it_pq_list);
+ dt.tv_sec = tv->tv_sec - t->it_v.it_value.tv_sec;
+ dt.tv_nsec = tv->tv_nsec - t->it_v.it_value.tv_nsec;
+ if (dt.tv_sec < 0 || (dt.tv_sec == 0 && dt.tv_nsec < 0)) {
+ /*
+ * It has not expired yet. Return nano-seconds
+ * remaining if its less than a second.
+ */
+ if (dt.tv_sec < -1)
+ nsec = -1;
+ else
+ nsec = dt.tv_sec ? 1000000000-dt.tv_nsec :
+ -dt.tv_nsec;
+ spin_unlock_irqrestore(&posix_timers_lock, flags);
+ return(nsec);
+ }
+ /*
+ * Its expired. If this is a periodic timer we need to
+ * setup for the next expiry. We also check for overrun
+ * here. If the timer has already missed an expiry we want
+ * deliver the overrun information and get back on schedule.
+ */
+ if (dt.tv_nsec < 0) {
+ dt.tv_sec--;
+ dt.tv_nsec += 1000000000;
+ }
+ timer_remove_nolock(t);
+ if (t->it_v.it_interval.tv_sec || t->it_v.it_interval.tv_nsec) {
+ if (dt.tv_sec > t->it_v.it_interval.tv_sec ||
+ (dt.tv_sec == t->it_v.it_interval.tv_sec &&
+ dt.tv_nsec > t->it_v.it_interval.tv_nsec)) {
+ ovr = handle_overrun(t, dt);
+ } else {
+ nsec = t->it_v.it_value.tv_nsec +
+ t->it_v.it_interval.tv_nsec;
+ sec = t->it_v.it_value.tv_sec +
+ t->it_v.it_interval.tv_sec;
+ if (nsec > 1000000000) {
+ nsec -= 1000000000;
+ sec++;
+ }
+ t->it_v.it_value.tv_sec = sec;
+ t->it_v.it_value.tv_nsec = nsec;
+ }
+ /*
+ * It might make sense to leave the timer queue and
+ * avoid the remove/insert for timers which stay
+ * at the front of the queue.
+ */
+ timer_insert_nolock(pq, t);
+ }
+ do_expiry(t, ovr);
+ }
+ spin_unlock_irqrestore(&posix_timers_lock, flags);
+ return(-1);
+}
+
+/*
+ * kluge? We should know the offset between clock_realtime and
+ * clock_monotonic so we don't need to get the time twice.
+ */
+
+void run_posix_timers(unsigned long dummy)
+{
+ struct timespec now;
+ int ns, ret;
+
+ ns = 0x7fffffff;
+ do_posix_clock_monotonic_gettime(&now);
+ ret = check_expiry(&clock_monotonic.pq, &now);
+ if (ret > 0 && ret < ns)
+ ns = ret;
+
+ do_gettimeofday((struct timeval*)&now);
+ now.tv_nsec *= NSEC_PER_USEC;
+ ret = check_expiry(&clock_realtime.pq, &now);
+ if (ret > 0 && ret < ns)
+ ns = ret;
+ poll_posix_timers.expires = jiffies + 1;
+ add_timer(&poll_posix_timers);
+}
+
+
+extern rwlock_t xtime_lock;
+
+/*
+ * CLOCKs: The POSIX standard calls for a couple of clocks and allows us
+ * to implement others. This structure defines the various
+ * clocks and allows the possibility of adding others. We
+ * provide an interface to add clocks to the table and expect
+ * the "arch" code to add at least one clock that is high
+ * resolution. Here we define the standard CLOCK_REALTIME as a
+ * 1/HZ resolution clock.
+
+ * CPUTIME & THREAD_CPUTIME: We are not, at this time, definding these
+ * two clocks (and the other process related clocks (Std
+ * 1003.1d-1999). The way these should be supported, we think,
+ * is to use large negative numbers for the two clocks that are
+ * pinned to the executing process and to use -pid for clocks
+ * pinned to particular pids. Calls which supported these clock
+ * ids would split early in the function.
+
+ * RESOLUTION: Clock resolution is used to round up timer and interval
+ * times, NOT to report clock times, which are reported with as
+ * much resolution as the system can muster. In some cases this
+ * resolution may depend on the underlaying clock hardware and
+ * may not be quantifiable until run time, and only then is the
+ * necessary code is written. The standard says we should say
+ * something about this issue in the documentation...
+
+ * FUNCTIONS: The CLOCKs structure defines possible functions to handle
+ * various clock functions. For clocks that use the standard
+ * system timer code these entries should be NULL. This will
+ * allow dispatch without the overhead of indirect function
+ * calls. CLOCKS that depend on other sources (e.g. WWV or GPS)
+ * must supply functions here, even if the function just returns
+ * ENOSYS. The standard POSIX timer management code assumes the
+ * following: 1.) The k_itimer struct (sched.h) is used for the
+ * timer. 2.) The list, it_lock, it_clock, it_id and it_process
+ * fields are not modified by timer code.
+ *
+ * Permissions: It is assumed that the clock_settime() function defined
+ * for each clock will take care of permission checks. Some
+ * clocks may be set able by any user (i.e. local process
+ * clocks) others not. Currently the only set able clock we
+ * have is CLOCK_REALTIME and its high res counter part, both of
+ * which we beg off on and pass to do_sys_settimeofday().
+ */
+
+struct k_clock *posix_clocks[MAX_CLOCKS];
+
+#define if_clock_do(clock_fun, alt_fun,parms) (! clock_fun)? alt_fun parms :\
+ clock_fun parms
+
+#define p_timer_get( clock,a,b) if_clock_do((clock)->timer_get, \
+ do_timer_gettime, \
+ (a,b))
+
+#define p_nsleep( clock,a,b,c) if_clock_do((clock)->nsleep, \
+ do_nsleep, \
+ (a,b,c))
+
+#define p_timer_del( clock,a) if_clock_do((clock)->timer_del, \
+ do_timer_delete, \
+ (a))
+
+void register_posix_clock(int clock_id, struct k_clock * new_clock);
+
+static int do_posix_gettime(struct k_clock *clock, struct timespec *tp);
+
+
+void register_posix_clock(int clock_id,struct k_clock * new_clock)
+{
+ if ((unsigned)clock_id >= MAX_CLOCKS) {
+ printk("POSIX clock register failed for clock_id %d\n",clock_id);
+ return;
+ }
+ posix_clocks[clock_id] = new_clock;
+}
+
+static __init int init_posix_timers(void)
+{
+ posix_timers_cache = kmem_cache_create("posix_timers_cache",
+ sizeof(struct k_itimer), 0, 0, 0, 0);
+ id2ptr_init(&posix_timers_id, 1000);
+
+ register_posix_clock(CLOCK_REALTIME,&clock_realtime);
+ register_posix_clock(CLOCK_MONOTONIC,&clock_monotonic);
+ return 0;
+}
+
+__initcall(init_posix_timers);
+
+/*
+ * For some reason mips/mips64 define the SIGEV constants plus 128.
+ * Here we define a mask to get rid of the common bits. The
+ * optimizer should make this costless to all but mips.
+ */
+#if (ARCH == mips) || (ARCH == mips64)
+#define MIPS_SIGEV ~(SIGEV_NONE & \
+ SIGEV_SIGNAL & \
+ SIGEV_THREAD & \
+ SIGEV_THREAD_ID)
+#else
+#define MIPS_SIGEV (int)-1
+#endif
+
+static struct task_struct * good_sigevent(sigevent_t *event)
+{
+ struct task_struct * rtn = current;
+
+ if (event->sigev_notify & SIGEV_THREAD_ID & MIPS_SIGEV ) {
+ if ( !(rtn = find_task_by_pid(event->sigev_notify_thread_id)) ||
+ rtn->tgid != current->tgid){
+ return NULL;
+ }
+ }
+ if (event->sigev_notify & SIGEV_SIGNAL & MIPS_SIGEV) {
+ if ((unsigned)(event->sigev_signo > SIGRTMAX))
+ return NULL;
+ }
+ if (event->sigev_notify & ~(SIGEV_SIGNAL | SIGEV_THREAD_ID )) {
+ return NULL;
+ }
+ return rtn;
+}
+
+
+
+static struct k_itimer * alloc_posix_timer(void)
+{
+ struct k_itimer *tmr;
+ tmr = kmem_cache_alloc(posix_timers_cache, GFP_KERNEL);
+ memset(tmr, 0, sizeof(struct k_itimer));
+ return(tmr);
+}
+
+static void release_posix_timer(struct k_itimer *tmr)
+{
+ if (tmr->it_id > 0)
+ id2ptr_remove(&posix_timers_id, tmr->it_id);
+ kmem_cache_free(posix_timers_cache, tmr);
+}
+
+/* Create a POSIX.1b interval timer. */
+
+asmlinkage int
+sys_timer_create(clockid_t which_clock, struct sigevent *timer_event_spec,
+ timer_t *created_timer_id)
+{
+ int error = 0;
+ struct k_itimer *new_timer = NULL;
+ int new_timer_id;
+ struct task_struct * process = 0;
+ sigevent_t event;
+
+ if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
+ return -EINVAL;
+
+ new_timer = alloc_posix_timer();
+ if (new_timer == NULL) return -EAGAIN;
+
+ new_timer_id = (timer_t)id2ptr_new(&posix_timers_id,
+ (void *)new_timer);
+ if (!new_timer_id) {
+ error = -EAGAIN;
+ goto out;
+ }
+ new_timer->it_id = new_timer_id;
+
+ if (copy_to_user(created_timer_id, &new_timer_id,
+ sizeof(new_timer_id))) {
+ error = -EFAULT;
+ goto out;
+ }
+ spin_lock_init(&new_timer->it_lock);
+ if (timer_event_spec) {
+ if (copy_from_user(&event, timer_event_spec, sizeof(event))) {
+ error = -EFAULT;
+ goto out;
+ }
+ read_lock(&tasklist_lock);
+ if ((process = good_sigevent(&event))) {
+ /*
+ * We may be setting up this timer for another
+ * thread. It may be exitiing. To catch this
+ * case the we clear posix_timers.next in
+ * exit_itimers.
+ */
+ spin_lock(&process->alloc_lock);
+ if (process->posix_timers.next) {
+ list_add(&new_timer->it_task_list,
+ &process->posix_timers);
+ spin_unlock(&process->alloc_lock);
+ } else {
+ spin_unlock(&process->alloc_lock);
+ process = 0;
+ }
+ }
+ read_unlock(&tasklist_lock);
+ if (!process) {
+ error = -EINVAL;
+ goto out;
+ }
+ new_timer->it_sigev_notify = event.sigev_notify;
+ new_timer->it_sigev_signo = event.sigev_signo;
+ new_timer->it_sigev_value = event.sigev_value;
+ } else {
+ new_timer->it_sigev_notify = SIGEV_SIGNAL;
+ new_timer->it_sigev_signo = SIGALRM;
+ new_timer->it_sigev_value.sival_int = new_timer->it_id;
+ process = current;
+ spin_lock(&current->alloc_lock);
+ list_add(&new_timer->it_task_list, &current->posix_timers);
+ spin_unlock(&current->alloc_lock);
+ }
+ new_timer->it_clock = which_clock;
+ new_timer->it_overrun = 0;
+ new_timer->it_process = process;
+
+ out:
+ if (error)
+ release_posix_timer(new_timer);
+ return error;
+}
+
+
+/*
+ * return timer owned by the process, used by exit and exec
+ */
+void itimer_delete(struct k_itimer *timer)
+{
+ if (sys_timer_delete(timer->it_id)){
+ BUG();
+ }
+}
+
+/*
+ * This is call from both exec and exit to shutdown the
+ * timers.
+ */
+
+inline void exit_itimers(struct task_struct *tsk, int exit)
+{
+ struct k_itimer *tmr;
+
+ if (!tsk->posix_timers.next)
+ BUG();
+ if (tsk->nanosleep_tmr.it_pq)
+ timer_remove(&tsk->nanosleep_tmr);
+ spin_lock(&tsk->alloc_lock);
+ while (tsk->posix_timers.next != &tsk->posix_timers){
+ spin_unlock(&tsk->alloc_lock);
+ tmr = list_entry(tsk->posix_timers.next,struct k_itimer,
+ it_task_list);
+ itimer_delete(tmr);
+ spin_lock(&tsk->alloc_lock);
+ }
+ /*
+ * sys_timer_create has the option to create a timer
+ * for another thread. There is the risk that as the timer
+ * is being created that the thread that was supposed to handle
+ * the signal is exiting. We use the posix_timers.next field
+ * as a flag so we can close this race.
+` */
+ if (exit)
+ tsk->posix_timers.next = 0;
+ spin_unlock(&tsk->alloc_lock);
+}
+
+/* good_timespec
+ *
+ * This function checks the elements of a timespec structure.
+ *
+ * Arguments:
+ * ts : Pointer to the timespec structure to check
+ *
+ * Return value:
+ * If a NULL pointer was passed in, or the tv_nsec field was less than 0 or
+ * greater than NSEC_PER_SEC, or the tv_sec field was less than 0, this
+ * function returns 0. Otherwise it returns 1.
+ */
+
+static int good_timespec(const struct timespec *ts)
+{
+ if ((ts == NULL) ||
+ (ts->tv_sec < 0) ||
+ ((unsigned)ts->tv_nsec >= NSEC_PER_SEC))
+ return 0;
+ return 1;
+}
+
+static inline void unlock_timer(struct k_itimer *timr)
+{
+ spin_unlock_irq(&timr->it_lock);
+}
+
+static struct k_itimer* lock_timer( timer_t timer_id)
+{
+ struct k_itimer *timr;
+
+ timr = (struct k_itimer *)id2ptr_lookup(&posix_timers_id,
+ (int)timer_id);
+ if (timr)
+ spin_lock_irq(&timr->it_lock);
+ return(timr);
+}
+
+/*
+ * Get the time remaining on a POSIX.1b interval timer.
+ * This function is ALWAYS called with spin_lock_irq on the timer, thus
+ * it must not mess with irq.
+ */
+void inline do_timer_gettime(struct k_itimer *timr,
+ struct itimerspec *cur_setting)
+{
+ struct timespec ts;
+
+ do_posix_gettime(posix_clocks[timr->it_clock], &ts);
+ ts.tv_sec = timr->it_v.it_value.tv_sec - ts.tv_sec;
+ ts.tv_nsec = timr->it_v.it_value.tv_nsec - ts.tv_nsec;
+ if (ts.tv_nsec < 0) {
+ ts.tv_nsec += 1000000000;
+ ts.tv_sec--;
+ }
+ if (ts.tv_sec < 0)
+ ts.tv_sec = ts.tv_nsec = 0;
+ cur_setting->it_value = ts;
+ cur_setting->it_interval = timr->it_v.it_interval;
+}
+
+/* Get the time remaining on a POSIX.1b interval timer. */
+asmlinkage int sys_timer_gettime(timer_t timer_id, struct itimerspec *setting)
+{
+ struct k_itimer *timr;
+ struct itimerspec cur_setting;
+
+ timr = lock_timer(timer_id);
+ if (!timr) return -EINVAL;
+
+ p_timer_get(posix_clocks[timr->it_clock],timr, &cur_setting);
+
+ unlock_timer(timr);
+
+ if (copy_to_user(setting, &cur_setting, sizeof(cur_setting)))
+ return -EFAULT;
+
+ return 0;
+}
+/*
+ * Get the number of overruns of a POSIX.1b interval timer
+ * This is a bit messy as we don't easily know where he is in the delivery
+ * of possible multiple signals. We are to give him the overrun on the
+ * last delivery. If we have another pending, we want to make sure we
+ * use the last and not the current. If there is not another pending
+ * then he is current and gets the current overrun. We search both the
+ * shared and local queue.
+ */
+
+asmlinkage int sys_timer_getoverrun(timer_t timer_id)
+{
+ struct k_itimer *timr;
+ int overrun, i;
+ struct sigqueue *q;
+ struct sigpending *sig_queue;
+ struct task_struct * t;
+
+ timr = lock_timer( timer_id);
+ if (!timr) return -EINVAL;
+
+ t = timr->it_process;
+ overrun = timr->it_overrun;
+ spin_lock_irq(&t->sig->siglock);
+ for (sig_queue = &t->sig->shared_pending, i = 2; i;
+ sig_queue = &t->pending, i--){
+ for (q = sig_queue->head; q; q = q->next) {
+ if ((q->info.si_code == SI_TIMER) &&
+ (q->info.si_tid == timr->it_id)) {
+
+ overrun = timr->it_overrun_last;
+ goto out;
+ }
+ }
+ }
+ out:
+ spin_unlock_irq(&t->sig->siglock);
+
+ unlock_timer(timr);
+
+ return overrun;
+}
+
+/*
+ * If it is relative time, we need to add the current time to it to
+ * get the proper expiry time.
+ */
+static int adjust_rel_time(struct k_clock *clock,struct timespec *tp)
+{
+ struct timespec now;
+
+
+ do_posix_gettime(clock,&now);
+ tp->tv_sec += now.tv_sec;
+ tp->tv_nsec += now.tv_nsec;
+
+ /*
+ * Normalize...
+ */
+ if (( tp->tv_nsec - NSEC_PER_SEC) >= 0){
+ tp->tv_nsec -= NSEC_PER_SEC;
+ tp->tv_sec++;
+ }
+ return 0;
+}
+
+/* Set a POSIX.1b interval timer. */
+/* timr->it_lock is taken. */
+static inline int do_timer_settime(struct k_itimer *timr, int flags,
+ struct itimerspec *new_setting,
+ struct itimerspec *old_setting)
+{
+ struct k_clock * clock = posix_clocks[timr->it_clock];
+
+ timer_remove(timr);
+ if (old_setting) {
+ do_timer_gettime(timr, old_setting);
+ }
+
+
+ /* switch off the timer when it_value is zero */
+ if ((new_setting->it_value.tv_sec == 0) &&
+ (new_setting->it_value.tv_nsec == 0)) {
+ timr->it_v = *new_setting;
+ return 0;
+ }
+
+ if (!(flags & TIMER_ABSTIME))
+ adjust_rel_time(clock, &new_setting->it_value);
+
+ timr->it_v = *new_setting;
+ timr->it_overrun_deferred =
+ timr->it_overrun_last =
+ timr->it_overrun = 0;
+ timer_insert(&clock->pq, timr);
+ return 0;
+}
+
+
+
+/* Set a POSIX.1b interval timer */
+asmlinkage int sys_timer_settime(timer_t timer_id, int flags,
+ const struct itimerspec *new_setting,
+ struct itimerspec *old_setting)
+{
+ struct k_itimer *timr;
+ struct itimerspec new_spec, old_spec;
+ int error = 0;
+ struct itimerspec *rtn = old_setting ? &old_spec : NULL;
+
+
+ if (new_setting == NULL) {
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&new_spec, new_setting, sizeof(new_spec))) {
+ return -EFAULT;
+ }
+
+ if ((!good_timespec(&new_spec.it_interval)) ||
+ (!good_timespec(&new_spec.it_value))) {
+ return -EINVAL;
+ }
+
+ timr = lock_timer( timer_id);
+ if (!timr)
+ return -EINVAL;
+
+ if (! posix_clocks[timr->it_clock]->timer_set) {
+ error = do_timer_settime(timr, flags, &new_spec, rtn );
+ }else{
+ error = posix_clocks[timr->it_clock]->timer_set(timr,
+ flags,
+ &new_spec,
+ rtn );
+ }
+ unlock_timer(timr);
+
+ if (old_setting && ! error) {
+ if (copy_to_user(old_setting, &old_spec, sizeof(old_spec))) {
+ error = -EFAULT;
+ }
+ }
+
+ return error;
+}
+
+static inline int do_timer_delete(struct k_itimer *timer)
+{
+ timer_remove(timer);
+ return 0;
+}
+
+/* Delete a POSIX.1b interval timer. */
+asmlinkage int sys_timer_delete(timer_t timer_id)
+{
+ struct k_itimer *timer;
+
+ timer = lock_timer( timer_id);
+ if (!timer)
+ return -EINVAL;
+
+ p_timer_del(posix_clocks[timer->it_clock],timer);
+
+ spin_lock(&timer->it_process->alloc_lock);
+ list_del(&timer->it_task_list);
+ spin_unlock(&timer->it_process->alloc_lock);
+
+ /*
+ * This keeps any tasks waiting on the spin lock from thinking
+ * they got something (see the lock code above).
+ */
+ timer->it_process = NULL;
+ unlock_timer(timer);
+ release_posix_timer(timer);
+ return 0;
+}
+/*
+ * And now for the "clock" calls
+ * These functions are called both from timer functions (with the timer
+ * spin_lock_irq() held and from clock calls with no locking. They must
+ * use the save flags versions of locks.
+ */
+static int do_posix_gettime(struct k_clock *clock, struct timespec *tp)
+{
+
+ if (clock->clock_get){
+ return clock->clock_get(tp);
+ }
+
+ do_gettimeofday((struct timeval*)tp);
+ tp->tv_nsec *= NSEC_PER_USEC;
+ return 0;
+}
+
+/*
+ * We do ticks here to avoid the irq lock ( they take sooo long)
+ * Note also that the while loop assures that the sub_jiff_offset
+ * will be less than a jiffie, thus no need to normalize the result.
+ * Well, not really, if called with ints off :(
+ */
+
+int do_posix_clock_monotonic_gettime(struct timespec *tp)
+{
+ long sub_sec;
+ u64 jiffies_64_f;
+
+#if (BITS_PER_LONG > 32)
+
+ jiffies_64_f = jiffies_64;
+
+#elif defined(CONFIG_SMP)
+
+ /* Tricks don't work here, must take the lock. Remember, called
+ * above from both timer and clock system calls => save flags.
+ */
+ {
+ unsigned long flags;
+ read_lock_irqsave(&xtime_lock, flags);
+ jiffies_64_f = jiffies_64;
+
+
+ read_unlock_irqrestore(&xtime_lock, flags);
+ }
+#elif ! defined(CONFIG_SMP) && (BITS_PER_LONG < 64)
+ unsigned long jiffies_f;
+ do {
+ jiffies_f = jiffies;
+ barrier();
+ jiffies_64_f = jiffies_64;
+ } while (unlikely(jiffies_f != jiffies));
+
+
+#endif
+ tp->tv_sec = div_long_long_rem(jiffies_64_f,HZ,&sub_sec);
+
+ tp->tv_nsec = sub_sec * (NSEC_PER_SEC / HZ);
+ return 0;
+}
+
+int do_posix_clock_monotonic_settime(struct timespec *tp)
+{
+ return -EINVAL;
+}
+
+asmlinkage int sys_clock_settime(clockid_t which_clock,const struct timespec *tp)
+{
+ struct timespec new_tp;
+
+ if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
+ return -EINVAL;
+ if (copy_from_user(&new_tp, tp, sizeof(*tp)))
+ return -EFAULT;
+ if ( posix_clocks[which_clock]->clock_set){
+ return posix_clocks[which_clock]->clock_set(&new_tp);
+ }
+ new_tp.tv_nsec /= NSEC_PER_USEC;
+ return do_sys_settimeofday((struct timeval*)&new_tp,NULL);
+}
+asmlinkage int sys_clock_gettime(clockid_t which_clock, struct timespec *tp)
+{
+ struct timespec rtn_tp;
+ int error = 0;
+
+ if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
+ return -EINVAL;
+
+ error = do_posix_gettime(posix_clocks[which_clock],&rtn_tp);
+
+ if ( ! error) {
+ if (copy_to_user(tp, &rtn_tp, sizeof(rtn_tp))) {
+ error = -EFAULT;
+ }
+ }
+ return error;
+
+}
+asmlinkage int sys_clock_getres(clockid_t which_clock, struct timespec *tp)
+{
+ struct timespec rtn_tp;
+
+ if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
+ return -EINVAL;
+
+ rtn_tp.tv_sec = 0;
+ rtn_tp.tv_nsec = posix_clocks[which_clock]->res;
+ if ( tp){
+ if (copy_to_user(tp, &rtn_tp, sizeof(rtn_tp))) {
+ return -EFAULT;
+ }
+ }
+ return 0;
+
+}
+
+#if 0
+// This #if 0 is to keep the pretty printer/ formatter happy so the indents will
+// correct below.
+
+// The NANOSLEEP_ENTRY macro is defined in asm/signal.h and
+// is structured to allow code as well as entry definitions, so that when
+// we get control back here the entry parameters will be available as expected.
+// Some systems may find these paramerts in other ways than as entry parms,
+// for example, struct pt_regs *regs is defined in i386 as the address of the
+// first parameter, where as other archs pass it as one of the paramerters.
+
+asmlinkage long sys_clock_nanosleep(void)
+{
+#endif
+ CLOCK_NANOSLEEP_ENTRY( struct timespec ts;
+ struct k_itimer *t;
+ struct k_clock * clock;
+ int active;)
+
+ //asmlinkage int sys_clock_nanosleep(clockid_t which_clock,
+ // int flags,
+ // const struct timespec *rqtp,
+ // struct timespec *rmtp)
+ //{
+
+ if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
+ return -EINVAL;
+ /*
+ * See discussion below about waking up early.
+ */
+ clock = posix_clocks[which_clock];
+ t = &current->nanosleep_tmr;
+ if (t->it_pq)
+ timer_remove(t);
+
+ if(copy_from_user(&t->it_v.it_value, rqtp, sizeof(struct timespec)))
+ return -EFAULT;
+
+ if ((t->it_v.it_value.tv_nsec < 0) ||
+ (t->it_v.it_value.tv_nsec >= NSEC_PER_SEC) ||
+ (t->it_v.it_value.tv_sec < 0))
+ return -EINVAL;
+
+ if (!(flags & TIMER_ABSTIME))
+ adjust_rel_time(clock, &t->it_v.it_value);
+ /*
+ * These fields don't need to be setup each time. This
+ * should be in the INIT_TASK() and forgoten.
+ */
+ t->it_v.it_interval.tv_sec = 0;
+ t->it_v.it_interval.tv_nsec = 0;
+ t->it_type = NANOSLEEP;
+ t->it_process = current;
+
+ current->state = TASK_INTERRUPTIBLE;
+ timer_insert(&clock->pq, t);
+ schedule();
+ /*
+ * Were not supposed to leave early. The problem is
+ * being woken by signals that are not delivered to
+ * the user. Typically this means debug related
+ * signals.
+ *
+ * My plan is to leave the timer running and have a
+ * small hook in do_signal which will complete the
+ * nanosleep. For now we just return early in clear
+ * violation of the Posix spec.
+ */
+ active = (t->it_pq != 0);
+ if (!(flags & TIMER_ABSTIME) && active && rmtp ) {
+ do_posix_gettime(clock, &ts);
+ ts.tv_sec = t->it_v.it_value.tv_sec - ts.tv_sec;
+ ts.tv_nsec = t->it_v.it_value.tv_nsec - ts.tv_nsec;
+ if (ts.tv_nsec < 0) {
+ ts.tv_nsec += 1000000000;
+ ts.tv_sec--;
+ }
+ if (ts.tv_sec < 0)
+ ts.tv_sec = ts.tv_nsec = 0;
+ if (copy_to_user(rmtp, &ts, sizeof(struct timespec)))
+ return -EFAULT;
+ }
+ if (active)
+ return -EINTR;
+ return 0;
+}
+
+void clock_was_set(void)
+{
+}
diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/signal.c linux.mytimers/kernel/signal.c
--- linux.orig/kernel/signal.c Wed Oct 23 00:54:30 2002
+++ linux.mytimers/kernel/signal.c Wed Oct 23 01:17:51 2002
@@ -424,8 +424,6 @@
if (!collect_signal(sig, pending, info))
sig = 0;

- /* XXX: Once POSIX.1b timers are in, if si_code == SI_TIMER,
- we need to xchg out the timer overrun values. */
}
recalc_sigpending();

@@ -692,6 +690,7 @@
specific_send_sig_info(int sig, struct siginfo *info, struct task_struct *t, int shared)
{
int ret;
+ struct sigpending *sig_queue;

if (!irqs_disabled())
BUG();
@@ -725,20 +724,43 @@
if (ignored_signal(sig, t))
goto out;

+ sig_queue = shared ? &t->sig->shared_pending : &t->pending;
+
#define LEGACY_QUEUE(sigptr, sig) \
(((sig) < SIGRTMIN) && sigismember(&(sigptr)->signal, (sig)))
-
+ /*
+ * Support queueing exactly one non-rt signal, so that we
+ * can get more detailed information about the cause of
+ * the signal.
+ */
+ if (LEGACY_QUEUE(sig_queue, sig))
+ goto out;
+ /*
+ * In case of a POSIX timer generated signal you must check
+ * if a signal from this timer is already in the queue.
+ * If that is true, the overrun count will be increased in
+ * itimer.c:posix_timer_fn().
+ */
+
+ if (((unsigned long)info > 1) && (info->si_code == SI_TIMER)) {
+ struct sigqueue *q;
+ for (q = sig_queue->head; q; q = q->next) {
+ if ((q->info.si_code == SI_TIMER) &&
+ (q->info.si_tid == info->si_tid)) {
+ q->info.si_overrun += info->si_overrun + 1;
+ /*
+ * this special ret value (1) is recognized
+ * only by posix_timer_fn() in itimer.c
+ */
+ ret = 1;
+ goto out;
+ }
+ }
+ }
if (!shared) {
- /* Support queueing exactly one non-rt signal, so that we
- can get more detailed information about the cause of
- the signal. */
- if (LEGACY_QUEUE(&t->pending, sig))
- goto out;

ret = deliver_signal(sig, info, t);
} else {
- if (LEGACY_QUEUE(&t->sig->shared_pending, sig))
- goto out;
ret = send_signal(sig, info, &t->sig->shared_pending);
}
out:
@@ -1418,8 +1440,9 @@
err |= __put_user(from->si_uid, &to->si_uid);
break;
case __SI_TIMER:
- err |= __put_user(from->si_timer1, &to->si_timer1);
- err |= __put_user(from->si_timer2, &to->si_timer2);
+ err |= __put_user(from->si_tid, &to->si_tid);
+ err |= __put_user(from->si_overrun, &to->si_overrun);
+ err |= __put_user(from->si_ptr, &to->si_ptr);
break;
case __SI_POLL:
err |= __put_user(from->si_band, &to->si_band);
diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/timer.c linux.mytimers/kernel/timer.c
--- linux.orig/kernel/timer.c Wed Oct 23 00:54:21 2002
+++ linux.mytimers/kernel/timer.c Wed Oct 23 01:17:51 2002
@@ -47,12 +47,11 @@
struct list_head vec[TVR_SIZE];
} tvec_root_t;

-typedef struct timer_list timer_t;

struct tvec_t_base_s {
spinlock_t lock;
unsigned long timer_jiffies;
- timer_t *running_timer;
+ struct timer_list *running_timer;
tvec_root_t tv1;
tvec_t tv2;
tvec_t tv3;
@@ -67,7 +66,7 @@
/* Fake initialization needed to avoid compiler breakage */
static DEFINE_PER_CPU(struct tasklet_struct, timer_tasklet) = { NULL };

-static inline void internal_add_timer(tvec_base_t *base, timer_t *timer)
+static inline void internal_add_timer(tvec_base_t *base, struct timer_list *timer)
{
unsigned long expires = timer->expires;
unsigned long idx = expires - base->timer_jiffies;
@@ -119,7 +118,7 @@
* Timers with an ->expired field in the past will be executed in the next
* timer tick. It's illegal to add an already pending timer.
*/
-void add_timer(timer_t *timer)
+void add_timer(struct timer_list *timer)
{
int cpu = get_cpu();
tvec_base_t *base = tvec_bases + cpu;
@@ -153,7 +152,7 @@
* (ie. mod_timer() of an inactive timer returns 0, mod_timer() of an
* active timer returns 1.)
*/
-int mod_timer(timer_t *timer, unsigned long expires)
+int mod_timer(struct timer_list *timer, unsigned long expires)
{
tvec_base_t *old_base, *new_base;
unsigned long flags;
@@ -226,7 +225,7 @@
* (ie. del_timer() of an inactive timer returns 0, del_timer() of an
* active timer returns 1.)
*/
-int del_timer(timer_t *timer)
+int del_timer(struct timer_list *timer)
{
unsigned long flags;
tvec_base_t *base;
@@ -263,7 +262,7 @@
*
* The function returns whether it has deactivated a pending timer or not.
*/
-int del_timer_sync(timer_t *timer)
+int del_timer_sync(struct timer_list *timer)
{
tvec_base_t *base = tvec_bases;
int i, ret = 0;
@@ -302,9 +301,9 @@
* detach them individually, just clear the list afterwards.
*/
while (curr != head) {
- timer_t *tmp;
+ struct timer_list *tmp;

- tmp = list_entry(curr, timer_t, entry);
+ tmp = list_entry(curr, struct timer_list, entry);
if (tmp->base != base)
BUG();
next = curr->next;
@@ -343,9 +342,9 @@
if (curr != head) {
void (*fn)(unsigned long);
unsigned long data;
- timer_t *timer;
+ struct timer_list *timer;

- timer = list_entry(curr, timer_t, entry);
+ timer = list_entry(curr, struct timer_list, entry);
fn = timer->function;
data = timer->data;

@@ -448,6 +447,7 @@
if (xtime.tv_sec % 86400 == 0) {
xtime.tv_sec--;
time_state = TIME_OOP;
+ clock_was_set();
printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n");
}
break;
@@ -456,6 +456,7 @@
if ((xtime.tv_sec + 1) % 86400 == 0) {
xtime.tv_sec++;
time_state = TIME_WAIT;
+ clock_was_set();
printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n");
}
break;
@@ -912,7 +913,7 @@
*/
signed long schedule_timeout(signed long timeout)
{
- timer_t timer;
+ struct timer_list timer;
unsigned long expire;

switch (timeout)
@@ -968,10 +969,32 @@
return current->pid;
}

-asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp)
+#if 0
+// This #if 0 is to keep the pretty printer/ formatter happy so the indents will
+// correct below.
+// The NANOSLEEP_ENTRY macro is defined in asm/signal.h and
+// is structured to allow code as well as entry definitions, so that when
+// we get control back here the entry parameters will be available as expected.
+// Some systems may find these paramerts in other ways than as entry parms,
+// for example, struct pt_regs *regs is defined in i386 as the address of the
+// first parameter, where as other archs pass it as one of the paramerters.
+asmlinkage long sys_nanosleep(void)
{
- struct timespec t;
- unsigned long expire;
+#endif
+ NANOSLEEP_ENTRY( struct timespec t;
+ unsigned long expire;)
+
+#ifndef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
+ // The following code expects rqtp, rmtp to be available
+ // as a result of the above macro. Also any regs needed
+ // for the _do_signal() macro shoule be set up here.
+
+ //asmlinkage long sys_nanosleep(struct timespec *rqtp,
+ // struct timespec *rmtp)
+ // {
+ // struct timespec t;
+ // unsigned long expire;
+

if(copy_from_user(&t, rqtp, sizeof(struct timespec)))
return -EFAULT;
@@ -994,6 +1017,7 @@
}
return 0;
}
+#endif // ! FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP

/*
* sys_sysinfo - fill in sysinfo struct


2002-10-23 18:36:13

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] alternate Posix timer patch

Jim Houston wrote:
>
> Hi Everyone,
>
> This is the second version of my spin on the Posix timers. I started
> with George Anzinger's patch but I have made major changes.
>
> I have been using George's version of the patch and would be glad to
> see it included into the 2.5 tree. On the other hand since we don't
> know what might appeal to Linus it makes sense to give him a choice.
>
> I sent out the first version of this last friday and had useful
> coments from Andi Kleen. I have addressed some of these but mostly
> I have just been getting it to work. It now passes most of the
> tests that are included in George's timers support package.
>
> Of particular interest is a race (that Andi pointed out) between
> saving a task_struct pointer, using this pointer to send signals
> and the process exiting. George please look at my changes in
> sys_timer_create and exit_itimer.

Yes, I have looked and agree with your changes. They will
be in the next version, hopefully today.

I have also looked at the timer index stuff and made a few
changes. If it get it working today, I will include it
also. My changes mostly revolved around not caring about
reusing a timer id. Would you care to comment on why you
think reuse is bad?

With out this feature the code is much simpler and does not
keep around dead trees.

-g


>
> Here is a summary of my changes:
>
> - A new queue just for Posix timers and code to
> handle expiring timers. This supports high resolution
> without having to change the existing jiffie based timers.
>
> I implemented this priority queue as a sort list
> with a rbtree to index the list. It is deterministic
> and fast.
>
> - Change to use the slab allocator. This removes
> the CONFIG option for the maximum number of timers.
>
> - A new id allocator/lookup mechanism based on a
> radix tree. It includes a bitmap to summarize the portion
> of the tree which is in use. Currently the Posix
> timers patch reuses the id immediately.
>
> - I keep the timers in seconds and nano-seconds.
> I'm hoping that the system time keeping will sort
> itself out and the Posix timers can just be a consumer.
> Posix timers need two clocks - the time since boot and
> the wall clock time.
>
> I'm currently working on nanosleep. I'm trying to come up with an
> alternative for the call to do_signal. At the moment my patch may
> return from nanosleep early if it receives a debug signal.
>
> This patch should work with linux- 2.5.44.
>
> Jim Houston - Concurrent Computer Corp.
>
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/arch/i386/kernel/entry.S linux.mytimers/arch/i386/kernel/entry.S
> --- linux.orig/arch/i386/kernel/entry.S Wed Oct 23 00:54:19 2002
> +++ linux.mytimers/arch/i386/kernel/entry.S Wed Oct 23 01:17:51 2002
> @@ -737,6 +737,15 @@
> .long sys_free_hugepages
> .long sys_exit_group
> .long sys_lookup_dcookie
> + .long sys_timer_create
> + .long sys_timer_settime /* 255 */
> + .long sys_timer_gettime
> + .long sys_timer_getoverrun
> + .long sys_timer_delete
> + .long sys_clock_settime
> + .long sys_clock_gettime /* 260 */
> + .long sys_clock_getres
> + .long sys_clock_nanosleep
>
> .rept NR_syscalls-(.-sys_call_table)/4
> .long sys_ni_syscall
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/arch/i386/kernel/time.c linux.mytimers/arch/i386/kernel/time.c
> --- linux.orig/arch/i386/kernel/time.c Wed Oct 23 00:54:19 2002
> +++ linux.mytimers/arch/i386/kernel/time.c Wed Oct 23 01:17:51 2002
> @@ -131,6 +131,7 @@
> time_maxerror = NTP_PHASE_LIMIT;
> time_esterror = NTP_PHASE_LIMIT;
> write_unlock_irq(&xtime_lock);
> + clock_was_set();
> }
>
> /*
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/fs/exec.c linux.mytimers/fs/exec.c
> --- linux.orig/fs/exec.c Wed Oct 23 00:54:21 2002
> +++ linux.mytimers/fs/exec.c Wed Oct 23 01:37:27 2002
> @@ -756,6 +756,7 @@
>
> flush_signal_handlers(current);
> flush_old_files(current->files);
> + exit_itimers(current, 0);
>
> return 0;
>
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-generic/siginfo.h linux.mytimers/include/asm-generic/siginfo.h
> --- linux.orig/include/asm-generic/siginfo.h Wed Oct 23 00:54:24 2002
> +++ linux.mytimers/include/asm-generic/siginfo.h Wed Oct 23 01:17:51 2002
> @@ -43,8 +43,9 @@
>
> /* POSIX.1b timers */
> struct {
> - unsigned int _timer1;
> - unsigned int _timer2;
> + timer_t _tid; /* timer id */
> + int _overrun; /* overrun count */
> + sigval_t _sigval; /* same as below */
> } _timer;
>
> /* POSIX.1b signals */
> @@ -86,8 +87,8 @@
> */
> #define si_pid _sifields._kill._pid
> #define si_uid _sifields._kill._uid
> -#define si_timer1 _sifields._timer._timer1
> -#define si_timer2 _sifields._timer._timer2
> +#define si_tid _sifields._timer._tid
> +#define si_overrun _sifields._timer._overrun
> #define si_status _sifields._sigchld._status
> #define si_utime _sifields._sigchld._utime
> #define si_stime _sifields._sigchld._stime
> @@ -221,6 +222,7 @@
> #define SIGEV_SIGNAL 0 /* notify via signal */
> #define SIGEV_NONE 1 /* other notification: meaningless */
> #define SIGEV_THREAD 2 /* deliver via thread creation */
> +#define SIGEV_THREAD_ID 4 /* deliver to thread */
>
> #define SIGEV_MAX_SIZE 64
> #ifndef SIGEV_PAD_SIZE
> @@ -235,6 +237,7 @@
> int sigev_notify;
> union {
> int _pad[SIGEV_PAD_SIZE];
> + int _tid;
>
> struct {
> void (*_function)(sigval_t);
> @@ -247,6 +250,7 @@
>
> #define sigev_notify_function _sigev_un._sigev_thread._function
> #define sigev_notify_attributes _sigev_un._sigev_thread._attribute
> +#define sigev_notify_thread_id _sigev_un._tid
>
> #ifdef __KERNEL__
>
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-i386/posix_types.h linux.mytimers/include/asm-i386/posix_types.h
> --- linux.orig/include/asm-i386/posix_types.h Tue Jan 18 01:22:52 2000
> +++ linux.mytimers/include/asm-i386/posix_types.h Wed Oct 23 01:17:51 2002
> @@ -22,6 +22,8 @@
> typedef long __kernel_time_t;
> typedef long __kernel_suseconds_t;
> typedef long __kernel_clock_t;
> +typedef int __kernel_timer_t;
> +typedef int __kernel_clockid_t;
> typedef int __kernel_daddr_t;
> typedef char * __kernel_caddr_t;
> typedef unsigned short __kernel_uid16_t;
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-i386/signal.h linux.mytimers/include/asm-i386/signal.h
> --- linux.orig/include/asm-i386/signal.h Wed Oct 23 00:50:41 2002
> +++ linux.mytimers/include/asm-i386/signal.h Wed Oct 23 01:17:51 2002
> @@ -219,6 +219,73 @@
>
> struct pt_regs;
> extern int FASTCALL(do_signal(struct pt_regs *regs, sigset_t *oldset));
> +/*
> + * These macros are used by nanosleep() and clock_nanosleep().
> + * The issue is that these functions need the *regs pointer which is
> + * passed in different ways by the differing archs.
> +
> + * Below we do things in two differing ways. In the long run we would
> + * like to see nano_sleep() go away (glibc should call clock_nanosleep
> + * much as we do). When that happens and the nano_sleep() system
> + * call entry is retired, there will no longer be any real need for
> + * sys_nanosleep() so the FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP macro
> + * could be undefined, resulting in not needing to stack all the
> + * parms over again, i.e. better (faster AND smaller) code.
> +
> + * And while were at it, there needs to be a way to set the return code
> + * on the way to do_signal(). It (i.e. do_signal()) saves the regs on
> + * the callers stack to call the user handler and then the return is
> + * done using those registers. This means that the error code MUST be
> + * set in the register PRIOR to calling do_signal(). See our answer
> + * below...thanks to Jim Houston <[email protected]>
> + */
> +#define FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
> +
> +
> +#ifdef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
> +extern long do_clock_nanosleep(struct pt_regs *regs,
> + clockid_t which_clock,
> + int flags,
> + const struct timespec *rqtp,
> + struct timespec *rmtp);
> +
> +#define NANOSLEEP_ENTRY(a) \
> + asmlinkage long sys_nanosleep( struct timespec* rqtp, \
> + struct timespec * rmtp) \
> +{ struct pt_regs *regs = (struct pt_regs *)&rqtp; \
> + return do_clock_nanosleep(regs, CLOCK_REALTIME, 0, rqtp, rmtp); \
> +}
> +
> +#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
> + clockid_t which_clock, \
> + int flags, \
> + const struct timespec *rqtp, \
> + struct timespec *rmtp) \
> +{ struct pt_regs *regs = (struct pt_regs *)&which_clock; \
> + return do_clock_nanosleep(regs, which_clock, flags, rqtp, rmtp); \
> +} \
> +long do_clock_nanosleep(struct pt_regs *regs, \
> + clockid_t which_clock, \
> + int flags, \
> + const struct timespec *rqtp, \
> + struct timespec *rmtp) \
> +{ a
> +
> +#else
> +#define NANOSLEEP_ENTRY(a) \
> + asmlinkage long sys_nanosleep( struct timespec* rqtp, \
> + struct timespec * rmtp) \
> +{ struct pt_regs *regs = (struct pt_regs *)&rqtp; \
> + a
> +#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
> + clockid_t which_clock, \
> + int flags, \
> + const struct timespec *rqtp, \
> + struct timespec *rmtp) \
> +{ struct pt_regs *regs = (struct pt_regs *)&which_clock; \
> + a
> +#endif
> +#define _do_signal() (regs->eax = -EINTR, do_signal(regs, NULL))
>
> #endif /* __KERNEL__ */
>
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-i386/unistd.h linux.mytimers/include/asm-i386/unistd.h
> --- linux.orig/include/asm-i386/unistd.h Wed Oct 23 00:54:21 2002
> +++ linux.mytimers/include/asm-i386/unistd.h Wed Oct 23 01:17:51 2002
> @@ -258,6 +258,15 @@
> #define __NR_free_hugepages 251
> #define __NR_exit_group 252
> #define __NR_lookup_dcookie 253
> +#define __NR_timer_create 254
> +#define __NR_timer_settime (__NR_timer_create+1)
> +#define __NR_timer_gettime (__NR_timer_create+2)
> +#define __NR_timer_getoverrun (__NR_timer_create+3)
> +#define __NR_timer_delete (__NR_timer_create+4)
> +#define __NR_clock_settime (__NR_timer_create+5)
> +#define __NR_clock_gettime (__NR_timer_create+6)
> +#define __NR_clock_getres (__NR_timer_create+7)
> +#define __NR_clock_nanosleep (__NR_timer_create+8)
>
>
> /* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/id2ptr.h linux.mytimers/include/linux/id2ptr.h
> --- linux.orig/include/linux/id2ptr.h Wed Dec 31 19:00:00 1969
> +++ linux.mytimers/include/linux/id2ptr.h Wed Oct 23 01:25:23 2002
> @@ -0,0 +1,47 @@
> +/*
> + * include/linux/id2ptr.h
> + *
> + * 2002-10-18 written by Jim Houston [email protected]
> + * Copyright (C) 2002 by Concurrent Computer Corporation
> + * Distributed under the GNU GPL license version 2.
> + *
> + * Small id to pointer translation service avoiding fixed sized
> + * tables.
> + */
> +
> +#define ID_BITS 5
> +#define ID_MASK ((1 << ID_BITS)-1)
> +#define ID_FULL ((1 << (1 << ID_BITS))-1)
> +
> +/* Number of id_layer structs to leave in free list */
> +#define ID_FREE_MAX 6
> +
> +struct id_layer {
> + unsigned int bitmap;
> + struct id_layer *ary[1<<ID_BITS];
> +};
> +
> +struct id {
> + int layers;
> + int last;
> + int count;
> + int min_wrap;
> + struct id_layer *top;
> +};
> +
> +void *id2ptr_lookup(struct id *idp, int id);
> +int id2ptr_new(struct id *idp, void *ptr);
> +void id2ptr_remove(struct id *idp, int id);
> +void id2ptr_init(struct id *idp, int min_wrap);
> +
> +
> +static inline void update_bitmap(struct id_layer *p, int bit)
> +{
> + if (p->ary[bit] && p->ary[bit]->bitmap == 0xffffffff)
> + p->bitmap |= 1<<bit;
> + else
> + p->bitmap &= ~(1<<bit);
> +}
> +
> +extern kmem_cache_t *id_layer_cache;
> +
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/init_task.h linux.mytimers/include/linux/init_task.h
> --- linux.orig/include/linux/init_task.h Wed Oct 23 00:54:03 2002
> +++ linux.mytimers/include/linux/init_task.h Wed Oct 23 01:17:51 2002
> @@ -93,6 +93,7 @@
> .sig = &init_signals, \
> .pending = { NULL, &tsk.pending.head, {{0}}}, \
> .blocked = {{0}}, \
> + .posix_timers = LIST_HEAD_INIT(tsk.posix_timers), \
> .alloc_lock = SPIN_LOCK_UNLOCKED, \
> .switch_lock = SPIN_LOCK_UNLOCKED, \
> .journal_info = NULL, \
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/posix-timers.h linux.mytimers/include/linux/posix-timers.h
> --- linux.orig/include/linux/posix-timers.h Wed Dec 31 19:00:00 1969
> +++ linux.mytimers/include/linux/posix-timers.h Wed Oct 23 01:25:02 2002
> @@ -0,0 +1,81 @@
> +/*
> + * include/linux/posix-timers.h
> + *
> + * 2002-10-22 written by Jim Houston [email protected]
> + * Copyright (C) 2002 by Concurrent Computer Corporation
> + * Distributed under the GNU GPL license version 2.
> + *
> + */
> +
> +#ifndef _linux_POSIX_TIMERS_H
> +#define _linux_POSIX_TIMERS_H
> +
> +/* This should be in posix-timers.h - but this is easier now. */
> +
> +enum timer_type {
> + TIMER,
> + NANOSLEEP
> +};
> +
> +struct k_itimer {
> + struct list_head it_pq_list; /* fields for timer priority queue. */
> + struct rb_node it_pq_node;
> + struct timer_pq *it_pq; /* pointer to the queue. */
> +
> + struct list_head it_task_list; /* list for exit_itimers */
> + spinlock_t it_lock;
> + clockid_t it_clock; /* which timer type */
> + timer_t it_id; /* timer id */
> + int it_overrun; /* overrun on pending signal */
> + int it_overrun_last; /* overrun on last delivered signal */
> + int it_overrun_deferred; /* overrun on pending timer interrupt */
> + int it_sigev_notify; /* notify word of sigevent struct */
> + int it_sigev_signo; /* signo word of sigevent struct */
> + sigval_t it_sigev_value; /* value word of sigevent struct */
> + struct task_struct *it_process; /* process to send signal to */
> + struct itimerspec it_v; /* expiry time & interval */
> + enum timer_type it_type;
> +};
> +
> +/*
> + * The priority queue is a sorted doubly linked list ordered by
> + * expiry time. A rbtree is used as an index in to this list
> + * so that inserts are O(log2(n)).
> + */
> +
> +struct timer_pq {
> + struct list_head head;
> + struct rb_root rb_root;
> +};
> +
> +#define TIMER_PQ_INIT(name) { \
> + .rb_root = RB_ROOT, \
> + .head = LIST_HEAD_INIT(name.head), \
> +}
> +
> +
> +#if 0
> +#include <linux/posix-timers.h>
> +#endif
> +
> +struct k_clock {
> + struct timer_pq pq;
> + int res; /* in nano seconds */
> + int ( *clock_set)(struct timespec *tp);
> + int ( *clock_get)(struct timespec *tp);
> + int ( *nsleep)( int flags,
> + struct timespec*new_setting,
> + struct itimerspec *old_setting);
> + int ( *timer_set)(struct k_itimer *timr, int flags,
> + struct itimerspec *new_setting,
> + struct itimerspec *old_setting);
> + int ( *timer_del)(struct k_itimer *timr);
> + void ( *timer_get)(struct k_itimer *timr,
> + struct itimerspec *cur_setting);
> +};
> +
> +int do_posix_clock_monotonic_gettime(struct timespec *tp);
> +int do_posix_clock_monotonic_settime(struct timespec *tp);
> +asmlinkage int sys_timer_delete(timer_t timer_id);
> +
> +#endif
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/sched.h linux.mytimers/include/linux/sched.h
> --- linux.orig/include/linux/sched.h Wed Oct 23 00:54:28 2002
> +++ linux.mytimers/include/linux/sched.h Wed Oct 23 01:31:41 2002
> @@ -29,6 +29,7 @@
> #include <linux/compiler.h>
> #include <linux/completion.h>
> #include <linux/pid.h>
> +#include <linux/posix-timers.h>
>
> struct exec_domain;
>
> @@ -333,6 +334,8 @@
> unsigned long it_real_value, it_prof_value, it_virt_value;
> unsigned long it_real_incr, it_prof_incr, it_virt_incr;
> struct timer_list real_timer;
> + struct list_head posix_timers; /* POSIX.1b Interval Timers */
> + struct k_itimer nanosleep_tmr;
> unsigned long utime, stime, cutime, cstime;
> unsigned long start_time;
> long per_cpu_utime[NR_CPUS], per_cpu_stime[NR_CPUS];
> @@ -637,6 +640,7 @@
>
> extern void exit_mm(struct task_struct *);
> extern void exit_files(struct task_struct *);
> +extern void exit_itimers(struct task_struct *, int);
> extern void exit_sighand(struct task_struct *);
> extern void __exit_sighand(struct task_struct *);
>
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/signal.h linux.mytimers/include/linux/signal.h
> --- linux.orig/include/linux/signal.h Wed Oct 23 00:53:01 2002
> +++ linux.mytimers/include/linux/signal.h Wed Oct 23 01:17:51 2002
> @@ -224,6 +224,36 @@
> struct pt_regs;
> extern int get_signal_to_deliver(siginfo_t *info, struct pt_regs *regs);
> #endif
> +/*
> + * We would like the asm/signal.h code to define these so that the using
> + * function can call do_signal(). In loo of that, we define a genaric
> + * version that pretends that do_signal() was called and delivered a signal.
> + * To see how this is used, see nano_sleep() in timer.c and the i386 version
> + * in asm_i386/signal.h.
> + */
> +#ifndef PT_REGS_ENTRY
> +#define PT_REGS_ENTRY(type,name,p1_type,p1, p2_type,p2) \
> +type name(p1_type p1,p2_type p2)\
> +{
> +#endif
> +#ifndef _do_signal
> +#define _do_signal() 1
> +#endif
> +#ifndef NANOSLEEP_ENTRY
> +#define NANOSLEEP_ENTRY(a) asmlinkage long sys_nanosleep( struct timespec* rqtp, \
> + struct timespec * rmtp) \
> +{ a
> +#endif
> +#ifndef CLOCK_NANOSLEEP_ENTRY
> +#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
> + clockid_t which_clock, \
> + int flags, \
> + const struct timespec *rqtp, \
> + struct timespec *rmtp) \
> +{ a
> +
> +#endif
> +
>
> #endif /* __KERNEL__ */
>
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/sys.h linux.mytimers/include/linux/sys.h
> --- linux.orig/include/linux/sys.h Sun Dec 10 23:56:37 1995
> +++ linux.mytimers/include/linux/sys.h Wed Oct 23 01:17:51 2002
> @@ -4,7 +4,7 @@
> /*
> * system call entry points ... but not all are defined
> */
> -#define NR_syscalls 256
> +#define NR_syscalls 275
>
> /*
> * These are system calls that will be removed at some time
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/time.h linux.mytimers/include/linux/time.h
> --- linux.orig/include/linux/time.h Wed Oct 23 00:53:34 2002
> +++ linux.mytimers/include/linux/time.h Wed Oct 23 01:17:51 2002
> @@ -38,6 +38,19 @@
> */
> #define MAX_JIFFY_OFFSET ((~0UL >> 1)-1)
>
> +/* Parameters used to convert the timespec values */
> +#ifndef USEC_PER_SEC
> +#define USEC_PER_SEC (1000000L)
> +#endif
> +
> +#ifndef NSEC_PER_SEC
> +#define NSEC_PER_SEC (1000000000L)
> +#endif
> +
> +#ifndef NSEC_PER_USEC
> +#define NSEC_PER_USEC (1000L)
> +#endif
> +
> static __inline__ unsigned long
> timespec_to_jiffies(struct timespec *value)
> {
> @@ -124,6 +137,8 @@
> #ifdef __KERNEL__
> extern void do_gettimeofday(struct timeval *tv);
> extern void do_settimeofday(struct timeval *tv);
> +extern int do_sys_settimeofday(struct timeval *tv, struct timezone *tz);
> +extern void clock_was_set(void); // call when ever the clock is set
> #endif
>
> #define FD_SETSIZE __FD_SETSIZE
> @@ -149,5 +164,25 @@
> struct timeval it_interval; /* timer interval */
> struct timeval it_value; /* current value */
> };
> +
> +
> +/*
> + * The IDs of the various system clocks (for POSIX.1b interval timers).
> + */
> +#define CLOCK_REALTIME 0
> +#define CLOCK_MONOTONIC 1
> +#define CLOCK_PROCESS_CPUTIME_ID 2
> +#define CLOCK_THREAD_CPUTIME_ID 3
> +#define CLOCK_REALTIME_HR 4
> +#define CLOCK_MONOTONIC_HR 5
> +
> +#define MAX_CLOCKS 6
> +
> +/*
> + * The various flags for setting POSIX.1b interval timers.
> + */
> +
> +#define TIMER_ABSTIME 0x01
> +
>
> #endif
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/types.h linux.mytimers/include/linux/types.h
> --- linux.orig/include/linux/types.h Wed Oct 23 00:54:17 2002
> +++ linux.mytimers/include/linux/types.h Wed Oct 23 01:17:51 2002
> @@ -23,6 +23,8 @@
> typedef __kernel_daddr_t daddr_t;
> typedef __kernel_key_t key_t;
> typedef __kernel_suseconds_t suseconds_t;
> +typedef __kernel_timer_t timer_t;
> +typedef __kernel_clockid_t clockid_t;
>
> #ifdef __KERNEL__
> typedef __kernel_uid32_t uid_t;
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/init/Config.help linux.mytimers/init/Config.help
> --- linux.orig/init/Config.help Wed Oct 23 00:50:42 2002
> +++ linux.mytimers/init/Config.help Wed Oct 23 01:17:51 2002
> @@ -115,3 +115,11 @@
> replacement for kerneld.) Say Y here and read about configuring it
> in <file:Documentation/kmod.txt>.
>
> +Maximum number of POSIX timers
> +CONFIG_MAX_POSIX_TIMERS
> + This option allows you to configure the system wide maximum number of
> + POSIX timers. Timers are allocated as needed so the only memory
> + overhead this adds is about 4 bytes for every 50 or so timers to keep
> + track of each block of timers. The system quietly rounds this number
> + up to fill out a timer allocation block. It is ok to have several
> + thousand timers as needed by your applications.
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/init/Config.in linux.mytimers/init/Config.in
> --- linux.orig/init/Config.in Wed Oct 23 00:50:45 2002
> +++ linux.mytimers/init/Config.in Wed Oct 23 01:17:51 2002
> @@ -9,6 +9,7 @@
> bool 'System V IPC' CONFIG_SYSVIPC
> bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
> bool 'Sysctl support' CONFIG_SYSCTL
> +int 'System wide maximum number of POSIX timers' CONFIG_MAX_POSIX_TIMERS 3000
> endmenu
>
> mainmenu_option next_comment
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/Makefile linux.mytimers/kernel/Makefile
> --- linux.orig/kernel/Makefile Wed Oct 23 00:54:21 2002
> +++ linux.mytimers/kernel/Makefile Wed Oct 23 01:24:01 2002
> @@ -10,7 +10,7 @@
> module.o exit.o itimer.o time.o softirq.o resource.o \
> sysctl.o capability.o ptrace.o timer.o user.o \
> signal.o sys.o kmod.o workqueue.o futex.o platform.o pid.o \
> - rcupdate.o
> + rcupdate.o posix-timers.o id2ptr.o
>
> obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
> obj-$(CONFIG_SMP) += cpu.o
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/exit.c linux.mytimers/kernel/exit.c
> --- linux.orig/kernel/exit.c Wed Oct 23 00:54:21 2002
> +++ linux.mytimers/kernel/exit.c Wed Oct 23 01:22:00 2002
> @@ -647,6 +647,7 @@
> __exit_files(tsk);
> __exit_fs(tsk);
> exit_namespace(tsk);
> + exit_itimers(tsk, 1);
> exit_thread();
>
> if (current->leader)
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/fork.c linux.mytimers/kernel/fork.c
> --- linux.orig/kernel/fork.c Wed Oct 23 00:54:17 2002
> +++ linux.mytimers/kernel/fork.c Wed Oct 23 01:17:51 2002
> @@ -783,6 +783,7 @@
> goto bad_fork_cleanup_files;
> if (copy_sighand(clone_flags, p))
> goto bad_fork_cleanup_fs;
> + INIT_LIST_HEAD(&p->posix_timers);
> if (copy_mm(clone_flags, p))
> goto bad_fork_cleanup_sighand;
> if (copy_namespace(clone_flags, p))
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/id2ptr.c linux.mytimers/kernel/id2ptr.c
> --- linux.orig/kernel/id2ptr.c Wed Dec 31 19:00:00 1969
> +++ linux.mytimers/kernel/id2ptr.c Wed Oct 23 01:23:24 2002
> @@ -0,0 +1,223 @@
> +/*
> + * linux/kernel/id2ptr.c
> + *
> + * 2002-10-18 written by Jim Houston [email protected]
> + * Copyright (C) 2002 by Concurrent Computer Corporation
> + * Distributed under the GNU GPL license version 2.
> + *
> + * Small id to pointer translation service.
> + *
> + * It uses a radix tree like structure as a sparse array indexed
> + * by the id to obtain the pointer. A bit map is included in each
> + * level of the tree which identifies portions of the tree which
> + * are completely full. This makes the process of allocating a
> + * new id quick.
> + */
> +
> +
> +#include <linux/slab.h>
> +#include <linux/id2ptr.h>
> +#include <linux/init.h>
> +#include <linux/string.h>
> +
> +static kmem_cache_t *id_layer_cache;
> +spinlock_t id_lock = SPIN_LOCK_UNLOCKED;
> +
> +/*
> + * Since we can't allocate memory with spinlock held and dropping the
> + * lock to allocate gets ugly keep a free list which will satisfy the
> + * worst case allocation.
> + */
> +
> +struct id_layer *id_free;
> +int id_free_cnt;
> +
> +static inline struct id_layer *alloc_layer(void)
> +{
> + struct id_layer *p;
> +
> + if (!(p = id_free))
> + BUG();
> + id_free = p->ary[0];
> + id_free_cnt--;
> + p->ary[0] = 0;
> + return(p);
> +}
> +
> +static inline void free_layer(struct id_layer *p)
> +{
> + p->ary[0] = id_free;
> + id_free = p;
> + id_free_cnt++;
> +}
> +
> +/*
> + * Lookup the kernel pointer associated with a user supplied
> + * id value.
> + */
> +void *id2ptr_lookup(struct id *idp, int id)
> +{
> + int n;
> + struct id_layer *p;
> +
> + if (id <= 0)
> + return(NULL);
> + id--;
> + spin_lock_irq(&id_lock);
> + n = idp->layers * ID_BITS;
> + p = idp->top;
> + if (id >= (1 << n)) {
> + spin_unlock_irq(&id_lock);
> + return(NULL);
> + }
> +
> + while (n > 0 && p) {
> + n -= ID_BITS;
> + p = p->ary[(id >> n) & ID_MASK];
> + }
> + spin_unlock_irq(&id_lock);
> + return((void *)p);
> +}
> +
> +static int sub_alloc(struct id_layer *p, int shift, int id, void *ptr)
> +{
> + int n = (id >> shift) & ID_MASK;
> + int bitmap = p->bitmap;
> + int id_base = id & ~((1 << (shift+ID_BITS))-1);
> + int v;
> +
> + for ( ; n <= ID_MASK; n++, id = id_base + (n << shift)) {
> + if (bitmap & (1 << n))
> + continue;
> + if (shift == 0) {
> + p->ary[n] = (struct id_layer *)ptr;
> + p->bitmap |= 1<<n;
> + return(id);
> + }
> + if (!p->ary[n])
> + p->ary[n] = alloc_layer();
> + if ((v = sub_alloc(p->ary[n], shift-ID_BITS, id, ptr))) {
> + update_bitmap(p, n);
> + return(v);
> + }
> + }
> + return(0);
> +}
> +
> +/*
> + * Allocate a new id associate the value ptr with this new id.
> + */
> +int id2ptr_new(struct id *idp, void *ptr)
> +{
> + int n, last, id, v;
> + struct id_layer *new;
> +
> + spin_lock_irq(&id_lock);
> + n = idp->layers * ID_BITS;
> + last = idp->last;
> + while (id_free_cnt < n+1) {
> + spin_unlock_irq(&id_lock);
> + new = kmem_cache_alloc(id_layer_cache, GFP_KERNEL);
> + memset(new, 0, sizeof(struct id_layer));
> + spin_lock_irq(&id_lock);
> + free_layer(new);
> + }
> + /*
> + * Add a new layer if the array is full or the last id
> + * was at the limit and we don't want to wrap.
> + */
> + if ((last == ((1 << n)-1) && last < idp->min_wrap) ||
> + idp->count == (1 << n)) {
> + ++idp->layers;
> + n += ID_BITS;
> + new = alloc_layer();
> + new->ary[0] = idp->top;
> + idp->top = new;
> + update_bitmap(new, 0);
> + }
> + if (last >= ((1 << n)-1))
> + last = 0;
> +
> + /*
> + * Search for a free id starting after last id allocated.
> + * If that fails wrap back to start.
> + */
> + id = last+1;
> + if (!(v = sub_alloc(idp->top, n-ID_BITS, id, ptr)))
> + v = sub_alloc(idp->top, n-ID_BITS, 1, ptr);
> + idp->last = v;
> + idp->count++;
> + spin_unlock_irq(&id_lock);
> + return(v+1);
> +}
> +
> +
> +static int sub_remove(struct id_layer *p, int shift, int id)
> +{
> + int n = (id >> shift) & ID_MASK;
> + int i, bitmap, rv;
> +
> + rv = 0;
> + bitmap = p->bitmap & ~(1<<n);
> + p->bitmap = bitmap;
> + if (shift == 0) {
> + p->ary[n] = NULL;
> + rv = !bitmap;
> + } else {
> + if (sub_remove(p->ary[n], shift-ID_BITS, id)) {
> + free_layer(p->ary[n]);
> + p->ary[n] = 0;
> + for (i = 0; i < (1 << ID_BITS); i++)
> + if (p->ary[i])
> + break;
> + if (i == (1 << ID_BITS))
> + rv = 1;
> + }
> + }
> + return(rv);
> +}
> +
> +/*
> + * Remove (free) an id value and break the association with
> + * the kernel pointer.
> + */
> +void id2ptr_remove(struct id *idp, int id)
> +{
> + struct id_layer *p;
> +
> + if (id <= 0)
> + return;
> + id--;
> + spin_lock_irq(&id_lock);
> + sub_remove(idp->top, (idp->layers-1)*ID_BITS, id);
> + idp->count--;
> + if (id_free_cnt >= ID_FREE_MAX) {
> +
> + p = alloc_layer();
> + spin_unlock_irq(&id_lock);
> + kmem_cache_free(id_layer_cache, p);
> + return;
> + }
> + spin_unlock_irq(&id_lock);
> +}
> +
> +void init_id_cache(void)
> +{
> + if (!id_layer_cache)
> + id_layer_cache = kmem_cache_create("id_layer_cache",
> + sizeof(struct id_layer), 0, 0, 0, 0);
> +}
> +
> +void id2ptr_init(struct id *idp, int min_wrap)
> +{
> + init_id_cache();
> + idp->count = 1;
> + idp->last = 0;
> + idp->layers = 1;
> + idp->top = kmem_cache_alloc(id_layer_cache, GFP_KERNEL);
> + memset(idp->top, 0, sizeof(struct id_layer));
> + idp->top->bitmap = 0;
> + idp->min_wrap = min_wrap;
> +}
> +
> +__initcall(init_id_cache);
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/posix-timers.c linux.mytimers/kernel/posix-timers.c
> --- linux.orig/kernel/posix-timers.c Wed Dec 31 19:00:00 1969
> +++ linux.mytimers/kernel/posix-timers.c Wed Oct 23 01:56:45 2002
> @@ -0,0 +1,1109 @@
> +/*
> + * linux/kernel/posix_timers.c
> + *
> + *
> + * 2002-10-15 Posix Clocks & timers by George Anzinger
> + * Copyright (C) 2002 by MontaVista Software.
> + *
> + * 2002-10-18 changes by Jim Houston [email protected]
> + * Copyright (C) 2002 by Concurrent Computer Corp.
> + *
> + * - Add a separate queue for posix timers. Its a
> + * priority queue implemented as a sorted doubly
> + * linked list & a rbtree as an index into the list.
> + * - Use a slab cache to allocate the timer structures.
> + * - Allocate timer ids using my new id allocator.
> + * This avoids the immediate reuse of timer ids.
> + * - Uses seconds and nano-seconds rather than
> + * jiffies and sub_jiffies.
> + *
> + * This is an experimental change. I'm sending it out to
> + * the mailing list in the hope that it will stimulate
> + * discussion.
> + */
> +
> +/* These are all the functions necessary to implement
> + * POSIX clocks & timers
> + */
> +
> +#include <linux/mm.h>
> +#include <linux/smp_lock.h>
> +#include <linux/interrupt.h>
> +#include <linux/slab.h>
> +#include <linux/time.h>
> +
> +#include <asm/uaccess.h>
> +#include <asm/semaphore.h>
> +#include <linux/list.h>
> +#include <linux/init.h>
> +#include <linux/nmi.h>
> +#include <linux/compiler.h>
> +#include <linux/id2ptr.h>
> +#include <linux/rbtree.h>
> +#include <linux/posix-timers.h>
> +
> +
> +#ifndef div_long_long_rem
> +#include <asm/div64.h>
> +
> +#define div_long_long_rem(dividend,divisor,remainder) ({ \
> + u64 result = dividend; \
> + *remainder = do_div(result,divisor); \
> + result; })
> +
> +#endif /* ifndef div_long_long_rem */
> +
> +
> +/*
> + * Lets keep our timers in a slab cache :-)
> + */
> +static kmem_cache_t *posix_timers_cache;
> +struct id posix_timers_id;
> +
> +/*
> + * This lock portects the timer queues it is held for the
> + * duration of the timer expiry process.
> + */
> +spinlock_t posix_timers_lock = SPIN_LOCK_UNLOCKED;
> +
> +/*
> + * Kluge until I can wire into the timer interrupt.
> + */
> +int poll_timer_running;
> +void run_posix_timers(unsigned long dummy);
> +static struct timer_list poll_posix_timers = {
> + .function = &run_posix_timers,
> +};
> +
> +struct k_clock clock_realtime = {
> + .pq = TIMER_PQ_INIT(clock_realtime.pq),
> + .res = NSEC_PER_SEC/HZ,
> +};
> +
> +struct k_clock clock_monotonic = {
> + .pq = TIMER_PQ_INIT(clock_monotonic.pq),
> + .res= NSEC_PER_SEC/HZ,
> + .clock_get = do_posix_clock_monotonic_gettime,
> + .clock_set = do_posix_clock_monotonic_settime
> +};
> +
> +/*
> + * Insert a timer into a priority queue. This is a sorted
> + * list of timers. A rbtree is used to index the list.
> + */
> +
> +static int timer_insert_nolock(struct timer_pq *pq, struct k_itimer *t)
> +{
> + struct rb_node ** p = &pq->rb_root.rb_node;
> + struct rb_node * parent = NULL;
> + struct k_itimer *cur;
> + struct list_head *prev;
> + prev = &pq->head;
> +
> + if (t->it_pq)
> + BUG();
> + t->it_pq = pq;
> + while (*p) {
> + parent = *p;
> + cur = rb_entry(parent, struct k_itimer , it_pq_node);
> +
> + /*
> + * We allow non unique entries. This works
> + * but there might be opportunity to do something
> + * clever.
> + */
> + if (t->it_v.it_value.tv_sec < cur->it_v.it_value.tv_sec ||
> + (t->it_v.it_value.tv_sec == cur->it_v.it_value.tv_sec &&
> + t->it_v.it_value.tv_nsec < cur->it_v.it_value.tv_nsec))
> + p = &(*p)->rb_left;
> + else {
> + prev = &cur->it_pq_list;
> + p = &(*p)->rb_right;
> + }
> + }
> + /* link into rbtree. */
> + rb_link_node(&t->it_pq_node, parent, p);
> + rb_insert_color(&t->it_pq_node, &pq->rb_root);
> + /* link it into the list */
> + list_add(&t->it_pq_list, prev);
> + /*
> + * We need to setup a timer interrupt if the new timer is
> + * at the head of the queue.
> + */
> + return(pq->head.next == &t->it_pq_list);
> +}
> +
> +static inline void timer_remove_nolock(struct k_itimer *t)
> +{
> + struct timer_pq *pq;
> +
> + if (!(pq = t->it_pq))
> + return;
> + rb_erase(&t->it_pq_node, &pq->rb_root);
> + list_del(&t->it_pq_list);
> + t->it_pq = 0;
> +}
> +
> +static void timer_remove(struct k_itimer *t)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&posix_timers_lock, flags);
> + timer_remove_nolock(t);
> + spin_unlock_irqrestore(&posix_timers_lock, flags);
> +}
> +
> +
> +static int timer_insert(struct timer_pq *pq, struct k_itimer *t)
> +{
> + unsigned long flags;
> + int rv;
> +
> + spin_lock_irqsave(&posix_timers_lock, flags);
> + rv = timer_insert_nolock(pq, t);
> + spin_unlock_irqrestore(&posix_timers_lock, flags);
> + if (!poll_timer_running) {
> + poll_timer_running = 1;
> + poll_posix_timers.expires = jiffies + 1;
> + add_timer(&poll_posix_timers);
> + }
> + return(rv);
> +}
> +
> +/*
> + * If we are late delivering a periodic timer we may
> + * have missed several expiries. We want to calculate the
> + * number we have missed both as the overrun count but also
> + * so that we can pick next expiry.
> + *
> + * You really need this if you schedule a high frequency timer
> + * and then make a big change to the current time.
> + */
> +
> +int handle_overrun(struct k_itimer *t, struct timespec dt)
> +{
> + int ovr;
> +#if 0
> + long long ldt, in;
> + long sec, nsec;
> +
> + in = (long long)t->it_v.it_interval.tv_sec*1000000000 +
> + t->it_v.it_interval.tv_nsec;
> + ldt = (long long)dt.tv_sec * 1000000000 + dt.tv_nsec;
> + ovr = ldt/in + 1;
> + ldt = (long long)t->it_v.it_interval.tv_nsec * ovr;
> + nsec = ldt % 1000000000;
> + sec = ldt / 1000000000;
> + sec += ovr * t->it_v.it_interval.tv_sec;
> + nsec += t->it_v.it_value.tv_nsec;
> + sec += t->it_v.it_value.tv_sec;
> + if (nsec > 1000000000) {
> + sec++;
> + nsec -= 1000000000;
> + }
> + t->it_v.it_value.tv_sec = sec;
> + t->it_v.it_value.tv_nsec = nsec;
> +#else
> + /* Temporary hack */
> + ovr = 0;
> + while (dt.tv_sec > t->it_v.it_interval.tv_sec ||
> + (dt.tv_sec == t->it_v.it_interval.tv_sec &&
> + dt.tv_nsec > t->it_v.it_interval.tv_nsec)) {
> + dt.tv_sec -= t->it_v.it_interval.tv_sec;
> + dt.tv_nsec -= t->it_v.it_interval.tv_nsec;
> + if (dt.tv_nsec < 0) {
> + dt.tv_sec--;
> + dt.tv_nsec += 1000000000;
> + }
> + t->it_v.it_value.tv_sec += t->it_v.it_interval.tv_sec;
> + t->it_v.it_value.tv_nsec += t->it_v.it_interval.tv_nsec;
> + if (t->it_v.it_value.tv_nsec >= 1000000000) {
> + t->it_v.it_value.tv_sec++;
> + t->it_v.it_value.tv_nsec -= 1000000000;
> + }
> + ovr++;
> + }
> +#endif
> + return(ovr);
> +}
> +
> +int sending_signal_failed;
> +
> +/*
> + * Yes I calculate an overrun but don't deliver it. I need to
> + * play with this code.
> + */
> +static void timer_notify_task(struct k_itimer *timr, int ovr)
> +{
> + struct siginfo info;
> + int ret;
> +
> + if (! (timr->it_sigev_notify & SIGEV_NONE)) {
> + memset(&info, 0, sizeof(info));
> + /* Send signal to the process that owns this timer. */
> + info.si_signo = timr->it_sigev_signo;
> + info.si_errno = 0;
> + info.si_code = SI_TIMER;
> + info.si_tid = timr->it_id;
> + info.si_value = timr->it_sigev_value;
> + info.si_overrun = timr->it_overrun_deferred;
> + ret = send_sig_info(info.si_signo, &info, timr->it_process);
> + switch (ret) {
> + case 0: /* all's well new signal queued */
> + timr->it_overrun_last = timr->it_overrun;
> + timr->it_overrun = timr->it_overrun_deferred;
> + break;
> + case 1: /* signal from this timer was already in the queue */
> + timr->it_overrun += timr->it_overrun_deferred + 1;
> + break;
> + default:
> + sending_signal_failed++;
> + break;
> + }
> + }
> +}
> +
> +void do_expiry(struct k_itimer *t, int ovr)
> +{
> + switch (t->it_type) {
> + case TIMER:
> + timer_notify_task(t, ovr);
> + return;
> + case NANOSLEEP:
> + wake_up_process(t->it_process);
> + return;
> + }
> +}
> +
> +/*
> + * Check if the timer at the head of the priority queue has
> + * expired and handle the expiry. Return time in nsec till
> + * the next expiry. We only really care about expiries
> + * before the next clock tick so we use a 32 bit int here.
> + */
> +
> +static int check_expiry(struct timer_pq *pq, struct timespec *tv)
> +{
> + struct k_itimer *t;
> + struct timespec dt;
> + int ovr;
> + long sec, nsec;
> + unsigned long flags;
> +
> + ovr = 1;
> + spin_lock_irqsave(&posix_timers_lock, flags);
> + while (!list_empty(&pq->head)) {
> + t = list_entry(pq->head.next, struct k_itimer, it_pq_list);
> + dt.tv_sec = tv->tv_sec - t->it_v.it_value.tv_sec;
> + dt.tv_nsec = tv->tv_nsec - t->it_v.it_value.tv_nsec;
> + if (dt.tv_sec < 0 || (dt.tv_sec == 0 && dt.tv_nsec < 0)) {
> + /*
> + * It has not expired yet. Return nano-seconds
> + * remaining if its less than a second.
> + */
> + if (dt.tv_sec < -1)
> + nsec = -1;
> + else
> + nsec = dt.tv_sec ? 1000000000-dt.tv_nsec :
> + -dt.tv_nsec;
> + spin_unlock_irqrestore(&posix_timers_lock, flags);
> + return(nsec);
> + }
> + /*
> + * Its expired. If this is a periodic timer we need to
> + * setup for the next expiry. We also check for overrun
> + * here. If the timer has already missed an expiry we want
> + * deliver the overrun information and get back on schedule.
> + */
> + if (dt.tv_nsec < 0) {
> + dt.tv_sec--;
> + dt.tv_nsec += 1000000000;
> + }
> + timer_remove_nolock(t);
> + if (t->it_v.it_interval.tv_sec || t->it_v.it_interval.tv_nsec) {
> + if (dt.tv_sec > t->it_v.it_interval.tv_sec ||
> + (dt.tv_sec == t->it_v.it_interval.tv_sec &&
> + dt.tv_nsec > t->it_v.it_interval.tv_nsec)) {
> + ovr = handle_overrun(t, dt);
> + } else {
> + nsec = t->it_v.it_value.tv_nsec +
> + t->it_v.it_interval.tv_nsec;
> + sec = t->it_v.it_value.tv_sec +
> + t->it_v.it_interval.tv_sec;
> + if (nsec > 1000000000) {
> + nsec -= 1000000000;
> + sec++;
> + }
> + t->it_v.it_value.tv_sec = sec;
> + t->it_v.it_value.tv_nsec = nsec;
> + }
> + /*
> + * It might make sense to leave the timer queue and
> + * avoid the remove/insert for timers which stay
> + * at the front of the queue.
> + */
> + timer_insert_nolock(pq, t);
> + }
> + do_expiry(t, ovr);
> + }
> + spin_unlock_irqrestore(&posix_timers_lock, flags);
> + return(-1);
> +}
> +
> +/*
> + * kluge? We should know the offset between clock_realtime and
> + * clock_monotonic so we don't need to get the time twice.
> + */
> +
> +void run_posix_timers(unsigned long dummy)
> +{
> + struct timespec now;
> + int ns, ret;
> +
> + ns = 0x7fffffff;
> + do_posix_clock_monotonic_gettime(&now);
> + ret = check_expiry(&clock_monotonic.pq, &now);
> + if (ret > 0 && ret < ns)
> + ns = ret;
> +
> + do_gettimeofday((struct timeval*)&now);
> + now.tv_nsec *= NSEC_PER_USEC;
> + ret = check_expiry(&clock_realtime.pq, &now);
> + if (ret > 0 && ret < ns)
> + ns = ret;
> + poll_posix_timers.expires = jiffies + 1;
> + add_timer(&poll_posix_timers);
> +}
> +
> +
> +extern rwlock_t xtime_lock;
> +
> +/*
> + * CLOCKs: The POSIX standard calls for a couple of clocks and allows us
> + * to implement others. This structure defines the various
> + * clocks and allows the possibility of adding others. We
> + * provide an interface to add clocks to the table and expect
> + * the "arch" code to add at least one clock that is high
> + * resolution. Here we define the standard CLOCK_REALTIME as a
> + * 1/HZ resolution clock.
> +
> + * CPUTIME & THREAD_CPUTIME: We are not, at this time, definding these
> + * two clocks (and the other process related clocks (Std
> + * 1003.1d-1999). The way these should be supported, we think,
> + * is to use large negative numbers for the two clocks that are
> + * pinned to the executing process and to use -pid for clocks
> + * pinned to particular pids. Calls which supported these clock
> + * ids would split early in the function.
> +
> + * RESOLUTION: Clock resolution is used to round up timer and interval
> + * times, NOT to report clock times, which are reported with as
> + * much resolution as the system can muster. In some cases this
> + * resolution may depend on the underlaying clock hardware and
> + * may not be quantifiable until run time, and only then is the
> + * necessary code is written. The standard says we should say
> + * something about this issue in the documentation...
> +
> + * FUNCTIONS: The CLOCKs structure defines possible functions to handle
> + * various clock functions. For clocks that use the standard
> + * system timer code these entries should be NULL. This will
> + * allow dispatch without the overhead of indirect function
> + * calls. CLOCKS that depend on other sources (e.g. WWV or GPS)
> + * must supply functions here, even if the function just returns
> + * ENOSYS. The standard POSIX timer management code assumes the
> + * following: 1.) The k_itimer struct (sched.h) is used for the
> + * timer. 2.) The list, it_lock, it_clock, it_id and it_process
> + * fields are not modified by timer code.
> + *
> + * Permissions: It is assumed that the clock_settime() function defined
> + * for each clock will take care of permission checks. Some
> + * clocks may be set able by any user (i.e. local process
> + * clocks) others not. Currently the only set able clock we
> + * have is CLOCK_REALTIME and its high res counter part, both of
> + * which we beg off on and pass to do_sys_settimeofday().
> + */
> +
> +struct k_clock *posix_clocks[MAX_CLOCKS];
> +
> +#define if_clock_do(clock_fun, alt_fun,parms) (! clock_fun)? alt_fun parms :\
> + clock_fun parms
> +
> +#define p_timer_get( clock,a,b) if_clock_do((clock)->timer_get, \
> + do_timer_gettime, \
> + (a,b))
> +
> +#define p_nsleep( clock,a,b,c) if_clock_do((clock)->nsleep, \
> + do_nsleep, \
> + (a,b,c))
> +
> +#define p_timer_del( clock,a) if_clock_do((clock)->timer_del, \
> + do_timer_delete, \
> + (a))
> +
> +void register_posix_clock(int clock_id, struct k_clock * new_clock);
> +
> +static int do_posix_gettime(struct k_clock *clock, struct timespec *tp);
> +
> +
> +void register_posix_clock(int clock_id,struct k_clock * new_clock)
> +{
> + if ((unsigned)clock_id >= MAX_CLOCKS) {
> + printk("POSIX clock register failed for clock_id %d\n",clock_id);
> + return;
> + }
> + posix_clocks[clock_id] = new_clock;
> +}
> +
> +static __init int init_posix_timers(void)
> +{
> + posix_timers_cache = kmem_cache_create("posix_timers_cache",
> + sizeof(struct k_itimer), 0, 0, 0, 0);
> + id2ptr_init(&posix_timers_id, 1000);
> +
> + register_posix_clock(CLOCK_REALTIME,&clock_realtime);
> + register_posix_clock(CLOCK_MONOTONIC,&clock_monotonic);
> + return 0;
> +}
> +
> +__initcall(init_posix_timers);
> +
> +/*
> + * For some reason mips/mips64 define the SIGEV constants plus 128.
> + * Here we define a mask to get rid of the common bits. The
> + * optimizer should make this costless to all but mips.
> + */
> +#if (ARCH == mips) || (ARCH == mips64)
> +#define MIPS_SIGEV ~(SIGEV_NONE & \
> + SIGEV_SIGNAL & \
> + SIGEV_THREAD & \
> + SIGEV_THREAD_ID)
> +#else
> +#define MIPS_SIGEV (int)-1
> +#endif
> +
> +static struct task_struct * good_sigevent(sigevent_t *event)
> +{
> + struct task_struct * rtn = current;
> +
> + if (event->sigev_notify & SIGEV_THREAD_ID & MIPS_SIGEV ) {
> + if ( !(rtn = find_task_by_pid(event->sigev_notify_thread_id)) ||
> + rtn->tgid != current->tgid){
> + return NULL;
> + }
> + }
> + if (event->sigev_notify & SIGEV_SIGNAL & MIPS_SIGEV) {
> + if ((unsigned)(event->sigev_signo > SIGRTMAX))
> + return NULL;
> + }
> + if (event->sigev_notify & ~(SIGEV_SIGNAL | SIGEV_THREAD_ID )) {
> + return NULL;
> + }
> + return rtn;
> +}
> +
> +
> +
> +static struct k_itimer * alloc_posix_timer(void)
> +{
> + struct k_itimer *tmr;
> + tmr = kmem_cache_alloc(posix_timers_cache, GFP_KERNEL);
> + memset(tmr, 0, sizeof(struct k_itimer));
> + return(tmr);
> +}
> +
> +static void release_posix_timer(struct k_itimer *tmr)
> +{
> + if (tmr->it_id > 0)
> + id2ptr_remove(&posix_timers_id, tmr->it_id);
> + kmem_cache_free(posix_timers_cache, tmr);
> +}
> +
> +/* Create a POSIX.1b interval timer. */
> +
> +asmlinkage int
> +sys_timer_create(clockid_t which_clock, struct sigevent *timer_event_spec,
> + timer_t *created_timer_id)
> +{
> + int error = 0;
> + struct k_itimer *new_timer = NULL;
> + int new_timer_id;
> + struct task_struct * process = 0;
> + sigevent_t event;
> +
> + if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
> + return -EINVAL;
> +
> + new_timer = alloc_posix_timer();
> + if (new_timer == NULL) return -EAGAIN;
> +
> + new_timer_id = (timer_t)id2ptr_new(&posix_timers_id,
> + (void *)new_timer);
> + if (!new_timer_id) {
> + error = -EAGAIN;
> + goto out;
> + }
> + new_timer->it_id = new_timer_id;
> +
> + if (copy_to_user(created_timer_id, &new_timer_id,
> + sizeof(new_timer_id))) {
> + error = -EFAULT;
> + goto out;
> + }
> + spin_lock_init(&new_timer->it_lock);
> + if (timer_event_spec) {
> + if (copy_from_user(&event, timer_event_spec, sizeof(event))) {
> + error = -EFAULT;
> + goto out;
> + }
> + read_lock(&tasklist_lock);
> + if ((process = good_sigevent(&event))) {
> + /*
> + * We may be setting up this timer for another
> + * thread. It may be exitiing. To catch this
> + * case the we clear posix_timers.next in
> + * exit_itimers.
> + */
> + spin_lock(&process->alloc_lock);
> + if (process->posix_timers.next) {
> + list_add(&new_timer->it_task_list,
> + &process->posix_timers);
> + spin_unlock(&process->alloc_lock);
> + } else {
> + spin_unlock(&process->alloc_lock);
> + process = 0;
> + }
> + }
> + read_unlock(&tasklist_lock);
> + if (!process) {
> + error = -EINVAL;
> + goto out;
> + }
> + new_timer->it_sigev_notify = event.sigev_notify;
> + new_timer->it_sigev_signo = event.sigev_signo;
> + new_timer->it_sigev_value = event.sigev_value;
> + } else {
> + new_timer->it_sigev_notify = SIGEV_SIGNAL;
> + new_timer->it_sigev_signo = SIGALRM;
> + new_timer->it_sigev_value.sival_int = new_timer->it_id;
> + process = current;
> + spin_lock(&current->alloc_lock);
> + list_add(&new_timer->it_task_list, &current->posix_timers);
> + spin_unlock(&current->alloc_lock);
> + }
> + new_timer->it_clock = which_clock;
> + new_timer->it_overrun = 0;
> + new_timer->it_process = process;
> +
> + out:
> + if (error)
> + release_posix_timer(new_timer);
> + return error;
> +}
> +
> +
> +/*
> + * return timer owned by the process, used by exit and exec
> + */
> +void itimer_delete(struct k_itimer *timer)
> +{
> + if (sys_timer_delete(timer->it_id)){
> + BUG();
> + }
> +}
> +
> +/*
> + * This is call from both exec and exit to shutdown the
> + * timers.
> + */
> +
> +inline void exit_itimers(struct task_struct *tsk, int exit)
> +{
> + struct k_itimer *tmr;
> +
> + if (!tsk->posix_timers.next)
> + BUG();
> + if (tsk->nanosleep_tmr.it_pq)
> + timer_remove(&tsk->nanosleep_tmr);
> + spin_lock(&tsk->alloc_lock);
> + while (tsk->posix_timers.next != &tsk->posix_timers){
> + spin_unlock(&tsk->alloc_lock);
> + tmr = list_entry(tsk->posix_timers.next,struct k_itimer,
> + it_task_list);
> + itimer_delete(tmr);
> + spin_lock(&tsk->alloc_lock);
> + }
> + /*
> + * sys_timer_create has the option to create a timer
> + * for another thread. There is the risk that as the timer
> + * is being created that the thread that was supposed to handle
> + * the signal is exiting. We use the posix_timers.next field
> + * as a flag so we can close this race.
> +` */
> + if (exit)
> + tsk->posix_timers.next = 0;
> + spin_unlock(&tsk->alloc_lock);
> +}
> +
> +/* good_timespec
> + *
> + * This function checks the elements of a timespec structure.
> + *
> + * Arguments:
> + * ts : Pointer to the timespec structure to check
> + *
> + * Return value:
> + * If a NULL pointer was passed in, or the tv_nsec field was less than 0 or
> + * greater than NSEC_PER_SEC, or the tv_sec field was less than 0, this
> + * function returns 0. Otherwise it returns 1.
> + */
> +
> +static int good_timespec(const struct timespec *ts)
> +{
> + if ((ts == NULL) ||
> + (ts->tv_sec < 0) ||
> + ((unsigned)ts->tv_nsec >= NSEC_PER_SEC))
> + return 0;
> + return 1;
> +}
> +
> +static inline void unlock_timer(struct k_itimer *timr)
> +{
> + spin_unlock_irq(&timr->it_lock);
> +}
> +
> +static struct k_itimer* lock_timer( timer_t timer_id)
> +{
> + struct k_itimer *timr;
> +
> + timr = (struct k_itimer *)id2ptr_lookup(&posix_timers_id,
> + (int)timer_id);
> + if (timr)
> + spin_lock_irq(&timr->it_lock);
> + return(timr);
> +}
> +
> +/*
> + * Get the time remaining on a POSIX.1b interval timer.
> + * This function is ALWAYS called with spin_lock_irq on the timer, thus
> + * it must not mess with irq.
> + */
> +void inline do_timer_gettime(struct k_itimer *timr,
> + struct itimerspec *cur_setting)
> +{
> + struct timespec ts;
> +
> + do_posix_gettime(posix_clocks[timr->it_clock], &ts);
> + ts.tv_sec = timr->it_v.it_value.tv_sec - ts.tv_sec;
> + ts.tv_nsec = timr->it_v.it_value.tv_nsec - ts.tv_nsec;
> + if (ts.tv_nsec < 0) {
> + ts.tv_nsec += 1000000000;
> + ts.tv_sec--;
> + }
> + if (ts.tv_sec < 0)
> + ts.tv_sec = ts.tv_nsec = 0;
> + cur_setting->it_value = ts;
> + cur_setting->it_interval = timr->it_v.it_interval;
> +}
> +
> +/* Get the time remaining on a POSIX.1b interval timer. */
> +asmlinkage int sys_timer_gettime(timer_t timer_id, struct itimerspec *setting)
> +{
> + struct k_itimer *timr;
> + struct itimerspec cur_setting;
> +
> + timr = lock_timer(timer_id);
> + if (!timr) return -EINVAL;
> +
> + p_timer_get(posix_clocks[timr->it_clock],timr, &cur_setting);
> +
> + unlock_timer(timr);
> +
> + if (copy_to_user(setting, &cur_setting, sizeof(cur_setting)))
> + return -EFAULT;
> +
> + return 0;
> +}
> +/*
> + * Get the number of overruns of a POSIX.1b interval timer
> + * This is a bit messy as we don't easily know where he is in the delivery
> + * of possible multiple signals. We are to give him the overrun on the
> + * last delivery. If we have another pending, we want to make sure we
> + * use the last and not the current. If there is not another pending
> + * then he is current and gets the current overrun. We search both the
> + * shared and local queue.
> + */
> +
> +asmlinkage int sys_timer_getoverrun(timer_t timer_id)
> +{
> + struct k_itimer *timr;
> + int overrun, i;
> + struct sigqueue *q;
> + struct sigpending *sig_queue;
> + struct task_struct * t;
> +
> + timr = lock_timer( timer_id);
> + if (!timr) return -EINVAL;
> +
> + t = timr->it_process;
> + overrun = timr->it_overrun;
> + spin_lock_irq(&t->sig->siglock);
> + for (sig_queue = &t->sig->shared_pending, i = 2; i;
> + sig_queue = &t->pending, i--){
> + for (q = sig_queue->head; q; q = q->next) {
> + if ((q->info.si_code == SI_TIMER) &&
> + (q->info.si_tid == timr->it_id)) {
> +
> + overrun = timr->it_overrun_last;
> + goto out;
> + }
> + }
> + }
> + out:
> + spin_unlock_irq(&t->sig->siglock);
> +
> + unlock_timer(timr);
> +
> + return overrun;
> +}
> +
> +/*
> + * If it is relative time, we need to add the current time to it to
> + * get the proper expiry time.
> + */
> +static int adjust_rel_time(struct k_clock *clock,struct timespec *tp)
> +{
> + struct timespec now;
> +
> +
> + do_posix_gettime(clock,&now);
> + tp->tv_sec += now.tv_sec;
> + tp->tv_nsec += now.tv_nsec;
> +
> + /*
> + * Normalize...
> + */
> + if (( tp->tv_nsec - NSEC_PER_SEC) >= 0){
> + tp->tv_nsec -= NSEC_PER_SEC;
> + tp->tv_sec++;
> + }
> + return 0;
> +}
> +
> +/* Set a POSIX.1b interval timer. */
> +/* timr->it_lock is taken. */
> +static inline int do_timer_settime(struct k_itimer *timr, int flags,
> + struct itimerspec *new_setting,
> + struct itimerspec *old_setting)
> +{
> + struct k_clock * clock = posix_clocks[timr->it_clock];
> +
> + timer_remove(timr);
> + if (old_setting) {
> + do_timer_gettime(timr, old_setting);
> + }
> +
> +
> + /* switch off the timer when it_value is zero */
> + if ((new_setting->it_value.tv_sec == 0) &&
> + (new_setting->it_value.tv_nsec == 0)) {
> + timr->it_v = *new_setting;
> + return 0;
> + }
> +
> + if (!(flags & TIMER_ABSTIME))
> + adjust_rel_time(clock, &new_setting->it_value);
> +
> + timr->it_v = *new_setting;
> + timr->it_overrun_deferred =
> + timr->it_overrun_last =
> + timr->it_overrun = 0;
> + timer_insert(&clock->pq, timr);
> + return 0;
> +}
> +
> +
> +
> +/* Set a POSIX.1b interval timer */
> +asmlinkage int sys_timer_settime(timer_t timer_id, int flags,
> + const struct itimerspec *new_setting,
> + struct itimerspec *old_setting)
> +{
> + struct k_itimer *timr;
> + struct itimerspec new_spec, old_spec;
> + int error = 0;
> + struct itimerspec *rtn = old_setting ? &old_spec : NULL;
> +
> +
> + if (new_setting == NULL) {
> + return -EINVAL;
> + }
> +
> + if (copy_from_user(&new_spec, new_setting, sizeof(new_spec))) {
> + return -EFAULT;
> + }
> +
> + if ((!good_timespec(&new_spec.it_interval)) ||
> + (!good_timespec(&new_spec.it_value))) {
> + return -EINVAL;
> + }
> +
> + timr = lock_timer( timer_id);
> + if (!timr)
> + return -EINVAL;
> +
> + if (! posix_clocks[timr->it_clock]->timer_set) {
> + error = do_timer_settime(timr, flags, &new_spec, rtn );
> + }else{
> + error = posix_clocks[timr->it_clock]->timer_set(timr,
> + flags,
> + &new_spec,
> + rtn );
> + }
> + unlock_timer(timr);
> +
> + if (old_setting && ! error) {
> + if (copy_to_user(old_setting, &old_spec, sizeof(old_spec))) {
> + error = -EFAULT;
> + }
> + }
> +
> + return error;
> +}
> +
> +static inline int do_timer_delete(struct k_itimer *timer)
> +{
> + timer_remove(timer);
> + return 0;
> +}
> +
> +/* Delete a POSIX.1b interval timer. */
> +asmlinkage int sys_timer_delete(timer_t timer_id)
> +{
> + struct k_itimer *timer;
> +
> + timer = lock_timer( timer_id);
> + if (!timer)
> + return -EINVAL;
> +
> + p_timer_del(posix_clocks[timer->it_clock],timer);
> +
> + spin_lock(&timer->it_process->alloc_lock);
> + list_del(&timer->it_task_list);
> + spin_unlock(&timer->it_process->alloc_lock);
> +
> + /*
> + * This keeps any tasks waiting on the spin lock from thinking
> + * they got something (see the lock code above).
> + */
> + timer->it_process = NULL;
> + unlock_timer(timer);
> + release_posix_timer(timer);
> + return 0;
> +}
> +/*
> + * And now for the "clock" calls
> + * These functions are called both from timer functions (with the timer
> + * spin_lock_irq() held and from clock calls with no locking. They must
> + * use the save flags versions of locks.
> + */
> +static int do_posix_gettime(struct k_clock *clock, struct timespec *tp)
> +{
> +
> + if (clock->clock_get){
> + return clock->clock_get(tp);
> + }
> +
> + do_gettimeofday((struct timeval*)tp);
> + tp->tv_nsec *= NSEC_PER_USEC;
> + return 0;
> +}
> +
> +/*
> + * We do ticks here to avoid the irq lock ( they take sooo long)
> + * Note also that the while loop assures that the sub_jiff_offset
> + * will be less than a jiffie, thus no need to normalize the result.
> + * Well, not really, if called with ints off :(
> + */
> +
> +int do_posix_clock_monotonic_gettime(struct timespec *tp)
> +{
> + long sub_sec;
> + u64 jiffies_64_f;
> +
> +#if (BITS_PER_LONG > 32)
> +
> + jiffies_64_f = jiffies_64;
> +
> +#elif defined(CONFIG_SMP)
> +
> + /* Tricks don't work here, must take the lock. Remember, called
> + * above from both timer and clock system calls => save flags.
> + */
> + {
> + unsigned long flags;
> + read_lock_irqsave(&xtime_lock, flags);
> + jiffies_64_f = jiffies_64;
> +
> +
> + read_unlock_irqrestore(&xtime_lock, flags);
> + }
> +#elif ! defined(CONFIG_SMP) && (BITS_PER_LONG < 64)
> + unsigned long jiffies_f;
> + do {
> + jiffies_f = jiffies;
> + barrier();
> + jiffies_64_f = jiffies_64;
> + } while (unlikely(jiffies_f != jiffies));
> +
> +
> +#endif
> + tp->tv_sec = div_long_long_rem(jiffies_64_f,HZ,&sub_sec);
> +
> + tp->tv_nsec = sub_sec * (NSEC_PER_SEC / HZ);
> + return 0;
> +}
> +
> +int do_posix_clock_monotonic_settime(struct timespec *tp)
> +{
> + return -EINVAL;
> +}
> +
> +asmlinkage int sys_clock_settime(clockid_t which_clock,const struct timespec *tp)
> +{
> + struct timespec new_tp;
> +
> + if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
> + return -EINVAL;
> + if (copy_from_user(&new_tp, tp, sizeof(*tp)))
> + return -EFAULT;
> + if ( posix_clocks[which_clock]->clock_set){
> + return posix_clocks[which_clock]->clock_set(&new_tp);
> + }
> + new_tp.tv_nsec /= NSEC_PER_USEC;
> + return do_sys_settimeofday((struct timeval*)&new_tp,NULL);
> +}
> +asmlinkage int sys_clock_gettime(clockid_t which_clock, struct timespec *tp)
> +{
> + struct timespec rtn_tp;
> + int error = 0;
> +
> + if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
> + return -EINVAL;
> +
> + error = do_posix_gettime(posix_clocks[which_clock],&rtn_tp);
> +
> + if ( ! error) {
> + if (copy_to_user(tp, &rtn_tp, sizeof(rtn_tp))) {
> + error = -EFAULT;
> + }
> + }
> + return error;
> +
> +}
> +asmlinkage int sys_clock_getres(clockid_t which_clock, struct timespec *tp)
> +{
> + struct timespec rtn_tp;
> +
> + if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
> + return -EINVAL;
> +
> + rtn_tp.tv_sec = 0;
> + rtn_tp.tv_nsec = posix_clocks[which_clock]->res;
> + if ( tp){
> + if (copy_to_user(tp, &rtn_tp, sizeof(rtn_tp))) {
> + return -EFAULT;
> + }
> + }
> + return 0;
> +
> +}
> +
> +#if 0
> +// This #if 0 is to keep the pretty printer/ formatter happy so the indents will
> +// correct below.
> +
> +// The NANOSLEEP_ENTRY macro is defined in asm/signal.h and
> +// is structured to allow code as well as entry definitions, so that when
> +// we get control back here the entry parameters will be available as expected.
> +// Some systems may find these paramerts in other ways than as entry parms,
> +// for example, struct pt_regs *regs is defined in i386 as the address of the
> +// first parameter, where as other archs pass it as one of the paramerters.
> +
> +asmlinkage long sys_clock_nanosleep(void)
> +{
> +#endif
> + CLOCK_NANOSLEEP_ENTRY( struct timespec ts;
> + struct k_itimer *t;
> + struct k_clock * clock;
> + int active;)
> +
> + //asmlinkage int sys_clock_nanosleep(clockid_t which_clock,
> + // int flags,
> + // const struct timespec *rqtp,
> + // struct timespec *rmtp)
> + //{
> +
> + if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
> + return -EINVAL;
> + /*
> + * See discussion below about waking up early.
> + */
> + clock = posix_clocks[which_clock];
> + t = &current->nanosleep_tmr;
> + if (t->it_pq)
> + timer_remove(t);
> +
> + if(copy_from_user(&t->it_v.it_value, rqtp, sizeof(struct timespec)))
> + return -EFAULT;
> +
> + if ((t->it_v.it_value.tv_nsec < 0) ||
> + (t->it_v.it_value.tv_nsec >= NSEC_PER_SEC) ||
> + (t->it_v.it_value.tv_sec < 0))
> + return -EINVAL;
> +
> + if (!(flags & TIMER_ABSTIME))
> + adjust_rel_time(clock, &t->it_v.it_value);
> + /*
> + * These fields don't need to be setup each time. This
> + * should be in the INIT_TASK() and forgoten.
> + */
> + t->it_v.it_interval.tv_sec = 0;
> + t->it_v.it_interval.tv_nsec = 0;
> + t->it_type = NANOSLEEP;
> + t->it_process = current;
> +
> + current->state = TASK_INTERRUPTIBLE;
> + timer_insert(&clock->pq, t);
> + schedule();
> + /*
> + * Were not supposed to leave early. The problem is
> + * being woken by signals that are not delivered to
> + * the user. Typically this means debug related
> + * signals.
> + *
> + * My plan is to leave the timer running and have a
> + * small hook in do_signal which will complete the
> + * nanosleep. For now we just return early in clear
> + * violation of the Posix spec.
> + */
> + active = (t->it_pq != 0);
> + if (!(flags & TIMER_ABSTIME) && active && rmtp ) {
> + do_posix_gettime(clock, &ts);
> + ts.tv_sec = t->it_v.it_value.tv_sec - ts.tv_sec;
> + ts.tv_nsec = t->it_v.it_value.tv_nsec - ts.tv_nsec;
> + if (ts.tv_nsec < 0) {
> + ts.tv_nsec += 1000000000;
> + ts.tv_sec--;
> + }
> + if (ts.tv_sec < 0)
> + ts.tv_sec = ts.tv_nsec = 0;
> + if (copy_to_user(rmtp, &ts, sizeof(struct timespec)))
> + return -EFAULT;
> + }
> + if (active)
> + return -EINTR;
> + return 0;
> +}
> +
> +void clock_was_set(void)
> +{
> +}
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/signal.c linux.mytimers/kernel/signal.c
> --- linux.orig/kernel/signal.c Wed Oct 23 00:54:30 2002
> +++ linux.mytimers/kernel/signal.c Wed Oct 23 01:17:51 2002
> @@ -424,8 +424,6 @@
> if (!collect_signal(sig, pending, info))
> sig = 0;
>
> - /* XXX: Once POSIX.1b timers are in, if si_code == SI_TIMER,
> - we need to xchg out the timer overrun values. */
> }
> recalc_sigpending();
>
> @@ -692,6 +690,7 @@
> specific_send_sig_info(int sig, struct siginfo *info, struct task_struct *t, int shared)
> {
> int ret;
> + struct sigpending *sig_queue;
>
> if (!irqs_disabled())
> BUG();
> @@ -725,20 +724,43 @@
> if (ignored_signal(sig, t))
> goto out;
>
> + sig_queue = shared ? &t->sig->shared_pending : &t->pending;
> +
> #define LEGACY_QUEUE(sigptr, sig) \
> (((sig) < SIGRTMIN) && sigismember(&(sigptr)->signal, (sig)))
> -
> + /*
> + * Support queueing exactly one non-rt signal, so that we
> + * can get more detailed information about the cause of
> + * the signal.
> + */
> + if (LEGACY_QUEUE(sig_queue, sig))
> + goto out;
> + /*
> + * In case of a POSIX timer generated signal you must check
> + * if a signal from this timer is already in the queue.
> + * If that is true, the overrun count will be increased in
> + * itimer.c:posix_timer_fn().
> + */
> +
> + if (((unsigned long)info > 1) && (info->si_code == SI_TIMER)) {
> + struct sigqueue *q;
> + for (q = sig_queue->head; q; q = q->next) {
> + if ((q->info.si_code == SI_TIMER) &&
> + (q->info.si_tid == info->si_tid)) {
> + q->info.si_overrun += info->si_overrun + 1;
> + /*
> + * this special ret value (1) is recognized
> + * only by posix_timer_fn() in itimer.c
> + */
> + ret = 1;
> + goto out;
> + }
> + }
> + }
> if (!shared) {
> - /* Support queueing exactly one non-rt signal, so that we
> - can get more detailed information about the cause of
> - the signal. */
> - if (LEGACY_QUEUE(&t->pending, sig))
> - goto out;
>
> ret = deliver_signal(sig, info, t);
> } else {
> - if (LEGACY_QUEUE(&t->sig->shared_pending, sig))
> - goto out;
> ret = send_signal(sig, info, &t->sig->shared_pending);
> }
> out:
> @@ -1418,8 +1440,9 @@
> err |= __put_user(from->si_uid, &to->si_uid);
> break;
> case __SI_TIMER:
> - err |= __put_user(from->si_timer1, &to->si_timer1);
> - err |= __put_user(from->si_timer2, &to->si_timer2);
> + err |= __put_user(from->si_tid, &to->si_tid);
> + err |= __put_user(from->si_overrun, &to->si_overrun);
> + err |= __put_user(from->si_ptr, &to->si_ptr);
> break;
> case __SI_POLL:
> err |= __put_user(from->si_band, &to->si_band);
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/timer.c linux.mytimers/kernel/timer.c
> --- linux.orig/kernel/timer.c Wed Oct 23 00:54:21 2002
> +++ linux.mytimers/kernel/timer.c Wed Oct 23 01:17:51 2002
> @@ -47,12 +47,11 @@
> struct list_head vec[TVR_SIZE];
> } tvec_root_t;
>
> -typedef struct timer_list timer_t;
>
> struct tvec_t_base_s {
> spinlock_t lock;
> unsigned long timer_jiffies;
> - timer_t *running_timer;
> + struct timer_list *running_timer;
> tvec_root_t tv1;
> tvec_t tv2;
> tvec_t tv3;
> @@ -67,7 +66,7 @@
> /* Fake initialization needed to avoid compiler breakage */
> static DEFINE_PER_CPU(struct tasklet_struct, timer_tasklet) = { NULL };
>
> -static inline void internal_add_timer(tvec_base_t *base, timer_t *timer)
> +static inline void internal_add_timer(tvec_base_t *base, struct timer_list *timer)
> {
> unsigned long expires = timer->expires;
> unsigned long idx = expires - base->timer_jiffies;
> @@ -119,7 +118,7 @@
> * Timers with an ->expired field in the past will be executed in the next
> * timer tick. It's illegal to add an already pending timer.
> */
> -void add_timer(timer_t *timer)
> +void add_timer(struct timer_list *timer)
> {
> int cpu = get_cpu();
> tvec_base_t *base = tvec_bases + cpu;
> @@ -153,7 +152,7 @@
> * (ie. mod_timer() of an inactive timer returns 0, mod_timer() of an
> * active timer returns 1.)
> */
> -int mod_timer(timer_t *timer, unsigned long expires)
> +int mod_timer(struct timer_list *timer, unsigned long expires)
> {
> tvec_base_t *old_base, *new_base;
> unsigned long flags;
> @@ -226,7 +225,7 @@
> * (ie. del_timer() of an inactive timer returns 0, del_timer() of an
> * active timer returns 1.)
> */
> -int del_timer(timer_t *timer)
> +int del_timer(struct timer_list *timer)
> {
> unsigned long flags;
> tvec_base_t *base;
> @@ -263,7 +262,7 @@
> *
> * The function returns whether it has deactivated a pending timer or not.
> */
> -int del_timer_sync(timer_t *timer)
> +int del_timer_sync(struct timer_list *timer)
> {
> tvec_base_t *base = tvec_bases;
> int i, ret = 0;
> @@ -302,9 +301,9 @@
> * detach them individually, just clear the list afterwards.
> */
> while (curr != head) {
> - timer_t *tmp;
> + struct timer_list *tmp;
>
> - tmp = list_entry(curr, timer_t, entry);
> + tmp = list_entry(curr, struct timer_list, entry);
> if (tmp->base != base)
> BUG();
> next = curr->next;
> @@ -343,9 +342,9 @@
> if (curr != head) {
> void (*fn)(unsigned long);
> unsigned long data;
> - timer_t *timer;
> + struct timer_list *timer;
>
> - timer = list_entry(curr, timer_t, entry);
> + timer = list_entry(curr, struct timer_list, entry);
> fn = timer->function;
> data = timer->data;
>
> @@ -448,6 +447,7 @@
> if (xtime.tv_sec % 86400 == 0) {
> xtime.tv_sec--;
> time_state = TIME_OOP;
> + clock_was_set();
> printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n");
> }
> break;
> @@ -456,6 +456,7 @@
> if ((xtime.tv_sec + 1) % 86400 == 0) {
> xtime.tv_sec++;
> time_state = TIME_WAIT;
> + clock_was_set();
> printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n");
> }
> break;
> @@ -912,7 +913,7 @@
> */
> signed long schedule_timeout(signed long timeout)
> {
> - timer_t timer;
> + struct timer_list timer;
> unsigned long expire;
>
> switch (timeout)
> @@ -968,10 +969,32 @@
> return current->pid;
> }
>
> -asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp)
> +#if 0
> +// This #if 0 is to keep the pretty printer/ formatter happy so the indents will
> +// correct below.
> +// The NANOSLEEP_ENTRY macro is defined in asm/signal.h and
> +// is structured to allow code as well as entry definitions, so that when
> +// we get control back here the entry parameters will be available as expected.
> +// Some systems may find these paramerts in other ways than as entry parms,
> +// for example, struct pt_regs *regs is defined in i386 as the address of the
> +// first parameter, where as other archs pass it as one of the paramerters.
> +asmlinkage long sys_nanosleep(void)
> {
> - struct timespec t;
> - unsigned long expire;
> +#endif
> + NANOSLEEP_ENTRY( struct timespec t;
> + unsigned long expire;)
> +
> +#ifndef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
> + // The following code expects rqtp, rmtp to be available
> + // as a result of the above macro. Also any regs needed
> + // for the _do_signal() macro shoule be set up here.
> +
> + //asmlinkage long sys_nanosleep(struct timespec *rqtp,
> + // struct timespec *rmtp)
> + // {
> + // struct timespec t;
> + // unsigned long expire;
> +
>
> if(copy_from_user(&t, rqtp, sizeof(struct timespec)))
> return -EFAULT;
> @@ -994,6 +1017,7 @@
> }
> return 0;
> }
> +#endif // ! FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
>
> /*
> * sys_sysinfo - fill in sysinfo struct
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-10-23 19:42:47

by Jim Houston

[permalink] [raw]
Subject: Re: [PATCH] alternate Posix timer patch

george anzinger wrote:
>
> Jim Houston wrote:
> I have also looked at the timer index stuff and made a few
> changes. If it get it working today, I will include it
> also. My changes mostly revolved around not caring about
> reusing a timer id. Would you care to comment on why you
> think reuse is bad?
>
> With out this feature the code is much simpler and does not
> keep around dead trees.
>
> -g

Hi George,

I assume the rationale is that not reusing the same id immediately helps
catch errors in user code. Since the id space is global, there
is more chance that one process may be manipulating another processes
timer. Reusing the same id makes this sort of problem harder to
catch.

The main reason I changed this in my patch is to avoid the CONFIG
limit on the number of timers. Since I don't have the fixed array,
I need a way to safely translate a user-space id into a kernel pointer.

Jim Houston Concurrent Computer Corp.

2002-10-23 22:12:43

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] alternate Posix timer patch

Jim Houston wrote:
>
> george anzinger wrote:
> >
> > Jim Houston wrote:
> > I have also looked at the timer index stuff and made a few
> > changes. If it get it working today, I will include it
> > also. My changes mostly revolved around not caring about
> > reusing a timer id. Would you care to comment on why you
> > think reuse is bad?
> >
> > With out this feature the code is much simpler and does not
> > keep around dead trees.
> >
> > -g
>
> Hi George,
>
> I assume the rationale is that not reusing the same id immediately helps
> catch errors in user code. Since the id space is global, there
> is more chance that one process may be manipulating another processes
> timer. Reusing the same id makes this sort of problem harder to
> catch.

Actually the timer itself has an owner field so if the id is
reused by a different process, the timer will belong to that
process and attempting to use the id from the prior process
will fail. If the same process gets the same id there could
be a problem of this sort, but, then why would a process
release and then ask for another if that is not what it
wanted to do.
>
> The main reason I changed this in my patch is to avoid the CONFIG
> limit on the number of timers. Since I don't have the fixed array,
> I need a way to safely translate a user-space id into a kernel pointer.

I think this is independent of the reuse of the timer id. I
like your id indexing except for the extra code and memory
required by the delayed reuse.

-g
>
> Jim Houston Concurrent Computer Corp.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-10-23 22:30:32

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] alternate Posix timer patch

mbs wrote:
>
> the way I handled this was that the timerid was the (kernel space) address of
> the dynamically allocated timer structure. this provides a fairly low
> likelyhood of duplicate timerid's value....

Yes, but this is a VERY messy pointer which must be verified
to not cause a kernel fault before it can even be
dereferenced... nuff said.

-g

by the way, the address [email protected] fails. Maybe you could
use something I can mail to.
>
> On Wednesday 23 October 2002 15:48, Jim Houston wrote:
> > george anzinger wrote:
> > > Jim Houston wrote:
> > > I have also looked at the timer index stuff and made a few
> > > changes. If it get it working today, I will include it
> > > also. My changes mostly revolved around not caring about
> > > reusing a timer id. Would you care to comment on why you
> > > think reuse is bad?
> > >
> > > With out this feature the code is much simpler and does not
> > > keep around dead trees.
> > >
> > > -g
> >
> > Hi George,
> >
> > I assume the rationale is that not reusing the same id immediately helps
> > catch errors in user code. Since the id space is global, there
> > is more chance that one process may be manipulating another processes
> > timer. Reusing the same id makes this sort of problem harder to
> > catch.
> >
> > The main reason I changed this in my patch is to avoid the CONFIG
> > limit on the number of timers. Since I don't have the fixed array,
> > I need a way to safely translate a user-space id into a kernel pointer.
> >
> > Jim Houston Concurrent Computer Corp.
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>
> --
> /**************************************************
> ** Mark Salisbury || [email protected] **
> ** If you would like to sponsor me for the **
> ** Mass Getaway, a 150 mile bicycle ride to for **
> ** MS, contact me to donate by cash or check or **
> ** click the link below to donate by credit card **
> **************************************************/
> https://www.nationalmssociety.org/pledge/pledge.asp?participantid=86736

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml