Message-ID: <3DB6ED10.A81B0429@mvista.com>
Date: Wed, 23 Oct 2002 11:40:16 -0700
From: george anzinger <george@mvista.com>
Organization: Monta Vista Software
MIME-Version: 1.0
To: jim.houston@attbi.com
CC: linux-kernel@vger.kernel.org,
       high-res-timers-discourse@lists.sourceforge.net, jim.houston@ccur.com,
       ak@suse.de
Subject: Re: [PATCH] alternate Posix timer patch
References: <200210230838.g9N8cac00490@linux.local>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 82562
Lines: 2221

Jim Houston wrote:
> 
> Hi Everyone,
> 
> This is the second version of my spin on the Posix timers.  I started
> with George Anzinger's patch but I have made major changes.
> 
> I have been using George's version of the patch and would be glad to
> see it included into the 2.5 tree.  On the other hand since we don't
> know what might appeal to Linus it makes sense to give him a choice.
> 
> I sent out the first version of this last friday and had useful
> coments from Andi Kleen.  I have addressed some of these but mostly
> I have just been getting it to work.  It now passes most of the
> tests that are included in George's timers support package.
> 
> Of particular interest is a race (that Andi pointed out) between
> saving a task_struct pointer, using this pointer to send signals
> and the process exiting.  George please look at my changes in
> sys_timer_create and exit_itimer.

Yes, I have looked and agree with your changes.  They will
be in the next version, hopefully today.

I have also looked at the timer index stuff and made a few
changes.  If it get it working today, I will include it
also.  My changes mostly revolved around not caring about
reusing a timer id.  Would you care to comment on why you
think reuse is bad?

With out this feature the code is much simpler and does not
keep around dead trees.

-g


> 
> Here is a summary of my changes:
> 
>      -  A new queue just for Posix timers and code to
>         handle expiring timers.  This supports high resolution
>         without having to change the existing jiffie based timers.
> 
>         I implemented this priority queue as a sort list
>         with a rbtree to index the list.  It is deterministic
>         and fast.
> 
>      -  Change to use the slab allocator.  This removes
>         the CONFIG option for the maximum number of timers.
> 
>      -  A new id allocator/lookup mechanism based on a
>         radix tree.  It includes  a bitmap to summarize the portion
>         of the tree which is in use.  Currently the Posix
>         timers patch reuses the id immediately.
> 
>      -  I keep the timers in seconds and nano-seconds.
>         I'm hoping that the system time keeping will sort
>         itself out and the Posix timers can just be a consumer.
>         Posix timers need two clocks - the time since boot and
>         the wall clock time.
> 
> I'm currently working on nanosleep.  I'm trying to come up with an
> alternative for the call to do_signal.  At the moment my patch may
> return from nanosleep early if it receives a debug signal.
> 
> This patch should work with linux- 2.5.44.
> 
> Jim Houston - Concurrent Computer Corp.
> 
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/arch/i386/kernel/entry.S linux.mytimers/arch/i386/kernel/entry.S
> --- linux.orig/arch/i386/kernel/entry.S Wed Oct 23 00:54:19 2002
> +++ linux.mytimers/arch/i386/kernel/entry.S     Wed Oct 23 01:17:51 2002
> @@ -737,6 +737,15 @@
>         .long sys_free_hugepages
>         .long sys_exit_group
>         .long sys_lookup_dcookie
> +       .long sys_timer_create
> +       .long sys_timer_settime   /* 255 */
> +       .long sys_timer_gettime
> +       .long sys_timer_getoverrun
> +       .long sys_timer_delete
> +       .long sys_clock_settime
> +       .long sys_clock_gettime   /* 260 */
> +       .long sys_clock_getres
> +       .long sys_clock_nanosleep
> 
>         .rept NR_syscalls-(.-sys_call_table)/4
>                 .long sys_ni_syscall
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/arch/i386/kernel/time.c linux.mytimers/arch/i386/kernel/time.c
> --- linux.orig/arch/i386/kernel/time.c  Wed Oct 23 00:54:19 2002
> +++ linux.mytimers/arch/i386/kernel/time.c      Wed Oct 23 01:17:51 2002
> @@ -131,6 +131,7 @@
>         time_maxerror = NTP_PHASE_LIMIT;
>         time_esterror = NTP_PHASE_LIMIT;
>         write_unlock_irq(&xtime_lock);
> +       clock_was_set();
>  }
> 
>  /*
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/fs/exec.c linux.mytimers/fs/exec.c
> --- linux.orig/fs/exec.c        Wed Oct 23 00:54:21 2002
> +++ linux.mytimers/fs/exec.c    Wed Oct 23 01:37:27 2002
> @@ -756,6 +756,7 @@
> 
>         flush_signal_handlers(current);
>         flush_old_files(current->files);
> +       exit_itimers(current, 0);
> 
>         return 0;
> 
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-generic/siginfo.h linux.mytimers/include/asm-generic/siginfo.h
> --- linux.orig/include/asm-generic/siginfo.h    Wed Oct 23 00:54:24 2002
> +++ linux.mytimers/include/asm-generic/siginfo.h        Wed Oct 23 01:17:51 2002
> @@ -43,8 +43,9 @@
> 
>                 /* POSIX.1b timers */
>                 struct {
> -                       unsigned int _timer1;
> -                       unsigned int _timer2;
> +                       timer_t _tid;           /* timer id */
> +                       int _overrun;           /* overrun count */
> +                       sigval_t _sigval;       /* same as below */
>                 } _timer;
> 
>                 /* POSIX.1b signals */
> @@ -86,8 +87,8 @@
>   */
>  #define si_pid         _sifields._kill._pid
>  #define si_uid         _sifields._kill._uid
> -#define si_timer1      _sifields._timer._timer1
> -#define si_timer2      _sifields._timer._timer2
> +#define si_tid         _sifields._timer._tid
> +#define si_overrun     _sifields._timer._overrun
>  #define si_status      _sifields._sigchld._status
>  #define si_utime       _sifields._sigchld._utime
>  #define si_stime       _sifields._sigchld._stime
> @@ -221,6 +222,7 @@
>  #define SIGEV_SIGNAL   0       /* notify via signal */
>  #define SIGEV_NONE     1       /* other notification: meaningless */
>  #define SIGEV_THREAD   2       /* deliver via thread creation */
> +#define SIGEV_THREAD_ID 4      /* deliver to thread */
> 
>  #define SIGEV_MAX_SIZE 64
>  #ifndef SIGEV_PAD_SIZE
> @@ -235,6 +237,7 @@
>         int sigev_notify;
>         union {
>                 int _pad[SIGEV_PAD_SIZE];
> +                int _tid;
> 
>                 struct {
>                         void (*_function)(sigval_t);
> @@ -247,6 +250,7 @@
> 
>  #define sigev_notify_function  _sigev_un._sigev_thread._function
>  #define sigev_notify_attributes        _sigev_un._sigev_thread._attribute
> +#define sigev_notify_thread_id  _sigev_un._tid
> 
>  #ifdef __KERNEL__
> 
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-i386/posix_types.h linux.mytimers/include/asm-i386/posix_types.h
> --- linux.orig/include/asm-i386/posix_types.h   Tue Jan 18 01:22:52 2000
> +++ linux.mytimers/include/asm-i386/posix_types.h       Wed Oct 23 01:17:51 2002
> @@ -22,6 +22,8 @@
>  typedef long           __kernel_time_t;
>  typedef long           __kernel_suseconds_t;
>  typedef long           __kernel_clock_t;
> +typedef int            __kernel_timer_t;
> +typedef int            __kernel_clockid_t;
>  typedef int            __kernel_daddr_t;
>  typedef char *         __kernel_caddr_t;
>  typedef unsigned short __kernel_uid16_t;
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-i386/signal.h linux.mytimers/include/asm-i386/signal.h
> --- linux.orig/include/asm-i386/signal.h        Wed Oct 23 00:50:41 2002
> +++ linux.mytimers/include/asm-i386/signal.h    Wed Oct 23 01:17:51 2002
> @@ -219,6 +219,73 @@
> 
>  struct pt_regs;
>  extern int FASTCALL(do_signal(struct pt_regs *regs, sigset_t *oldset));
> +/*
> + * These macros are used by nanosleep() and clock_nanosleep().
> + * The issue is that these functions need the *regs pointer which is
> + * passed in different ways by the differing archs.
> +
> + * Below we do things in two differing ways.  In the long run we would
> + * like to see nano_sleep() go away (glibc should call clock_nanosleep
> + * much as we do).  When that happens and the nano_sleep() system
> + * call entry is retired, there will no longer be any real need for
> + * sys_nanosleep() so the FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP macro
> + * could be undefined, resulting in not needing to stack all the
> + * parms over again, i.e. better (faster AND smaller) code.
> +
> + * And while were at it, there needs to be a way to set the return code
> + * on the way to do_signal().  It (i.e. do_signal()) saves the regs on
> + * the callers stack to call the user handler and then the return is
> + * done using those registers.  This means that the error code MUST be
> + * set in the register PRIOR to calling do_signal().  See our answer
> + * below...thanks to  Jim Houston <jim.houston@attbi.com>
> + */
> +#define FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
> +
> +
> +#ifdef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
> +extern long do_clock_nanosleep(struct pt_regs *regs,
> +                       clockid_t which_clock,
> +                       int flags,
> +                       const struct timespec *rqtp,
> +                       struct timespec *rmtp);
> +
> +#define NANOSLEEP_ENTRY(a) \
> +  asmlinkage long sys_nanosleep( struct timespec* rqtp, \
> +                                 struct timespec * rmtp) \
> +{       struct pt_regs *regs = (struct pt_regs *)&rqtp; \
> +        return do_clock_nanosleep(regs, CLOCK_REALTIME, 0, rqtp, rmtp); \
> +}
> +
> +#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
> +                               clockid_t which_clock,      \
> +                               int flags,                  \
> +                               const struct timespec *rqtp, \
> +                               struct timespec *rmtp)       \
> +{       struct pt_regs *regs = (struct pt_regs *)&which_clock; \
> +        return do_clock_nanosleep(regs, which_clock, flags, rqtp, rmtp); \
> +} \
> +long do_clock_nanosleep(struct pt_regs *regs, \
> +                    clockid_t which_clock,      \
> +                    int flags,                  \
> +                    const struct timespec *rqtp, \
> +                    struct timespec *rmtp)       \
> +{        a
> +
> +#else
> +#define NANOSLEEP_ENTRY(a) \
> +      asmlinkage long sys_nanosleep( struct timespec* rqtp, \
> +                                     struct timespec * rmtp) \
> +{       struct pt_regs *regs = (struct pt_regs *)&rqtp; \
> +        a
> +#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
> +                               clockid_t which_clock,      \
> +                               int flags,                  \
> +                               const struct timespec *rqtp, \
> +                               struct timespec *rmtp)       \
> +{       struct pt_regs *regs = (struct pt_regs *)&which_clock; \
> +        a
> +#endif
> +#define _do_signal() (regs->eax = -EINTR, do_signal(regs, NULL))
> 
>  #endif /* __KERNEL__ */
> 
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/asm-i386/unistd.h linux.mytimers/include/asm-i386/unistd.h
> --- linux.orig/include/asm-i386/unistd.h        Wed Oct 23 00:54:21 2002
> +++ linux.mytimers/include/asm-i386/unistd.h    Wed Oct 23 01:17:51 2002
> @@ -258,6 +258,15 @@
>  #define __NR_free_hugepages    251
>  #define __NR_exit_group                252
>  #define __NR_lookup_dcookie    253
> +#define __NR_timer_create      254
> +#define __NR_timer_settime     (__NR_timer_create+1)
> +#define __NR_timer_gettime     (__NR_timer_create+2)
> +#define __NR_timer_getoverrun  (__NR_timer_create+3)
> +#define __NR_timer_delete      (__NR_timer_create+4)
> +#define __NR_clock_settime     (__NR_timer_create+5)
> +#define __NR_clock_gettime     (__NR_timer_create+6)
> +#define __NR_clock_getres      (__NR_timer_create+7)
> +#define __NR_clock_nanosleep   (__NR_timer_create+8)
> 
> 
>  /* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/id2ptr.h linux.mytimers/include/linux/id2ptr.h
> --- linux.orig/include/linux/id2ptr.h   Wed Dec 31 19:00:00 1969
> +++ linux.mytimers/include/linux/id2ptr.h       Wed Oct 23 01:25:23 2002
> @@ -0,0 +1,47 @@
> +/*
> + * include/linux/id2ptr.h
> + *
> + * 2002-10-18  written by Jim Houston jim.houston@ccur.com
> + *     Copyright (C) 2002 by Concurrent Computer Corporation
> + *     Distributed under the GNU GPL license version 2.
> + *
> + * Small id to pointer translation service avoiding fixed sized
> + * tables.
> + */
> +
> +#define ID_BITS 5
> +#define ID_MASK ((1 << ID_BITS)-1)
> +#define ID_FULL ((1 << (1 << ID_BITS))-1)
> +
> +/* Number of id_layer structs to leave in free list */
> +#define ID_FREE_MAX 6
> +
> +struct id_layer {
> +       unsigned int    bitmap;
> +       struct id_layer *ary[1<<ID_BITS];
> +};
> +
> +struct id {
> +       int             layers;
> +       int             last;
> +       int             count;
> +       int             min_wrap;
> +       struct id_layer *top;
> +};
> +
> +void *id2ptr_lookup(struct id *idp, int id);
> +int id2ptr_new(struct id *idp, void *ptr);
> +void id2ptr_remove(struct id *idp, int id);
> +void id2ptr_init(struct id *idp, int min_wrap);
> +
> +
> +static inline void update_bitmap(struct id_layer *p, int bit)
> +{
> +       if (p->ary[bit] && p->ary[bit]->bitmap == 0xffffffff)
> +               p->bitmap |= 1<<bit;
> +       else
> +               p->bitmap &= ~(1<<bit);
> +}
> +
> +extern kmem_cache_t *id_layer_cache;
> +
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/init_task.h linux.mytimers/include/linux/init_task.h
> --- linux.orig/include/linux/init_task.h        Wed Oct 23 00:54:03 2002
> +++ linux.mytimers/include/linux/init_task.h    Wed Oct 23 01:17:51 2002
> @@ -93,6 +93,7 @@
>         .sig            = &init_signals,                                \
>         .pending        = { NULL, &tsk.pending.head, {{0}}},            \
>         .blocked        = {{0}},                                        \
> +        .posix_timers   = LIST_HEAD_INIT(tsk.posix_timers),               \
>         .alloc_lock     = SPIN_LOCK_UNLOCKED,                           \
>         .switch_lock    = SPIN_LOCK_UNLOCKED,                           \
>         .journal_info   = NULL,                                         \
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/posix-timers.h linux.mytimers/include/linux/posix-timers.h
> --- linux.orig/include/linux/posix-timers.h     Wed Dec 31 19:00:00 1969
> +++ linux.mytimers/include/linux/posix-timers.h Wed Oct 23 01:25:02 2002
> @@ -0,0 +1,81 @@
> +/*
> + * include/linux/posix-timers.h
> + *
> + * 2002-10-22  written by Jim Houston jim.houston@ccur.com
> + *     Copyright (C) 2002 by Concurrent Computer Corporation
> + *     Distributed under the GNU GPL license version 2.
> + *
> + */
> +
> +#ifndef _linux_POSIX_TIMERS_H
> +#define _linux_POSIX_TIMERS_H
> +
> +/* This should be in posix-timers.h - but this is easier now. */
> +
> +enum timer_type {
> +       TIMER,
> +       NANOSLEEP
> +};
> +
> +struct k_itimer {
> +       struct list_head        it_pq_list;     /* fields for timer priority queue. */
> +       struct rb_node          it_pq_node;
> +       struct timer_pq         *it_pq;         /* pointer to the queue. */
> +
> +       struct list_head it_task_list;  /* list for exit_itimers */
> +       spinlock_t it_lock;
> +       clockid_t it_clock;             /* which timer type */
> +       timer_t it_id;                  /* timer id */
> +       int it_overrun;                 /* overrun on pending signal  */
> +       int it_overrun_last;             /* overrun on last delivered signal */
> +       int it_overrun_deferred;         /* overrun on pending timer interrupt */
> +       int it_sigev_notify;             /* notify word of sigevent struct */
> +       int it_sigev_signo;              /* signo word of sigevent struct */
> +       sigval_t it_sigev_value;         /* value word of sigevent struct */
> +       struct task_struct *it_process; /* process to send signal to */
> +       struct itimerspec it_v;         /* expiry time & interval */
> +       enum timer_type it_type;
> +};
> +
> +/*
> + * The priority queue is a sorted doubly linked list ordered by
> + * expiry time.  A rbtree is used as an index in to this list
> + * so that inserts are O(log2(n)).
> + */
> +
> +struct timer_pq {
> +       struct list_head        head;
> +       struct rb_root          rb_root;
> +};
> +
> +#define TIMER_PQ_INIT(name)    { \
> +       .rb_root = RB_ROOT, \
> +       .head = LIST_HEAD_INIT(name.head), \
> +}
> +
> +
> +#if 0
> +#include <linux/posix-timers.h>
> +#endif
> +
> +struct k_clock {
> +       struct timer_pq pq;
> +       int  res;                       /* in nano seconds */
> +       int ( *clock_set)(struct timespec *tp);
> +       int ( *clock_get)(struct timespec *tp);
> +       int ( *nsleep)(   int flags,
> +                          struct timespec*new_setting,
> +                          struct itimerspec *old_setting);
> +       int ( *timer_set)(struct k_itimer *timr, int flags,
> +                          struct itimerspec *new_setting,
> +                          struct itimerspec *old_setting);
> +       int  ( *timer_del)(struct k_itimer *timr);
> +       void ( *timer_get)(struct k_itimer *timr,
> +                          struct itimerspec *cur_setting);
> +};
> +
> +int do_posix_clock_monotonic_gettime(struct timespec *tp);
> +int do_posix_clock_monotonic_settime(struct timespec *tp);
> +asmlinkage int sys_timer_delete(timer_t timer_id);
> +
> +#endif
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/sched.h linux.mytimers/include/linux/sched.h
> --- linux.orig/include/linux/sched.h    Wed Oct 23 00:54:28 2002
> +++ linux.mytimers/include/linux/sched.h        Wed Oct 23 01:31:41 2002
> @@ -29,6 +29,7 @@
>  #include <linux/compiler.h>
>  #include <linux/completion.h>
>  #include <linux/pid.h>
> +#include <linux/posix-timers.h>
> 
>  struct exec_domain;
> 
> @@ -333,6 +334,8 @@
>         unsigned long it_real_value, it_prof_value, it_virt_value;
>         unsigned long it_real_incr, it_prof_incr, it_virt_incr;
>         struct timer_list real_timer;
> +       struct list_head posix_timers; /* POSIX.1b Interval Timers */
> +       struct k_itimer nanosleep_tmr;
>         unsigned long utime, stime, cutime, cstime;
>         unsigned long start_time;
>         long per_cpu_utime[NR_CPUS], per_cpu_stime[NR_CPUS];
> @@ -637,6 +640,7 @@
> 
>  extern void exit_mm(struct task_struct *);
>  extern void exit_files(struct task_struct *);
> +extern void exit_itimers(struct task_struct *, int);
>  extern void exit_sighand(struct task_struct *);
>  extern void __exit_sighand(struct task_struct *);
> 
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/signal.h linux.mytimers/include/linux/signal.h
> --- linux.orig/include/linux/signal.h   Wed Oct 23 00:53:01 2002
> +++ linux.mytimers/include/linux/signal.h       Wed Oct 23 01:17:51 2002
> @@ -224,6 +224,36 @@
>  struct pt_regs;
>  extern int get_signal_to_deliver(siginfo_t *info, struct pt_regs *regs);
>  #endif
> +/*
> + * We would like the asm/signal.h code to define these so that the using
> + * function can call do_signal().  In loo of that, we define a genaric
> + * version that pretends that do_signal() was called and delivered a signal.
> + * To see how this is used, see nano_sleep() in timer.c and the i386 version
> + * in asm_i386/signal.h.
> + */
> +#ifndef PT_REGS_ENTRY
> +#define PT_REGS_ENTRY(type,name,p1_type,p1, p2_type,p2) \
> +type name(p1_type p1,p2_type p2)\
> +{
> +#endif
> +#ifndef _do_signal
> +#define _do_signal() 1
> +#endif
> +#ifndef NANOSLEEP_ENTRY
> +#define NANOSLEEP_ENTRY(a) asmlinkage long sys_nanosleep( struct timespec* rqtp, \
> +                                                         struct timespec * rmtp) \
> +{ a
> +#endif
> +#ifndef CLOCK_NANOSLEEP_ENTRY
> +#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
> +                              clockid_t which_clock,      \
> +                              int flags,                  \
> +                              const struct timespec *rqtp, \
> +                              struct timespec *rmtp)       \
> +{ a
> +
> +#endif
> +
> 
>  #endif /* __KERNEL__ */
> 
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/sys.h linux.mytimers/include/linux/sys.h
> --- linux.orig/include/linux/sys.h      Sun Dec 10 23:56:37 1995
> +++ linux.mytimers/include/linux/sys.h  Wed Oct 23 01:17:51 2002
> @@ -4,7 +4,7 @@
>  /*
>   * system call entry points ... but not all are defined
>   */
> -#define NR_syscalls 256
> +#define NR_syscalls 275
> 
>  /*
>   * These are system calls that will be removed at some time
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/time.h linux.mytimers/include/linux/time.h
> --- linux.orig/include/linux/time.h     Wed Oct 23 00:53:34 2002
> +++ linux.mytimers/include/linux/time.h Wed Oct 23 01:17:51 2002
> @@ -38,6 +38,19 @@
>   */
>  #define MAX_JIFFY_OFFSET ((~0UL >> 1)-1)
> 
> +/* Parameters used to convert the timespec values */
> +#ifndef USEC_PER_SEC
> +#define USEC_PER_SEC (1000000L)
> +#endif
> +
> +#ifndef NSEC_PER_SEC
> +#define NSEC_PER_SEC (1000000000L)
> +#endif
> +
> +#ifndef NSEC_PER_USEC
> +#define NSEC_PER_USEC (1000L)
> +#endif
> +
>  static __inline__ unsigned long
>  timespec_to_jiffies(struct timespec *value)
>  {
> @@ -124,6 +137,8 @@
>  #ifdef __KERNEL__
>  extern void do_gettimeofday(struct timeval *tv);
>  extern void do_settimeofday(struct timeval *tv);
> +extern int do_sys_settimeofday(struct timeval *tv, struct timezone *tz);
> +extern void clock_was_set(void); // call when ever the clock is set
>  #endif
> 
>  #define FD_SETSIZE             __FD_SETSIZE
> @@ -149,5 +164,25 @@
>         struct  timeval it_interval;    /* timer interval */
>         struct  timeval it_value;       /* current value */
>  };
> +
> +
> +/*
> + * The IDs of the various system clocks (for POSIX.1b interval timers).
> + */
> +#define CLOCK_REALTIME           0
> +#define CLOCK_MONOTONIC          1
> +#define CLOCK_PROCESS_CPUTIME_ID 2
> +#define CLOCK_THREAD_CPUTIME_ID         3
> +#define CLOCK_REALTIME_HR       4
> +#define CLOCK_MONOTONIC_HR       5
> +
> +#define MAX_CLOCKS 6
> +
> +/*
> + * The various flags for setting POSIX.1b interval timers.
> + */
> +
> +#define TIMER_ABSTIME 0x01
> +
> 
>  #endif
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/include/linux/types.h linux.mytimers/include/linux/types.h
> --- linux.orig/include/linux/types.h    Wed Oct 23 00:54:17 2002
> +++ linux.mytimers/include/linux/types.h        Wed Oct 23 01:17:51 2002
> @@ -23,6 +23,8 @@
>  typedef __kernel_daddr_t       daddr_t;
>  typedef __kernel_key_t         key_t;
>  typedef __kernel_suseconds_t   suseconds_t;
> +typedef __kernel_timer_t       timer_t;
> +typedef __kernel_clockid_t     clockid_t;
> 
>  #ifdef __KERNEL__
>  typedef __kernel_uid32_t       uid_t;
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/init/Config.help linux.mytimers/init/Config.help
> --- linux.orig/init/Config.help Wed Oct 23 00:50:42 2002
> +++ linux.mytimers/init/Config.help     Wed Oct 23 01:17:51 2002
> @@ -115,3 +115,11 @@
>    replacement for kerneld.) Say Y here and read about configuring it
>    in <file:Documentation/kmod.txt>.
> 
> +Maximum number of POSIX timers
> +CONFIG_MAX_POSIX_TIMERS
> +  This option allows you to configure the system wide maximum number of
> +  POSIX timers.  Timers are allocated as needed so the only memory
> +  overhead this adds is about 4 bytes for every 50 or so timers to keep
> +  track of each block of timers.  The system quietly rounds this number
> +  up to fill out a timer allocation block.  It is ok to have several
> +  thousand timers as needed by your applications.
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/init/Config.in linux.mytimers/init/Config.in
> --- linux.orig/init/Config.in   Wed Oct 23 00:50:45 2002
> +++ linux.mytimers/init/Config.in       Wed Oct 23 01:17:51 2002
> @@ -9,6 +9,7 @@
>  bool 'System V IPC' CONFIG_SYSVIPC
>  bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
>  bool 'Sysctl support' CONFIG_SYSCTL
> +int 'System wide maximum number of POSIX timers' CONFIG_MAX_POSIX_TIMERS 3000
>  endmenu
> 
>  mainmenu_option next_comment
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/Makefile linux.mytimers/kernel/Makefile
> --- linux.orig/kernel/Makefile  Wed Oct 23 00:54:21 2002
> +++ linux.mytimers/kernel/Makefile      Wed Oct 23 01:24:01 2002
> @@ -10,7 +10,7 @@
>             module.o exit.o itimer.o time.o softirq.o resource.o \
>             sysctl.o capability.o ptrace.o timer.o user.o \
>             signal.o sys.o kmod.o workqueue.o futex.o platform.o pid.o \
> -           rcupdate.o
> +           rcupdate.o posix-timers.o id2ptr.o
> 
>  obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
>  obj-$(CONFIG_SMP) += cpu.o
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/exit.c linux.mytimers/kernel/exit.c
> --- linux.orig/kernel/exit.c    Wed Oct 23 00:54:21 2002
> +++ linux.mytimers/kernel/exit.c        Wed Oct 23 01:22:00 2002
> @@ -647,6 +647,7 @@
>         __exit_files(tsk);
>         __exit_fs(tsk);
>         exit_namespace(tsk);
> +       exit_itimers(tsk, 1);
>         exit_thread();
> 
>         if (current->leader)
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/fork.c linux.mytimers/kernel/fork.c
> --- linux.orig/kernel/fork.c    Wed Oct 23 00:54:17 2002
> +++ linux.mytimers/kernel/fork.c        Wed Oct 23 01:17:51 2002
> @@ -783,6 +783,7 @@
>                 goto bad_fork_cleanup_files;
>         if (copy_sighand(clone_flags, p))
>                 goto bad_fork_cleanup_fs;
> +       INIT_LIST_HEAD(&p->posix_timers);
>         if (copy_mm(clone_flags, p))
>                 goto bad_fork_cleanup_sighand;
>         if (copy_namespace(clone_flags, p))
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/id2ptr.c linux.mytimers/kernel/id2ptr.c
> --- linux.orig/kernel/id2ptr.c  Wed Dec 31 19:00:00 1969
> +++ linux.mytimers/kernel/id2ptr.c      Wed Oct 23 01:23:24 2002
> @@ -0,0 +1,223 @@
> +/*
> + * linux/kernel/id2ptr.c
> + *
> + * 2002-10-18  written by Jim Houston jim.houston@ccur.com
> + *     Copyright (C) 2002 by Concurrent Computer Corporation
> + *     Distributed under the GNU GPL license version 2.
> + *
> + * Small id to pointer translation service.
> + *
> + * It uses a radix tree like structure as a sparse array indexed
> + * by the id to obtain the pointer.  A bit map is included in each
> + * level of the tree which identifies portions of the tree which
> + * are completely full.  This makes the process of allocating a
> + * new id quick.
> + */
> +
> +
> +#include <linux/slab.h>
> +#include <linux/id2ptr.h>
> +#include <linux/init.h>
> +#include <linux/string.h>
> +
> +static kmem_cache_t *id_layer_cache;
> +spinlock_t id_lock = SPIN_LOCK_UNLOCKED;
> +
> +/*
> + * Since we can't allocate memory with spinlock held and dropping the
> + * lock to allocate gets ugly keep a free list which will satisfy the
> + * worst case allocation.
> + */
> +
> +struct id_layer *id_free;
> +int id_free_cnt;
> +
> +static inline struct id_layer *alloc_layer(void)
> +{
> +       struct id_layer *p;
> +
> +       if (!(p = id_free))
> +               BUG();
> +       id_free = p->ary[0];
> +       id_free_cnt--;
> +       p->ary[0] = 0;
> +       return(p);
> +}
> +
> +static inline void free_layer(struct id_layer *p)
> +{
> +       p->ary[0] = id_free;
> +       id_free = p;
> +       id_free_cnt++;
> +}
> +
> +/*
> + * Lookup the kernel pointer associated with a user supplied
> + * id value.
> + */
> +void *id2ptr_lookup(struct id *idp, int id)
> +{
> +       int n;
> +       struct id_layer *p;
> +
> +       if (id <= 0)
> +               return(NULL);
> +       id--;
> +       spin_lock_irq(&id_lock);
> +       n = idp->layers * ID_BITS;
> +       p = idp->top;
> +       if (id >= (1 << n)) {
> +               spin_unlock_irq(&id_lock);
> +               return(NULL);
> +       }
> +
> +       while (n > 0 && p) {
> +               n -= ID_BITS;
> +               p = p->ary[(id >> n) & ID_MASK];
> +       }
> +       spin_unlock_irq(&id_lock);
> +       return((void *)p);
> +}
> +
> +static int sub_alloc(struct id_layer *p, int shift, int id, void *ptr)
> +{
> +       int n = (id >> shift) & ID_MASK;
> +       int bitmap = p->bitmap;
> +       int id_base = id & ~((1 << (shift+ID_BITS))-1);
> +       int v;
> +
> +       for ( ; n <= ID_MASK; n++, id = id_base + (n << shift)) {
> +               if (bitmap & (1 << n))
> +                       continue;
> +               if (shift == 0) {
> +                       p->ary[n] = (struct id_layer *)ptr;
> +                       p->bitmap |= 1<<n;
> +                       return(id);
> +               }
> +               if (!p->ary[n])
> +                       p->ary[n] = alloc_layer();
> +               if ((v = sub_alloc(p->ary[n], shift-ID_BITS, id, ptr))) {
> +                       update_bitmap(p, n);
> +                       return(v);
> +               }
> +       }
> +       return(0);
> +}
> +
> +/*
> + * Allocate a new id associate the value ptr with this new id.
> + */
> +int id2ptr_new(struct id *idp, void *ptr)
> +{
> +       int n, last, id, v;
> +       struct id_layer *new;
> +
> +       spin_lock_irq(&id_lock);
> +       n = idp->layers * ID_BITS;
> +       last = idp->last;
> +       while (id_free_cnt < n+1) {
> +               spin_unlock_irq(&id_lock);
> +               new = kmem_cache_alloc(id_layer_cache, GFP_KERNEL);
> +               memset(new, 0, sizeof(struct id_layer));
> +               spin_lock_irq(&id_lock);
> +               free_layer(new);
> +       }
> +       /*
> +        * Add a new layer if the array is full or the last id
> +        * was at the limit and we don't want to wrap.
> +        */
> +       if ((last == ((1 << n)-1) && last < idp->min_wrap) ||
> +               idp->count == (1 << n)) {
> +               ++idp->layers;
> +               n += ID_BITS;
> +               new = alloc_layer();
> +               new->ary[0] = idp->top;
> +               idp->top = new;
> +               update_bitmap(new, 0);
> +       }
> +       if (last >= ((1 << n)-1))
> +               last = 0;
> +
> +       /*
> +        * Search for a free id starting after last id allocated.
> +        * If that fails wrap back to start.
> +        */
> +       id = last+1;
> +       if (!(v = sub_alloc(idp->top, n-ID_BITS, id, ptr)))
> +               v = sub_alloc(idp->top, n-ID_BITS, 1, ptr);
> +       idp->last = v;
> +       idp->count++;
> +       spin_unlock_irq(&id_lock);
> +       return(v+1);
> +}
> +
> +
> +static int sub_remove(struct id_layer *p, int shift, int id)
> +{
> +       int n = (id >> shift) & ID_MASK;
> +       int i, bitmap, rv;
> +
> +       rv = 0;
> +       bitmap = p->bitmap & ~(1<<n);
> +       p->bitmap = bitmap;
> +       if (shift == 0) {
> +               p->ary[n] = NULL;
> +               rv = !bitmap;
> +       } else {
> +               if (sub_remove(p->ary[n], shift-ID_BITS, id)) {
> +                       free_layer(p->ary[n]);
> +                       p->ary[n] = 0;
> +                       for (i = 0; i < (1 << ID_BITS); i++)
> +                               if (p->ary[i])
> +                                       break;
> +                       if (i == (1 << ID_BITS))
> +                               rv = 1;
> +               }
> +       }
> +       return(rv);
> +}
> +
> +/*
> + * Remove (free) an id value and break the association with
> + * the kernel pointer.
> + */
> +void id2ptr_remove(struct id *idp, int id)
> +{
> +       struct id_layer *p;
> +
> +       if (id <= 0)
> +               return;
> +       id--;
> +       spin_lock_irq(&id_lock);
> +       sub_remove(idp->top, (idp->layers-1)*ID_BITS, id);
> +       idp->count--;
> +       if (id_free_cnt >= ID_FREE_MAX) {
> +
> +               p = alloc_layer();
> +               spin_unlock_irq(&id_lock);
> +               kmem_cache_free(id_layer_cache, p);
> +               return;
> +       }
> +       spin_unlock_irq(&id_lock);
> +}
> +
> +void init_id_cache(void)
> +{
> +       if (!id_layer_cache)
> +               id_layer_cache = kmem_cache_create("id_layer_cache",
> +                       sizeof(struct id_layer), 0, 0, 0, 0);
> +}
> +
> +void id2ptr_init(struct id *idp, int min_wrap)
> +{
> +       init_id_cache();
> +       idp->count = 1;
> +       idp->last = 0;
> +       idp->layers = 1;
> +       idp->top = kmem_cache_alloc(id_layer_cache, GFP_KERNEL);
> +       memset(idp->top, 0, sizeof(struct id_layer));
> +       idp->top->bitmap = 0;
> +       idp->min_wrap = min_wrap;
> +}
> +
> +__initcall(init_id_cache);
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/posix-timers.c linux.mytimers/kernel/posix-timers.c
> --- linux.orig/kernel/posix-timers.c    Wed Dec 31 19:00:00 1969
> +++ linux.mytimers/kernel/posix-timers.c        Wed Oct 23 01:56:45 2002
> @@ -0,0 +1,1109 @@
> +/*
> + * linux/kernel/posix_timers.c
> + *
> + *
> + * 2002-10-15  Posix Clocks & timers by George Anzinger
> + *                          Copyright (C) 2002 by MontaVista Software.
> + *
> + * 2002-10-18  changes by Jim Houston jim.houston@attbi.com
> + *     Copyright (C) 2002 by Concurrent Computer Corp.
> + *
> + *          -  Add a separate queue for posix timers.  Its a
> + *             priority queue implemented as a sorted doubly
> + *             linked list & a rbtree as an index into the list.
> + *          -  Use a slab cache to allocate the timer structures.
> + *          -  Allocate timer ids using my new id allocator.
> + *             This avoids the immediate reuse of timer ids.
> + *          -  Uses seconds and nano-seconds rather than
> + *             jiffies and sub_jiffies.
> + *
> + *     This is an experimental change.  I'm sending it out to
> + *     the mailing list in the hope that it will stimulate
> + *     discussion.
> + */
> +
> +/* These are all the functions necessary to implement
> + * POSIX clocks & timers
> + */
> +
> +#include <linux/mm.h>
> +#include <linux/smp_lock.h>
> +#include <linux/interrupt.h>
> +#include <linux/slab.h>
> +#include <linux/time.h>
> +
> +#include <asm/uaccess.h>
> +#include <asm/semaphore.h>
> +#include <linux/list.h>
> +#include <linux/init.h>
> +#include <linux/nmi.h>
> +#include <linux/compiler.h>
> +#include <linux/id2ptr.h>
> +#include <linux/rbtree.h>
> +#include <linux/posix-timers.h>
> +
> +
> +#ifndef div_long_long_rem
> +#include <asm/div64.h>
> +
> +#define div_long_long_rem(dividend,divisor,remainder) ({ \
> +                      u64 result = dividend;           \
> +                      *remainder = do_div(result,divisor); \
> +                      result; })
> +
> +#endif  /* ifndef div_long_long_rem */
> +
> +
> +/*
> + * Lets keep our timers in a slab cache :-)
> + */
> +static kmem_cache_t *posix_timers_cache;
> +struct id posix_timers_id;
> +
> +/*
> + * This lock portects the timer queues it is held for the
> + * duration of the timer expiry process.
> + */
> +spinlock_t posix_timers_lock = SPIN_LOCK_UNLOCKED;
> +
> +/*
> + * Kluge until I can wire into the timer interrupt.
> + */
> +int poll_timer_running;
> +void run_posix_timers(unsigned long dummy);
> +static struct timer_list poll_posix_timers = {
> +       .function = &run_posix_timers,
> +};
> +
> +struct k_clock clock_realtime = {
> +       .pq = TIMER_PQ_INIT(clock_realtime.pq),
> +       .res = NSEC_PER_SEC/HZ,
> +};
> +
> +struct k_clock clock_monotonic = {
> +       .pq = TIMER_PQ_INIT(clock_monotonic.pq),
> +       .res= NSEC_PER_SEC/HZ,
> +       .clock_get = do_posix_clock_monotonic_gettime,
> +       .clock_set = do_posix_clock_monotonic_settime
> +};
> +
> +/*
> + * Insert a timer into a priority queue.  This is a sorted
> + * list of timers.  A rbtree is used to index the list.
> + */
> +
> +static int timer_insert_nolock(struct timer_pq *pq, struct k_itimer *t)
> +{
> +       struct rb_node ** p = &pq->rb_root.rb_node;
> +       struct rb_node * parent = NULL;
> +       struct k_itimer *cur;
> +       struct list_head *prev;
> +       prev = &pq->head;
> +
> +       if (t->it_pq)
> +               BUG();
> +       t->it_pq = pq;
> +       while (*p) {
> +               parent = *p;
> +               cur = rb_entry(parent, struct k_itimer , it_pq_node);
> +
> +               /*
> +                * We allow non unique entries.  This works
> +                * but there might be opportunity to do something
> +                * clever.
> +                */
> +               if (t->it_v.it_value.tv_sec < cur->it_v.it_value.tv_sec  ||
> +                       (t->it_v.it_value.tv_sec == cur->it_v.it_value.tv_sec &&
> +                        t->it_v.it_value.tv_nsec < cur->it_v.it_value.tv_nsec))
> +                       p = &(*p)->rb_left;
> +               else {
> +                       prev = &cur->it_pq_list;
> +                       p = &(*p)->rb_right;
> +               }
> +       }
> +       /* link into rbtree. */
> +       rb_link_node(&t->it_pq_node, parent, p);
> +       rb_insert_color(&t->it_pq_node, &pq->rb_root);
> +       /* link it into the list */
> +       list_add(&t->it_pq_list, prev);
> +       /*
> +        * We need to setup a timer interrupt if the new timer is
> +        * at the head of the queue.
> +        */
> +       return(pq->head.next == &t->it_pq_list);
> +}
> +
> +static inline void timer_remove_nolock(struct k_itimer *t)
> +{
> +       struct timer_pq *pq;
> +
> +       if (!(pq = t->it_pq))
> +               return;
> +       rb_erase(&t->it_pq_node, &pq->rb_root);
> +       list_del(&t->it_pq_list);
> +       t->it_pq = 0;
> +}
> +
> +static void timer_remove(struct k_itimer *t)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&posix_timers_lock, flags);
> +       timer_remove_nolock(t);
> +       spin_unlock_irqrestore(&posix_timers_lock, flags);
> +}
> +
> +
> +static int timer_insert(struct timer_pq *pq, struct k_itimer *t)
> +{
> +       unsigned long flags;
> +       int rv;
> +
> +       spin_lock_irqsave(&posix_timers_lock, flags);
> +       rv = timer_insert_nolock(pq, t);
> +       spin_unlock_irqrestore(&posix_timers_lock, flags);
> +       if (!poll_timer_running) {
> +               poll_timer_running = 1;
> +               poll_posix_timers.expires = jiffies + 1;
> +               add_timer(&poll_posix_timers);
> +       }
> +       return(rv);
> +}
> +
> +/*
> + * If we are late delivering a periodic timer we may
> + * have missed several expiries.  We want to calculate the
> + * number we have missed both as the overrun count but also
> + * so that we can pick next expiry.
> + *
> + * You really need this if you schedule a high frequency timer
> + * and then make a big change to the current time.
> + */
> +
> +int handle_overrun(struct k_itimer *t, struct timespec dt)
> +{
> +       int ovr;
> +#if 0
> +       long long ldt, in;
> +       long sec, nsec;
> +
> +       in =  (long long)t->it_v.it_interval.tv_sec*1000000000 +
> +               t->it_v.it_interval.tv_nsec;
> +       ldt = (long long)dt.tv_sec * 1000000000 + dt.tv_nsec;
> +       ovr = ldt/in + 1;
> +       ldt = (long long)t->it_v.it_interval.tv_nsec * ovr;
> +       nsec = ldt % 1000000000;
> +       sec = ldt / 1000000000;
> +       sec += ovr * t->it_v.it_interval.tv_sec;
> +       nsec += t->it_v.it_value.tv_nsec;
> +       sec +=  t->it_v.it_value.tv_sec;
> +       if (nsec > 1000000000) {
> +               sec++;
> +               nsec -= 1000000000;
> +       }
> +       t->it_v.it_value.tv_sec = sec;
> +       t->it_v.it_value.tv_nsec = nsec;
> +#else
> +       /* Temporary hack */
> +       ovr = 0;
> +       while (dt.tv_sec > t->it_v.it_interval.tv_sec ||
> +               (dt.tv_sec == t->it_v.it_interval.tv_sec &&
> +               dt.tv_nsec > t->it_v.it_interval.tv_nsec)) {
> +               dt.tv_sec -= t->it_v.it_interval.tv_sec;
> +               dt.tv_nsec -= t->it_v.it_interval.tv_nsec;
> +               if (dt.tv_nsec < 0) {
> +                        dt.tv_sec--;
> +                        dt.tv_nsec += 1000000000;
> +               }
> +               t->it_v.it_value.tv_sec += t->it_v.it_interval.tv_sec;
> +               t->it_v.it_value.tv_nsec += t->it_v.it_interval.tv_nsec;
> +               if (t->it_v.it_value.tv_nsec >= 1000000000) {
> +                       t->it_v.it_value.tv_sec++;
> +                       t->it_v.it_value.tv_nsec -= 1000000000;
> +               }
> +               ovr++;
> +       }
> +#endif
> +       return(ovr);
> +}
> +
> +int sending_signal_failed;
> +
> +/*
> + * Yes I calculate an overrun but don't deliver it.  I need to
> + * play with this code.
> + */
> +static void timer_notify_task(struct k_itimer *timr, int ovr)
> +{
> +       struct siginfo info;
> +       int ret;
> +
> +       if (! (timr->it_sigev_notify & SIGEV_NONE)) {
> +               memset(&info, 0, sizeof(info));
> +               /* Send signal to the process that owns this timer. */
> +               info.si_signo = timr->it_sigev_signo;
> +               info.si_errno = 0;
> +               info.si_code = SI_TIMER;
> +               info.si_tid = timr->it_id;
> +               info.si_value = timr->it_sigev_value;
> +               info.si_overrun = timr->it_overrun_deferred;
> +               ret = send_sig_info(info.si_signo, &info, timr->it_process);
> +               switch (ret) {
> +               case 0:         /* all's well new signal queued */
> +                       timr->it_overrun_last = timr->it_overrun;
> +                       timr->it_overrun = timr->it_overrun_deferred;
> +                       break;
> +               case 1: /* signal from this timer was already in the queue */
> +                       timr->it_overrun += timr->it_overrun_deferred + 1;
> +                       break;
> +               default:
> +                       sending_signal_failed++;
> +                       break;
> +               }
> +       }
> +}
> +
> +void do_expiry(struct k_itimer *t, int ovr)
> +{
> +       switch (t->it_type) {
> +       case TIMER:
> +               timer_notify_task(t, ovr);
> +               return;
> +       case NANOSLEEP:
> +               wake_up_process(t->it_process);
> +               return;
> +       }
> +}
> +
> +/*
> + * Check if the timer at the head of the priority queue has
> + * expired and handle the expiry.  Return time in nsec till
> + * the next expiry.  We only really care about expiries
> + * before the next clock tick so we use a 32 bit int here.
> + */
> +
> +static int check_expiry(struct timer_pq *pq, struct timespec *tv)
> +{
> +       struct k_itimer *t;
> +       struct timespec dt;
> +       int ovr;
> +       long sec, nsec;
> +       unsigned long flags;
> +
> +       ovr = 1;
> +       spin_lock_irqsave(&posix_timers_lock, flags);
> +       while (!list_empty(&pq->head)) {
> +               t = list_entry(pq->head.next, struct k_itimer, it_pq_list);
> +               dt.tv_sec = tv->tv_sec - t->it_v.it_value.tv_sec;
> +               dt.tv_nsec = tv->tv_nsec - t->it_v.it_value.tv_nsec;
> +               if (dt.tv_sec < 0 || (dt.tv_sec == 0 && dt.tv_nsec < 0)) {
> +                       /*
> +                        * It has not expired yet.  Return nano-seconds
> +                        * remaining if its less than a second.
> +                        */
> +                       if (dt.tv_sec < -1)
> +                               nsec = -1;
> +                       else
> +                               nsec = dt.tv_sec ? 1000000000-dt.tv_nsec :
> +                                        -dt.tv_nsec;
> +                       spin_unlock_irqrestore(&posix_timers_lock, flags);
> +                       return(nsec);
> +               }
> +               /*
> +                * Its expired.  If this is a periodic timer we need to
> +                * setup for the next expiry.  We also check for overrun
> +                * here.  If the timer has already missed an expiry we want
> +                * deliver the overrun information and get back on schedule.
> +                */
> +               if (dt.tv_nsec < 0) {
> +                       dt.tv_sec--;
> +                       dt.tv_nsec += 1000000000;
> +               }
> +               timer_remove_nolock(t);
> +               if (t->it_v.it_interval.tv_sec || t->it_v.it_interval.tv_nsec) {
> +                       if (dt.tv_sec > t->it_v.it_interval.tv_sec ||
> +                          (dt.tv_sec == t->it_v.it_interval.tv_sec &&
> +                           dt.tv_nsec > t->it_v.it_interval.tv_nsec)) {
> +                               ovr = handle_overrun(t, dt);
> +                       } else {
> +                               nsec = t->it_v.it_value.tv_nsec +
> +                                       t->it_v.it_interval.tv_nsec;
> +                               sec = t->it_v.it_value.tv_sec +
> +                                       t->it_v.it_interval.tv_sec;
> +                               if (nsec > 1000000000) {
> +                                       nsec -= 1000000000;
> +                                       sec++;
> +                               }
> +                               t->it_v.it_value.tv_sec = sec;
> +                               t->it_v.it_value.tv_nsec = nsec;
> +                       }
> +                       /*
> +                        * It might make sense to leave the timer queue and
> +                        * avoid the remove/insert for timers which stay
> +                        * at the front of the queue.
> +                        */
> +                       timer_insert_nolock(pq, t);
> +               }
> +               do_expiry(t, ovr);
> +       }
> +       spin_unlock_irqrestore(&posix_timers_lock, flags);
> +       return(-1);
> +}
> +
> +/*
> + * kluge?  We should know the offset between clock_realtime and
> + * clock_monotonic so we don't need to get the time twice.
> + */
> +
> +void run_posix_timers(unsigned long dummy)
> +{
> +       struct timespec now;
> +       int ns, ret;
> +
> +       ns = 0x7fffffff;
> +       do_posix_clock_monotonic_gettime(&now);
> +       ret = check_expiry(&clock_monotonic.pq, &now);
> +       if (ret > 0 && ret < ns)
> +               ns = ret;
> +
> +       do_gettimeofday((struct timeval*)&now);
> +       now.tv_nsec *= NSEC_PER_USEC;
> +       ret = check_expiry(&clock_realtime.pq, &now);
> +       if (ret > 0 && ret < ns)
> +               ns = ret;
> +       poll_posix_timers.expires = jiffies + 1;
> +       add_timer(&poll_posix_timers);
> +}
> +
> +
> +extern rwlock_t xtime_lock;
> +
> +/*
> + * CLOCKs: The POSIX standard calls for a couple of clocks and allows us
> + *         to implement others.  This structure defines the various
> + *         clocks and allows the possibility of adding others.  We
> + *         provide an interface to add clocks to the table and expect
> + *         the "arch" code to add at least one clock that is high
> + *         resolution.  Here we define the standard CLOCK_REALTIME as a
> + *         1/HZ resolution clock.
> +
> + * CPUTIME & THREAD_CPUTIME: We are not, at this time, definding these
> + *         two clocks (and the other process related clocks (Std
> + *         1003.1d-1999).  The way these should be supported, we think,
> + *         is to use large negative numbers for the two clocks that are
> + *         pinned to the executing process and to use -pid for clocks
> + *         pinned to particular pids.  Calls which supported these clock
> + *         ids would split early in the function.
> +
> + * RESOLUTION: Clock resolution is used to round up timer and interval
> + *         times, NOT to report clock times, which are reported with as
> + *         much resolution as the system can muster.  In some cases this
> + *         resolution may depend on the underlaying clock hardware and
> + *         may not be quantifiable until run time, and only then is the
> + *         necessary code is written.  The standard says we should say
> + *         something about this issue in the documentation...
> +
> + * FUNCTIONS: The CLOCKs structure defines possible functions to handle
> + *         various clock functions.  For clocks that use the standard
> + *         system timer code these entries should be NULL.  This will
> + *         allow dispatch without the overhead of indirect function
> + *         calls.  CLOCKS that depend on other sources (e.g. WWV or GPS)
> + *         must supply functions here, even if the function just returns
> + *         ENOSYS.  The standard POSIX timer management code assumes the
> + *         following: 1.) The k_itimer struct (sched.h) is used for the
> + *         timer.  2.) The list, it_lock, it_clock, it_id and it_process
> + *         fields are not modified by timer code.
> + *
> + * Permissions: It is assumed that the clock_settime() function defined
> + *         for each clock will take care of permission checks.  Some
> + *         clocks may be set able by any user (i.e. local process
> + *         clocks) others not.  Currently the only set able clock we
> + *         have is CLOCK_REALTIME and its high res counter part, both of
> + *         which we beg off on and pass to do_sys_settimeofday().
> + */
> +
> +struct k_clock *posix_clocks[MAX_CLOCKS];
> +
> +#define if_clock_do(clock_fun, alt_fun,parms)  (! clock_fun)? alt_fun parms :\
> +                                                             clock_fun parms
> +
> +#define p_timer_get( clock,a,b) if_clock_do((clock)->timer_get, \
> +                                            do_timer_gettime,   \
> +                                            (a,b))
> +
> +#define p_nsleep( clock,a,b,c) if_clock_do((clock)->nsleep,   \
> +                                           do_nsleep,         \
> +                                           (a,b,c))
> +
> +#define p_timer_del( clock,a) if_clock_do((clock)->timer_del, \
> +                                          do_timer_delete,    \
> +                                          (a))
> +
> +void register_posix_clock(int clock_id, struct k_clock * new_clock);
> +
> +static int do_posix_gettime(struct k_clock *clock, struct timespec *tp);
> +
> +
> +void register_posix_clock(int clock_id,struct k_clock * new_clock)
> +{
> +       if ((unsigned)clock_id >= MAX_CLOCKS) {
> +               printk("POSIX clock register failed for clock_id %d\n",clock_id);
> +               return;
> +       }
> +       posix_clocks[clock_id] = new_clock;
> +}
> +
> +static  __init int init_posix_timers(void)
> +{
> +       posix_timers_cache = kmem_cache_create("posix_timers_cache",
> +               sizeof(struct k_itimer), 0, 0, 0, 0);
> +       id2ptr_init(&posix_timers_id, 1000);
> +
> +       register_posix_clock(CLOCK_REALTIME,&clock_realtime);
> +       register_posix_clock(CLOCK_MONOTONIC,&clock_monotonic);
> +       return 0;
> +}
> +
> +__initcall(init_posix_timers);
> +
> +/*
> + * For some reason mips/mips64 define the SIGEV constants plus 128.
> + * Here we define a mask to get rid of the common bits.         The
> + * optimizer should make this costless to all but mips.
> + */
> +#if (ARCH == mips) || (ARCH == mips64)
> +#define MIPS_SIGEV ~(SIGEV_NONE & \
> +                     SIGEV_SIGNAL & \
> +                     SIGEV_THREAD &  \
> +                     SIGEV_THREAD_ID)
> +#else
> +#define MIPS_SIGEV (int)-1
> +#endif
> +
> +static struct task_struct * good_sigevent(sigevent_t *event)
> +{
> +       struct task_struct * rtn = current;
> +
> +       if (event->sigev_notify & SIGEV_THREAD_ID & MIPS_SIGEV ) {
> +               if ( !(rtn = find_task_by_pid(event->sigev_notify_thread_id)) ||
> +                    rtn->tgid != current->tgid){
> +                       return NULL;
> +               }
> +       }
> +       if (event->sigev_notify & SIGEV_SIGNAL & MIPS_SIGEV) {
> +               if ((unsigned)(event->sigev_signo > SIGRTMAX))
> +                       return NULL;
> +       }
> +       if (event->sigev_notify & ~(SIGEV_SIGNAL | SIGEV_THREAD_ID )) {
> +               return NULL;
> +       }
> +       return rtn;
> +}
> +
> +
> +
> +static struct k_itimer * alloc_posix_timer(void)
> +{
> +       struct k_itimer *tmr;
> +       tmr = kmem_cache_alloc(posix_timers_cache, GFP_KERNEL);
> +       memset(tmr, 0, sizeof(struct k_itimer));
> +       return(tmr);
> +}
> +
> +static void release_posix_timer(struct k_itimer *tmr)
> +{
> +       if (tmr->it_id > 0)
> +               id2ptr_remove(&posix_timers_id, tmr->it_id);
> +       kmem_cache_free(posix_timers_cache, tmr);
> +}
> +
> +/* Create a POSIX.1b interval timer. */
> +
> +asmlinkage int
> +sys_timer_create(clockid_t which_clock, struct sigevent *timer_event_spec,
> +                               timer_t *created_timer_id)
> +{
> +       int error = 0;
> +       struct k_itimer *new_timer = NULL;
> +       int new_timer_id;
> +       struct task_struct * process = 0;
> +       sigevent_t event;
> +
> +       if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
> +               return -EINVAL;
> +
> +       new_timer = alloc_posix_timer();
> +       if (new_timer == NULL) return -EAGAIN;
> +
> +       new_timer_id = (timer_t)id2ptr_new(&posix_timers_id,
> +               (void *)new_timer);
> +       if (!new_timer_id) {
> +               error = -EAGAIN;
> +               goto out;
> +       }
> +       new_timer->it_id = new_timer_id;
> +
> +       if (copy_to_user(created_timer_id, &new_timer_id,
> +                        sizeof(new_timer_id))) {
> +               error = -EFAULT;
> +               goto out;
> +       }
> +       spin_lock_init(&new_timer->it_lock);
> +       if (timer_event_spec) {
> +               if (copy_from_user(&event, timer_event_spec, sizeof(event))) {
> +                       error = -EFAULT;
> +                       goto out;
> +               }
> +               read_lock(&tasklist_lock);
> +               if ((process = good_sigevent(&event))) {
> +                       /*
> +                        * We may be setting up this timer for another
> +                        * thread.  It may be exitiing.  To catch this
> +                        * case the we clear posix_timers.next in
> +                        * exit_itimers.
> +                        */
> +                       spin_lock(&process->alloc_lock);
> +                       if (process->posix_timers.next) {
> +                               list_add(&new_timer->it_task_list,
> +                                       &process->posix_timers);
> +                               spin_unlock(&process->alloc_lock);
> +                       } else {
> +                               spin_unlock(&process->alloc_lock);
> +                               process = 0;
> +                       }
> +               }
> +               read_unlock(&tasklist_lock);
> +               if (!process) {
> +                       error = -EINVAL;
> +                       goto out;
> +               }
> +               new_timer->it_sigev_notify = event.sigev_notify;
> +               new_timer->it_sigev_signo = event.sigev_signo;
> +               new_timer->it_sigev_value = event.sigev_value;
> +       } else {
> +               new_timer->it_sigev_notify = SIGEV_SIGNAL;
> +               new_timer->it_sigev_signo = SIGALRM;
> +               new_timer->it_sigev_value.sival_int = new_timer->it_id;
> +               process = current;
> +               spin_lock(&current->alloc_lock);
> +               list_add(&new_timer->it_task_list, &current->posix_timers);
> +               spin_unlock(&current->alloc_lock);
> +       }
> +       new_timer->it_clock = which_clock;
> +       new_timer->it_overrun = 0;
> +       new_timer->it_process = process;
> +
> + out:
> +       if (error)
> +               release_posix_timer(new_timer);
> +       return error;
> +}
> +
> +
> +/*
> + * return  timer owned by the process, used by exit and exec
> + */
> +void itimer_delete(struct k_itimer *timer)
> +{
> +       if (sys_timer_delete(timer->it_id)){
> +               BUG();
> +       }
> +}
> +
> +/*
> + * This is call from both exec and exit to shutdown the
> + * timers.
> + */
> +
> +inline void exit_itimers(struct task_struct *tsk, int exit)
> +{
> +       struct  k_itimer *tmr;
> +
> +       if (!tsk->posix_timers.next)
> +               BUG();
> +       if (tsk->nanosleep_tmr.it_pq)
> +               timer_remove(&tsk->nanosleep_tmr);
> +       spin_lock(&tsk->alloc_lock);
> +       while (tsk->posix_timers.next != &tsk->posix_timers){
> +               spin_unlock(&tsk->alloc_lock);
> +                tmr = list_entry(tsk->posix_timers.next,struct k_itimer,
> +                       it_task_list);
> +               itimer_delete(tmr);
> +               spin_lock(&tsk->alloc_lock);
> +       }
> +       /*
> +        * sys_timer_create has the option to create a timer
> +        * for another thread.  There is the risk that as the timer
> +        * is being created that the thread that was supposed to handle
> +        * the signal is exiting.  We use the posix_timers.next field
> +        * as a flag so we can close this race.
> +`       */
> +       if (exit)
> +               tsk->posix_timers.next = 0;
> +       spin_unlock(&tsk->alloc_lock);
> +}
> +
> +/* good_timespec
> + *
> + * This function checks the elements of a timespec structure.
> + *
> + * Arguments:
> + * ts       : Pointer to the timespec structure to check
> + *
> + * Return value:
> + * If a NULL pointer was passed in, or the tv_nsec field was less than 0 or
> + * greater than NSEC_PER_SEC, or the tv_sec field was less than 0, this
> + * function returns 0. Otherwise it returns 1.
> + */
> +
> +static int good_timespec(const struct timespec *ts)
> +{
> +       if ((ts == NULL) ||
> +           (ts->tv_sec < 0) ||
> +           ((unsigned)ts->tv_nsec >= NSEC_PER_SEC))
> +               return 0;
> +       return 1;
> +}
> +
> +static inline void unlock_timer(struct k_itimer *timr)
> +{
> +       spin_unlock_irq(&timr->it_lock);
> +}
> +
> +static struct k_itimer* lock_timer( timer_t timer_id)
> +{
> +       struct  k_itimer *timr;
> +
> +       timr = (struct  k_itimer *)id2ptr_lookup(&posix_timers_id,
> +               (int)timer_id);
> +       if (timr)
> +               spin_lock_irq(&timr->it_lock);
> +       return(timr);
> +}
> +
> +/*
> + * Get the time remaining on a POSIX.1b interval timer.
> + * This function is ALWAYS called with spin_lock_irq on the timer, thus
> + * it must not mess with irq.
> + */
> +void inline do_timer_gettime(struct k_itimer *timr,
> +                            struct itimerspec *cur_setting)
> +{
> +       struct timespec ts;
> +
> +       do_posix_gettime(posix_clocks[timr->it_clock], &ts);
> +       ts.tv_sec = timr->it_v.it_value.tv_sec - ts.tv_sec;
> +       ts.tv_nsec = timr->it_v.it_value.tv_nsec - ts.tv_nsec;
> +       if (ts.tv_nsec < 0) {
> +               ts.tv_nsec += 1000000000;
> +               ts.tv_sec--;
> +       }
> +       if (ts.tv_sec < 0)
> +               ts.tv_sec = ts.tv_nsec = 0;
> +       cur_setting->it_value = ts;
> +       cur_setting->it_interval = timr->it_v.it_interval;
> +}
> +
> +/* Get the time remaining on a POSIX.1b interval timer. */
> +asmlinkage int sys_timer_gettime(timer_t timer_id, struct itimerspec *setting)
> +{
> +       struct k_itimer *timr;
> +       struct itimerspec cur_setting;
> +
> +       timr = lock_timer(timer_id);
> +       if (!timr) return -EINVAL;
> +
> +       p_timer_get(posix_clocks[timr->it_clock],timr, &cur_setting);
> +
> +       unlock_timer(timr);
> +
> +       if (copy_to_user(setting, &cur_setting, sizeof(cur_setting)))
> +               return -EFAULT;
> +
> +       return 0;
> +}
> +/*
> + * Get the number of overruns of a POSIX.1b interval timer
> + * This is a bit messy as we don't easily know where he is in the delivery
> + * of possible multiple signals.  We are to give him the overrun on the
> + * last delivery.  If we have another pending, we want to make sure we
> + * use the last and not the current.  If there is not another pending
> + * then he is current and gets the current overrun.  We search both the
> + * shared and local queue.
> + */
> +
> +asmlinkage int sys_timer_getoverrun(timer_t timer_id)
> +{
> +       struct k_itimer *timr;
> +       int overrun, i;
> +       struct sigqueue *q;
> +       struct sigpending *sig_queue;
> +       struct task_struct * t;
> +
> +       timr = lock_timer( timer_id);
> +       if (!timr) return -EINVAL;
> +
> +       t = timr->it_process;
> +       overrun = timr->it_overrun;
> +       spin_lock_irq(&t->sig->siglock);
> +       for (sig_queue = &t->sig->shared_pending, i = 2; i;
> +            sig_queue = &t->pending, i--){
> +               for (q = sig_queue->head; q; q = q->next) {
> +                       if ((q->info.si_code == SI_TIMER) &&
> +                           (q->info.si_tid == timr->it_id)) {
> +
> +                               overrun = timr->it_overrun_last;
> +                               goto out;
> +                       }
> +               }
> +       }
> + out:
> +       spin_unlock_irq(&t->sig->siglock);
> +
> +       unlock_timer(timr);
> +
> +       return overrun;
> +}
> +
> +/*
> + * If it is relative time, we need to add the current  time to it to
> + * get the proper expiry time.
> + */
> +static int  adjust_rel_time(struct k_clock *clock,struct timespec *tp)
> +{
> +       struct timespec now;
> +
> +
> +       do_posix_gettime(clock,&now);
> +       tp->tv_sec += now.tv_sec;
> +       tp->tv_nsec += now.tv_nsec;
> +
> +       /*
> +        * Normalize...
> +        */
> +       if (( tp->tv_nsec - NSEC_PER_SEC) >= 0){
> +               tp->tv_nsec -= NSEC_PER_SEC;
> +               tp->tv_sec++;
> +       }
> +       return 0;
> +}
> +
> +/* Set a POSIX.1b interval timer. */
> +/* timr->it_lock is taken. */
> +static inline int do_timer_settime(struct k_itimer *timr, int flags,
> +                                  struct itimerspec *new_setting,
> +                                  struct itimerspec *old_setting)
> +{
> +       struct k_clock * clock = posix_clocks[timr->it_clock];
> +
> +       timer_remove(timr);
> +       if (old_setting) {
> +               do_timer_gettime(timr, old_setting);
> +       }
> +
> +
> +       /* switch off the timer when it_value is zero */
> +       if ((new_setting->it_value.tv_sec == 0) &&
> +               (new_setting->it_value.tv_nsec == 0)) {
> +               timr->it_v = *new_setting;
> +               return 0;
> +       }
> +
> +       if (!(flags & TIMER_ABSTIME))
> +               adjust_rel_time(clock, &new_setting->it_value);
> +
> +       timr->it_v = *new_setting;
> +       timr->it_overrun_deferred =
> +               timr->it_overrun_last =
> +               timr->it_overrun = 0;
> +       timer_insert(&clock->pq, timr);
> +       return 0;
> +}
> +
> +
> +
> +/* Set a POSIX.1b interval timer */
> +asmlinkage int sys_timer_settime(timer_t timer_id, int flags,
> +                                const struct itimerspec *new_setting,
> +                                struct itimerspec *old_setting)
> +{
> +       struct k_itimer *timr;
> +       struct itimerspec new_spec, old_spec;
> +       int error = 0;
> +       struct itimerspec *rtn = old_setting ? &old_spec : NULL;
> +
> +
> +       if (new_setting == NULL) {
> +               return -EINVAL;
> +       }
> +
> +       if (copy_from_user(&new_spec, new_setting, sizeof(new_spec))) {
> +               return -EFAULT;
> +       }
> +
> +       if ((!good_timespec(&new_spec.it_interval)) ||
> +           (!good_timespec(&new_spec.it_value))) {
> +               return -EINVAL;
> +       }
> +
> +       timr = lock_timer( timer_id);
> +       if (!timr)
> +               return -EINVAL;
> +
> +       if (! posix_clocks[timr->it_clock]->timer_set) {
> +               error = do_timer_settime(timr, flags, &new_spec, rtn );
> +       }else{
> +               error = posix_clocks[timr->it_clock]->timer_set(timr,
> +                                                              flags,
> +                                                              &new_spec,
> +                                                              rtn );
> +       }
> +       unlock_timer(timr);
> +
> +       if (old_setting && ! error) {
> +               if (copy_to_user(old_setting, &old_spec, sizeof(old_spec))) {
> +                       error = -EFAULT;
> +               }
> +       }
> +
> +       return error;
> +}
> +
> +static inline int do_timer_delete(struct k_itimer  *timer)
> +{
> +       timer_remove(timer);
> +       return 0;
> +}
> +
> +/* Delete a POSIX.1b interval timer. */
> +asmlinkage int sys_timer_delete(timer_t timer_id)
> +{
> +       struct k_itimer *timer;
> +
> +       timer = lock_timer( timer_id);
> +       if (!timer)
> +               return -EINVAL;
> +
> +       p_timer_del(posix_clocks[timer->it_clock],timer);
> +
> +       spin_lock(&timer->it_process->alloc_lock);
> +       list_del(&timer->it_task_list);
> +       spin_unlock(&timer->it_process->alloc_lock);
> +
> +       /*
> +        * This keeps any tasks waiting on the spin lock from thinking
> +        * they got something (see the lock code above).
> +        */
> +       timer->it_process = NULL;
> +       unlock_timer(timer);
> +       release_posix_timer(timer);
> +       return 0;
> +}
> +/*
> + * And now for the "clock" calls
> + * These functions are called both from timer functions (with the timer
> + * spin_lock_irq() held and from clock calls with no locking.  They must
> + * use the save flags versions of locks.
> + */
> +static int do_posix_gettime(struct k_clock *clock, struct timespec *tp)
> +{
> +
> +       if (clock->clock_get){
> +               return clock->clock_get(tp);
> +       }
> +
> +       do_gettimeofday((struct timeval*)tp);
> +       tp->tv_nsec *= NSEC_PER_USEC;
> +       return 0;
> +}
> +
> +/*
> + * We do ticks here to avoid the irq lock ( they take sooo long)
> + * Note also that the while loop assures that the sub_jiff_offset
> + * will be less than a jiffie, thus no need to normalize the result.
> + * Well, not really, if called with ints off :(
> + */
> +
> +int do_posix_clock_monotonic_gettime(struct timespec *tp)
> +{
> +       long sub_sec;
> +       u64 jiffies_64_f;
> +
> +#if (BITS_PER_LONG > 32)
> +
> +       jiffies_64_f = jiffies_64;
> +
> +#elif defined(CONFIG_SMP)
> +
> +       /* Tricks don't work here, must take the lock.   Remember, called
> +        * above from both timer and clock system calls => save flags.
> +        */
> +       {
> +               unsigned long flags;
> +               read_lock_irqsave(&xtime_lock, flags);
> +               jiffies_64_f = jiffies_64;
> +
> +
> +               read_unlock_irqrestore(&xtime_lock, flags);
> +       }
> +#elif ! defined(CONFIG_SMP) && (BITS_PER_LONG < 64)
> +       unsigned long jiffies_f;
> +       do {
> +               jiffies_f = jiffies;
> +               barrier();
> +               jiffies_64_f = jiffies_64;
> +       } while (unlikely(jiffies_f != jiffies));
> +
> +
> +#endif
> +       tp->tv_sec = div_long_long_rem(jiffies_64_f,HZ,&sub_sec);
> +
> +       tp->tv_nsec = sub_sec * (NSEC_PER_SEC / HZ);
> +       return 0;
> +}
> +
> +int do_posix_clock_monotonic_settime(struct timespec *tp)
> +{
> +       return -EINVAL;
> +}
> +
> +asmlinkage int sys_clock_settime(clockid_t which_clock,const struct timespec *tp)
> +{
> +       struct timespec new_tp;
> +
> +       if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
> +               return -EINVAL;
> +       if (copy_from_user(&new_tp, tp, sizeof(*tp)))
> +               return -EFAULT;
> +       if ( posix_clocks[which_clock]->clock_set){
> +               return posix_clocks[which_clock]->clock_set(&new_tp);
> +       }
> +       new_tp.tv_nsec /= NSEC_PER_USEC;
> +       return do_sys_settimeofday((struct timeval*)&new_tp,NULL);
> +}
> +asmlinkage int sys_clock_gettime(clockid_t which_clock, struct timespec *tp)
> +{
> +       struct timespec rtn_tp;
> +       int error = 0;
> +
> +       if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
> +               return -EINVAL;
> +
> +       error = do_posix_gettime(posix_clocks[which_clock],&rtn_tp);
> +
> +       if ( ! error) {
> +               if (copy_to_user(tp, &rtn_tp, sizeof(rtn_tp))) {
> +                       error = -EFAULT;
> +               }
> +       }
> +       return error;
> +
> +}
> +asmlinkage int  sys_clock_getres(clockid_t which_clock, struct timespec *tp)
> +{
> +       struct timespec rtn_tp;
> +
> +       if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
> +               return -EINVAL;
> +
> +       rtn_tp.tv_sec = 0;
> +       rtn_tp.tv_nsec = posix_clocks[which_clock]->res;
> +       if ( tp){
> +               if (copy_to_user(tp, &rtn_tp, sizeof(rtn_tp))) {
> +                       return -EFAULT;
> +               }
> +       }
> +       return 0;
> +
> +}
> +
> +#if 0
> +// This #if 0 is to keep the pretty printer/ formatter happy so the indents will
> +// correct below.
> +
> +// The NANOSLEEP_ENTRY macro is defined in  asm/signal.h and
> +// is structured to allow code as well as entry definitions, so that when
> +// we get control back here the entry parameters will be available as expected.
> +// Some systems may find these paramerts in other ways than as entry parms,
> +// for example, struct pt_regs *regs is defined in i386 as the address of the
> +// first parameter, where as other archs pass it as one of the paramerters.
> +
> +asmlinkage long sys_clock_nanosleep(void)
> +{
> +#endif
> +       CLOCK_NANOSLEEP_ENTRY(  struct timespec ts;
> +                               struct k_itimer *t;
> +                               struct k_clock * clock;
> +                               int active;)
> +
> +               //asmlinkage int  sys_clock_nanosleep(clockid_t which_clock,
> +               //                         int flags,
> +               //                         const struct timespec *rqtp,
> +               //                         struct timespec *rmtp)
> +               //{
> +
> +       if ((unsigned)which_clock >= MAX_CLOCKS || !posix_clocks[which_clock])
> +               return -EINVAL;
> +       /*
> +        * See discussion below about waking up early.
> +        */
> +       clock = posix_clocks[which_clock];
> +       t = &current->nanosleep_tmr;
> +       if (t->it_pq)
> +               timer_remove(t);
> +
> +       if(copy_from_user(&t->it_v.it_value, rqtp, sizeof(struct timespec)))
> +               return -EFAULT;
> +
> +       if ((t->it_v.it_value.tv_nsec < 0) ||
> +               (t->it_v.it_value.tv_nsec >= NSEC_PER_SEC) ||
> +               (t->it_v.it_value.tv_sec < 0))
> +               return -EINVAL;
> +
> +       if (!(flags & TIMER_ABSTIME))
> +               adjust_rel_time(clock, &t->it_v.it_value);
> +       /*
> +        * These fields don't need to be setup each time.  This
> +        * should be in the INIT_TASK() and forgoten.
> +        */
> +       t->it_v.it_interval.tv_sec = 0;
> +       t->it_v.it_interval.tv_nsec = 0;
> +       t->it_type = NANOSLEEP;
> +       t->it_process = current;
> +
> +       current->state = TASK_INTERRUPTIBLE;
> +       timer_insert(&clock->pq, t);
> +       schedule();
> +       /*
> +        * Were not supposed to leave early.  The problem is
> +        * being woken by signals that are not delivered to
> +        * the user.  Typically this means debug related
> +        * signals.
> +        *
> +        * My plan is to leave the timer running and have a
> +        * small hook in do_signal which will complete the
> +        * nanosleep.  For now we just return early in clear
> +        * violation of the Posix spec.
> +        */
> +       active = (t->it_pq != 0);
> +       if (!(flags & TIMER_ABSTIME) && active && rmtp ) {
> +               do_posix_gettime(clock, &ts);
> +               ts.tv_sec = t->it_v.it_value.tv_sec - ts.tv_sec;
> +               ts.tv_nsec = t->it_v.it_value.tv_nsec - ts.tv_nsec;
> +               if (ts.tv_nsec < 0) {
> +                       ts.tv_nsec += 1000000000;
> +                       ts.tv_sec--;
> +               }
> +               if (ts.tv_sec < 0)
> +                       ts.tv_sec = ts.tv_nsec = 0;
> +               if (copy_to_user(rmtp, &ts, sizeof(struct timespec)))
> +                       return -EFAULT;
> +       }
> +       if (active)
> +               return -EINTR;
> +       return 0;
> +}
> +
> +void clock_was_set(void)
> +{
> +}
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/signal.c linux.mytimers/kernel/signal.c
> --- linux.orig/kernel/signal.c  Wed Oct 23 00:54:30 2002
> +++ linux.mytimers/kernel/signal.c      Wed Oct 23 01:17:51 2002
> @@ -424,8 +424,6 @@
>                 if (!collect_signal(sig, pending, info))
>                         sig = 0;
> 
> -               /* XXX: Once POSIX.1b timers are in, if si_code == SI_TIMER,
> -                  we need to xchg out the timer overrun values.  */
>         }
>         recalc_sigpending();
> 
> @@ -692,6 +690,7 @@
>  specific_send_sig_info(int sig, struct siginfo *info, struct task_struct *t, int shared)
>  {
>         int ret;
> +        struct sigpending *sig_queue;
> 
>         if (!irqs_disabled())
>                 BUG();
> @@ -725,20 +724,43 @@
>         if (ignored_signal(sig, t))
>                 goto out;
> 
> +        sig_queue = shared ? &t->sig->shared_pending : &t->pending;
> +
>  #define LEGACY_QUEUE(sigptr, sig) \
>         (((sig) < SIGRTMIN) && sigismember(&(sigptr)->signal, (sig)))
> -
> +        /*
> +         * Support queueing exactly one non-rt signal, so that we
> +         * can get more detailed information about the cause of
> +         * the signal.
> +         */
> +        if (LEGACY_QUEUE(sig_queue, sig))
> +                goto out;
> +        /*
> +         * In case of a POSIX timer generated signal you must check
> +        * if a signal from this timer is already in the queue.
> +        * If that is true, the overrun count will be increased in
> +        * itimer.c:posix_timer_fn().
> +         */
> +
> +       if (((unsigned long)info > 1) && (info->si_code == SI_TIMER)) {
> +               struct sigqueue *q;
> +               for (q = sig_queue->head; q; q = q->next) {
> +                       if ((q->info.si_code == SI_TIMER) &&
> +                           (q->info.si_tid == info->si_tid)) {
> +                                q->info.si_overrun += info->si_overrun + 1;
> +                               /*
> +                                 * this special ret value (1) is recognized
> +                                 * only by posix_timer_fn() in itimer.c
> +                                 */
> +                               ret = 1;
> +                               goto out;
> +                       }
> +               }
> +       }
>         if (!shared) {
> -               /* Support queueing exactly one non-rt signal, so that we
> -                  can get more detailed information about the cause of
> -                  the signal. */
> -               if (LEGACY_QUEUE(&t->pending, sig))
> -                       goto out;
> 
>                 ret = deliver_signal(sig, info, t);
>         } else {
> -               if (LEGACY_QUEUE(&t->sig->shared_pending, sig))
> -                       goto out;
>                 ret = send_signal(sig, info, &t->sig->shared_pending);
>         }
>  out:
> @@ -1418,8 +1440,9 @@
>                 err |= __put_user(from->si_uid, &to->si_uid);
>                 break;
>         case __SI_TIMER:
> -               err |= __put_user(from->si_timer1, &to->si_timer1);
> -               err |= __put_user(from->si_timer2, &to->si_timer2);
> +                err |= __put_user(from->si_tid, &to->si_tid);
> +                err |= __put_user(from->si_overrun, &to->si_overrun);
> +                err |= __put_user(from->si_ptr, &to->si_ptr);
>                 break;
>         case __SI_POLL:
>                 err |= __put_user(from->si_band, &to->si_band);
> diff -X /usr1/jhouston/dontdiff -urN linux.orig/kernel/timer.c linux.mytimers/kernel/timer.c
> --- linux.orig/kernel/timer.c   Wed Oct 23 00:54:21 2002
> +++ linux.mytimers/kernel/timer.c       Wed Oct 23 01:17:51 2002
> @@ -47,12 +47,11 @@
>         struct list_head vec[TVR_SIZE];
>  } tvec_root_t;
> 
> -typedef struct timer_list timer_t;
> 
>  struct tvec_t_base_s {
>         spinlock_t lock;
>         unsigned long timer_jiffies;
> -       timer_t *running_timer;
> +       struct timer_list *running_timer;
>         tvec_root_t tv1;
>         tvec_t tv2;
>         tvec_t tv3;
> @@ -67,7 +66,7 @@
>  /* Fake initialization needed to avoid compiler breakage */
>  static DEFINE_PER_CPU(struct tasklet_struct, timer_tasklet) = { NULL };
> 
> -static inline void internal_add_timer(tvec_base_t *base, timer_t *timer)
> +static inline void internal_add_timer(tvec_base_t *base, struct timer_list *timer)
>  {
>         unsigned long expires = timer->expires;
>         unsigned long idx = expires - base->timer_jiffies;
> @@ -119,7 +118,7 @@
>   * Timers with an ->expired field in the past will be executed in the next
>   * timer tick. It's illegal to add an already pending timer.
>   */
> -void add_timer(timer_t *timer)
> +void add_timer(struct timer_list *timer)
>  {
>         int cpu = get_cpu();
>         tvec_base_t *base = tvec_bases + cpu;
> @@ -153,7 +152,7 @@
>   * (ie. mod_timer() of an inactive timer returns 0, mod_timer() of an
>   * active timer returns 1.)
>   */
> -int mod_timer(timer_t *timer, unsigned long expires)
> +int mod_timer(struct timer_list *timer, unsigned long expires)
>  {
>         tvec_base_t *old_base, *new_base;
>         unsigned long flags;
> @@ -226,7 +225,7 @@
>   * (ie. del_timer() of an inactive timer returns 0, del_timer() of an
>   * active timer returns 1.)
>   */
> -int del_timer(timer_t *timer)
> +int del_timer(struct timer_list *timer)
>  {
>         unsigned long flags;
>         tvec_base_t *base;
> @@ -263,7 +262,7 @@
>   *
>   * The function returns whether it has deactivated a pending timer or not.
>   */
> -int del_timer_sync(timer_t *timer)
> +int del_timer_sync(struct timer_list *timer)
>  {
>         tvec_base_t *base = tvec_bases;
>         int i, ret = 0;
> @@ -302,9 +301,9 @@
>          * detach them individually, just clear the list afterwards.
>          */
>         while (curr != head) {
> -               timer_t *tmp;
> +               struct timer_list *tmp;
> 
> -               tmp = list_entry(curr, timer_t, entry);
> +               tmp = list_entry(curr, struct timer_list, entry);
>                 if (tmp->base != base)
>                         BUG();
>                 next = curr->next;
> @@ -343,9 +342,9 @@
>                 if (curr != head) {
>                         void (*fn)(unsigned long);
>                         unsigned long data;
> -                       timer_t *timer;
> +                       struct timer_list *timer;
> 
> -                       timer = list_entry(curr, timer_t, entry);
> +                       timer = list_entry(curr, struct timer_list, entry);
>                         fn = timer->function;
>                         data = timer->data;
> 
> @@ -448,6 +447,7 @@
>         if (xtime.tv_sec % 86400 == 0) {
>             xtime.tv_sec--;
>             time_state = TIME_OOP;
> +           clock_was_set();
>             printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n");
>         }
>         break;
> @@ -456,6 +456,7 @@
>         if ((xtime.tv_sec + 1) % 86400 == 0) {
>             xtime.tv_sec++;
>             time_state = TIME_WAIT;
> +           clock_was_set();
>             printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n");
>         }
>         break;
> @@ -912,7 +913,7 @@
>   */
>  signed long schedule_timeout(signed long timeout)
>  {
> -       timer_t timer;
> +       struct timer_list timer;
>         unsigned long expire;
> 
>         switch (timeout)
> @@ -968,10 +969,32 @@
>         return current->pid;
>  }
> 
> -asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp)
> +#if 0
> +// This #if 0 is to keep the pretty printer/ formatter happy so the indents will
> +// correct below.
> +// The NANOSLEEP_ENTRY macro is defined in  asm/signal.h and
> +// is structured to allow code as well as entry definitions, so that when
> +// we get control back here the entry parameters will be available as expected.
> +// Some systems may find these paramerts in other ways than as entry parms,
> +// for example, struct pt_regs *regs is defined in i386 as the address of the
> +// first parameter, where as other archs pass it as one of the paramerters.
> +asmlinkage long sys_nanosleep(void)
>  {
> -       struct timespec t;
> -       unsigned long expire;
> +#endif
> +       NANOSLEEP_ENTRY(        struct timespec t;
> +                               unsigned long expire;)
> +
> +#ifndef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
> +               // The following code expects rqtp, rmtp to be available
> +               // as a result of the above macro.  Also any regs needed
> +               // for the _do_signal() macro shoule be set up here.
> +
> +               //asmlinkage long sys_nanosleep(struct timespec *rqtp,
> +               //  struct timespec *rmtp)
> +               //  {
> +               //    struct timespec t;
> +               //    unsigned long expire;
> +
> 
>         if(copy_from_user(&t, rqtp, sizeof(struct timespec)))
>                 return -EFAULT;
> @@ -994,6 +1017,7 @@
>         }
>         return 0;
>  }
> +#endif // ! FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
> 
>  /*
>   * sys_sysinfo - fill in sysinfo struct
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/