2002-02-26 17:47:41

by Tim Schmielau

[permalink] [raw]
Subject: [patch][rfc] enable uptime display > 497 days on 32 bit

This is a polished version of the patch that came out of the
"[Patch] Re: Nasty suprise with uptime" thread last november.

It makes the kernel export correct uptimes after jiffies wraparound
(497 days after boot with HZ=100 on 32 bit, somewhat sooner with HZ=1024)
and keeps ps output after wrap sane as well. No userland application
changes are needed.
The performance hit is minimal as the upper 32bit of the jiffies counter
are only updated when they are actually used (the get_jiffies64() routine
is introduced for this). A timer is used to check at least once between
two wraps, since timers are implemented very efficiently. 64 bit idle
time is done in the same way.

Before submitting the patch to Marcelo for 2.4.19-pre, I'd like to get
some comments, especially on the following points:

1. Does the patch, in particular the introduction of get_jiffies64(),
interfere with the high-res-timers project
(http://sourceforge.net/projects/high-res-timers) in any bad way?

2. I bumped up the start_time field of struct task_struct from 32 to
64 bits when borrowing just some extra bits would suffice.
Any suggestions, where these could be stolen, and whether this
micro-optimization is worth the trouble?

3. Maybe accounting is not worth being touched at all. comp_t is able
to hold values up to a little less that 2^34, so I stuffed the
elapsed time into that. CPU times however, although probably a
little less than real times, will still overflow if >= 2^32 secs.

User space accounting programs using 32bit integers will overflow
anyways and see the same values as without the patch.

Some people wondered why this not yet went into mainline.
Well, previous kernels sometimes locked up solidly at jiffies wraparound,
and I didn't want to imply a false feeling of safety by just fixing the
exported uptime.
As quite a few fixes in this area went into 2.4.18 (with some more to come
soon), this should not hinder inclusion anymore.
A patch for setting the jiffies counter to a pre-wrap value should
find the remaining glitches, it will follow in a separate mail.

Tim


--- linux-2.4.19-pre1/include/linux/sched.h Fri Dec 21 18:42:03 2001
+++ linux-2.4.19-pre1-j64/include/linux/sched.h Tue Feb 26 16:44:08 2002
@@ -359,7 +359,7 @@
unsigned long it_real_incr, it_prof_incr, it_virt_incr;
struct timer_list real_timer;
struct tms times;
- unsigned long start_time;
+ u64 start_time;
long per_cpu_utime[NR_CPUS], per_cpu_stime[NR_CPUS];
/* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
@@ -571,6 +571,18 @@
#include <asm/current.h>

extern unsigned long volatile jiffies;
+#if BITS_PER_LONG < 48
+# define NEEDS_JIFFIES64
+ extern u64 get_jiffies64(void);
+#else
+ /* jiffies is wide enough to not wrap for 8716 years at HZ==1024 */
+ static inline u64 get_jiffies64(void)
+ {
+ return (u64)jiffies;
+ }
+#endif
+
+
extern unsigned long itimer_ticks;
extern unsigned long itimer_next;
extern struct timeval xtime;

--- linux-2.4.19-pre1/kernel/timer.c Mon Oct 8 19:41:41 2001
+++ linux-2.4.19-pre1-j64/kernel/timer.c Tue Feb 26 16:13:35 2002
@@ -103,6 +103,8 @@

#define NOOF_TVECS (sizeof(tvecs) / sizeof(tvecs[0]))

+static inline void init_jiffieswrap_timer(void);
+
void init_timervecs (void)
{
int i;
@@ -115,6 +117,8 @@
}
for (i = 0; i < TVR_SIZE; i++)
INIT_LIST_HEAD(tv1.vec + i);
+
+ init_jiffieswrap_timer();
}

static unsigned long timer_jiffies;
@@ -683,6 +687,53 @@
if (TQ_ACTIVE(tq_timer))
mark_bh(TQUEUE_BH);
}
+
+
+#ifdef NEEDS_JIFFIES64
+
+u64 get_jiffies64(void)
+{
+ static unsigned long jiffies_hi, jiffies_last;
+ static spinlock_t jiffies64_lock = SPIN_LOCK_UNLOCKED;
+ unsigned long jiffies_tmp, flags;
+
+ spin_lock_irqsave(&jiffies64_lock, flags);
+ jiffies_tmp = jiffies; /* avoid races */
+ if (jiffies_tmp < jiffies_last) /* We have a wrap */
+ jiffies_hi++;
+ jiffies_last = jiffies_tmp;
+ spin_unlock_irqrestore(&jiffies64_lock, flags);
+
+ return (jiffies_tmp | ((u64)jiffies_hi) << BITS_PER_LONG);
+}
+
+/* use a timer to periodically check for jiffies overflow */
+
+static struct timer_list jiffieswrap_timer;
+#define CHECK_JIFFIESWRAP_INTERVAL (1ul << (BITS_PER_LONG-2))
+
+static void check_jiffieswrap(unsigned long data)
+{
+ mod_timer(&jiffieswrap_timer, jiffies + CHECK_JIFFIESWRAP_INTERVAL);
+ get_jiffies64();
+}
+
+static inline void init_jiffieswrap_timer(void)
+{
+ init_timer(&jiffieswrap_timer);
+ jiffieswrap_timer.expires = jiffies + CHECK_JIFFIESWRAP_INTERVAL;
+ jiffieswrap_timer.function = check_jiffieswrap;
+ add_timer(&jiffieswrap_timer);
+}
+
+#else
+
+static inline void init_jiffieswrap_timer(void)
+{
+}
+
+#endif /* NEEDS_JIFFIES64 */
+

#if !defined(__alpha__) && !defined(__ia64__)


--- linux-2.4.19-pre1/kernel/fork.c Sun Feb 24 19:20:43 2002
+++ linux-2.4.19-pre1-j64/kernel/fork.c Tue Feb 26 16:13:35 2002
@@ -657,7 +657,7 @@
}
#endif
p->lock_depth = -1; /* -1 = no lock */
- p->start_time = jiffies;
+ p->start_time = get_jiffies64();

INIT_LIST_HEAD(&p->local_pages);


--- linux-2.4.19-pre1/kernel/info.c Sat Apr 21 01:15:40 2001
+++ linux-2.4.19-pre1-j64/kernel/info.c Tue Feb 26 16:13:35 2002
@@ -12,15 +12,19 @@
#include <linux/smp_lock.h>

#include <asm/uaccess.h>
+#include <asm/div64.h>

asmlinkage long sys_sysinfo(struct sysinfo *info)
{
struct sysinfo val;
+ u64 uptime;

memset((char *)&val, 0, sizeof(struct sysinfo));

cli();
- val.uptime = jiffies / HZ;
+ uptime = get_jiffies64();
+ do_div(uptime, HZ);
+ val.uptime = (unsigned long) uptime;

val.loads[0] = avenrun[0] << (SI_LOAD_SHIFT - FSHIFT);
val.loads[1] = avenrun[1] << (SI_LOAD_SHIFT - FSHIFT);

--- linux-2.4.19-pre1/fs/proc/array.c Thu Oct 11 18:00:01 2001
+++ linux-2.4.19-pre1-j64/fs/proc/array.c Tue Feb 26 16:13:35 2002
@@ -343,7 +343,7 @@
ppid = task->pid ? task->p_opptr->pid : 0;
read_unlock(&tasklist_lock);
res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \
-%lu %lu %lu %lu %lu %ld %ld %ld %ld %ld %ld %lu %lu %ld %lu %lu %lu %lu %lu \
+%lu %lu %lu %lu %lu %ld %ld %ld %ld %ld %ld %llu %lu %ld %lu %lu %lu %lu %lu \
%lu %lu %lu %lu %lu %lu %lu %lu %d %d\n",
task->pid,
task->comm,
@@ -366,7 +366,7 @@
nice,
0UL /* removed */,
task->it_real_value,
- task->start_time,
+ (unsigned long long)(task->start_time),
vsize,
mm ? mm->rss : 0, /* you might want to shift this left 3 */
task->rlim[RLIMIT_RSS].rlim_cur,

--- linux-2.4.19-pre1/fs/proc/proc_misc.c Wed Nov 21 06:29:09 2001
+++ linux-2.4.19-pre1-j64/fs/proc/proc_misc.c Tue Feb 26 16:50:45 2002
@@ -40,6 +40,7 @@
#include <asm/uaccess.h>
#include <asm/pgtable.h>
#include <asm/io.h>
+#include <asm/div64.h>


#define LOAD_INT(x) ((x) >> FSHIFT)
@@ -93,37 +94,82 @@
return proc_calc_metrics(page, start, off, count, eof, len);
}

+#if BITS_PER_LONG < 48
+
+u64 get_idle64(void)
+{
+ static unsigned long idle_hi, idle_last;
+ static spinlock_t idle64_lock = SPIN_LOCK_UNLOCKED;
+ unsigned long idle, flags;
+
+ spin_lock_irqsave(&idle64_lock, flags);
+ idle = init_tasks[0]->times.tms_utime + init_tasks[0]->times.tms_stime;
+ if (idle < idle_last) /* We have a wrap */
+ idle_hi++;
+ idle_last = idle;
+ spin_unlock_irqrestore(&idle64_lock, flags);
+
+ return (idle | ((u64)idle_hi) << BITS_PER_LONG);
+}
+
+/* use a timer to periodically check for idle time overflow */
+
+static struct timer_list idlewrap_timer;
+#define CHECK_IDLEWRAP_INTERVAL (1ul << (BITS_PER_LONG-2))
+
+static void check_idlewrap(unsigned long data)
+{
+ mod_timer(&idlewrap_timer, jiffies + CHECK_IDLEWRAP_INTERVAL);
+ get_idle64();
+}
+
+static inline void init_idlewrap_timer(void)
+{
+ init_timer(&idlewrap_timer);
+ idlewrap_timer.expires = jiffies + CHECK_IDLEWRAP_INTERVAL;
+ idlewrap_timer.function = check_idlewrap;
+ add_timer(&idlewrap_timer);
+}
+
+#else
+ /* Idle time won't overflow for 8716 years at HZ==1024 */
+
+static inline u64 get_idle64(void)
+{
+ return (u64)(init_tasks[0]->times.tms_utime
+ + init_tasks[0]->times.tms_stime);
+}
+
+static inline void init_jiffieswrap_timer(void)
+{
+}
+
+#endif /* BITS_PER_LONG < 48 */
+
static int uptime_read_proc(char *page, char **start, off_t off,
int count, int *eof, void *data)
{
- unsigned long uptime;
- unsigned long idle;
+ u64 uptime, idle;
+ unsigned long uptime_remainder, idle_remainder;
int len;

- uptime = jiffies;
- idle = init_tasks[0]->times.tms_utime + init_tasks[0]->times.tms_stime;
+ uptime = get_jiffies64();
+ uptime_remainder = (unsigned long) do_div(uptime, HZ);
+ idle = get_idle64();
+ idle_remainder = (unsigned long) do_div(idle, HZ);

- /* The formula for the fraction parts really is ((t * 100) / HZ) % 100, but
- that would overflow about every five days at HZ == 100.
- Therefore the identity a = (a / b) * b + a % b is used so that it is
- calculated as (((t / HZ) * 100) + ((t % HZ) * 100) / HZ) % 100.
- The part in front of the '+' always evaluates as 0 (mod 100). All divisions
- in the above formulas are truncating. For HZ being a power of 10, the
- calculations simplify to the version in the #else part (if the printf
- format is adapted to the same number of digits as zeroes in HZ.
- */
#if HZ!=100
len = sprintf(page,"%lu.%02lu %lu.%02lu\n",
- uptime / HZ,
- (((uptime % HZ) * 100) / HZ) % 100,
- idle / HZ,
- (((idle % HZ) * 100) / HZ) % 100);
+ (unsigned long) uptime,
+ (uptime_remainder * 100) / HZ,
+ (unsigned long) idle,
+ (idle_remainder * 100) / HZ);
#else
len = sprintf(page,"%lu.%02lu %lu.%02lu\n",
- uptime / HZ,
- uptime % HZ,
- idle / HZ,
- idle % HZ);
+ (unsigned long) uptime,
+ uptime_remainder,
+ (unsigned long) idle,
+ idle_remainder);
#endif
return proc_calc_metrics(page, start, off, count, eof, len);
}
@@ -240,7 +286,7 @@
{
int i, len;
extern unsigned long total_forks;
- unsigned long jif = jiffies;
+ u64 jif = get_jiffies64();
unsigned int sum = 0, user = 0, nice = 0, system = 0;
int major, disk;

@@ -256,17 +302,19 @@
#endif
}

- len = sprintf(page, "cpu %u %u %u %lu\n", user, nice, system,
- jif * smp_num_cpus - (user + nice + system));
+ len = sprintf(page, "cpu %u %u %u %llu\n", user, nice, system,
+ (unsigned long long) jif * smp_num_cpus
+ - user - nice - system);
for (i = 0 ; i < smp_num_cpus; i++)
- len += sprintf(page + len, "cpu%d %u %u %u %lu\n",
+ len += sprintf(page + len, "cpu%d %u %u %u %llu\n",
i,
kstat.per_cpu_user[cpu_logical_map(i)],
kstat.per_cpu_nice[cpu_logical_map(i)],
kstat.per_cpu_system[cpu_logical_map(i)],
- jif - ( kstat.per_cpu_user[cpu_logical_map(i)] \
- + kstat.per_cpu_nice[cpu_logical_map(i)] \
- + kstat.per_cpu_system[cpu_logical_map(i)]));
+ (unsigned long long) jif
+ - kstat.per_cpu_user[cpu_logical_map(i)]
+ - kstat.per_cpu_nice[cpu_logical_map(i)]
+ - kstat.per_cpu_system[cpu_logical_map(i)]);
len += sprintf(page + len,
"page %u %u\n"
"swap %u %u\n"
@@ -302,12 +350,13 @@
}
}

+ do_div(jif, HZ);
len += sprintf(page + len,
"\nctxt %u\n"
"btime %lu\n"
"processes %lu\n",
kstat.context_swtch,
- xtime.tv_sec - jif / HZ,
+ xtime.tv_sec - (unsigned long) jif,
total_forks);

return proc_calc_metrics(page, start, off, count, eof, len);
@@ -565,4 +614,6 @@
slabinfo_read_proc, NULL);
if (entry)
entry->write_proc = slabinfo_write_proc;
+
+ init_idlewrap_timer();
}

--- linux-2.4.19-pre1/mm/oom_kill.c Sun Nov 4 02:05:25 2001
+++ linux-2.4.19-pre1-j64/mm/oom_kill.c Tue Feb 26 16:13:35 2002
@@ -69,11 +69,10 @@
/*
* CPU time is in seconds and run time is in minutes. There is no
* particular reason for this other than that it turned out to work
- * very well in practice. This is not safe against jiffie wraps
- * but we don't care _that_ much...
+ * very well in practice.
*/
cpu_time = (p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3);
- run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
+ run_time = (get_jiffies64() - p->start_time) >> (SHIFT_HZ + 10);

points /= int_sqrt(cpu_time);
points /= int_sqrt(int_sqrt(run_time));

--- linux-2.4.19-pre1/kernel/acct.c Mon Mar 19 21:35:08 2001
+++ linux-2.4.19-pre1-j64/kernel/acct.c Tue Feb 26 16:13:35 2002
@@ -56,6 +56,7 @@
#include <linux/tty.h>

#include <asm/uaccess.h>
+#include <asm/div64.h>

/*
* These constants control the amount of freespace that suspend and
@@ -227,20 +228,24 @@
* This routine has been adopted from the encode_comp_t() function in
* the kern_acct.c file of the FreeBSD operating system. The encoding
* is a 13-bit fraction with a 3-bit (base 8) exponent.
+ *
+ * Bumped up to encode 64 bit values. Unfortunately the result may
+ * overflow now.
*/

#define MANTSIZE 13 /* 13 bit mantissa. */
-#define EXPSIZE 3 /* Base 8 (3 bit) exponent. */
+#define EXPSIZE 3 /* 3 bit exponent. */
+#define EXPBASE 3 /* Base 8 (3 bit) exponent. */
#define MAXFRACT ((1 << MANTSIZE) - 1) /* Maximum fractional value. */

-static comp_t encode_comp_t(unsigned long value)
+static comp_t encode_comp_t(u64 value)
{
int exp, rnd;

exp = rnd = 0;
while (value > MAXFRACT) {
- rnd = value & (1 << (EXPSIZE - 1)); /* Round up? */
- value >>= EXPSIZE; /* Base 8 exponent == 3 bit shift. */
+ rnd = value & (1 << (EXPBASE - 1)); /* Round up? */
+ value >>= EXPBASE; /* Base 8 exponent == 3 bit shift. */
exp++;
}

@@ -248,16 +253,21 @@
* If we need to round up, do it (and handle overflow correctly).
*/
if (rnd && (++value > MAXFRACT)) {
- value >>= EXPSIZE;
+ value >>= EXPBASE;
exp++;
}

/*
* Clean it up and polish it off.
*/
- exp <<= MANTSIZE; /* Shift the exponent into place */
- exp += value; /* and add on the mantissa. */
- return exp;
+ if (exp >= (1 << EXPSIZE)) {
+ /* Overflow. Return largest representable number instead. */
+ return (1ul << (MANTSIZE + EXPSIZE)) - 1;
+ } else {
+ exp <<= MANTSIZE; /* Shift the exponent into place */
+ exp += value; /* and add on the mantissa. */
+ return exp;
+ }
}

/*
@@ -277,6 +287,7 @@
struct acct ac;
mm_segment_t fs;
unsigned long vsize;
+ u64 elapsed;

/*
* First check to see if there is enough free_space to continue
@@ -294,8 +305,10 @@
strncpy(ac.ac_comm, current->comm, ACCT_COMM);
ac.ac_comm[ACCT_COMM - 1] = '\0';

- ac.ac_btime = CT_TO_SECS(current->start_time) + (xtime.tv_sec - (jiffies / HZ));
- ac.ac_etime = encode_comp_t(jiffies - current->start_time);
+ elapsed = get_jiffies64() - current->start_time;
+ ac.ac_etime = encode_comp_t(elapsed);
+ do_div(elapsed, HZ);
+ ac.ac_btime = xtime.tv_sec - elapsed;
ac.ac_utime = encode_comp_t(current->times.tms_utime);
ac.ac_stime = encode_comp_t(current->times.tms_stime);
ac.ac_uid = current->uid;








2002-02-26 17:59:32

by Tim Schmielau

[permalink] [raw]
Subject: Re: [patch][rfc] enable uptime display > 497 days on 32 bit

This patch introduces a config option on 32 bit platforms to set the
jiffies counter to a values 5 minutes before wraparound. It depends on my
previous patch to "enable uptime display > 497 days on 32 bit".

This should help to find any remaining problems at jiffies wraparound.
As many fixes went into 2.4.18, this shouldn't be that dangerous anymore,
and I'd like as many people as possible to try this out.
Still, you have to feel OK in case your box does indeed struggle, so the
warning is left in.

Tim


--- linux-2.4.19-pre1-j64/include/linux/timex.h Thu Nov 22 20:46:18 2001
+++ linux-2.4.19-pre1-j64-dbg/include/linux/timex.h Tue Feb 26 16:44:08 2002
@@ -53,6 +53,13 @@

#include <asm/param.h>

+#ifdef CONFIG_DEBUG_JIFFIESWRAP
+ /* Make the jiffies counter wrap around sooner. */
+# define INITIAL_JIFFIES ((unsigned long)(-300*HZ))
+#else
+# define INITIAL_JIFFIES 0
+#endif
+
/*
* The following defines establish the engineering parameters of the PLL
* model. The HZ variable establishes the timer interrupt frequency, 100 Hz

--- linux-2.4.19-pre1-j64/kernel/timer.c Tue Feb 26 16:13:35 2002
+++ linux-2.4.19-pre1-j64-dbg/kernel/timer.c Tue Feb 26 16:39:02 2002
@@ -65,7 +65,7 @@

extern int do_setitimer(int, struct itimerval *, struct itimerval *);

-unsigned long volatile jiffies;
+unsigned long volatile jiffies = INITIAL_JIFFIES;

unsigned int * prof_buffer;
unsigned long prof_len;
@@ -118,10 +118,18 @@
for (i = 0; i < TVR_SIZE; i++)
INIT_LIST_HEAD(tv1.vec + i);

+#ifdef CONFIG_DEBUG_JIFFIESWRAP
+ tv1.index = INITIAL_JIFFIES & TVR_MASK;
+ tv2.index = (INITIAL_JIFFIES >> TVR_BITS) & TVN_MASK;
+ tv3.index = (INITIAL_JIFFIES >> (TVR_BITS + TVN_BITS)) & TVN_MASK;
+ tv4.index = (INITIAL_JIFFIES >> (TVR_BITS + 2*TVN_BITS)) & TVN_MASK;
+ tv5.index = (INITIAL_JIFFIES >> (TVR_BITS + 3*TVN_BITS)) & TVN_MASK;
+#endif
+
init_jiffieswrap_timer();
}

-static unsigned long timer_jiffies;
+static unsigned long timer_jiffies = INITIAL_JIFFIES;

static inline void internal_add_timer(struct timer_list *timer)
{
@@ -642,7 +650,7 @@
}

/* jiffies at the most recent update of wall time */
-unsigned long wall_jiffies;
+unsigned long wall_jiffies = INITIAL_JIFFIES;

/*
* This spinlock protect us from races in SMP while playing with xtime. -arca
@@ -693,7 +701,7 @@

u64 get_jiffies64(void)
{
- static unsigned long jiffies_hi, jiffies_last;
+ static unsigned long jiffies_hi, jiffies_last = INITIAL_JIFFIES;
static spinlock_t jiffies64_lock = SPIN_LOCK_UNLOCKED;
unsigned long jiffies_tmp, flags;


--- linux-2.4.19-pre1-j64/fs/proc/array.c Tue Feb 26 16:13:35 2002
+++ linux-2.4.19-pre1-j64-dbg/fs/proc/array.c Tue Feb 26 16:39:02 2002
@@ -366,7 +366,7 @@
nice,
0UL /* removed */,
task->it_real_value,
- (unsigned long long)(task->start_time),
+ (unsigned long long)(task->start_time) - INITIAL_JIFFIES,
vsize,
mm ? mm->rss : 0, /* you might want to shift this left 3 */
task->rlim[RLIMIT_RSS].rlim_cur,

--- linux-2.4.19-pre1-j64/fs/proc/proc_misc.c Tue Feb 26 16:50:45 2002
+++ linux-2.4.19-pre1-j64-dbg/fs/proc/proc_misc.c Tue Feb 26 16:50:28 2002
@@ -153,7 +153,7 @@
unsigned long uptime_remainder, idle_remainder;
int len;

- uptime = get_jiffies64();
+ uptime = get_jiffies64() - INITIAL_JIFFIES;
uptime_remainder = (unsigned long) do_div(uptime, HZ);
idle = get_idle64();
idle_remainder = (unsigned long) do_div(idle, HZ);
@@ -286,7 +286,7 @@
{
int i, len;
extern unsigned long total_forks;
- u64 jif = get_jiffies64();
+ u64 jif = get_jiffies64() - INITIAL_JIFFIES;
unsigned int sum = 0, user = 0, nice = 0, system = 0;
int major, disk;


--- linux-2.4.19-pre1-j64/kernel/info.c Tue Feb 26 16:13:35 2002
+++ linux-2.4.19-pre1-j64-dbg/kernel/info.c Tue Feb 26 16:39:02 2002
@@ -22,7 +22,7 @@
memset((char *)&val, 0, sizeof(struct sysinfo));

cli();
- uptime = get_jiffies64();
+ uptime = get_jiffies64() - INITIAL_JIFFIES;
do_div(uptime, HZ);
val.uptime = (unsigned long) uptime;


--- linux-2.4.19-pre1-j64/Documentation/Configure.help Tue Feb 26 16:12:18 2002
+++ linux-2.4.19-pre1-j64-dbg/Documentation/Configure.help Tue Feb 26 16:39:02 2002
@@ -24034,6 +24034,14 @@
of the BUG call as well as the EIP and oops trace. This aids
debugging but costs about 70-100K of memory.

+Debug jiffies counter wraparound (DANGEROUS)
+CONFIG_DEBUG_JIFFIESWRAP
+ Say Y here to initialize the jiffies counter to a value 5 minutes
+ before wraparound. This may make your system UNSTABLE and its
+ only use is to hunt down the causes of this instability.
+ If you don't know what the jiffies counter is or if you want
+ a stable system, say N.
+
Include kgdb kernel debugger
CONFIG_KGDB
Include in-kernel hooks for kgdb, the Linux kernel source level

--- linux-2.4.19-pre1-j64/arch/arm/config.in Fri Nov 9 22:58:02 2001
+++ linux-2.4.19-pre1-j64-dbg/arch/arm/config.in Tue Feb 26 16:39:02 2002
@@ -601,6 +601,7 @@
bool 'Magic SysRq key' CONFIG_MAGIC_SYSRQ
bool 'Spinlock debugging' CONFIG_DEBUG_SPINLOCK
dep_bool 'Disable pgtable cache' CONFIG_NO_PGT_CACHE $CONFIG_CPU_26
+bool 'Debug jiffies counter wraparound (DANGEROUS)' CONFIG_DEBUG_JIFFIESWRAP
# These options are only for real kernel hackers who want to get their hands dirty.
dep_bool 'Kernel low-level debugging functions' CONFIG_DEBUG_LL $CONFIG_EXPERIMENTAL
dep_bool ' Kernel low-level debugging messages via footbridge serial port' CONFIG_DEBUG_DC21285_PORT $CONFIG_DEBUG_LL $CONFIG_FOOTBRIDGE

--- linux-2.4.19-pre1-j64/arch/cris/config.in Sun Feb 24 19:20:36 2002
+++ linux-2.4.19-pre1-j64-dbg/arch/cris/config.in Tue Feb 26 16:39:02 2002
@@ -253,4 +253,5 @@
if [ "$CONFIG_PROFILE" = "y" ]; then
int ' Profile shift count' CONFIG_PROFILE_SHIFT 2
fi
+bool 'Debug jiffies counter wraparound (DANGEROUS)' CONFIG_DEBUG_JIFFIESWRAP
endmenu

--- linux-2.4.19-pre1-j64/arch/i386/config.in Sun Feb 24 19:20:36 2002
+++ linux-2.4.19-pre1-j64-dbg/arch/i386/config.in Tue Feb 26 16:39:02 2002
@@ -422,6 +422,7 @@
bool ' Magic SysRq key' CONFIG_MAGIC_SYSRQ
bool ' Spinlock debugging' CONFIG_DEBUG_SPINLOCK
bool ' Verbose BUG() reporting (adds 70K)' CONFIG_DEBUG_BUGVERBOSE
+ bool ' Debug jiffies counter wraparound (DANGEROUS)' CONFIG_DEBUG_JIFFIESWRAP
fi

endmenu

--- linux-2.4.19-pre1-j64/arch/m68k/config.in Tue Jun 12 04:15:27 2001
+++ linux-2.4.19-pre1-j64-dbg/arch/m68k/config.in Tue Feb 26 16:39:02 2002
@@ -545,4 +545,5 @@

#bool 'Debug kmalloc/kfree' CONFIG_DEBUG_MALLOC
bool 'Magic SysRq key' CONFIG_MAGIC_SYSRQ
+bool 'Debug jiffies counter wraparound (DANGEROUS)' CONFIG_DEBUG_JIFFIESWRAP
endmenu

--- linux-2.4.19-pre1-j64/arch/mips/config.in Mon Oct 15 22:41:34 2001
+++ linux-2.4.19-pre1-j64-dbg/arch/mips/config.in Tue Feb 26 16:39:02 2002
@@ -519,4 +519,5 @@
if [ "$CONFIG_SMP" != "y" ]; then
bool 'Run uncached' CONFIG_MIPS_UNCACHED
fi
+bool 'Debug jiffies counter wraparound (DANGEROUS)' CONFIG_DEBUG_JIFFIESWRAP
endmenu

--- linux-2.4.19-pre1-j64/arch/parisc/config.in Wed Apr 18 02:19:25 2001
+++ linux-2.4.19-pre1-j64-dbg/arch/parisc/config.in Tue Feb 26 16:39:02 2002
@@ -206,5 +206,6 @@

#bool 'Debug kmalloc/kfree' CONFIG_DEBUG_MALLOC
bool 'Magic SysRq key' CONFIG_MAGIC_SYSRQ
+bool 'Debug jiffies counter wraparound (DANGEROUS)' CONFIG_DEBUG_JIFFIESWRAP
endmenu


--- linux-2.4.19-pre1-j64/arch/ppc/config.in Sun Feb 24 19:20:36 2002
+++ linux-2.4.19-pre1-j64-dbg/arch/ppc/config.in Tue Feb 26 16:39:02 2002
@@ -399,4 +399,5 @@
bool 'Magic SysRq key' CONFIG_MAGIC_SYSRQ
bool 'Include kgdb kernel debugger' CONFIG_KGDB
bool 'Include xmon kernel debugger' CONFIG_XMON
+bool 'Debug jiffies counter wraparound (DANGEROUS)' CONFIG_DEBUG_JIFFIESWRAP
endmenu

--- linux-2.4.19-pre1-j64/arch/sh/config.in Sun Feb 24 19:20:37 2002
+++ linux-2.4.19-pre1-j64-dbg/arch/sh/config.in Tue Feb 26 16:39:02 2002
@@ -385,4 +385,5 @@
if [ "$CONFIG_SH_STANDARD_BIOS" = "y" ]; then
bool 'Early printk support' CONFIG_SH_EARLY_PRINTK
fi
+bool 'Debug jiffies counter wraparound (DANGEROUS)' CONFIG_DEBUG_JIFFIESWRAP
endmenu

--- linux-2.4.19-pre1-j64/arch/sparc/config.in Tue Jun 12 04:15:27 2001
+++ linux-2.4.19-pre1-j64-dbg/arch/sparc/config.in Tue Feb 26 16:39:02 2002
@@ -265,4 +265,5 @@
comment 'Kernel hacking'

bool 'Magic SysRq key' CONFIG_MAGIC_SYSRQ
+bool 'Debug jiffies counter wraparound (DANGEROUS)' CONFIG_DEBUG_JIFFIESWRAP
endmenu

2002-02-26 19:47:14

by George Anzinger

[permalink] [raw]
Subject: Re: [patch][rfc] enable uptime display > 497 days on 32 bit

Tim Schmielau wrote:
>
> This is a polished version of the patch that came out of the
> "[Patch] Re: Nasty suprise with uptime" thread last november.
>
> It makes the kernel export correct uptimes after jiffies wraparound
> (497 days after boot with HZ=100 on 32 bit, somewhat sooner with HZ=1024)
> and keeps ps output after wrap sane as well. No userland application
> changes are needed.
> The performance hit is minimal as the upper 32bit of the jiffies counter
> are only updated when they are actually used (the get_jiffies64() routine
> is introduced for this). A timer is used to check at least once between
> two wraps, since timers are implemented very efficiently. 64 bit idle
> time is done in the same way.
>
> Before submitting the patch to Marcelo for 2.4.19-pre, I'd like to get
> some comments, especially on the following points:
>
> 1. Does the patch, in particular the introduction of get_jiffies64(),
> interfere with the high-res-timers project
> (http://sourceforge.net/projects/high-res-timers) in any bad way?

Well, since you asked (thank you), the high-res-timers patch needs to
get the full 64-bit uptime to implement CLOCK_MONOTONIC. Also, since
the timers in the kernel are, in fact clocked by CLOCK_MONOTONIC (read
jiffies) the code has to resolve any wall clock time (i.e.
CLOCK_REALTIME) into CLOCK_MONOTONIC as part of the timer_settimer()
call. This means that the 64-bit jiffies is used "often" in this code.
In discussions on the list with Linus, he suggested that the jiffies
counter be expanded to 64-bits in a particular way, i.e. by #define
jiffie ... taking into account the endian and 64/32 bitness of the
particular platform. I think this is what I implemented in the
high-res-timers patch. This implementation allows almost all users to
continue to work with 32-bit jiffies while providing a 64-bit value for
those who need it. Unlike this patch, it keeps the 64-bit rational
(i.e. always current) by doing a 64-bit add which adds an "adc" (add
carry to memory) to the timer interrupt path. This update is already
SMP protected so no additional locking is required on update. Reading
it CAN be done with out locking if it is done with care (read high, low,
high, if high1 != high2 do it again) or a read lock can be taken.

The only down side I see to the suggested 64-bit jiffies as implemented
in the high-res-timers patch is the name space collision issue around
the #define of jiffies. I think I have found almost all of these,
however, it is hard to come up with a .config that compiles ALL paths
(especially considering the various archs).
>
> 2. I bumped up the start_time field of struct task_struct from 32 to
> 64 bits when borrowing just some extra bits would suffice.
> Any suggestions, where these could be stolen, and whether this
> micro-optimization is worth the trouble?
>
> 3. Maybe accounting is not worth being touched at all. comp_t is able
> to hold values up to a little less that 2^34, so I stuffed the
> elapsed time into that. CPU times however, although probably a
> little less than real times, will still overflow if >= 2^32 secs.
>
> User space accounting programs using 32bit integers will overflow
> anyways and see the same values as without the patch.
>
> Some people wondered why this not yet went into mainline.
> Well, previous kernels sometimes locked up solidly at jiffies wraparound,
> and I didn't want to imply a false feeling of safety by just fixing the
> exported uptime.
> As quite a few fixes in this area went into 2.4.18 (with some more to come
> soon), this should not hinder inclusion anymore.
> A patch for setting the jiffies counter to a pre-wrap value should
> find the remaining glitches, it will follow in a separate mail.
>
> Tim
>
> --- linux-2.4.19-pre1/include/linux/sched.h Fri Dec 21 18:42:03 2001
> +++ linux-2.4.19-pre1-j64/include/linux/sched.h Tue Feb 26 16:44:08 2002
> @@ -359,7 +359,7 @@
> unsigned long it_real_incr, it_prof_incr, it_virt_incr;
> struct timer_list real_timer;
> struct tms times;
> - unsigned long start_time;
> + u64 start_time;
> long per_cpu_utime[NR_CPUS], per_cpu_stime[NR_CPUS];
> /* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
> unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
> @@ -571,6 +571,18 @@
> #include <asm/current.h>
>
> extern unsigned long volatile jiffies;
> +#if BITS_PER_LONG < 48
> +# define NEEDS_JIFFIES64
> + extern u64 get_jiffies64(void);
> +#else
> + /* jiffies is wide enough to not wrap for 8716 years at HZ==1024 */
> + static inline u64 get_jiffies64(void)
> + {
> + return (u64)jiffies;
> + }
> +#endif
> +
> +
> extern unsigned long itimer_ticks;
> extern unsigned long itimer_next;
> extern struct timeval xtime;
>
> --- linux-2.4.19-pre1/kernel/timer.c Mon Oct 8 19:41:41 2001
> +++ linux-2.4.19-pre1-j64/kernel/timer.c Tue Feb 26 16:13:35 2002
> @@ -103,6 +103,8 @@
>
> #define NOOF_TVECS (sizeof(tvecs) / sizeof(tvecs[0]))
>
> +static inline void init_jiffieswrap_timer(void);
> +
> void init_timervecs (void)
> {
> int i;
> @@ -115,6 +117,8 @@
> }
> for (i = 0; i < TVR_SIZE; i++)
> INIT_LIST_HEAD(tv1.vec + i);
> +
> + init_jiffieswrap_timer();
> }
>
> static unsigned long timer_jiffies;
> @@ -683,6 +687,53 @@
> if (TQ_ACTIVE(tq_timer))
> mark_bh(TQUEUE_BH);
> }
> +
> +
> +#ifdef NEEDS_JIFFIES64
> +
> +u64 get_jiffies64(void)
> +{
> + static unsigned long jiffies_hi, jiffies_last;
> + static spinlock_t jiffies64_lock = SPIN_LOCK_UNLOCKED;
> + unsigned long jiffies_tmp, flags;
> +
> + spin_lock_irqsave(&jiffies64_lock, flags);
> + jiffies_tmp = jiffies; /* avoid races */
> + if (jiffies_tmp < jiffies_last) /* We have a wrap */
> + jiffies_hi++;
> + jiffies_last = jiffies_tmp;
> + spin_unlock_irqrestore(&jiffies64_lock, flags);
> +
> + return (jiffies_tmp | ((u64)jiffies_hi) << BITS_PER_LONG);
> +}
> +
> +/* use a timer to periodically check for jiffies overflow */
> +
> +static struct timer_list jiffieswrap_timer;
> +#define CHECK_JIFFIESWRAP_INTERVAL (1ul << (BITS_PER_LONG-2))
> +
> +static void check_jiffieswrap(unsigned long data)
> +{
> + mod_timer(&jiffieswrap_timer, jiffies + CHECK_JIFFIESWRAP_INTERVAL);
> + get_jiffies64();
> +}
> +
> +static inline void init_jiffieswrap_timer(void)
> +{
> + init_timer(&jiffieswrap_timer);
> + jiffieswrap_timer.expires = jiffies + CHECK_JIFFIESWRAP_INTERVAL;
> + jiffieswrap_timer.function = check_jiffieswrap;
> + add_timer(&jiffieswrap_timer);
> +}
> +
> +#else
> +
> +static inline void init_jiffieswrap_timer(void)
> +{
> +}
> +
> +#endif /* NEEDS_JIFFIES64 */
> +
>
> #if !defined(__alpha__) && !defined(__ia64__)
>
>
> --- linux-2.4.19-pre1/kernel/fork.c Sun Feb 24 19:20:43 2002
> +++ linux-2.4.19-pre1-j64/kernel/fork.c Tue Feb 26 16:13:35 2002
> @@ -657,7 +657,7 @@
> }
> #endif
> p->lock_depth = -1; /* -1 = no lock */
> - p->start_time = jiffies;
> + p->start_time = get_jiffies64();
>
> INIT_LIST_HEAD(&p->local_pages);
>
>
> --- linux-2.4.19-pre1/kernel/info.c Sat Apr 21 01:15:40 2001
> +++ linux-2.4.19-pre1-j64/kernel/info.c Tue Feb 26 16:13:35 2002
> @@ -12,15 +12,19 @@
> #include <linux/smp_lock.h>
>
> #include <asm/uaccess.h>
> +#include <asm/div64.h>
>
> asmlinkage long sys_sysinfo(struct sysinfo *info)
> {
> struct sysinfo val;
> + u64 uptime;
>
> memset((char *)&val, 0, sizeof(struct sysinfo));
>
> cli();
> - val.uptime = jiffies / HZ;
> + uptime = get_jiffies64();
> + do_div(uptime, HZ);
> + val.uptime = (unsigned long) uptime;
>
> val.loads[0] = avenrun[0] << (SI_LOAD_SHIFT - FSHIFT);
> val.loads[1] = avenrun[1] << (SI_LOAD_SHIFT - FSHIFT);
>
> --- linux-2.4.19-pre1/fs/proc/array.c Thu Oct 11 18:00:01 2001
> +++ linux-2.4.19-pre1-j64/fs/proc/array.c Tue Feb 26 16:13:35 2002
> @@ -343,7 +343,7 @@
> ppid = task->pid ? task->p_opptr->pid : 0;
> read_unlock(&tasklist_lock);
> res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \
> -%lu %lu %lu %lu %lu %ld %ld %ld %ld %ld %ld %lu %lu %ld %lu %lu %lu %lu %lu \
> +%lu %lu %lu %lu %lu %ld %ld %ld %ld %ld %ld %llu %lu %ld %lu %lu %lu %lu %lu \
> %lu %lu %lu %lu %lu %lu %lu %lu %d %d\n",
> task->pid,
> task->comm,
> @@ -366,7 +366,7 @@
> nice,
> 0UL /* removed */,
> task->it_real_value,
> - task->start_time,
> + (unsigned long long)(task->start_time),
> vsize,
> mm ? mm->rss : 0, /* you might want to shift this left 3 */
> task->rlim[RLIMIT_RSS].rlim_cur,
>
> --- linux-2.4.19-pre1/fs/proc/proc_misc.c Wed Nov 21 06:29:09 2001
> +++ linux-2.4.19-pre1-j64/fs/proc/proc_misc.c Tue Feb 26 16:50:45 2002
> @@ -40,6 +40,7 @@
> #include <asm/uaccess.h>
> #include <asm/pgtable.h>
> #include <asm/io.h>
> +#include <asm/div64.h>
>
>
> #define LOAD_INT(x) ((x) >> FSHIFT)
> @@ -93,37 +94,82 @@
> return proc_calc_metrics(page, start, off, count, eof, len);
> }
>
> +#if BITS_PER_LONG < 48
> +
> +u64 get_idle64(void)
> +{
> + static unsigned long idle_hi, idle_last;
> + static spinlock_t idle64_lock = SPIN_LOCK_UNLOCKED;
> + unsigned long idle, flags;
> +
> + spin_lock_irqsave(&idle64_lock, flags);
> + idle = init_tasks[0]->times.tms_utime + init_tasks[0]->times.tms_stime;
> + if (idle < idle_last) /* We have a wrap */
> + idle_hi++;
> + idle_last = idle;
> + spin_unlock_irqrestore(&idle64_lock, flags);
> +
> + return (idle | ((u64)idle_hi) << BITS_PER_LONG);
> +}
> +
> +/* use a timer to periodically check for idle time overflow */
> +
> +static struct timer_list idlewrap_timer;
> +#define CHECK_IDLEWRAP_INTERVAL (1ul << (BITS_PER_LONG-2))
> +
> +static void check_idlewrap(unsigned long data)
> +{
> + mod_timer(&idlewrap_timer, jiffies + CHECK_IDLEWRAP_INTERVAL);
> + get_idle64();
> +}
> +
> +static inline void init_idlewrap_timer(void)
> +{
> + init_timer(&idlewrap_timer);
> + idlewrap_timer.expires = jiffies + CHECK_IDLEWRAP_INTERVAL;
> + idlewrap_timer.function = check_idlewrap;
> + add_timer(&idlewrap_timer);
> +}
> +
> +#else
> + /* Idle time won't overflow for 8716 years at HZ==1024 */
> +
> +static inline u64 get_idle64(void)
> +{
> + return (u64)(init_tasks[0]->times.tms_utime
> + + init_tasks[0]->times.tms_stime);
> +}
> +
> +static inline void init_jiffieswrap_timer(void)
> +{
> +}
> +
> +#endif /* BITS_PER_LONG < 48 */
> +
> static int uptime_read_proc(char *page, char **start, off_t off,
> int count, int *eof, void *data)
> {
> - unsigned long uptime;
> - unsigned long idle;
> + u64 uptime, idle;
> + unsigned long uptime_remainder, idle_remainder;
> int len;
>
> - uptime = jiffies;
> - idle = init_tasks[0]->times.tms_utime + init_tasks[0]->times.tms_stime;
> + uptime = get_jiffies64();
> + uptime_remainder = (unsigned long) do_div(uptime, HZ);
> + idle = get_idle64();
> + idle_remainder = (unsigned long) do_div(idle, HZ);
>
> - /* The formula for the fraction parts really is ((t * 100) / HZ) % 100, but
> - that would overflow about every five days at HZ == 100.
> - Therefore the identity a = (a / b) * b + a % b is used so that it is
> - calculated as (((t / HZ) * 100) + ((t % HZ) * 100) / HZ) % 100.
> - The part in front of the '+' always evaluates as 0 (mod 100). All divisions
> - in the above formulas are truncating. For HZ being a power of 10, the
> - calculations simplify to the version in the #else part (if the printf
> - format is adapted to the same number of digits as zeroes in HZ.
> - */
> #if HZ!=100
> len = sprintf(page,"%lu.%02lu %lu.%02lu\n",
> - uptime / HZ,
> - (((uptime % HZ) * 100) / HZ) % 100,
> - idle / HZ,
> - (((idle % HZ) * 100) / HZ) % 100);
> + (unsigned long) uptime,
> + (uptime_remainder * 100) / HZ,
> + (unsigned long) idle,
> + (idle_remainder * 100) / HZ);
> #else
> len = sprintf(page,"%lu.%02lu %lu.%02lu\n",
> - uptime / HZ,
> - uptime % HZ,
> - idle / HZ,
> - idle % HZ);
> + (unsigned long) uptime,
> + uptime_remainder,
> + (unsigned long) idle,
> + idle_remainder);
> #endif
> return proc_calc_metrics(page, start, off, count, eof, len);
> }
> @@ -240,7 +286,7 @@
> {
> int i, len;
> extern unsigned long total_forks;
> - unsigned long jif = jiffies;
> + u64 jif = get_jiffies64();
> unsigned int sum = 0, user = 0, nice = 0, system = 0;
> int major, disk;
>
> @@ -256,17 +302,19 @@
> #endif
> }
>
> - len = sprintf(page, "cpu %u %u %u %lu\n", user, nice, system,
> - jif * smp_num_cpus - (user + nice + system));
> + len = sprintf(page, "cpu %u %u %u %llu\n", user, nice, system,
> + (unsigned long long) jif * smp_num_cpus
> + - user - nice - system);
> for (i = 0 ; i < smp_num_cpus; i++)
> - len += sprintf(page + len, "cpu%d %u %u %u %lu\n",
> + len += sprintf(page + len, "cpu%d %u %u %u %llu\n",
> i,
> kstat.per_cpu_user[cpu_logical_map(i)],
> kstat.per_cpu_nice[cpu_logical_map(i)],
> kstat.per_cpu_system[cpu_logical_map(i)],
> - jif - ( kstat.per_cpu_user[cpu_logical_map(i)] \
> - + kstat.per_cpu_nice[cpu_logical_map(i)] \
> - + kstat.per_cpu_system[cpu_logical_map(i)]));
> + (unsigned long long) jif
> + - kstat.per_cpu_user[cpu_logical_map(i)]
> + - kstat.per_cpu_nice[cpu_logical_map(i)]
> + - kstat.per_cpu_system[cpu_logical_map(i)]);
> len += sprintf(page + len,
> "page %u %u\n"
> "swap %u %u\n"
> @@ -302,12 +350,13 @@
> }
> }
>
> + do_div(jif, HZ);
> len += sprintf(page + len,
> "\nctxt %u\n"
> "btime %lu\n"
> "processes %lu\n",
> kstat.context_swtch,
> - xtime.tv_sec - jif / HZ,
> + xtime.tv_sec - (unsigned long) jif,
> total_forks);
>
> return proc_calc_metrics(page, start, off, count, eof, len);
> @@ -565,4 +614,6 @@
> slabinfo_read_proc, NULL);
> if (entry)
> entry->write_proc = slabinfo_write_proc;
> +
> + init_idlewrap_timer();
> }
>
> --- linux-2.4.19-pre1/mm/oom_kill.c Sun Nov 4 02:05:25 2001
> +++ linux-2.4.19-pre1-j64/mm/oom_kill.c Tue Feb 26 16:13:35 2002
> @@ -69,11 +69,10 @@
> /*
> * CPU time is in seconds and run time is in minutes. There is no
> * particular reason for this other than that it turned out to work
> - * very well in practice. This is not safe against jiffie wraps
> - * but we don't care _that_ much...
> + * very well in practice.
> */
> cpu_time = (p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3);
> - run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
> + run_time = (get_jiffies64() - p->start_time) >> (SHIFT_HZ + 10);
>
> points /= int_sqrt(cpu_time);
> points /= int_sqrt(int_sqrt(run_time));
>
> --- linux-2.4.19-pre1/kernel/acct.c Mon Mar 19 21:35:08 2001
> +++ linux-2.4.19-pre1-j64/kernel/acct.c Tue Feb 26 16:13:35 2002
> @@ -56,6 +56,7 @@
> #include <linux/tty.h>
>
> #include <asm/uaccess.h>
> +#include <asm/div64.h>
>
> /*
> * These constants control the amount of freespace that suspend and
> @@ -227,20 +228,24 @@
> * This routine has been adopted from the encode_comp_t() function in
> * the kern_acct.c file of the FreeBSD operating system. The encoding
> * is a 13-bit fraction with a 3-bit (base 8) exponent.
> + *
> + * Bumped up to encode 64 bit values. Unfortunately the result may
> + * overflow now.
> */
>
> #define MANTSIZE 13 /* 13 bit mantissa. */
> -#define EXPSIZE 3 /* Base 8 (3 bit) exponent. */
> +#define EXPSIZE 3 /* 3 bit exponent. */
> +#define EXPBASE 3 /* Base 8 (3 bit) exponent. */
> #define MAXFRACT ((1 << MANTSIZE) - 1) /* Maximum fractional value. */
>
> -static comp_t encode_comp_t(unsigned long value)
> +static comp_t encode_comp_t(u64 value)
> {
> int exp, rnd;
>
> exp = rnd = 0;
> while (value > MAXFRACT) {
> - rnd = value & (1 << (EXPSIZE - 1)); /* Round up? */
> - value >>= EXPSIZE; /* Base 8 exponent == 3 bit shift. */
> + rnd = value & (1 << (EXPBASE - 1)); /* Round up? */
> + value >>= EXPBASE; /* Base 8 exponent == 3 bit shift. */
> exp++;
> }
>
> @@ -248,16 +253,21 @@
> * If we need to round up, do it (and handle overflow correctly).
> */
> if (rnd && (++value > MAXFRACT)) {
> - value >>= EXPSIZE;
> + value >>= EXPBASE;
> exp++;
> }
>
> /*
> * Clean it up and polish it off.
> */
> - exp <<= MANTSIZE; /* Shift the exponent into place */
> - exp += value; /* and add on the mantissa. */
> - return exp;
> + if (exp >= (1 << EXPSIZE)) {
> + /* Overflow. Return largest representable number instead. */
> + return (1ul << (MANTSIZE + EXPSIZE)) - 1;
> + } else {
> + exp <<= MANTSIZE; /* Shift the exponent into place */
> + exp += value; /* and add on the mantissa. */
> + return exp;
> + }
> }
>
> /*
> @@ -277,6 +287,7 @@
> struct acct ac;
> mm_segment_t fs;
> unsigned long vsize;
> + u64 elapsed;
>
> /*
> * First check to see if there is enough free_space to continue
> @@ -294,8 +305,10 @@
> strncpy(ac.ac_comm, current->comm, ACCT_COMM);
> ac.ac_comm[ACCT_COMM - 1] = '\0';
>
> - ac.ac_btime = CT_TO_SECS(current->start_time) + (xtime.tv_sec - (jiffies / HZ));
> - ac.ac_etime = encode_comp_t(jiffies - current->start_time);
> + elapsed = get_jiffies64() - current->start_time;
> + ac.ac_etime = encode_comp_t(elapsed);
> + do_div(elapsed, HZ);
> + ac.ac_btime = xtime.tv_sec - elapsed;
> ac.ac_utime = encode_comp_t(current->times.tms_utime);
> ac.ac_stime = encode_comp_t(current->times.tms_stime);
> ac.ac_uid = current->uid;

--
George [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/

2002-02-26 20:14:57

by Tim Schmielau

[permalink] [raw]
Subject: Re: [patch][rfc] enable uptime display > 497 days on 32 bit

On Tue, 26 Feb 2002, george anzinger wrote:

> Tim Schmielau wrote:
> >
> > This is a polished version of the patch that came out of the
> > "[Patch] Re: Nasty suprise with uptime" thread last november.
> >
[...]
> > Before submitting the patch to Marcelo for 2.4.19-pre, I'd like to get
> > some comments, especially on the following points:
> >
> > 1. Does the patch, in particular the introduction of get_jiffies64(),
> > interfere with the high-res-timers project
> > (http://sourceforge.net/projects/high-res-timers) in any bad way?
>
> Well, since you asked (thank you), the high-res-timers patch needs to
> get the full 64-bit uptime to implement CLOCK_MONOTONIC. Also, since
> the timers in the kernel are, in fact clocked by CLOCK_MONOTONIC (read
> jiffies) the code has to resolve any wall clock time (i.e.
> CLOCK_REALTIME) into CLOCK_MONOTONIC as part of the timer_settimer()
> call. This means that the 64-bit jiffies is used "often" in this code.
> In discussions on the list with Linus, he suggested that the jiffies
> counter be expanded to 64-bits in a particular way, i.e. by #define
> jiffie ... taking into account the endian and 64/32 bitness of the
> particular platform. I think this is what I implemented in the
> high-res-timers patch. This implementation allows almost all users to
> continue to work with 32-bit jiffies while providing a 64-bit value for
> those who need it. Unlike this patch, it keeps the 64-bit rational
> (i.e. always current) by doing a 64-bit add which adds an "adc" (add
> carry to memory) to the timer interrupt path. This update is already
> SMP protected so no additional locking is required on update. Reading
> it CAN be done with out locking if it is done with care (read high, low,
> high, if high1 != high2 do it again) or a read lock can be taken.
>
> The only down side I see to the suggested 64-bit jiffies as implemented
> in the high-res-timers patch is the name space collision issue around
> the #define of jiffies. I think I have found almost all of these,
> however, it is hard to come up with a .config that compiles ALL paths
> (especially considering the various archs).

Well, my intention was not to push this patch into high-res-timers.

Rather, once this patch may have made it into the kernel, the
high-res-timers patch should back out the simple 64 bit jiffies stuff,
i.e. init_jiffieswrap_timer(), check_jiffieswrap() and get_jiffies64(),
and do all this in its own way, just provide a new get_jiffies64()
function that does the (read high, low, high; compare) stuff.

If you see problems with the get_jiffies64() interface, or prefer another
name, please say so.

Since you did not raise any hard objections, I count this as an "you moron
put some extra work on me, but I can stand it, so it's OK with me" ;-)

Thanks,
Tim

2002-02-26 20:50:45

by Andreas Dilger

[permalink] [raw]
Subject: Re: [patch][rfc] enable uptime display > 497 days on 32 bit

On Feb 26, 2002 11:42 -0800, george anzinger wrote:
> Well, since you asked (thank you), the high-res-timers patch needs to
> get the full 64-bit uptime to implement CLOCK_MONOTONIC.
> This means that the 64-bit jiffies is used "often" in this code.
> Unlike this patch, it keeps the 64-bit rational (i.e. always current)

Well, if you use the get_jiffies64() interface the 64-bit value is
always coherent as well, and the direct access to the 32-bit value
is monotonic. While the high and low words of the 64-bit jiffies
values may be incoherent at times, as long as you always access the
64-bit value with the get_jiffies64() interface it should be OK.

Do you think that doing a 64-bit add-with-carry to memory on each
timer interrupt and doing multiple volatile reads is faster than
doing a spinlock with an optional 32-bit increment? Do you think
there would be a lot of contention on this lock, given that you
only need to lock when you need the full 64-bit value?

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2002-02-26 23:16:10

by George Anzinger

[permalink] [raw]
Subject: Re: [patch][rfc] enable uptime display > 497 days on 32 bit

Andreas Dilger wrote:
>
> On Feb 26, 2002 11:42 -0800, george anzinger wrote:
> > Well, since you asked (thank you), the high-res-timers patch needs to
> > get the full 64-bit uptime to implement CLOCK_MONOTONIC.
> > This means that the 64-bit jiffies is used "often" in this code.
> > Unlike this patch, it keeps the 64-bit rational (i.e. always current)
>
> Well, if you use the get_jiffies64() interface the 64-bit value is
> always coherent as well, and the direct access to the 32-bit value
> is monotonic. While the high and low words of the 64-bit jiffies
> values may be incoherent at times, as long as you always access the
> 64-bit value with the get_jiffies64() interface it should be OK.
>
> Do you think that doing a 64-bit add-with-carry to memory on each
> timer interrupt and doing multiple volatile reads is faster than
> doing a spinlock with an optional 32-bit increment?

I think the memory cycle is "almost" free as we are also updating
jiffies which is in the same cache line, so, yes, in the overall scheme
of things the overhead of the additional add-with-carry is very small.
On the read side of things, the issue is not so much the lock, but the
irq nature of it. This will be VERY long, much longer than the double
load of the high order bits, again from the same cache line.

> Do you think
> there would be a lot of contention on this lock, given that you
> only need to lock when you need the full 64-bit value?

A question that arises is if you can use an independant lock. For the
high-res code, we need to be coherent with the sub-jiffie part also, and
this is all updated in the interrupt code, so the lock, it would seem,
should be the one that is taken there, which is the xtime read/write irq
lock. Again, it is the irq nature of things that is slow.

>
> Cheers, Andreas
> --
> Andreas Dilger
> http://sourceforge.net/projects/ext2resize/
> http://www-mddsp.enel.ucalgary.ca/People/adilger/

--
George [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/

2002-02-26 23:34:20

by Andreas Dilger

[permalink] [raw]
Subject: Re: [patch][rfc] enable uptime display > 497 days on 32 bit

On Feb 26, 2002 15:13 -0800, george anzinger wrote:
> Andreas Dilger wrote:
> > Do you think that doing a 64-bit add-with-carry to memory on each
> > timer interrupt and doing multiple volatile reads is faster than
> > doing a spinlock with an optional 32-bit increment?
>
> I think the memory cycle is "almost" free as we are also updating
> jiffies which is in the same cache line, so, yes, in the overall scheme
> of things the overhead of the additional add-with-carry is very small.
> On the read side of things, the issue is not so much the lock, but the
> irq nature of it. This will be VERY long, much longer than the double
> load of the high order bits, again from the same cache line.

I was wondering about that myself when looking at the code again. I'm
not quite sure why we need to use the irq spinlock, since we already
make a local copy of jiffies so another timer IRQ changing the jiffies
value shouldn't affect the return value of get_jiffies64(). Then again,
that isn't exactly stuff I'm familiar with, so I could be totally
off-base here.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2002-02-27 04:34:44

by Tim Schmielau

[permalink] [raw]
Subject: Re: [patch][rfc] enable uptime display > 497 days on 32 bit

On Tue, 26 Feb 2002, Andreas Dilger wrote:
> I'm
> not quite sure why we need to use the irq spinlock, since we already
> make a local copy of jiffies so another timer IRQ changing the jiffies
> value shouldn't affect the return value of get_jiffies64(). Then again,
> that isn't exactly stuff I'm familiar with, so I could be totally
> off-base here.

Indeed, the outcome of get_jiffies64() cannot be affected. The lock is
just to prevent the tiny chance of jiffies_hi getting incremented twice.

Tim