Hi,
the commits
411187fb05cd11676b0979d9fbf3291db69dbce2 (GTOD: persistent clock support)
c1d370e167d66b10bca3b602d3740405469383de (i386: use GTOD persistent clock
support)
caused a change in the way uptime is measured and introduced a regression such
that it's no longer possible to obtain a correct process start time or the
system boot time.
These two commits make the monotonic time not jump by hudreds of seconds after
the kernel resumes from suspend, but they achieve it by moving the
wall_to_monotonic offset, which is used for all boot time / uptime
calculations. It's not possible to read the real boot time / process start
time then.
I was thinking about a solution and I am posting the least intrusive one I
found. I add a total_sleep_time variable which is to be added to
wall_to_monotonic when one wants the proper boot time. I do by no means say
it's the best one, I expect comments.
This requires a patch to procps that makes it use /proc/stat's btime instead
of "now - uptime". This was written by Tomas Smetana and I'm attaching it as
well. (It was posted upstream as it fixes another bug in ps as well.) Or we
could just make the /proc/uptime contain "now - btime" like it used to before
those two commits.
---
fs/proc/proc_misc.c | 2 +-
include/linux/time.h | 1 +
kernel/fork.c | 1 +
kernel/timer.c | 6 ++++++
4 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
index e2c4c0a..a29309e 100644
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -456,7 +456,7 @@ static int show_stat(struct seq_file *p, void *v)
user = nice = system = idle = iowait =
irq = softirq = steal = cputime64_zero;
- jif = - wall_to_monotonic.tv_sec;
+ jif = - (wall_to_monotonic.tv_sec + total_sleep_time);
if (wall_to_monotonic.tv_nsec)
--jif;
diff --git a/include/linux/time.h b/include/linux/time.h
index 8ea8dea..cb87616 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -90,6 +90,7 @@ static inline struct timespec timespec_sub(struct timespec lhs,
extern struct timespec xtime;
extern struct timespec wall_to_monotonic;
+extern unsigned long total_sleep_time;
extern seqlock_t xtime_lock __attribute__((weak));
extern unsigned long read_persistent_clock(void);
diff --git a/kernel/fork.c b/kernel/fork.c
index 6af959c..c051f17 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1056,6 +1056,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->lock_depth = -1; /* -1 = no lock */
do_posix_clock_monotonic_gettime(&p->start_time);
+ p->start_time.tv_sec += total_sleep_time;
p->security = NULL;
p->io_context = NULL;
p->io_wait = NULL;
diff --git a/kernel/timer.c b/kernel/timer.c
index dd6c2c1..13eb201 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -759,9 +759,13 @@ unsigned long next_timer_interrupt(void)
* at zero at system boot time, so wall_to_monotonic will be negative,
* however, we will ALWAYS keep the tv_nsec part positive so we can use
* the usual normalization.
+ *
+ * One needs to add total_sleep_time to wall_to_monotonic to get the real boot
+ * time.
*/
struct timespec xtime __attribute__ ((aligned (16)));
struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
+unsigned long total_sleep_time;
EXPORT_SYMBOL(xtime);
@@ -975,6 +979,7 @@ void __init timekeeping_init(void)
xtime.tv_nsec = 0;
set_normalized_timespec(&wall_to_monotonic,
-xtime.tv_sec, -xtime.tv_nsec);
+ total_sleep_time = 0;
write_sequnlock_irqrestore(&xtime_lock, flags);
}
@@ -1004,6 +1009,7 @@ static int timekeeping_resume(struct sys_device *dev)
xtime.tv_sec += sleep_length;
wall_to_monotonic.tv_sec -= sleep_length;
+ total_sleep_time += sleep_length;
}
/* re-base the last cycle value */
clock->cycle_last = clocksource_read(clock);
Regards,
--
Tomas Janousek, SW Engineer, Red Hat, Inc.
On Thu, 2007-05-03 at 17:03 +0200, Tomas Janousek wrote:
> Hi,
>
> the commits
> 411187fb05cd11676b0979d9fbf3291db69dbce2 (GTOD: persistent clock support)
> c1d370e167d66b10bca3b602d3740405469383de (i386: use GTOD persistent clock
> support)
> caused a change in the way uptime is measured and introduced a regression such
> that it's no longer possible to obtain a correct process start time or the
> system boot time.
>
> These two commits make the monotonic time not jump by hudreds of seconds after
> the kernel resumes from suspend, but they achieve it by moving the
> wall_to_monotonic offset, which is used for all boot time / uptime
> calculations. It's not possible to read the real boot time / process start
> time then.
Indeed. The monotonic clock's behavior around suspend and resume is
poorly defined. When we increased it, folks didn't like the fact that
uptime would increase while a system was suspended.
> I was thinking about a solution and I am posting the least intrusive one I
> found. I add a total_sleep_time variable which is to be added to
> wall_to_monotonic when one wants the proper boot time. I do by no means say
> it's the best one, I expect comments.
While I'd prefer wrapping it in a interface and keeping it in
timekeeping (rather then exporting a collection of values that should be
combined in varying ways), I think this approach is probably the best.
Rather then forcing one behavior or the other we can provide both smooth
monotonic and monotonic w/ sleep.
> This requires a patch to procps that makes it use /proc/stat's btime instead
> of "now - uptime". This was written by Tomas Smetana and I'm attaching it as
> well. (It was posted upstream as it fixes another bug in ps as well.) Or we
> could just make the /proc/uptime contain "now - btime" like it used to before
> those two commits.
>
> ---
> fs/proc/proc_misc.c | 2 +-
> include/linux/time.h | 1 +
> kernel/fork.c | 1 +
> kernel/timer.c | 6 ++++++
> 4 files changed, 9 insertions(+), 1 deletions(-)
>
> diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
> index e2c4c0a..a29309e 100644
> --- a/fs/proc/proc_misc.c
> +++ b/fs/proc/proc_misc.c
> @@ -456,7 +456,7 @@ static int show_stat(struct seq_file *p, void *v)
>
> user = nice = system = idle = iowait =
> irq = softirq = steal = cputime64_zero;
> - jif = - wall_to_monotonic.tv_sec;
> + jif = - (wall_to_monotonic.tv_sec + total_sleep_time);
> if (wall_to_monotonic.tv_nsec)
> --jif;
As I said, could we wrap this in a function?
Maybe something like get_monotonicwithsleep_time() (feel free to come up
with a better name :).
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 8ea8dea..cb87616 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -90,6 +90,7 @@ static inline struct timespec timespec_sub(struct timespec lhs,
>
> extern struct timespec xtime;
> extern struct timespec wall_to_monotonic;
> +extern unsigned long total_sleep_time;
> extern seqlock_t xtime_lock __attribute__((weak));
>
> extern unsigned long read_persistent_clock(void);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 6af959c..c051f17 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1056,6 +1056,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>
> p->lock_depth = -1; /* -1 = no lock */
> do_posix_clock_monotonic_gettime(&p->start_time);
> + p->start_time.tv_sec += total_sleep_time;
> p->security = NULL;
> p->io_context = NULL;
> p->io_wait = NULL;
> diff --git a/kernel/timer.c b/kernel/timer.c
> index dd6c2c1..13eb201 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -759,9 +759,13 @@ unsigned long next_timer_interrupt(void)
> * at zero at system boot time, so wall_to_monotonic will be negative,
> * however, we will ALWAYS keep the tv_nsec part positive so we can use
> * the usual normalization.
> + *
> + * One needs to add total_sleep_time to wall_to_monotonic to get the real boot
> + * time.
> */
> struct timespec xtime __attribute__ ((aligned (16)));
> struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
> +unsigned long total_sleep_time;
>
> EXPORT_SYMBOL(xtime);
>
> @@ -975,6 +979,7 @@ void __init timekeeping_init(void)
> xtime.tv_nsec = 0;
> set_normalized_timespec(&wall_to_monotonic,
> -xtime.tv_sec, -xtime.tv_nsec);
> + total_sleep_time = 0;
>
> write_sequnlock_irqrestore(&xtime_lock, flags);
> }
> @@ -1004,6 +1009,7 @@ static int timekeeping_resume(struct sys_device *dev)
>
> xtime.tv_sec += sleep_length;
> wall_to_monotonic.tv_sec -= sleep_length;
> + total_sleep_time += sleep_length;
> }
> /* re-base the last cycle value */
> clock->cycle_last = clocksource_read(clock);
Otherwise it looks good.
Thanks for bringing this up!
-john
Hi,
On Thu, May 03, 2007 at 12:59:20PM -0700, john stultz wrote:
> Indeed. The monotonic clock's behavior around suspend and resume is
> poorly defined. When we increased it, folks didn't like the fact that
> uptime would increase while a system was suspended.
Seems like your change also made accounting and OOM calculations behave more
correctly. And my previous patch broke that completely, because they would use
"your" uptime and "my" process start time, resulting in negative numbers when
calculating the process uptime. Therefore, I had to create another start_time
in the task_struct.
> > I was thinking about a solution and I am posting the least intrusive one I
> > found. I add a total_sleep_time variable which is to be added to
> > wall_to_monotonic when one wants the proper boot time. I do by no means say
> > it's the best one, I expect comments.
>
> While I'd prefer wrapping it in a interface and keeping it in
> timekeeping (rather then exporting a collection of values that should be
> combined in varying ways), I think this approach is probably the best.
> Rather then forcing one behavior or the other we can provide both smooth
> monotonic and monotonic w/ sleep.
Ok, thanks for the comment, I did so.
And here's new patch:
From: Tomas Janousek <[email protected]>
Subject: [PATCH] Fix boot time and process startup times after suspend
The commits
411187fb05cd11676b0979d9fbf3291db69dbce2 (GTOD: persistent clock support)
c1d370e167d66b10bca3b602d3740405469383de (i386: use GTOD persistent clock
support)
changed the monotonic time so that it no longer jumps after resume, but it's
not possible to use it for boot time and process start time calculations then.
I add a getboottime function to get the real boot time and a boot-based time
concept. Because we need a process start time corresponding to the monotonic
time, I add real_start_time to the task_struct which is then used in
/proc/pid/stat. The /proc/stat's btime uses getboottime instead of the
monotonic time offset as well.
Signed-off-by: Tomas Janousek <[email protected]>
Cc: Tomas Smetana <[email protected]>
---
fs/proc/array.c | 5 +++--
fs/proc/proc_misc.c | 8 +++++---
include/linux/ktime.h | 2 ++
include/linux/sched.h | 2 +-
include/linux/time.h | 2 ++
kernel/fork.c | 1 +
kernel/hrtimer.c | 27 +++++++++++++++++++++++++++
kernel/timer.c | 21 +++++++++++++++++++++
8 files changed, 62 insertions(+), 6 deletions(-)
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 07c9cdb..831734c 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -405,8 +405,9 @@ static int do_task_stat(struct task_struct *task, char * buffer, int whole)
/* Temporary variable needed for gcc-2.96 */
/* convert timespec -> nsec*/
- start_time = (unsigned long long)task->start_time.tv_sec * NSEC_PER_SEC
- + task->start_time.tv_nsec;
+ start_time =
+ (unsigned long long)task->real_start_time.tv_sec * NSEC_PER_SEC
+ + task->real_start_time.tv_nsec;
/* convert nsec -> ticks */
start_time = nsec_to_clock_t(start_time);
diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
index e2c4c0a..b1791a0 100644
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -453,12 +453,14 @@ static int show_stat(struct seq_file *p, void *v)
unsigned long jif;
cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
u64 sum = 0;
+ struct timespec boottime;
user = nice = system = idle = iowait =
irq = softirq = steal = cputime64_zero;
- jif = - wall_to_monotonic.tv_sec;
- if (wall_to_monotonic.tv_nsec)
- --jif;
+ getboottime(&boottime);
+ jif = boottime.tv_sec;
+ if (boottime.tv_nsec)
+ ++jif;
for_each_possible_cpu(i) {
int j;
diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index 248305b..346c1e1 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -269,6 +269,8 @@ static inline s64 ktime_to_ns(const ktime_t kt)
/* Get the monotonic time in timespec format: */
extern void ktime_get_ts(struct timespec *ts);
+/* Get the boot based time in timespec format: */
+extern void bbtime_get_ts(struct timespec *ts);
/* Get the real (wall-) time in timespec format: */
#define ktime_get_real_ts(ts) getnstimeofday(ts)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 49fe299..0845bde 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -884,7 +884,7 @@ struct task_struct {
unsigned long rt_priority;
cputime_t utime, stime;
unsigned long nvcsw, nivcsw; /* context switch counts */
- struct timespec start_time;
+ struct timespec start_time, real_start_time;
/* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
unsigned long min_flt, maj_flt;
diff --git a/include/linux/time.h b/include/linux/time.h
index 8ea8dea..a52f481 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -90,6 +90,7 @@ static inline struct timespec timespec_sub(struct timespec lhs,
extern struct timespec xtime;
extern struct timespec wall_to_monotonic;
+extern unsigned long total_sleep_time;
extern seqlock_t xtime_lock __attribute__((weak));
extern unsigned long read_persistent_clock(void);
@@ -116,6 +117,7 @@ extern int do_setitimer(int which, struct itimerval *value,
extern unsigned int alarm_setitimer(unsigned int seconds);
extern int do_getitimer(int which, struct itimerval *value);
extern void getnstimeofday(struct timespec *tv);
+extern void getboottime(struct timespec *ts);
extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
extern int timekeeping_is_continuous(void);
diff --git a/kernel/fork.c b/kernel/fork.c
index 6af959c..b10d9b7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1056,6 +1056,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->lock_depth = -1; /* -1 = no lock */
do_posix_clock_monotonic_gettime(&p->start_time);
+ bbtime_get_ts(&p->real_start_time);
p->security = NULL;
p->io_context = NULL;
p->io_wait = NULL;
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index b74860a..cd744a1 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -128,6 +128,33 @@ void ktime_get_ts(struct timespec *ts)
}
EXPORT_SYMBOL_GPL(ktime_get_ts);
+/**
+ * bbtime_get_ts - get the boot based clock in timespec format
+ * @ts: pointer to timespec variable
+ *
+ * The function calculates the boot based clock from the realtime
+ * clock, the wall_to_monotonic offset and the total sleep time and
+ * stores the result in normalized timespec format in the variable
+ * pointed to by @ts.
+ */
+void bbtime_get_ts(struct timespec *ts)
+{
+ struct timespec tobb;
+ unsigned long seq;
+
+ do {
+ seq = read_seqbegin(&xtime_lock);
+ getnstimeofday(ts);
+ tobb = wall_to_monotonic;
+ tobb.tv_sec += total_sleep_time;
+
+ } while (read_seqretry(&xtime_lock, seq));
+
+ set_normalized_timespec(ts, ts->tv_sec + tobb.tv_sec,
+ ts->tv_nsec + tobb.tv_nsec);
+}
+EXPORT_SYMBOL_GPL(bbtime_get_ts);
+
/*
* Get the coarse grained time at the softirq based on xtime and
* wall_to_monotonic.
diff --git a/kernel/timer.c b/kernel/timer.c
index dd6c2c1..a1a8ccb 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -759,9 +759,13 @@ unsigned long next_timer_interrupt(void)
* at zero at system boot time, so wall_to_monotonic will be negative,
* however, we will ALWAYS keep the tv_nsec part positive so we can use
* the usual normalization.
+ *
+ * We need to add total_sleep_time to wall_to_monotonic to get the real boot
+ * time.
*/
struct timespec xtime __attribute__ ((aligned (16)));
struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
+unsigned long total_sleep_time;
EXPORT_SYMBOL(xtime);
@@ -832,6 +836,21 @@ void getnstimeofday(struct timespec *ts)
EXPORT_SYMBOL(getnstimeofday);
/**
+ * getboottime - Return the real time of system boot.
+ * @ts: pointer to the timespec to be set
+ *
+ * Returns the time of day in a timespec.
+ */
+void getboottime(struct timespec *ts)
+{
+ set_normalized_timespec(ts,
+ - (wall_to_monotonic.tv_sec + total_sleep_time),
+ - wall_to_monotonic.tv_nsec);
+}
+
+EXPORT_SYMBOL(getboottime);
+
+/**
* do_gettimeofday - Returns the time of day in a timeval
* @tv: pointer to the timeval to be set
*
@@ -975,6 +994,7 @@ void __init timekeeping_init(void)
xtime.tv_nsec = 0;
set_normalized_timespec(&wall_to_monotonic,
-xtime.tv_sec, -xtime.tv_nsec);
+ total_sleep_time = 0;
write_sequnlock_irqrestore(&xtime_lock, flags);
}
@@ -1004,6 +1024,7 @@ static int timekeeping_resume(struct sys_device *dev)
xtime.tv_sec += sleep_length;
wall_to_monotonic.tv_sec -= sleep_length;
+ total_sleep_time += sleep_length;
}
/* re-base the last cycle value */
clock->cycle_last = clocksource_read(clock);
--
TJ. (Brno, CZ), BaseOS, Red Hat
On Fri, 2007-05-04 at 18:13 +0200, Tomas Janousek wrote:
> Hi,
>
> On Thu, May 03, 2007 at 12:59:20PM -0700, john stultz wrote:
> > Indeed. The monotonic clock's behavior around suspend and resume is
> > poorly defined. When we increased it, folks didn't like the fact that
> > uptime would increase while a system was suspended.
>
> Seems like your change also made accounting and OOM calculations behave more
> correctly. And my previous patch broke that completely, because they would use
> "your" uptime and "my" process start time, resulting in negative numbers when
> calculating the process uptime. Therefore, I had to create another start_time
> in the task_struct.
>
> > > I was thinking about a solution and I am posting the least intrusive one I
> > > found. I add a total_sleep_time variable which is to be added to
> > > wall_to_monotonic when one wants the proper boot time. I do by no means say
> > > it's the best one, I expect comments.
> >
> > While I'd prefer wrapping it in a interface and keeping it in
> > timekeeping (rather then exporting a collection of values that should be
> > combined in varying ways), I think this approach is probably the best.
> > Rather then forcing one behavior or the other we can provide both smooth
> > monotonic and monotonic w/ sleep.
>
> Ok, thanks for the comment, I did so.
Looks great! Just a few minor comments.
> And here's new patch:
>
>
> From: Tomas Janousek <[email protected]>
> Subject: [PATCH] Fix boot time and process startup times after suspend
>
> The commits
> 411187fb05cd11676b0979d9fbf3291db69dbce2 (GTOD: persistent clock support)
> c1d370e167d66b10bca3b602d3740405469383de (i386: use GTOD persistent clock
> support)
> changed the monotonic time so that it no longer jumps after resume, but it's
> not possible to use it for boot time and process start time calculations then.
>
> I add a getboottime function to get the real boot time and a boot-based time
> concept. Because we need a process start time corresponding to the monotonic
> time, I add real_start_time to the task_struct which is then used in
> /proc/pid/stat. The /proc/stat's btime uses getboottime instead of the
> monotonic time offset as well.
>
> Signed-off-by: Tomas Janousek <[email protected]>
> Cc: Tomas Smetana <[email protected]>
> ---
> fs/proc/array.c | 5 +++--
> fs/proc/proc_misc.c | 8 +++++---
> include/linux/ktime.h | 2 ++
> include/linux/sched.h | 2 +-
> include/linux/time.h | 2 ++
> kernel/fork.c | 1 +
> kernel/hrtimer.c | 27 +++++++++++++++++++++++++++
> kernel/timer.c | 21 +++++++++++++++++++++
> 8 files changed, 62 insertions(+), 6 deletions(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 07c9cdb..831734c 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -405,8 +405,9 @@ static int do_task_stat(struct task_struct *task, char * buffer, int whole)
>
> /* Temporary variable needed for gcc-2.96 */
> /* convert timespec -> nsec*/
> - start_time = (unsigned long long)task->start_time.tv_sec * NSEC_PER_SEC
> - + task->start_time.tv_nsec;
> + start_time =
> + (unsigned long long)task->real_start_time.tv_sec * NSEC_PER_SEC
> + + task->real_start_time.tv_nsec;
> /* convert nsec -> ticks */
> start_time = nsec_to_clock_t(start_time);
>
> diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
> index e2c4c0a..b1791a0 100644
> --- a/fs/proc/proc_misc.c
> +++ b/fs/proc/proc_misc.c
> @@ -453,12 +453,14 @@ static int show_stat(struct seq_file *p, void *v)
> unsigned long jif;
> cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
> u64 sum = 0;
> + struct timespec boottime;
>
> user = nice = system = idle = iowait =
> irq = softirq = steal = cputime64_zero;
> - jif = - wall_to_monotonic.tv_sec;
> - if (wall_to_monotonic.tv_nsec)
> - --jif;
> + getboottime(&boottime);
> + jif = boottime.tv_sec;
> + if (boottime.tv_nsec)
> + ++jif;
>
> for_each_possible_cpu(i) {
> int j;
> diff --git a/include/linux/ktime.h b/include/linux/ktime.h
> index 248305b..346c1e1 100644
> --- a/include/linux/ktime.h
> +++ b/include/linux/ktime.h
> @@ -269,6 +269,8 @@ static inline s64 ktime_to_ns(const ktime_t kt)
>
> /* Get the monotonic time in timespec format: */
> extern void ktime_get_ts(struct timespec *ts);
> +/* Get the boot based time in timespec format: */
> +extern void bbtime_get_ts(struct timespec *ts);
>
> /* Get the real (wall-) time in timespec format: */
> #define ktime_get_real_ts(ts) getnstimeofday(ts)
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 49fe299..0845bde 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -884,7 +884,7 @@ struct task_struct {
> unsigned long rt_priority;
> cputime_t utime, stime;
> unsigned long nvcsw, nivcsw; /* context switch counts */
> - struct timespec start_time;
> + struct timespec start_time, real_start_time;
> /* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
> unsigned long min_flt, maj_flt;
>
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 8ea8dea..a52f481 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -90,6 +90,7 @@ static inline struct timespec timespec_sub(struct timespec lhs,
>
> extern struct timespec xtime;
> extern struct timespec wall_to_monotonic;
> +extern unsigned long total_sleep_time;
> extern seqlock_t xtime_lock __attribute__((weak));
>
> extern unsigned long read_persistent_clock(void);
> @@ -116,6 +117,7 @@ extern int do_setitimer(int which, struct itimerval *value,
> extern unsigned int alarm_setitimer(unsigned int seconds);
> extern int do_getitimer(int which, struct itimerval *value);
> extern void getnstimeofday(struct timespec *tv);
> +extern void getboottime(struct timespec *ts);
>
> extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
> extern int timekeeping_is_continuous(void);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 6af959c..b10d9b7 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1056,6 +1056,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>
> p->lock_depth = -1; /* -1 = no lock */
> do_posix_clock_monotonic_gettime(&p->start_time);
> + bbtime_get_ts(&p->real_start_time);
> p->security = NULL;
> p->io_context = NULL;
> p->io_wait = NULL;
Since both do_posix_clock_monotonic_gettime and bbtime_get_ts actually
read the timekeeping hardware (which could cost around 1.5us each), we
probably will want to optimize this down to a single read, or just avoid
the hardware read and take lower-granularity xtime value.
However, I'm not sure how fine-grained the start_time values need to be.
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index b74860a..cd744a1 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -128,6 +128,33 @@ void ktime_get_ts(struct timespec *ts)
> }
> EXPORT_SYMBOL_GPL(ktime_get_ts);
>
> +/**
> + * bbtime_get_ts - get the boot based clock in timespec format
> + * @ts: pointer to timespec variable
> + *
> + * The function calculates the boot based clock from the realtime
> + * clock, the wall_to_monotonic offset and the total sleep time and
> + * stores the result in normalized timespec format in the variable
> + * pointed to by @ts.
You might want to clarify this as being the "monotonic boot time", or
the time the system has been running including sleep time and that calls
to settimeofday() will not affect this value.
> + */
> +void bbtime_get_ts(struct timespec *ts)
Any reason you added this in hrtimer.c instead of timer.c? I think
keeping the new functions closer together would make the subtle
difference between them more clear, and would allow total_sleep_time to
be static to one file.
> +{
> + struct timespec tobb;
> + unsigned long seq;
> +
> + do {
> + seq = read_seqbegin(&xtime_lock);
> + getnstimeofday(ts);
> + tobb = wall_to_monotonic;
> + tobb.tv_sec += total_sleep_time;
> +
> + } while (read_seqretry(&xtime_lock, seq));
> +
> + set_normalized_timespec(ts, ts->tv_sec + tobb.tv_sec,
> + ts->tv_nsec + tobb.tv_nsec);
> +}
> +EXPORT_SYMBOL_GPL(bbtime_get_ts);
> +
> /*
> * Get the coarse grained time at the softirq based on xtime and
> * wall_to_monotonic.
> diff --git a/kernel/timer.c b/kernel/timer.c
> index dd6c2c1..a1a8ccb 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -759,9 +759,13 @@ unsigned long next_timer_interrupt(void)
> * at zero at system boot time, so wall_to_monotonic will be negative,
> * however, we will ALWAYS keep the tv_nsec part positive so we can use
> * the usual normalization.
> + *
> + * We need to add total_sleep_time to wall_to_monotonic to get the real boot
> + * time.
> */
> struct timespec xtime __attribute__ ((aligned (16)));
> struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
> +unsigned long total_sleep_time;
>
> EXPORT_SYMBOL(xtime);
>
> @@ -832,6 +836,21 @@ void getnstimeofday(struct timespec *ts)
> EXPORT_SYMBOL(getnstimeofday);
>
> /**
> + * getboottime - Return the real time of system boot.
> + * @ts: pointer to the timespec to be set
Might want to clarify this as being the "boot time" based on the current
CLOCK_REALTIME clock, noting that calls to settimeofday() would affect
this value.
> + * Returns the time of day in a timespec.
> + */
> +void getboottime(struct timespec *ts)
> +{
> + set_normalized_timespec(ts,
> + - (wall_to_monotonic.tv_sec + total_sleep_time),
> + - wall_to_monotonic.tv_nsec);
> +}
> +
> +EXPORT_SYMBOL(getboottime);
> +
> +/**
> * do_gettimeofday - Returns the time of day in a timeval
> * @tv: pointer to the timeval to be set
> *
> @@ -975,6 +994,7 @@ void __init timekeeping_init(void)
> xtime.tv_nsec = 0;
> set_normalized_timespec(&wall_to_monotonic,
> -xtime.tv_sec, -xtime.tv_nsec);
> + total_sleep_time = 0;
>
> write_sequnlock_irqrestore(&xtime_lock, flags);
> }
> @@ -1004,6 +1024,7 @@ static int timekeeping_resume(struct sys_device *dev)
>
> xtime.tv_sec += sleep_length;
> wall_to_monotonic.tv_sec -= sleep_length;
> + total_sleep_time += sleep_length;
> }
> /* re-base the last cycle value */
> clock->cycle_last = clocksource_read(clock);
The only other gotcha is that this patch conflicts with a 2.6.22 queued
patch in -mm that moves the majority of timekeeping code in
kernel/timer.c to kernel/time/timekeeping.c
So it might be easiest to re-diff this against -mm and send it to Andrew
for inclusion along with the cleanups. However, since this is a fix, it
should have priority over cleanups. I just don't want to upset Andrew's
tree too much :)
Your thoughts?
-john
Hi,
On Fri, May 04, 2007 at 11:18:52AM -0700, john stultz wrote:
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 6af959c..b10d9b7 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -1056,6 +1056,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> >
> > p->lock_depth = -1; /* -1 = no lock */
> > do_posix_clock_monotonic_gettime(&p->start_time);
> > + bbtime_get_ts(&p->real_start_time);
> > p->security = NULL;
> > p->io_context = NULL;
> > p->io_wait = NULL;
>
> Since both do_posix_clock_monotonic_gettime and bbtime_get_ts actually
> read the timekeeping hardware (which could cost around 1.5us each), we
> probably will want to optimize this down to a single read, or just avoid
> the hardware read and take lower-granularity xtime value.
>
> However, I'm not sure how fine-grained the start_time values need to be.
We can add a function like monotonic_to_bb probably? Nothing else crossed my
mind.
> > diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> > index b74860a..cd744a1 100644
> > --- a/kernel/hrtimer.c
> > +++ b/kernel/hrtimer.c
> > @@ -128,6 +128,33 @@ void ktime_get_ts(struct timespec *ts)
> > }
> > EXPORT_SYMBOL_GPL(ktime_get_ts);
> >
> > +/**
> > + * bbtime_get_ts - get the boot based clock in timespec format
> > + * @ts: pointer to timespec variable
> > + *
> > + * The function calculates the boot based clock from the realtime
> > + * clock, the wall_to_monotonic offset and the total sleep time and
> > + * stores the result in normalized timespec format in the variable
> > + * pointed to by @ts.
>
> You might want to clarify this as being the "monotonic boot time", or
> the time the system has been running including sleep time and that calls
> to settimeofday() will not affect this value.
I just copied the text from the ktime_get_ts, but yes, why not.
> > + */
> > +void bbtime_get_ts(struct timespec *ts)
>
> Any reason you added this in hrtimer.c instead of timer.c? I think
> keeping the new functions closer together would make the subtle
> difference between them more clear, and would allow total_sleep_time to
> be static to one file.
The same as above, I copied the ktime_get_ts and put it under that. Will doing
the diff against -mm solve this automagically? :)
> > @@ -832,6 +836,21 @@ void getnstimeofday(struct timespec *ts)
> > EXPORT_SYMBOL(getnstimeofday);
> >
> > /**
> > + * getboottime - Return the real time of system boot.
> > + * @ts: pointer to the timespec to be set
>
> Might want to clarify this as being the "boot time" based on the current
> CLOCK_REALTIME clock, noting that calls to settimeofday() would affect
> this value.
Ok.
> The only other gotcha is that this patch conflicts with a 2.6.22 queued
> patch in -mm that moves the majority of timekeeping code in
> kernel/timer.c to kernel/time/timekeeping.c
>
> So it might be easiest to re-diff this against -mm and send it to Andrew
> for inclusion along with the cleanups. However, since this is a fix, it
> should have priority over cleanups. I just don't want to upset Andrew's
> tree too much :)
>
> Your thoughts?
I'll look at that and will see.
Thanks for the comments,
--
TJ. (Brno, CZ), BaseOS, Red Hat
john stultz writes:
> Indeed. The monotonic clock's behavior around suspend and resume
> is poorly defined. When we increased it, folks didn't like the
> fact that uptime would increase while a system was suspended.
The uptime really does need to increase during suspend. Otherwise,
things get really weird with devices like the OLPC XO which will be
sleeping between keystrokes. You could run the device for hours,
yet get an uptime of only a few minutes. Suspended time should get
counted as stolen time, same as when a hypervisor takes away time.
Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused boot time to move and
process start times to become invalid after suspend. Using boot based time for
those restores the old behaviour and fixes the issue.
Signed-off-by: Tomas Janousek <[email protected]>
Cc: Tomas Smetana <[email protected]>
Cc: John Stultz <[email protected]>
---
fs/proc/array.c | 5 +++--
fs/proc/proc_misc.c | 8 +++++---
include/linux/sched.h | 2 +-
kernel/fork.c | 2 ++
4 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 74f30e0..e798e11 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -405,8 +405,9 @@ static int do_task_stat(struct task_struct *task, char * buffer, int whole)
/* Temporary variable needed for gcc-2.96 */
/* convert timespec -> nsec*/
- start_time = (unsigned long long)task->start_time.tv_sec * NSEC_PER_SEC
- + task->start_time.tv_nsec;
+ start_time =
+ (unsigned long long)task->real_start_time.tv_sec * NSEC_PER_SEC
+ + task->real_start_time.tv_nsec;
/* convert nsec -> ticks */
start_time = nsec_to_clock_t(start_time);
diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
index 4636be7..0782d34 100644
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -445,12 +445,14 @@ static int show_stat(struct seq_file *p, void *v)
unsigned long jif;
cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
u64 sum = 0;
+ struct timespec boottime;
user = nice = system = idle = iowait =
irq = softirq = steal = cputime64_zero;
- jif = - wall_to_monotonic.tv_sec;
- if (wall_to_monotonic.tv_nsec)
- --jif;
+ getboottime(&boottime);
+ jif = boottime.tv_sec;
+ if (boottime.tv_nsec)
+ ++jif;
for_each_possible_cpu(i) {
int j;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 40645b4..386ff51 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -918,7 +918,7 @@ struct task_struct {
unsigned int rt_priority;
cputime_t utime, stime;
unsigned long nvcsw, nivcsw; /* context switch counts */
- struct timespec start_time;
+ struct timespec start_time, real_start_time;
/* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
unsigned long min_flt, maj_flt;
diff --git a/kernel/fork.c b/kernel/fork.c
index 4239de2..3aa82bc 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1059,6 +1059,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->lock_depth = -1; /* -1 = no lock */
do_posix_clock_monotonic_gettime(&p->start_time);
+ p->real_start_time = p->start_time;
+ monotonic_to_bootbased(&p->real_start_time);
p->security = NULL;
p->io_context = NULL;
p->io_wait = NULL;
--
1.5.1.4
--
TJ. (Brno, CZ), BaseOS, Red Hat
Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused uptime not to increase
during suspend. This may cause confusion so I restore the old behaviour by
using the boot based time instead of monotonic for uptime.
Signed-off-by: Tomas Janousek <[email protected]>
---
fs/proc/proc_misc.c | 1 +
kernel/timer.c | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
index 0782d34..d84956c 100644
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -107,6 +107,7 @@ static int uptime_read_proc(char *page, char **start, off_t off,
cputime_t idletime = cputime_add(init_task.utime, init_task.stime);
do_posix_clock_monotonic_gettime(&uptime);
+ monotonic_to_bootbased(&uptime);
cputime_to_timespec(idletime, &idle);
len = sprintf(page,"%lu.%02lu %lu.%02lu\n",
(unsigned long) uptime.tv_sec,
diff --git a/kernel/timer.c b/kernel/timer.c
index d5e1d6f..3a8ba7d 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1106,6 +1106,7 @@ int do_sysinfo(struct sysinfo *info)
getnstimeofday(&tp);
tp.tv_sec += wall_to_monotonic.tv_sec;
tp.tv_nsec += wall_to_monotonic.tv_nsec;
+ monotonic_to_bootbased(&tp);
if (tp.tv_nsec - NSEC_PER_SEC >= 0) {
tp.tv_nsec = tp.tv_nsec - NSEC_PER_SEC;
tp.tv_sec++;
--
1.5.1.4
--
TJ. (Brno, CZ), BaseOS, Red Hat
The commits
411187fb05cd11676b0979d9fbf3291db69dbce2 (GTOD: persistent clock support)
c1d370e167d66b10bca3b602d3740405469383de (i386: use GTOD persistent clock
support)
changed the monotonic time so that it no longer jumps after resume, but it's
not possible to use it for boot time and process start time calculations then.
Also, the uptime no longer increases during suspend.
I add a variable to track the wall_to_monotonic changes, a function to get the
real boot time and a function to get the boot based time from the monotonic
one.
Signed-off-by: Tomas Janousek <[email protected]>
Cc: Tomas Smetana <[email protected]>
Cc: John Stultz <[email protected]>
---
include/linux/time.h | 2 ++
kernel/time/timekeeping.c | 41 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+), 0 deletions(-)
diff --git a/include/linux/time.h b/include/linux/time.h
index 8997b61..06f3eaf 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -116,6 +116,8 @@ extern int do_setitimer(int which, struct itimerval *value,
extern unsigned int alarm_setitimer(unsigned int seconds);
extern int do_getitimer(int which, struct itimerval *value);
extern void getnstimeofday(struct timespec *tv);
+extern void getboottime(struct timespec *ts);
+extern void monotonic_to_bootbased(struct timespec *ts);
extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
extern int timekeeping_is_continuous(void);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index f9217bf..dd9647a 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -36,9 +36,17 @@ EXPORT_SYMBOL(xtime_lock);
* at zero at system boot time, so wall_to_monotonic will be negative,
* however, we will ALWAYS keep the tv_nsec part positive so we can use
* the usual normalization.
+ *
+ * wall_to_monotonic is moved after resume from suspend for the monotonic
+ * time not to jump. We need to add total_sleep_time to wall_to_monotonic
+ * to get the real boot based time offset.
+ *
+ * - wall_to_monotonic is no longer the boot time, getboottime must be
+ * used instead.
*/
struct timespec xtime __attribute__ ((aligned (16)));
struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
+static unsigned long total_sleep_time;
EXPORT_SYMBOL(xtime);
@@ -251,6 +259,7 @@ void __init timekeeping_init(void)
xtime.tv_nsec = 0;
set_normalized_timespec(&wall_to_monotonic,
-xtime.tv_sec, -xtime.tv_nsec);
+ total_sleep_time = 0;
write_sequnlock_irqrestore(&xtime_lock, flags);
}
@@ -280,6 +289,7 @@ static int timekeeping_resume(struct sys_device *dev)
xtime.tv_sec += sleep_length;
wall_to_monotonic.tv_sec -= sleep_length;
+ total_sleep_time += sleep_length;
}
/* re-base the last cycle value */
clock->cycle_last = clocksource_read(clock);
@@ -474,3 +484,34 @@ void update_wall_time(void)
change_clocksource();
update_vsyscall(&xtime, clock);
}
+
+/**
+ * getboottime - Return the real time of system boot.
+ * @ts: pointer to the timespec to be set
+ *
+ * Returns the time of day in a timespec.
+ *
+ * This is based on the wall_to_monotonic offset and the total suspend
+ * time. Calls to settimeofday will affect the value returned (which
+ * basically means that however wrong your real time clock is at boot time,
+ * you get the right time here).
+ */
+void getboottime(struct timespec *ts)
+{
+ set_normalized_timespec(ts,
+ - (wall_to_monotonic.tv_sec + total_sleep_time),
+ - wall_to_monotonic.tv_nsec);
+}
+
+EXPORT_SYMBOL(getboottime);
+
+/**
+ * monotonic_to_bootbased - Convert the monotonic time to boot based.
+ * @ts: pointer to the timespec to be converted
+ */
+void monotonic_to_bootbased(struct timespec *ts)
+{
+ ts->tv_sec += total_sleep_time;
+}
+
+EXPORT_SYMBOL(monotonic_to_bootbased);
--
1.5.1.4
--
TJ. (Brno, CZ), BaseOS, Red Hat
On Thu, 2007-05-10 at 19:10 +0200, Tomas Janousek wrote:
> The commits
> 411187fb05cd11676b0979d9fbf3291db69dbce2 (GTOD: persistent clock support)
> c1d370e167d66b10bca3b602d3740405469383de (i386: use GTOD persistent clock
> support)
> changed the monotonic time so that it no longer jumps after resume, but it's
> not possible to use it for boot time and process start time calculations then.
> Also, the uptime no longer increases during suspend.
>
> I add a variable to track the wall_to_monotonic changes, a function to get the
> real boot time and a function to get the boot based time from the monotonic
> one.
>
> Signed-off-by: Tomas Janousek <[email protected]>
> Cc: Tomas Smetana <[email protected]>
> Cc: John Stultz <[email protected]>
Looks good! Thanks again for catching this!
Acked-by: John Stultz <[email protected]>
> ---
> include/linux/time.h | 2 ++
> kernel/time/timekeeping.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 43 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 8997b61..06f3eaf 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -116,6 +116,8 @@ extern int do_setitimer(int which, struct itimerval *value,
> extern unsigned int alarm_setitimer(unsigned int seconds);
> extern int do_getitimer(int which, struct itimerval *value);
> extern void getnstimeofday(struct timespec *tv);
> +extern void getboottime(struct timespec *ts);
> +extern void monotonic_to_bootbased(struct timespec *ts);
>
> extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
> extern int timekeeping_is_continuous(void);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index f9217bf..dd9647a 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -36,9 +36,17 @@ EXPORT_SYMBOL(xtime_lock);
> * at zero at system boot time, so wall_to_monotonic will be negative,
> * however, we will ALWAYS keep the tv_nsec part positive so we can use
> * the usual normalization.
> + *
> + * wall_to_monotonic is moved after resume from suspend for the monotonic
> + * time not to jump. We need to add total_sleep_time to wall_to_monotonic
> + * to get the real boot based time offset.
> + *
> + * - wall_to_monotonic is no longer the boot time, getboottime must be
> + * used instead.
> */
> struct timespec xtime __attribute__ ((aligned (16)));
> struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
> +static unsigned long total_sleep_time;
>
> EXPORT_SYMBOL(xtime);
>
> @@ -251,6 +259,7 @@ void __init timekeeping_init(void)
> xtime.tv_nsec = 0;
> set_normalized_timespec(&wall_to_monotonic,
> -xtime.tv_sec, -xtime.tv_nsec);
> + total_sleep_time = 0;
>
> write_sequnlock_irqrestore(&xtime_lock, flags);
> }
> @@ -280,6 +289,7 @@ static int timekeeping_resume(struct sys_device *dev)
>
> xtime.tv_sec += sleep_length;
> wall_to_monotonic.tv_sec -= sleep_length;
> + total_sleep_time += sleep_length;
> }
> /* re-base the last cycle value */
> clock->cycle_last = clocksource_read(clock);
> @@ -474,3 +484,34 @@ void update_wall_time(void)
> change_clocksource();
> update_vsyscall(&xtime, clock);
> }
> +
> +/**
> + * getboottime - Return the real time of system boot.
> + * @ts: pointer to the timespec to be set
> + *
> + * Returns the time of day in a timespec.
> + *
> + * This is based on the wall_to_monotonic offset and the total suspend
> + * time. Calls to settimeofday will affect the value returned (which
> + * basically means that however wrong your real time clock is at boot time,
> + * you get the right time here).
> + */
> +void getboottime(struct timespec *ts)
> +{
> + set_normalized_timespec(ts,
> + - (wall_to_monotonic.tv_sec + total_sleep_time),
> + - wall_to_monotonic.tv_nsec);
> +}
> +
> +EXPORT_SYMBOL(getboottime);
> +
> +/**
> + * monotonic_to_bootbased - Convert the monotonic time to boot based.
> + * @ts: pointer to the timespec to be converted
> + */
> +void monotonic_to_bootbased(struct timespec *ts)
> +{
> + ts->tv_sec += total_sleep_time;
> +}
> +
> +EXPORT_SYMBOL(monotonic_to_bootbased);
> --
> 1.5.1.4
>
>
On Thu, 2007-05-10 at 19:10 +0200, Tomas Janousek wrote:
> Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused boot time to move and
> process start times to become invalid after suspend. Using boot based time for
> those restores the old behaviour and fixes the issue.
>
> Signed-off-by: Tomas Janousek <[email protected]>
> Cc: Tomas Smetana <[email protected]>
> Cc: John Stultz <[email protected]>
Acked-by: John Stultz <[email protected]>
> ---
> fs/proc/array.c | 5 +++--
> fs/proc/proc_misc.c | 8 +++++---
> include/linux/sched.h | 2 +-
> kernel/fork.c | 2 ++
> 4 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 74f30e0..e798e11 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -405,8 +405,9 @@ static int do_task_stat(struct task_struct *task, char * buffer, int whole)
>
> /* Temporary variable needed for gcc-2.96 */
> /* convert timespec -> nsec*/
> - start_time = (unsigned long long)task->start_time.tv_sec * NSEC_PER_SEC
> - + task->start_time.tv_nsec;
> + start_time =
> + (unsigned long long)task->real_start_time.tv_sec * NSEC_PER_SEC
> + + task->real_start_time.tv_nsec;
> /* convert nsec -> ticks */
> start_time = nsec_to_clock_t(start_time);
>
> diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
> index 4636be7..0782d34 100644
> --- a/fs/proc/proc_misc.c
> +++ b/fs/proc/proc_misc.c
> @@ -445,12 +445,14 @@ static int show_stat(struct seq_file *p, void *v)
> unsigned long jif;
> cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
> u64 sum = 0;
> + struct timespec boottime;
>
> user = nice = system = idle = iowait =
> irq = softirq = steal = cputime64_zero;
> - jif = - wall_to_monotonic.tv_sec;
> - if (wall_to_monotonic.tv_nsec)
> - --jif;
> + getboottime(&boottime);
> + jif = boottime.tv_sec;
> + if (boottime.tv_nsec)
> + ++jif;
>
> for_each_possible_cpu(i) {
> int j;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 40645b4..386ff51 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -918,7 +918,7 @@ struct task_struct {
> unsigned int rt_priority;
> cputime_t utime, stime;
> unsigned long nvcsw, nivcsw; /* context switch counts */
> - struct timespec start_time;
> + struct timespec start_time, real_start_time;
> /* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
> unsigned long min_flt, maj_flt;
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 4239de2..3aa82bc 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1059,6 +1059,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>
> p->lock_depth = -1; /* -1 = no lock */
> do_posix_clock_monotonic_gettime(&p->start_time);
> + p->real_start_time = p->start_time;
> + monotonic_to_bootbased(&p->real_start_time);
> p->security = NULL;
> p->io_context = NULL;
> p->io_wait = NULL;
> --
> 1.5.1.4
>
>
On Thu, 2007-05-10 at 19:10 +0200, Tomas Janousek wrote:
> Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused uptime not to increase
> during suspend. This may cause confusion so I restore the old behaviour by
> using the boot based time instead of monotonic for uptime.
>
> Signed-off-by: Tomas Janousek <[email protected]>
Acked-by: John Stultz <[email protected]>
> ---
> fs/proc/proc_misc.c | 1 +
> kernel/timer.c | 1 +
> 2 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
> index 0782d34..d84956c 100644
> --- a/fs/proc/proc_misc.c
> +++ b/fs/proc/proc_misc.c
> @@ -107,6 +107,7 @@ static int uptime_read_proc(char *page, char **start, off_t off,
> cputime_t idletime = cputime_add(init_task.utime, init_task.stime);
>
> do_posix_clock_monotonic_gettime(&uptime);
> + monotonic_to_bootbased(&uptime);
> cputime_to_timespec(idletime, &idle);
> len = sprintf(page,"%lu.%02lu %lu.%02lu\n",
> (unsigned long) uptime.tv_sec,
> diff --git a/kernel/timer.c b/kernel/timer.c
> index d5e1d6f..3a8ba7d 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1106,6 +1106,7 @@ int do_sysinfo(struct sysinfo *info)
> getnstimeofday(&tp);
> tp.tv_sec += wall_to_monotonic.tv_sec;
> tp.tv_nsec += wall_to_monotonic.tv_nsec;
> + monotonic_to_bootbased(&tp);
> if (tp.tv_nsec - NSEC_PER_SEC >= 0) {
> tp.tv_nsec = tp.tv_nsec - NSEC_PER_SEC;
> tp.tv_sec++;
> --
> 1.5.1.4
>
>
Hi Tomas,
On Thursday 10 May 2007, Tomas Janousek wrote:
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 8997b61..06f3eaf 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -116,6 +116,8 @@ extern int do_setitimer(int which, struct itimerval *value,
> extern unsigned int alarm_setitimer(unsigned int seconds);
> extern int do_getitimer(int which, struct itimerval *value);
> extern void getnstimeofday(struct timespec *tv);
> +extern void getboottime(struct timespec *ts);
> +extern void monotonic_to_bootbased(struct timespec *ts);
>
> extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
> extern int timekeeping_is_continuous(void);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index f9217bf..dd9647a 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -36,9 +36,17 @@ EXPORT_SYMBOL(xtime_lock);
> * at zero at system boot time, so wall_to_monotonic will be negative,
> * however, we will ALWAYS keep the tv_nsec part positive so we can use
> * the usual normalization.
> + *
> + * wall_to_monotonic is moved after resume from suspend for the monotonic
> + * time not to jump. We need to add total_sleep_time to wall_to_monotonic
> + * to get the real boot based time offset.
> + *
> + * - wall_to_monotonic is no longer the boot time, getboottime must be
> + * used instead.
> */
> struct timespec xtime __attribute__ ((aligned (16)));
> struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
> +static unsigned long total_sleep_time;
Could you make that a ktime_t (or struct ktime)?
There are machines, which sleep more than they are awake.
Just imagine a surveillance camera triggered by door entrance.
Yes, these things might run Linux (e.g. on "cris" architecture).
Or your VCR.
Yes, these devices might sleep more than they are awake,
if you are not a TV junkie :-)
Best regards
Ingo Oeser
Hi Ingo,
On Thu, May 10, 2007 at 09:48:59PM +0200, Ingo Oeser wrote:
> > +static unsigned long total_sleep_time;
>
> Could you make that a ktime_t (or struct ktime)?
> There are machines, which sleep more than they are awake.
> Just imagine a surveillance camera triggered by door entrance.
>
> Yes, these things might run Linux (e.g. on "cris" architecture).
>
> Or your VCR.
> Yes, these devices might sleep more than they are awake,
> if you are not a TV junkie :-)
The value is in seconds. It can hold up to 136 years of sleep. I'd rather
spend the money on beer than on a VCR that I'm gonna suspend for 137 years :)
(or do we have a 16-bit long anywhere?)
Regards,
--
TJ. (Brno, CZ), BaseOS, Red Hat
On Thu, 2007-05-10 at 21:48 +0200, Ingo Oeser wrote:
> Hi Tomas,
>
> On Thursday 10 May 2007, Tomas Janousek wrote:
> > diff --git a/include/linux/time.h b/include/linux/time.h
> > index 8997b61..06f3eaf 100644
> > --- a/include/linux/time.h
> > +++ b/include/linux/time.h
> > @@ -116,6 +116,8 @@ extern int do_setitimer(int which, struct itimerval *value,
> > extern unsigned int alarm_setitimer(unsigned int seconds);
> > extern int do_getitimer(int which, struct itimerval *value);
> > extern void getnstimeofday(struct timespec *tv);
> > +extern void getboottime(struct timespec *ts);
> > +extern void monotonic_to_bootbased(struct timespec *ts);
> >
> > extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
> > extern int timekeeping_is_continuous(void);
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index f9217bf..dd9647a 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -36,9 +36,17 @@ EXPORT_SYMBOL(xtime_lock);
> > * at zero at system boot time, so wall_to_monotonic will be negative,
> > * however, we will ALWAYS keep the tv_nsec part positive so we can use
> > * the usual normalization.
> > + *
> > + * wall_to_monotonic is moved after resume from suspend for the monotonic
> > + * time not to jump. We need to add total_sleep_time to wall_to_monotonic
> > + * to get the real boot based time offset.
> > + *
> > + * - wall_to_monotonic is no longer the boot time, getboottime must be
> > + * used instead.
> > */
> > struct timespec xtime __attribute__ ((aligned (16)));
> > struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
> > +static unsigned long total_sleep_time;
>
> Could you make that a ktime_t (or struct ktime)?
> There are machines, which sleep more than they are awake.
> Just imagine a surveillance camera triggered by door entrance.
>
> Yes, these things might run Linux (e.g. on "cris" architecture).
>
> Or your VCR.
> Yes, these devices might sleep more than they are awake,
> if you are not a TV junkie :-)
I'm not sure I follow this.
total_sleep_time stores seconds. So on 32bit systems that's 130some
years, so it shouldn't be an issue.
Is the reason you want it to be a ktime is because you want a way to
keep sub-second sleep granularity?
thanks
-john
Hi John and Tomas,
On Thursday 10 May 2007, john stultz wrote:
> I'm not sure I follow this.
>
> total_sleep_time stores seconds. So on 32bit systems that's 130some
> years, so it shouldn't be an issue.
>
> Is the reason you want it to be a ktime is because you want a way to
> keep sub-second sleep granularity?
No, I'm just overworked and getting sloppy :-/
Sorry for the noise...
Best regards
Ingo Oeser
On Thu, 10 May 2007 19:10:25 +0200
Tomas Janousek <[email protected]> wrote:
> The commits
> 411187fb05cd11676b0979d9fbf3291db69dbce2 (GTOD: persistent clock support)
> c1d370e167d66b10bca3b602d3740405469383de (i386: use GTOD persistent clock
> support)
> changed the monotonic time so that it no longer jumps after resume, but it's
> not possible to use it for boot time and process start time calculations then.
> Also, the uptime no longer increases during suspend.
>
> I add a variable to track the wall_to_monotonic changes, a function to get the
> real boot time and a function to get the boot based time from the monotonic
> one.
From: Andrew Morton <[email protected]>
- I don't think those sybols are needed in modules.
- Document total_sleep_time units (would have been better to call it
total_sleep_time_secs, perhaps).
Cc: John Stultz <[email protected]>
Cc: Tomas Janousek <[email protected]>
Cc: Tomas Smetana <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
kernel/time/timekeeping.c | 6 +-----
1 files changed, 1 insertion(+), 5 deletions(-)
diff -puN include/linux/time.h~introduce-boot-based-time-fix include/linux/time.h
diff -puN kernel/time/timekeeping.c~introduce-boot-based-time-fix kernel/time/timekeeping.c
--- a/kernel/time/timekeeping.c~introduce-boot-based-time-fix
+++ a/kernel/time/timekeeping.c
@@ -46,7 +46,7 @@ EXPORT_SYMBOL(xtime_lock);
*/
struct timespec xtime __attribute__ ((aligned (16)));
struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
-static unsigned long total_sleep_time;
+static unsigned long total_sleep_time; /* seconds */
EXPORT_SYMBOL(xtime);
@@ -503,8 +503,6 @@ void getboottime(struct timespec *ts)
- wall_to_monotonic.tv_nsec);
}
-EXPORT_SYMBOL(getboottime);
-
/**
* monotonic_to_bootbased - Convert the monotonic time to boot based.
* @ts: pointer to the timespec to be converted
@@ -513,5 +511,3 @@ void monotonic_to_bootbased(struct times
{
ts->tv_sec += total_sleep_time;
}
-
-EXPORT_SYMBOL(monotonic_to_bootbased);
_
On Thu, 10 May 2007 19:10:42 +0200
Tomas Janousek <[email protected]> wrote:
> Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused boot time to move and
> process start times to become invalid after suspend. Using boot based time for
> those restores the old behaviour and fixes the issue.
>
> ..
>
> @@ -445,12 +445,14 @@ static int show_stat(struct seq_file *p, void *v)
> unsigned long jif;
> cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
> u64 sum = 0;
> + struct timespec boottime;
>
> user = nice = system = idle = iowait =
> irq = softirq = steal = cputime64_zero;
> - jif = - wall_to_monotonic.tv_sec;
> - if (wall_to_monotonic.tv_nsec)
> - --jif;
> + getboottime(&boottime);
> + jif = boottime.tv_sec;
> + if (boottime.tv_nsec)
> + ++jif;
>
Is the switch from --jif to ++jif a functional change? If so, how come?
> for_each_possible_cpu(i) {
> int j;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 40645b4..386ff51 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -918,7 +918,7 @@ struct task_struct {
> unsigned int rt_priority;
> cputime_t utime, stime;
> unsigned long nvcsw, nivcsw; /* context switch counts */
> - struct timespec start_time;
> + struct timespec start_time, real_start_time;
no, please prefer to do
struct timespec start_time;
struct timespec real_start_time;
which gives a nice place to add a comment documenting the field.
Please document fields.
What is the difference between start_time and real_start_time?
Hello,
On Thu, May 10, 2007 at 04:40:47PM -0700, Andrew Morton wrote:
> Tomas Janousek <[email protected]> wrote:
> > @@ -445,12 +445,14 @@ static int show_stat(struct seq_file *p, void *v)
> > unsigned long jif;
> > cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
> > u64 sum = 0;
> > + struct timespec boottime;
> >
> > user = nice = system = idle = iowait =
> > irq = softirq = steal = cputime64_zero;
> > - jif = - wall_to_monotonic.tv_sec;
> > - if (wall_to_monotonic.tv_nsec)
> > - --jif;
> > + getboottime(&boottime);
> > + jif = boottime.tv_sec;
> > + if (boottime.tv_nsec)
> > + ++jif;
> >
>
> Is the switch from --jif to ++jif a functional change? If so, how come?
Yes, I'm afraid it was.
-----------------------
From: Tomas Janousek <[email protected]>
Date: Fri, 11 May 2007 10:34:55 +0200
Subject: [PATCH] Lower the boot time to seconds, not ceil
Restores the original behaviour of boot time calculation.
Signed-off-by: Tomas Janousek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tomas Smetana <[email protected]>
Cc: John Stultz <[email protected]>
---
fs/proc/proc_misc.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
index d84956c..60b6210 100644
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -452,8 +452,6 @@ static int show_stat(struct seq_file *p, void *v)
irq = softirq = steal = cputime64_zero;
getboottime(&boottime);
jif = boottime.tv_sec;
- if (boottime.tv_nsec)
- ++jif;
for_each_possible_cpu(i) {
int j;
--
1.5.1.4
--
Tomas Janousek, SW Engineer, Red Hat, Inc.
Hi,
On Thu, May 10, 2007 at 04:40:47PM -0700, Andrew Morton wrote:
> On Thu, 10 May 2007 19:10:42 +0200
> Tomas Janousek <[email protected]> wrote:
> > @@ -918,7 +918,7 @@ struct task_struct {
> > unsigned int rt_priority;
> > cputime_t utime, stime;
> > unsigned long nvcsw, nivcsw; /* context switch counts */
> > - struct timespec start_time;
> > + struct timespec start_time, real_start_time;
>
> no, please prefer to do
>
> struct timespec start_time;
> struct timespec real_start_time;
>
> which gives a nice place to add a comment documenting the field.
>
> Please document fields.
>
> What is the difference between start_time and real_start_time?
-------------------------
From: Tomas Janousek <[email protected]>
Date: Fri, 11 May 2007 10:38:47 +0200
Subject: [PATCH] Document start times in struct task_struct
Signed-off-by: Tomas Janousek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tomas Smetana <[email protected]>
Cc: John Stultz <[email protected]>
---
include/linux/sched.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7eaa536..b852323 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -918,8 +918,8 @@ struct task_struct {
unsigned int rt_priority;
cputime_t utime, stime;
unsigned long nvcsw, nivcsw; /* context switch counts */
- struct timespec start_time;
- struct timespec real_start_time;
+ struct timespec start_time; /* monotonic time */
+ struct timespec real_start_time; /* boot based time */
/* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
unsigned long min_flt, maj_flt;
--
1.5.1.4
--
TJ. (Brno, CZ), BaseOS, Red Hat
Hi!
> >Indeed. The monotonic clock's behavior around suspend
> >and resume
> >is poorly defined. When we increased it, folks didn't
> >like the
> >fact that uptime would increase while a system was
> >suspended.
>
> The uptime really does need to increase during suspend.
> Otherwise,
> things get really weird with devices like the OLPC XO
> which will be
> sleeping between keystrokes. You could run the device
> for hours,
> yet get an uptime of only a few minutes. Suspended time
> should get
> counted as stolen time, same as when a hypervisor takes
> away time.
Yes, please. Just revert the offending patches, we don't want another
/proc/stat field.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Fri, 11 May 2007 10:45:31 +0200
Tomas Janousek <[email protected]> wrote:
> Hello,
>
> On Thu, May 10, 2007 at 04:40:47PM -0700, Andrew Morton wrote:
> > Tomas Janousek <[email protected]> wrote:
> > > @@ -445,12 +445,14 @@ static int show_stat(struct seq_file *p, void *v)
> > > unsigned long jif;
> > > cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
> > > u64 sum = 0;
> > > + struct timespec boottime;
> > >
> > > user = nice = system = idle = iowait =
> > > irq = softirq = steal = cputime64_zero;
> > > - jif = - wall_to_monotonic.tv_sec;
> > > - if (wall_to_monotonic.tv_nsec)
> > > - --jif;
> > > + getboottime(&boottime);
> > > + jif = boottime.tv_sec;
> > > + if (boottime.tv_nsec)
> > > + ++jif;
> > >
> >
> > Is the switch from --jif to ++jif a functional change? If so, how come?
>
> Yes, I'm afraid it was.
>
> -----------------------
> From: Tomas Janousek <[email protected]>
> Date: Fri, 11 May 2007 10:34:55 +0200
> Subject: [PATCH] Lower the boot time to seconds, not ceil
>
> Restores the original behaviour of boot time calculation.
>
> Signed-off-by: Tomas Janousek <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Tomas Smetana <[email protected]>
> Cc: John Stultz <[email protected]>
> ---
> fs/proc/proc_misc.c | 2 --
> 1 files changed, 0 insertions(+), 2 deletions(-)
>
> diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
> index d84956c..60b6210 100644
> --- a/fs/proc/proc_misc.c
> +++ b/fs/proc/proc_misc.c
> @@ -452,8 +452,6 @@ static int show_stat(struct seq_file *p, void *v)
> irq = softirq = steal = cputime64_zero;
> getboottime(&boottime);
> jif = boottime.tv_sec;
> - if (boottime.tv_nsec)
> - ++jif;
>
> for_each_possible_cpu(i) {
> int j;
So we've gone from --jif to ++jif to no change at all.
Are you sure that this net removal of --jif is correct?
Hi,
On Fri, May 11, 2007 at 12:51:32PM -0700, Andrew Morton wrote:
> On Fri, 11 May 2007 10:45:31 +0200
> Tomas Janousek <[email protected]> wrote:
>
> > Hello,
> >
> > On Thu, May 10, 2007 at 04:40:47PM -0700, Andrew Morton wrote:
> > > Tomas Janousek <[email protected]> wrote:
> > > > @@ -445,12 +445,14 @@ static int show_stat(struct seq_file *p, void *v)
> > > > unsigned long jif;
> > > > cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
> > > > u64 sum = 0;
> > > > + struct timespec boottime;
> > > >
> > > > user = nice = system = idle = iowait =
> > > > irq = softirq = steal = cputime64_zero;
> > > > - jif = - wall_to_monotonic.tv_sec;
> > > > - if (wall_to_monotonic.tv_nsec)
> > > > - --jif;
> > > > + getboottime(&boottime);
> > > > + jif = boottime.tv_sec;
> > > > + if (boottime.tv_nsec)
> > > > + ++jif;
> > > >
> > >
> > getboottime(&boottime);
> > jif = boottime.tv_sec;
> > - if (boottime.tv_nsec)
> > - ++jif;
> >
> So we've gone from --jif to ++jif to no change at all.
>
> Are you sure that this net removal of --jif is correct?
Yes.
Let's say wall_to_monotonic = { -10, 500000 } (which is { -9, -500000 }, and
the original code would result in - (- 10) - 1 == 9).
The getboottime calls set_normalized_timespec on { - (-10), - (500000) } which
results to { 10 - 1, - 500000 + 1000000 } = { 9, 500000 }.
tv_sec == 9 => correct.
--
TJ. (Brno, CZ), BaseOS, Red Hat