2014-04-04 03:19:21

by Lei Wen

[permalink] [raw]
Subject: [PATCH 0/3] switch printk timestamp to use booting time

It is very common to have many processor to run over the same machine
but run different OS actually, so that timestamp alignment is key
to do the right debugging when find something wrong.

Linux adopt schedule clock stopped, so that printk timestamp would get
during suspend period which break such assumption in the old days.
So this patch set is supposed to recover such behavior again.

BTW, I am not sure whether we could add additional member in printk
log structure, so that we could print out two piece of log with
one including suspend time, while another not?

Lei Wen (3):
time: create __get_monotonic_boottime for WARNless calls
timekeeping: move clocksource init to the early place
printk: using booting time as the timestamp

include/linux/time.h | 2 ++
init/main.c | 1 +
kernel/printk/printk.c | 4 ++--
kernel/time/timekeeping.c | 55 ++++++++++++++++++++++++++++++++++++-----------
4 files changed, 48 insertions(+), 14 deletions(-)

--
1.8.3.2


2014-04-04 03:19:07

by Lei Wen

[permalink] [raw]
Subject: [PATCH 2/3] timekeeping: move clocksource init to the early place

So that in the very early booting place, we could call timekeeping
code, while it would not cause system panic, since clock is not
init yet.

And for system default clock is always jiffies, so that it shall be
safe to do so.

Signed-off-by: Lei Wen <[email protected]>
---
include/linux/time.h | 1 +
init/main.c | 1 +
kernel/time/timekeeping.c | 22 +++++++++++++++-------
3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index a2f5079..e2d4899 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -127,6 +127,7 @@ extern void read_boot_clock(struct timespec *ts);
extern int persistent_clock_is_local;
extern int update_persistent_clock(struct timespec now);
void timekeeping_init(void);
+void timekeeping_init_early(void);
extern int timekeeping_suspended;

unsigned long get_seconds(void);
diff --git a/init/main.c b/init/main.c
index 9c7fd4c..5723933 100644
--- a/init/main.c
+++ b/init/main.c
@@ -494,6 +494,7 @@ asmlinkage void __init start_kernel(void)
*/
boot_init_stack_canary();

+ timekeeping_init_early();
cgroup_init_early();

local_irq_disable();
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index c196111..b8f850b 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -782,12 +782,25 @@ void __attribute__((weak)) read_boot_clock(struct timespec *ts)
}

/*
- * timekeeping_init - Initializes the clocksource and common timekeeping values
+ * timekeeping_init_early - setup clocksource early
*/
-void __init timekeeping_init(void)
+void __init timekeeping_init_early(void)
{
struct timekeeper *tk = &timekeeper;
struct clocksource *clock;
+
+ clock = clocksource_default_clock();
+ if (clock->enable)
+ clock->enable(clock);
+ tk_setup_internals(tk, clock);
+}
+
+/*
+ * timekeeping_init - Initializes common timekeeping values
+ */
+void __init timekeeping_init(void)
+{
+ struct timekeeper *tk = &timekeeper;
unsigned long flags;
struct timespec now, boot, tmp;

@@ -813,11 +826,6 @@ void __init timekeeping_init(void)
write_seqcount_begin(&timekeeper_seq);
ntp_init();

- clock = clocksource_default_clock();
- if (clock->enable)
- clock->enable(clock);
- tk_setup_internals(tk, clock);
-
tk_set_xtime(tk, &now);
tk->raw_time.tv_sec = 0;
tk->raw_time.tv_nsec = 0;
--
1.8.3.2

2014-04-04 03:19:13

by Lei Wen

[permalink] [raw]
Subject: [PATCH 1/3] time: create __get_monotonic_boottime for WARNless calls

Since sched_clock always get stopped during suspend period, it
would make it hard to use the kernel log to compare with other
procssor generated log which running over the same machine.
[Absolutely not running linux]

So we need a way to recover the printk timestamp that including
suspend time in the old way, get_monotonic_boottime is a good
candidate, but it cannot be called after suspend process has happen.
Thus, it prevent printk to be used in every corner.

Export one warn less __get_monotonic_boottime to solve this issue.

Signed-off-by: Lei Wen <[email protected]>
---
include/linux/time.h | 1 +
kernel/time/timekeeping.c | 33 ++++++++++++++++++++++++++++-----
2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index d5d229b..a2f5079 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -171,6 +171,7 @@ extern void getnstime_raw_and_real(struct timespec *ts_raw,
struct timespec *ts_real);
extern void getboottime(struct timespec *ts);
extern void monotonic_to_bootbased(struct timespec *ts);
+extern int __get_monotonic_boottime(struct timespec *ts);
extern void get_monotonic_boottime(struct timespec *ts);

extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5b40279..c196111 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1465,23 +1465,22 @@ void getboottime(struct timespec *ts)
EXPORT_SYMBOL_GPL(getboottime);

/**
- * get_monotonic_boottime - Returns monotonic time since boot
+ * __get_monotonic_boottime - Returns monotonic time since boot
* @ts: pointer to the timespec to be set
*
- * Returns the monotonic time since boot in a timespec.
+ * Update the monotonic time since boot in a timespec.
+ * Returns 0 on success, or -ve when suspended (timespec will be undefined).
*
* This is similar to CLOCK_MONTONIC/ktime_get_ts, but also
* includes the time spent in suspend.
*/
-void get_monotonic_boottime(struct timespec *ts)
+int __get_monotonic_boottime(struct timespec *ts)
{
struct timekeeper *tk = &timekeeper;
struct timespec tomono, sleep;
s64 nsec;
unsigned int seq;

- WARN_ON(timekeeping_suspended);
-
do {
seq = read_seqcount_begin(&timekeeper_seq);
ts->tv_sec = tk->xtime_sec;
@@ -1494,6 +1493,30 @@ void get_monotonic_boottime(struct timespec *ts)
ts->tv_sec += tomono.tv_sec + sleep.tv_sec;
ts->tv_nsec = 0;
timespec_add_ns(ts, nsec + tomono.tv_nsec + sleep.tv_nsec);
+
+ /*
+ * Do not bail out early, in case there were callers still using
+ * the value, even in the face of the WARN_ON.
+ */
+ if (unlikely(timekeeping_suspended))
+ return -EAGAIN;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(__get_monotonic_boottime);
+
+/**
+ * get_monotonic_boottime - Returns monotonic time since boot
+ * @ts: pointer to the timespec to be set
+ *
+ * Returns the monotonic time since boot in a timespec.
+ * (WARN if suspended)
+ *
+ * This is similar to CLOCK_MONTONIC/ktime_get_ts, but also
+ * includes the time spent in suspend.
+ */
+void get_monotonic_boottime(struct timespec *ts)
+{
+ WARN_ON(__get_monotonic_boottime(ts));
}
EXPORT_SYMBOL_GPL(get_monotonic_boottime);

--
1.8.3.2

2014-04-04 03:19:33

by Lei Wen

[permalink] [raw]
Subject: [PATCH 3/3] printk: using booting time as the timestamp

As people may want to align the kernel log with some other processor
running over the same machine but not the same copy of linux, we
need to keep their log aligned, so that it would not make debug
process hard and confused.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/printk/printk.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a45b509..af96fbd 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -349,7 +349,7 @@ static void log_store(int facility, int level,
if (ts_nsec > 0)
msg->ts_nsec = ts_nsec;
else
- msg->ts_nsec = local_clock();
+ msg->ts_nsec = ktime_to_ns(ktime_get_boottime());
memset(log_dict(msg) + dict_len, 0, pad_len);
msg->len = size;

@@ -1440,7 +1440,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
cont.facility = facility;
cont.level = level;
cont.owner = current;
- cont.ts_nsec = local_clock();
+ cont.ts_nsec = ktime_to_ns(ktime_get_boottime());
cont.flags = 0;
cont.cons = 0;
cont.flushed = false;
--
1.8.3.2

2014-04-04 21:55:41

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH 3/3] printk: using booting time as the timestamp

On 04/03/2014 08:18 PM, Lei Wen wrote:
> As people may want to align the kernel log with some other processor
> running over the same machine but not the same copy of linux, we
> need to keep their log aligned, so that it would not make debug
> process hard and confused.
>
> Signed-off-by: Lei Wen <[email protected]>
> ---
> kernel/printk/printk.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index a45b509..af96fbd 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -349,7 +349,7 @@ static void log_store(int facility, int level,
> if (ts_nsec > 0)
> msg->ts_nsec = ts_nsec;
> else
> - msg->ts_nsec = local_clock();
> + msg->ts_nsec = ktime_to_ns(ktime_get_boottime());

No no... Since ktime_get_boottime() takes the timekeeping seqlock, this
will cause any printk (or WARN_ON, etc) that is done while holding a
write in the timekeeping seqlock to deadlock (which I mentioned last time).

I think the better solution here is to have the timekeeping code provide
the printk logic the offset value whenever the sleep time is updated.
That way the printk logic can manage its own values w/o having to take
any timekeeping locks.

That said, I'm still not really fond of trying to add this
functionality. The printk timestamping code was always a little fuzzy,
using jiffies or sched_clock in order to give a rough estimate of boot
delays (and didn't include ntp freq correction, etc). When it was
introduced, I feel like folks were concerned about this, but folks
agreed the rough estimates were worth it, and it didn't have to involve
the real timekeeping code. Now, trying to raise the bar here to make
printk timestamps actually correlate with
CLOCK_BOOTTIME/REALTIME/MONOTONIC (which I'll admit, seems quite
reasonable given we have timestamps already - but ends up being feature
creep) will add quite a bit of complexity due to the fact that printk is
supposed to be safe to call from everywhere.

And again, syslog can add timestamps to the kernel messages relatively
well in userland. So I'm not sure the need is clear.

thanks
-john

2014-04-05 08:01:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/3] printk: using booting time as the timestamp

On Fri, Apr 04, 2014 at 02:55:35PM -0700, John Stultz wrote:
> On 04/03/2014 08:18 PM, Lei Wen wrote:
> > As people may want to align the kernel log with some other processor
> > running over the same machine but not the same copy of linux, we
> > need to keep their log aligned, so that it would not make debug
> > process hard and confused.
> >
> > Signed-off-by: Lei Wen <[email protected]>
> > ---
> > kernel/printk/printk.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index a45b509..af96fbd 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -349,7 +349,7 @@ static void log_store(int facility, int level,
> > if (ts_nsec > 0)
> > msg->ts_nsec = ts_nsec;
> > else
> > - msg->ts_nsec = local_clock();
> > + msg->ts_nsec = ktime_to_ns(ktime_get_boottime());
>
> No no... Since ktime_get_boottime() takes the timekeeping seqlock, this
> will cause any printk (or WARN_ON, etc) that is done while holding a
> write in the timekeeping seqlock to deadlock (which I mentioned last time).
>
> I think the better solution here is to have the timekeeping code provide
> the printk logic the offset value whenever the sleep time is updated.
> That way the printk logic can manage its own values w/o having to take
> any timekeeping locks.
>
> That said, I'm still not really fond of trying to add this
> functionality. The printk timestamping code was always a little fuzzy,
> using jiffies or sched_clock in order to give a rough estimate of boot
> delays (and didn't include ntp freq correction, etc). When it was
> introduced, I feel like folks were concerned about this, but folks
> agreed the rough estimates were worth it, and it didn't have to involve
> the real timekeeping code. Now, trying to raise the bar here to make
> printk timestamps actually correlate with
> CLOCK_BOOTTIME/REALTIME/MONOTONIC (which I'll admit, seems quite
> reasonable given we have timestamps already - but ends up being feature
> creep) will add quite a bit of complexity due to the fact that printk is
> supposed to be safe to call from everywhere.
>
> And again, syslog can add timestamps to the kernel messages relatively
> well in userland. So I'm not sure the need is clear.

Not to mention that ktime_get*() is stupid slow for some.