Currently no trace clock can account for suspend time, using monotonic during
tracing in the suspend path means the trace times wont be advaced. Using the
boot clock with ktime_get_with_offset is not an option due to live locking
concerns in NMI context as suggested by Thomas [1].
These patches add a fast boot clock based on fast monotonic clock and adds a
trace clock based on it. Instead of bloating the optimal case [2], these
patches present a simple approach. Thanks Thomas for making documentation
suggestions for effects of this approach [3].
[1] https://lkml.org/lkml/2016/11/20/75
[2] https://www.spinics.net/lists/kernel/msg2389485.html
[3] https://lkml.org/lkml/2016/11/23/166
Joel Fernandes (3):
timekeeping: Add a fast and NMI safe boot clock
trace: Add an option for boot clock as trace clock
trace: Update documentation for mono, mono_raw and boot clock
Documentation/trace/ftrace.txt | 20 ++++++++++++++++++++
include/linux/timekeeping.h | 1 +
kernel/time/timekeeping.c | 28 ++++++++++++++++++++++++++++
kernel/trace/trace.c | 1 +
4 files changed, 50 insertions(+)
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Ingo Molnar <[email protected]>
--
2.8.0.rc3.226.g39d4020
This boot clock can be used as a tracing clock and will account for
suspend time.
To keep it NMI safe since we're accessing from tracing, we're not using a
separate timekeeper with updates to monotonic clock and boot offset
protected with seqlocks. This has the following minor side effects:
(1) Its possible that a timestamp be taken after the boot offset is updated
but before the timekeeper is updated. If this happens, the new boot offset
is added to the old timekeeping making the clock appear to update slightly
earlier:
CPU 0 CPU 1
timekeeping_inject_sleeptime64()
__timekeeping_inject_sleeptime(tk, delta);
timestamp();
timekeeping_update(tk, TK_CLEAR_NTP...);
(2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be
partially updated. Since the tk->offs_boot update is a rare event, this
should be a rare occurrence which postprocessing should be able to handle.
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
---
include/linux/timekeeping.h | 1 +
kernel/time/timekeeping.c | 29 +++++++++++++++++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 09168c5..361f8bf 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -249,6 +249,7 @@ static inline u64 ktime_get_raw_ns(void)
extern u64 ktime_get_mono_fast_ns(void);
extern u64 ktime_get_raw_fast_ns(void);
+extern u64 ktime_get_boot_fast_ns(void);
/*
* Timespec interfaces utilizing the ktime based ones
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 37dec7e..b2286e9 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -425,6 +425,35 @@ u64 ktime_get_raw_fast_ns(void)
}
EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns);
+/**
+ * ktime_get_boot_fast_ns - NMI safe and fast access to boot clock.
+ *
+ * To keep it NMI safe since we're accessing from tracing, we're not using a
+ * separate timekeeper with updates to monotonic clock and boot offset
+ * protected with seqlocks. This has the following minor side effects:
+ *
+ * (1) Its possible that a timestamp be taken after the boot offset is updated
+ * but before the timekeeper is updated. If this happens, the new boot offset
+ * is added to the old timekeeping making the clock appear to update slightly
+ * earlier:
+ * CPU 0 CPU 1
+ * timekeeping_inject_sleeptime64()
+ * __timekeeping_inject_sleeptime(tk, delta);
+ * timestamp();
+ * timekeeping_update(tk, TK_CLEAR_NTP...);
+ *
+ * (2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be
+ * partially updated. Since the tk->offs_boot update is a rare event, this
+ * should be a rare occurrence which postprocessing should be able to handle.
+ */
+u64 notrace ktime_get_boot_fast_ns(void)
+{
+ struct timekeeper *tk = &tk_core.timekeeper;
+
+ return (ktime_get_mono_fast_ns() + ktime_to_ns(tk->offs_boot));
+}
+EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns);
+
/* Suspend-time cycles value for halted fast timekeeper. */
static cycle_t cycles_at_suspend;
--
2.8.0.rc3.226.g39d4020
Unlike monotonic clock, boot clock as a trace clock will account for
time spent in suspend useful for tracing suspend/resume. This uses
earlier introduced infrastructure for using the fast boot clock.
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
---
kernel/trace/trace.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8696ce6..f7b64db 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1125,6 +1125,7 @@ static struct {
{ trace_clock, "perf", 1 },
{ ktime_get_mono_fast_ns, "mono", 1 },
{ ktime_get_raw_fast_ns, "mono_raw", 1 },
+ { ktime_get_boot_fast_ns, "boot", 1 },
ARCH_TRACE_CLOCKS
};
--
2.8.0.rc3.226.g39d4020
Documentation was missing for mono and mono_raw, add them and also for
the boot clock introduced in this series.
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
---
Documentation/trace/ftrace.txt | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 185c39f..5180b09 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -362,6 +362,26 @@ of ftrace. Here is a list of some of the key files:
to correlate events across hypervisor/guest if
tb_offset is known.
+ mono: This uses the fast monotonic clock (CLOCK_MONOTONIC)
+ which is monotonic and is subject to NTP rate adjustments.
+
+ mono_raw:
+ This is the raw monotonic clock (CLOCK_MONOTONIC_RAW)
+ which is montonic but is not subject to any rate adjustments
+ and ticks at the same rate as the hardware clocksource.
+
+ boot: This is the boot clock (CLOCK_BOOTTIME) and is based on the
+ fast monotonic clock, but also accounts for time spent in
+ suspend. Since the clock access is designed for use in
+ tracing in the suspend path, some side effects are possible
+ if clock is accessed after the suspend time is accounted before
+ the fast mono clock is updated. In this case, the clock update
+ appears to happen slightly sooner than it normally would have.
+ Also on 32-bit systems, its possible that the 64-bit boot offset
+ sees a partial update. These effects are rare and post
+ processing should be able to handle them. See comments on
+ ktime_get_boot_fast_ns function for more information.
+
To set a clock, simply echo the clock name into this file.
echo global > trace_clock
--
2.8.0.rc3.226.g39d4020