Resending, fixed the subject.
changelog
---------
v2 - v3
- Addressed comment from Thomas Gleixner
- Timestamps are available a little later in boot but still much
earlier than in mainline. This significantly simplified this
work.
v1 - v2
In patch "x86/tsc: tsc early":
- added tsc_adjusted_early()
- fixed 32-bit compile error use do_div()
Adding early boot time stamps support for x86 machines.
SPARC patches for early boot time stamps are already integrated into
mainline linux.
Sample output
-------------
Before:
https://hastebin.com/jadaqukubu.scala
After:
https://hastebin.com/nubipozacu.scala
As seen above, currently timestamps are available from around the time when
"Security Framework" is initialized. But, 26s already passed until we
reached to this point.
Pavel Tatashin (2):
sched/clock: interface to allow timestamps early in boot
x86/tsc: use tsc early
arch/x86/include/asm/tsc.h | 4 +++
arch/x86/kernel/setup.c | 10 ++++++--
arch/x86/kernel/time.c | 22 ++++++++++++++++
arch/x86/kernel/tsc.c | 47 ++++++++++++++++++++++++++++++++++
include/linux/sched/clock.h | 4 +++
kernel/sched/clock.c | 61 ++++++++++++++++++++++++++++++++++++++++++++-
6 files changed, 145 insertions(+), 3 deletions(-)
--
2.14.0
tsc_early_init():
Use verious methods to determine the availability of TSC feature and its
frequency early in boot, and if that is possible initialize TSC and also
call sched_clock_early_init() to be able to get timestamps early in boot.
tsc_early_fini()
Implement the finish part of early tsc feature, print message about the
offset, which can be useful to findout how much time was spent in post and
boot manager, and also call sched_clock_early_fini() to let sched clock
know that
sched_clock_early():
TSC based implementation of weak function that is defined in sched clock.
Call tsc_early_init() to initialize early boot time stamps functionality on
the supported x86 platforms, and call tsc_early_fini() to finish this
feature after permanent tsc has been initialized.
Signed-off-by: Pavel Tatashin <[email protected]>
---
arch/x86/include/asm/tsc.h | 4 ++++
arch/x86/kernel/setup.c | 10 ++++++++--
arch/x86/kernel/tsc.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 59 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index f5e6f1c417df..6dc9618b24e3 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -50,11 +50,15 @@ extern bool tsc_store_and_check_tsc_adjust(bool bootcpu);
extern void tsc_verify_tsc_adjust(bool resume);
extern void check_tsc_sync_source(int cpu);
extern void check_tsc_sync_target(void);
+void tsc_early_init(unsigned int khz);
+void tsc_early_fini(void);
#else
static inline bool tsc_store_and_check_tsc_adjust(bool bootcpu) { return false; }
static inline void tsc_verify_tsc_adjust(bool resume) { }
static inline void check_tsc_sync_source(int cpu) { }
static inline void check_tsc_sync_target(void) { }
+static inline void tsc_early_init(unsigned int khz) { }
+static inline void tsc_early_fini(void) { }
#endif
extern int notsc_setup(char *);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3486d0498800..413434d98a23 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -812,7 +812,11 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
return 0;
}
-static void __init simple_udelay_calibration(void)
+/*
+ * Initialize early tsc to show early boot timestamps, and also loops_per_jiffy
+ * for udelay
+ */
+static void __init early_clock_calibration(void)
{
unsigned int tsc_khz, cpu_khz;
unsigned long lpj;
@@ -827,6 +831,8 @@ static void __init simple_udelay_calibration(void)
if (!tsc_khz)
return;
+ tsc_early_init(tsc_khz);
+
lpj = tsc_khz * 1000;
do_div(lpj, HZ);
loops_per_jiffy = lpj;
@@ -1039,7 +1045,7 @@ void __init setup_arch(char **cmdline_p)
*/
init_hypervisor_platform();
- simple_udelay_calibration();
+ early_clock_calibration();
x86_init.resources.probe_roms();
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 796d96bb0821..bd44c2dd4235 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1263,6 +1263,53 @@ static int __init init_tsc_clocksource(void)
*/
device_initcall(init_tsc_clocksource);
+#ifdef CONFIG_X86_TSC
+
+static struct cyc2ns_data cyc2ns_early;
+static bool sched_clock_early_enabled;
+
+u64 sched_clock_early(void)
+{
+ u64 ns;
+
+ if (!sched_clock_early_enabled)
+ return 0;
+ ns = mul_u64_u32_shr(rdtsc(), cyc2ns_early.cyc2ns_mul,
+ cyc2ns_early.cyc2ns_shift);
+ return ns + cyc2ns_early.cyc2ns_offset;
+}
+
+/*
+ * Initialize clock for early time stamps
+ */
+void __init tsc_early_init(unsigned int khz)
+{
+ sched_clock_early_enabled = true;
+ clocks_calc_mult_shift(&cyc2ns_early.cyc2ns_mul,
+ &cyc2ns_early.cyc2ns_shift,
+ khz, NSEC_PER_MSEC, 0);
+ cyc2ns_early.cyc2ns_offset = -sched_clock_early();
+ sched_clock_early_init();
+}
+
+void __init tsc_early_fini(void)
+{
+ unsigned long long t;
+ unsigned long r;
+
+ /* We did not have early sched clock if multiplier is 0 */
+ if (cyc2ns_early.cyc2ns_mul == 0)
+ return;
+
+ t = -cyc2ns_early.cyc2ns_offset;
+ r = do_div(t, NSEC_PER_SEC);
+
+ sched_clock_early_fini();
+ pr_info("sched clock early is finished, offset [%lld.%09lds]\n", t, r);
+ sched_clock_early_enabled = false;
+}
+#endif /* CONFIG_X86_TSC */
+
void __init tsc_init(void)
{
u64 lpj, cyc;
--
2.14.0
In Linux printk() can output timestamps next to every line. This is very
useful for tracking regressions, and finding places that can be optimized.
However, the timestamps are available only later in boot. On smaller
machines it is insignificant amount of time, but on larger it can be many
seconds or even minutes into the boot process.
This patch adds an interface for platforms with unstable sched clock to
show timestamps early in boot. In order to get this functionality a
platform must do:
- Implement u64 sched_clock_early()
Clock that returns monotonic time
- Call sched_clock_early_init()
Tells sched clock that the early clock can be used
- Call sched_clock_early_fini()
Tells sched clock that the early clock is finished, and sched clock
should hand over the operation to permanent clock.
- Use weak sched_clock_early() interface to determine time from boot in
arch specific read_boot_clock64()
Signed-off-by: Pavel Tatashin <[email protected]>
---
arch/x86/kernel/time.c | 22 ++++++++++++++++
include/linux/sched/clock.h | 4 +++
kernel/sched/clock.c | 61 ++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 86 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index e0754cdbad37..6ede0da7041a 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -14,6 +14,7 @@
#include <linux/i8253.h>
#include <linux/time.h>
#include <linux/export.h>
+#include <linux/sched/clock.h>
#include <asm/vsyscall.h>
#include <asm/x86_init.h>
@@ -85,6 +86,7 @@ static __init void x86_late_time_init(void)
{
x86_init.timers.timer_init();
tsc_init();
+ tsc_early_fini();
}
/*
@@ -95,3 +97,23 @@ void __init time_init(void)
{
late_time_init = x86_late_time_init;
}
+
+/*
+ * Called once during to boot to initialize boot time.
+ */
+void read_boot_clock64(struct timespec64 *ts)
+{
+ u64 ns_boot = sched_clock_early(); /* nsec from boot */
+ struct timespec64 ts_now;
+ bool valid_clock;
+
+ /* Time from epoch */
+ read_persistent_clock64(&ts_now);
+ valid_clock = ns_boot && timespec64_valid_strict(&ts_now) &&
+ (ts_now.tv_sec || ts_now.tv_nsec);
+
+ if (!valid_clock)
+ *ts = (struct timespec64){0, 0};
+ else
+ *ts = ns_to_timespec64(timespec64_to_ns(&ts_now) - ns_boot);
+}
diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h
index a55600ffdf4b..f8291fa28c0c 100644
--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -63,6 +63,10 @@ extern void sched_clock_tick_stable(void);
extern void sched_clock_idle_sleep_event(void);
extern void sched_clock_idle_wakeup_event(void);
+void sched_clock_early_init(void);
+void sched_clock_early_fini(void);
+u64 sched_clock_early(void);
+
/*
* As outlined in clock.c, provides a fast, high resolution, nanosecond
* time source that is monotonic per cpu argument and has bounded drift
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index ca0f8fc945c6..be5b60af4ca9 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -80,9 +80,24 @@ EXPORT_SYMBOL_GPL(sched_clock);
__read_mostly int sched_clock_running;
+/*
+ * We start with sched clock early static branch enabled, and global status
+ * disabled. Early in boot it is decided whether to enable the global
+ * status as well (set sched_clock_early_running to true), and later, when
+ * early clock is no longer needed, the static branch is disabled.
+ */
+static DEFINE_STATIC_KEY_TRUE(__use_sched_clock_early);
+static bool __read_mostly sched_clock_early_running;
+
void sched_clock_init(void)
{
- sched_clock_running = 1;
+ /*
+ * We start clock only once early clock is finished, or if early clock
+ * was not running.
+ */
+ if (!sched_clock_early_running)
+ sched_clock_running = 1;
+
}
#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
@@ -362,6 +377,11 @@ u64 sched_clock_cpu(int cpu)
if (sched_clock_stable())
return sched_clock() + __sched_clock_offset;
+ if (static_branch_unlikely(&__use_sched_clock_early)) {
+ if (sched_clock_early_running)
+ return sched_clock_early();
+ }
+
if (unlikely(!sched_clock_running))
return 0ull;
@@ -444,6 +464,45 @@ void sched_clock_idle_wakeup_event(void)
}
EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
+u64 __weak sched_clock_early(void)
+{
+ return 0;
+}
+
+/*
+ * Is called when sched_clock_early() is about to be finished, notifies sched
+ * clock that after this call sched_clock_early() can't be used.
+ */
+void __init sched_clock_early_fini(void)
+{
+ struct sched_clock_data *scd = this_scd();
+ u64 now_early, now_sched;
+
+ now_early = sched_clock_early();
+ now_sched = sched_clock();
+
+ __gtod_offset = now_early - scd->tick_gtod;
+ __sched_clock_offset = now_early - now_sched;
+
+ sched_clock_early_running = false;
+ static_branch_disable(&__use_sched_clock_early);
+
+ /* Now that early clock is finished, start regular sched clock */
+ sched_clock_init();
+}
+
+/*
+ * Notifies sched clock that early boot clocksource is available, it means that
+ * the current platform has implemented sched_clock_early().
+ *
+ * The early clock is running until we switch to a stable clock, or when we
+ * learn that the stable clock is not available.
+ */
+void __init sched_clock_early_init(void)
+{
+ sched_clock_early_running = true;
+}
+
#else /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */
u64 sched_clock_cpu(int cpu)
--
2.14.0
Hi Pavel,
At 08/12/2017 02:50 AM, Pavel Tatashin wrote:
> In Linux printk() can output timestamps next to every line. This is very
> useful for tracking regressions, and finding places that can be optimized.
> However, the timestamps are available only later in boot. On smaller
> machines it is insignificant amount of time, but on larger it can be many
> seconds or even minutes into the boot process.
>
> This patch adds an interface for platforms with unstable sched clock to
> show timestamps early in boot. In order to get this functionality a
> platform must do:
>
> - Implement u64 sched_clock_early()
> Clock that returns monotonic time
>
> - Call sched_clock_early_init()
> Tells sched clock that the early clock can be used
>
> - Call sched_clock_early_fini()
> Tells sched clock that the early clock is finished, and sched clock
> should hand over the operation to permanent clock.
>
> - Use weak sched_clock_early() interface to determine time from boot in
> arch specific read_boot_clock64()
>
> Signed-off-by: Pavel Tatashin <[email protected]>
> ---
> arch/x86/kernel/time.c | 22 ++++++++++++++++
> include/linux/sched/clock.h | 4 +++
> kernel/sched/clock.c | 61 ++++++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 86 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
> index e0754cdbad37..6ede0da7041a 100644
> --- a/arch/x86/kernel/time.c
> +++ b/arch/x86/kernel/time.c
> @@ -14,6 +14,7 @@
> #include <linux/i8253.h>
> #include <linux/time.h>
> #include <linux/export.h>
> +#include <linux/sched/clock.h>
>
> #include <asm/vsyscall.h>
> #include <asm/x86_init.h>
> @@ -85,6 +86,7 @@ static __init void x86_late_time_init(void)
> {
> x86_init.timers.timer_init();
> tsc_init();
> + tsc_early_fini();
tsc_early_fini() is defined in patch 2, I guess you may miss it
when you split your patches.
> }
>
> /*
> @@ -95,3 +97,23 @@ void __init time_init(void)
> {
> late_time_init = x86_late_time_init;
> }
> +
> +/*
> + * Called once during to boot to initialize boot time.
> + */
> +void read_boot_clock64(struct timespec64 *ts)
> +{
> + u64 ns_boot = sched_clock_early(); /* nsec from boot */
> + struct timespec64 ts_now;
> + bool valid_clock;
> +
> + /* Time from epoch */
> + read_persistent_clock64(&ts_now);
> + valid_clock = ns_boot && timespec64_valid_strict(&ts_now) &&
> + (ts_now.tv_sec || ts_now.tv_nsec);
> +
> + if (!valid_clock)
> + *ts = (struct timespec64){0, 0};
> + else
> + *ts = ns_to_timespec64(timespec64_to_ns(&ts_now) - ns_boot);
> +}
> diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h
> index a55600ffdf4b..f8291fa28c0c 100644
> --- a/include/linux/sched/clock.h
> +++ b/include/linux/sched/clock.h
> @@ -63,6 +63,10 @@ extern void sched_clock_tick_stable(void);
> extern void sched_clock_idle_sleep_event(void);
> extern void sched_clock_idle_wakeup_event(void);
>
> +void sched_clock_early_init(void);
> +void sched_clock_early_fini(void);
> +u64 sched_clock_early(void);
> +
> /*
> * As outlined in clock.c, provides a fast, high resolution, nanosecond
> * time source that is monotonic per cpu argument and has bounded drift
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index ca0f8fc945c6..be5b60af4ca9 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -80,9 +80,24 @@ EXPORT_SYMBOL_GPL(sched_clock);
>
> __read_mostly int sched_clock_running;
>
> +/*
> + * We start with sched clock early static branch enabled, and global status
> + * disabled. Early in boot it is decided whether to enable the global
> + * status as well (set sched_clock_early_running to true), and later, when
> + * early clock is no longer needed, the static branch is disabled.
> + */
> +static DEFINE_STATIC_KEY_TRUE(__use_sched_clock_early);
> +static bool __read_mostly sched_clock_early_running;
> +
In my opinion, these two parameters are repetitive, I suggest remove
one.
eg. remove sched_clock_early_running like below
First, static DEFINE_STATIC_KEY_FALSE(__use_sched_clock_early);
> void sched_clock_init(void)
we can make sched_clock_init __init
> {
> - sched_clock_running = 1;
> + /*
> + * We start clock only once early clock is finished, or if early clock
> + * was not running.
> + */
> + if (!sched_clock_early_running)
s/
!sched_clock_early_running/
!static_branch_unlikely(&__use_sched_clock_early)/
> + sched_clock_running = 1;
> +
> }
>
> #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> @@ -362,6 +377,11 @@ u64 sched_clock_cpu(int cpu)
> if (sched_clock_stable())
> return sched_clock() + __sched_clock_offset;
>
> + if (static_branch_unlikely(&__use_sched_clock_early)) {
> + if (sched_clock_early_running)
s/if (sched_clock_early_running)//
> + return sched_clock_early();
> + }
> +
> if (unlikely(!sched_clock_running))
> return 0ull;
>
> @@ -444,6 +464,45 @@ void sched_clock_idle_wakeup_event(void)
> }
> EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
>
> +u64 __weak sched_clock_early(void)
> +{
> + return 0;
> +}
> +
> +/*
> + * Is called when sched_clock_early() is about to be finished, notifies sched
> + * clock that after this call sched_clock_early() can't be used.
> + */
> +void __init sched_clock_early_fini(void)
> +{
> + struct sched_clock_data *scd = this_scd();
> + u64 now_early, now_sched;
> +
> + now_early = sched_clock_early();
> + now_sched = sched_clock();
> +
> + __gtod_offset = now_early - scd->tick_gtod;
> + __sched_clock_offset = now_early - now_sched;
> +
> + sched_clock_early_running = false;
s/sched_clock_early_running = false;//
> + static_branch_disable(&__use_sched_clock_early);
> +
> + /* Now that early clock is finished, start regular sched clock */
> + sched_clock_init();
> +}
> +
> +/*
> + * Notifies sched clock that early boot clocksource is available, it means that
> + * the current platform has implemented sched_clock_early().
> + *
> + * The early clock is running until we switch to a stable clock, or when we
> + * learn that the stable clock is not available.
> + */
> +void __init sched_clock_early_init(void)
> +{
> + sched_clock_early_running = true;
s/
sched_clock_early_running =true/
static_branch_enable(&__use_sched_clock_early)/
Thanks,
dou.
> +}
> +
> #else /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */
>
> u64 sched_clock_cpu(int cpu)
>
Hi Dou,
Thank you for your comments:
>> {
>> x86_init.timers.timer_init();
>> tsc_init();
>> + tsc_early_fini();
>
> tsc_early_fini() is defined in patch 2, I guess you may miss it
> when you split your patches.
Indeed, I will move it to patch 2.
>> +static DEFINE_STATIC_KEY_TRUE(__use_sched_clock_early);
>> +static bool __read_mostly sched_clock_early_running;
>> +
>
> In my opinion, these two parameters are repetitive, I suggest remove
> one.
>
> eg. remove sched_clock_early_running like below
> First, static DEFINE_STATIC_KEY_FALSE(__use_sched_clock_early);
We can't change the static branches before jump_label_init() is called,
and we start early boot timestamps before that
This is why having two booleans is appropriate: one that can be changed
early in boot, and another to patch the hotcode in order to keep good
performance after boot.
I will update comment before __use_sched_clock_early explaining the
reason why we need two of them.
Thank you,
Pasha
Hi Pasha,
At 08/14/2017 11:44 PM, Pasha Tatashin wrote:
> Hi Dou,
>
> Thank you for your comments:
>
>>> {
>>> x86_init.timers.timer_init();
>>> tsc_init();
>>> + tsc_early_fini();
>>
>> tsc_early_fini() is defined in patch 2, I guess you may miss it
>> when you split your patches.
>
> Indeed, I will move it to patch 2.
>
>>> +static DEFINE_STATIC_KEY_TRUE(__use_sched_clock_early);
>>> +static bool __read_mostly sched_clock_early_running;
>>> +
>>
>> In my opinion, these two parameters are repetitive, I suggest remove
>> one.
>>
>> eg. remove sched_clock_early_running like below
>> First, static DEFINE_STATIC_KEY_FALSE(__use_sched_clock_early);
>
> We can't change the static branches before jump_label_init() is called,
> and we start early boot timestamps before that
>
I understood, I was wrong, thanks for your explanation.
Thanks
dou.
> This is why having two booleans is appropriate: one that can be changed
> early in boot, and another to patch the hotcode in order to keep good
> performance after boot.
>
> I will update comment before __use_sched_clock_early explaining the
> reason why we need two of them.
>
> Thank you,
> Pasha
>
>
>