2014-12-22 05:06:39

by Cyril Bur

[permalink] [raw]
Subject: [PATCH 0/2] Quieten softlockup detector on virtualised kernels

When the hypervisor pauses a virtualised kernel the kernel will observe a jump
in timebase, this can cause spurious messages from the softlockup detector.

Whilst these messages are harmless, they are accompanied with a stack trace
which causes undue concern and more problematically the stack trace in the
guest has nothing to do with the observed problem and can only be misleading.

Futhermore, on POWER8 this is completely avoidable with the introduction of
the Virtual Time Base (VTB) register.

Cyril Bur (2):
Add another clock for use with the soft lockup watchdog.
powerpc: add running_clock for powerpc to prevent spurious softlockup
warnings

arch/powerpc/kernel/time.c | 24 ++++++++++++++++++++++++
include/linux/sched.h | 1 +
kernel/sched/clock.c | 14 ++++++++++++++
kernel/watchdog.c | 2 +-
4 files changed, 40 insertions(+), 1 deletion(-)

--
1.9.1


2014-12-22 05:06:55

by Cyril Bur

[permalink] [raw]
Subject: [PATCH 1/2] Add another clock for use with the soft lockup watchdog.

This permits the use of arch specific clocks for which virtualised kernels can
use their notion of 'running' time, not the elpased wall time which will
include host execution time.

Signed-off-by: Cyril Bur <[email protected]>
---
include/linux/sched.h | 1 +
kernel/sched/clock.c | 14 ++++++++++++++
kernel/watchdog.c | 2 +-
3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8db31ef..e400162 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2145,6 +2145,7 @@ extern unsigned long long notrace sched_clock(void);
*/
extern u64 cpu_clock(int cpu);
extern u64 local_clock(void);
+extern u64 running_clock(void);
extern u64 sched_clock_cpu(int cpu);


diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index c27e4f8..c83af4f 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -74,6 +74,20 @@ unsigned long long __weak sched_clock(void)
}
EXPORT_SYMBOL_GPL(sched_clock);

+/*
+ * Running clock - returns the time that has elapsed while a guest has been
+ * running.
+ * On a guest this value should be sched_clock minus the time the
+ * guest was suspended by the hypervisor (for any reason).
+ * On bare metal this function should return the same as sched_clock.
+ * Architectures and sub-architectures can override this.
+ */
+unsigned long long __weak running_clock(void)
+{
+ return sched_clock();
+}
+EXPORT_SYMBOL_GPL(running_clock);
+
__read_mostly int sched_clock_running;

#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 70bf118..3174bf8 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -154,7 +154,7 @@ static int get_softlockup_thresh(void)
*/
static unsigned long get_timestamp(void)
{
- return local_clock() >> 30LL; /* 2^30 ~= 10^9 */
+ return running_clock() >> 30LL; /* 2^30 ~= 10^9 */
}

static void set_sample_period(void)
--
1.9.1

2014-12-22 05:07:07

by Cyril Bur

[permalink] [raw]
Subject: [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings

On POWER8 virtualised kernels the VTB register can be read to have a view of
time that only increases while the guest is running. This will prevent guests
from seeing time jump if a guest is paused for significant amounts of time.

On POWER7 and below virtualised kernels stolen time is subtracted from
sched_clock as a best effort approximation. This will not eliminate spurious
warnings in the case of a suspended guest but may reduce the occurance in the
case of softlockups due to host over commit.

Bare metal kernels should avoid reading the VTB as KVM does not restore sane
values when not executing. sched_clock is returned in this case.

Signed-off-by: Cyril Bur <[email protected]>
---
arch/powerpc/kernel/time.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index fa7c4f1..9ba13ec 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -621,6 +621,30 @@ unsigned long long sched_clock(void)
return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
}

+unsigned long long running_clock(void)
+{
+ /*
+ * Don't read the VTB as a host since KVM does not switch in host timebase
+ * into the VTB when it takes a guest off the CPU, reading the VTB would
+ * result in reading 'last switched out' guest VTB.
+ */
+
+ if (firmware_has_feature(FW_FEATURE_LPAR)) {
+ if (cpu_has_feature(CPU_FTR_ARCH_207S))
+ return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
+
+ /* This is a next best approximation without a VTB. */
+ return sched_clock() - cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
+ }
+
+ /*
+ * On a host which doesn't do any virtualisation TB *should* equal VTB so
+ * it makes no difference anyway.
+ */
+
+ return sched_clock();
+}
+
static int __init get_freq(char *name, int cells, unsigned long *val)
{
struct device_node *cpu;
--
1.9.1