Hello!
If there is a sufficient delay between reading the watchdog clock and the
clock under test, the clock under test will be marked unstable through no
fault of its own. This series checks for this, doing limited retries
to get a good set of clock reads. If the clock is marked unstable
and is marked as being per-CPU, cross-CPU synchronization is checked.
This series also provides delay injection, which may be enabled via
kernel boot parameters to test the checking for delays.
Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
vCPU preemption.
1. Provide module parameters to inject delays in watchdog.
2. Retry clock read if long delays detected.
3. Check per-CPU clock synchronization when marked unstable.
4. Provide a module parameter to fuzz per-CPU clock checking.
5. Limit number of CPUs checked for clock synchronization.
6. Reduce clocksource-skew threshold for TSC.
Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
kernel test robot:
o Automatically compute the uncertainty margin for clocksource, and
also allow them to be specified manually before that clocksource
is registered.
o For the automatically computed uncertainty margins, bound them
below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
o For the manually specified uncertainty margins, splat (but
continue) if they are less than 100 microseconds (again 2 *
WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
production use of this clock-skew-inducing debugging technique.
o Manually set the uncertainty margin for clocksource_jiffies
(and thus refined_jiffies) to TICK_NSEC to compensate for the
very low frequency of these clocks.
o Manually set the uncertainty margin for clocksource_tsc_early
to 32 milliseconds.
o Apply numerous "Link:" fields to all patches.
o Add some acks and CCs.
Changes since v9:
o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
Zhengjun; and Thomas Gleixner.
o Improve CPU selection for clock-synchronization checking.
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Changes since v8, based on Thomas Gleixner feedback:
o Reduced clock-skew threshold to 200us and delay limit to 50us.
o Split out a cs_watchdog_read() function.
o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Changes since v7, based on Thomas Gleixner feedback:
o Fix embarrassing git-format-patch operator error.
o Merge pairwise clock-desynchronization checking into the checking
of per-CPU clock synchronization when marked unstable.
o Do selective per-CPU checking rather than blindly checking all
CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
to control this behavior, with the value -1 choosing the old
check-all-CPUs behavior. The default is to randomly check 8 CPUs.
o Fix the clock-desynchronization checking to avoid a potential
use-after-free error for dynamically allocated clocksource
structures.
o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
clocksource skew checking.
o Update commit logs and do code-style updates.
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Changes since v5:
o Rebased to v5.12-rc5.
Changes since v4:
o Rebased to v5.12-rc1.
Changes since v3:
o Rebased to v5.11.
o Apply Randy Dunlap feedback.
Changes since v2:
o Rebased to v5.11-rc6.
o Updated Cc: list.
Changes since v1:
o Applied feedback from Rik van Riel.
o Rebased to v5.11-rc3.
o Stripped "RFC" from the subject lines.
Thanx, Paul
------------------------------------------------------------------------
Documentation/admin-guide/kernel-parameters.txt | 32 +++
arch/x86/kernel/tsc.c | 1
b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
b/arch/x86/kernel/tsc.c | 3
b/include/linux/clocksource.h | 2
b/kernel/time/clocksource.c | 23 ++
b/kernel/time/jiffies.c | 15 -
include/linux/clocksource.h | 3
kernel/time/clocksource.c | 227 ++++++++++++++++++++--
9 files changed, 304 insertions(+), 23 deletions(-)
Code that checks for clock desynchronization must itself be tested, so
create a new clocksource.inject_delay_shift_percpu= kernel boot parameter
that adds or subtracts a large value from the check read, using the
specified bit of the CPU ID to determine whether to add or to subtract.
Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Cc: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Reported-by: Chris Mason <[email protected]>
[ paulmck: Apply Randy Dunlap feedback. ]
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 16 ++++++++++++++++
kernel/time/clocksource.c | 10 +++++++++-
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index eb40e5332236..babcc9e80fba 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -604,6 +604,22 @@
from, and complained about, and to larger values
to force the clock to be marked unstable.
+ clocksource.inject_delay_shift_percpu= [KNL]
+ Clocksource delay injection partitions the CPUs
+ into two sets, one whose clocks are moved ahead
+ and the other whose clocks are moved behind.
+ This kernel parameter selects the CPU-number
+ bit that determines which of these two sets the
+ corresponding CPU is placed into. For example,
+ setting this parameter to the value 4 will result
+ in the first set containing alternating groups
+ of 16 CPUs whose clocks are moved ahead, while
+ the second set will contain the rest of the CPUs,
+ whose clocks are moved behind.
+
+ The default value of -1 disables this type of
+ error injection.
+
clocksource.max_read_retries= [KNL]
Number of clocksource_watchdog() retries due to
external delays before the clock will be marked
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index a8d73e1f9431..584433448226 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -196,6 +196,8 @@ static ulong inject_delay_period;
module_param(inject_delay_period, ulong, 0644);
static ulong inject_delay_repeat = 1;
module_param(inject_delay_repeat, ulong, 0644);
+static int inject_delay_shift_percpu = -1;
+module_param(inject_delay_shift_percpu, int, 0644);
static ulong max_read_retries = 3;
module_param(max_read_retries, ulong, 0644);
@@ -252,8 +254,14 @@ static cpumask_t cpus_behind;
static void clocksource_verify_one_cpu(void *csin)
{
struct clocksource *cs = (struct clocksource *)csin;
+ s64 delta = 0;
+ int sign;
- csnow_mid = cs->read(cs);
+ if (inject_delay_shift_percpu >= 0) {
+ sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
+ delta = sign * NSEC_PER_SEC;
+ }
+ csnow_mid = cs->read(cs) + delta;
}
static void clocksource_verify_percpu(struct clocksource *cs)
--
2.31.1.189.g2e36527f23
Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in
a 500-millisecond WATCHDOG_INTERVAL. This requires that clocks be skewed
by more than 12.5% in order to be marked unstable. Except that a clock
that is skewed by that much is probably destroying unsuspecting software
right and left. And given that there are now checks for false-positive
skews due to delays between reading the two clocks, it should be possible
to greatly decrease WATCHDOG_THRESHOLD, at least for fine-grained clocks
such as TSC.
Therefore, add a new uncertainty_margin field to the clocksource
structure that contains the maximum uncertainty in nanoseconds for
the corresponding clock. This field may be initialized manually,
as it is for clocksource_tsc_early and clocksource_jiffies, which
is copied to refined_jiffies. If the field is not initialized
manually, it will be computed at clock-registry time as the period
of the clock in question based on the scale and freq parameters to
__clocksource_update_freq_scale() function. If either of those two
parameters are zero, the tens-of-milliseconds WATCHDOG_THRESHOLD is
used as a cowardly alternative to dividing by zero. No matter how the
uncertainty_margin field is calculated, it is bounded below by twice
WATCHDOG_MAX_SKEW, that is, by 100 microseconds.
Note that manually initialized uncertainty_margin fields are not adjusted,
but there is a WARN_ON_ONCE() that triggers if any such field is less than
twice WATCHDOG_MAX_SKEW. This WARN_ON_ONCE() is intended to discourage
production use of the one-nanosecond uncertainty_margin values that are
used to test the clock-skew code itself.
The actual clock-skew check uses the sum of the uncertainty_margin fields
of the two clocksource structures being compared. Integer overflow is
avoided because the largest computed value of the uncertainty_margin
fields is one billion (10^9), and double that value fits into an
unsigned int. However, if someone manually specifies (say) UINT_MAX,
they will get what they deserve.
Note that the refined_jiffies uncertainty_margin field is initialized to
TICK_NSEC, which means that skew checks involving this clocksource will
be sufficently forgiving. In a similar vein, the clocksource_tsc_early
uncertainty_margin field is initialized to 32*NSEC_PER_MSEC, which
replicates the current behavior and allows custom setting if needed
in order to address the rare skews detected for this clocksource in
current mainline.
Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Suggested-by: Thomas Gleixner <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Cc: Feng Tang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/x86/kernel/tsc.c | 1 +
include/linux/clocksource.h | 3 +++
kernel/time/clocksource.c | 30 ++++++++++++++++++++++++++----
kernel/time/jiffies.c | 15 ++++++++-------
4 files changed, 38 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 56289170753c..6e11c9619437 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1127,6 +1127,7 @@ static int tsc_cs_enable(struct clocksource *cs)
static struct clocksource clocksource_tsc_early = {
.name = "tsc-early",
.rating = 299,
+ .uncertainty_margin = 32 * NSEC_PER_MSEC,
.read = read_tsc,
.mask = CLOCKSOURCE_MASK(64),
.flags = CLOCK_SOURCE_IS_CONTINUOUS |
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 83a3ebff7456..8f87c1a6f323 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -42,6 +42,8 @@ struct module;
* @shift: Cycle to nanosecond divisor (power of two)
* @max_idle_ns: Maximum idle time permitted by the clocksource (nsecs)
* @maxadj: Maximum adjustment value to mult (~11%)
+ * @uncertainty_margin: Maximum uncertainty in nanoseconds per half second.
+ * Zero says to use default WATCHDOG_THRESHOLD.
* @archdata: Optional arch-specific data
* @max_cycles: Maximum safe cycle value which won't overflow on
* multiplication
@@ -93,6 +95,7 @@ struct clocksource {
u32 shift;
u64 max_idle_ns;
u32 maxadj;
+ u32 uncertainty_margin;
#ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
struct arch_clocksource_data archdata;
#endif
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index f71f375df544..c228f3727191 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -122,17 +122,17 @@ static int clocksource_watchdog_kthread(void *data);
static void __clocksource_change_rating(struct clocksource *cs, int rating);
/*
- * Interval: 0.5sec Threshold: 0.0625s
+ * Interval: 0.5sec Threshold: 0.0312s, when doubled: 0.0625s
*/
#define WATCHDOG_INTERVAL (HZ >> 1)
-#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 5)
/*
* Maximum permissible delay between two readouts of the watchdog
* clocksource surrounding a read of the clocksource being validated.
* This delay could be due to SMIs, NMIs, or to VCPU preemptions.
*/
-#define WATCHDOG_MAX_SKEW (100 * NSEC_PER_USEC)
+#define WATCHDOG_MAX_SKEW (50 * NSEC_PER_USEC)
static void clocksource_watchdog_work(struct work_struct *work)
{
@@ -377,6 +377,7 @@ static void clocksource_watchdog(struct timer_list *unused)
int next_cpu, reset_pending;
int64_t wd_nsec, cs_nsec;
struct clocksource *cs;
+ u32 md;
spin_lock(&watchdog_lock);
if (!watchdog_running)
@@ -423,7 +424,8 @@ static void clocksource_watchdog(struct timer_list *unused)
continue;
/* Check the deviation from the watchdog clocksource. */
- if (abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
+ md = cs->uncertainty_margin + watchdog->uncertainty_margin;
+ if (abs(cs_nsec - wd_nsec) > md) {
pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
smp_processor_id(), cs->name);
pr_warn(" '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
@@ -1076,6 +1078,26 @@ void __clocksource_update_freq_scale(struct clocksource *cs, u32 scale, u32 freq
clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
NSEC_PER_SEC / scale, sec * scale);
}
+
+ /*
+ * If the uncertainty margin is not specified, calculate it.
+ * If both scale and freq are non-zero, calculate the clock
+ * period, but bound below at 2*WATCHDOG_MAX_SKEW. However,
+ * if either of scale or freq is zero, be very conservative and
+ * take the tens-of-milliseconds WATCHDOG_THRESHOLD value for the
+ * uncertainty margin. Allow stupidly small uncertainty margins
+ * to be specified by the caller for testing purposes, but warn
+ * to discourage production use of this capability.
+ */
+ if (scale && freq && !cs->uncertainty_margin) {
+ cs->uncertainty_margin = NSEC_PER_SEC / (scale * freq);
+ if (cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW)
+ cs->uncertainty_margin = 2 * WATCHDOG_MAX_SKEW;
+ } else if (!cs->uncertainty_margin) {
+ cs->uncertainty_margin = WATCHDOG_THRESHOLD;
+ }
+ WARN_ON_ONCE(cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW);
+
/*
* Ensure clocksources that have large 'mult' values don't overflow
* when adjusted.
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index a5cffe2a1770..165b85bcdf29 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -49,13 +49,14 @@ static u64 jiffies_read(struct clocksource *cs)
* for "tick-less" systems.
*/
static struct clocksource clocksource_jiffies = {
- .name = "jiffies",
- .rating = 1, /* lowest valid rating*/
- .read = jiffies_read,
- .mask = CLOCKSOURCE_MASK(32),
- .mult = TICK_NSEC << JIFFIES_SHIFT, /* details above */
- .shift = JIFFIES_SHIFT,
- .max_cycles = 10,
+ .name = "jiffies",
+ .rating = 1, /* lowest valid rating*/
+ .uncertainty_margin = TICK_NSEC,
+ .read = jiffies_read,
+ .mask = CLOCKSOURCE_MASK(32),
+ .mult = TICK_NSEC << JIFFIES_SHIFT, /* details above */
+ .shift = JIFFIES_SHIFT,
+ .max_cycles = 10,
};
__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock);
--
2.31.1.189.g2e36527f23
When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks. Yes, interrupts are
disabled across those two reads, but there are no shortage of things that
can delay interrupts-disabled regions of code ranging from SMI handlers
to vCPU preemption. It would be good to have some indication as to why
the clock was marked unstable.
Therefore, re-read the watchdog clock on either side of the read
from the clock under test. If the watchdog clock shows an excessive
time delta between its pair of reads, the reads are retried. The
maximum number of retries is specified by a new kernel boot parameter
clocksource.max_read_retries, which defaults to three, that is, up to four
reads, one initial and up to three retries. If more than one retry was
required, a message is printed on the console (the occasional single retry
is expected behavior, especially in guest OSes). If the maximum number
of retries is exceeded, the clock under test will be marked unstable.
However, the probability of this happening due to various sorts of
delays is quite small. In addition, the reason (clock-read delays)
for the unstable marking will be apparent.
Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Cc: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Acked-by: Feng Tang <[email protected]>
Reported-by: Chris Mason <[email protected]>
[ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ]
[ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ]
[ paulmck: Apply Thomas Gleixner feedback. ]
Signed-off-by: Paul E. McKenney <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 6 +++
kernel/time/clocksource.c | 53 ++++++++++++++++---
2 files changed, 52 insertions(+), 7 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b68cb54bc872..eb40e5332236 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -604,6 +604,12 @@
from, and complained about, and to larger values
to force the clock to be marked unstable.
+ clocksource.max_read_retries= [KNL]
+ Number of clocksource_watchdog() retries due to
+ external delays before the clock will be marked
+ unstable. Defaults to three retries, that is,
+ four attempts to read the clock under test.
+
clearcpuid=BITNUM[,BITNUM...] [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index f1e1e6e4b387..94bfdb53f2f4 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -125,6 +125,13 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
#define WATCHDOG_INTERVAL (HZ >> 1)
#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+/*
+ * Maximum permissible delay between two readouts of the watchdog
+ * clocksource surrounding a read of the clocksource being validated.
+ * This delay could be due to SMIs, NMIs, or to VCPU preemptions.
+ */
+#define WATCHDOG_MAX_SKEW (100 * NSEC_PER_USEC)
+
static void clocksource_watchdog_work(struct work_struct *work)
{
/*
@@ -189,6 +196,8 @@ static ulong inject_delay_period;
module_param(inject_delay_period, ulong, 0644);
static ulong inject_delay_repeat = 1;
module_param(inject_delay_repeat, ulong, 0644);
+static ulong max_read_retries = 3;
+module_param(max_read_retries, ulong, 0644);
static void clocksource_watchdog_inject_delay(void)
{
@@ -206,12 +215,42 @@ static void clocksource_watchdog_inject_delay(void)
invocations++;
}
+static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
+{
+ unsigned int nretries;
+ u64 wd_end, wd_delta;
+ int64_t wd_delay;
+
+ for (nretries = 0; nretries <= max_read_retries; nretries++) {
+ local_irq_disable();
+ *wdnow = watchdog->read(watchdog);
+ clocksource_watchdog_inject_delay();
+ *csnow = cs->read(cs);
+ wd_end = watchdog->read(watchdog);
+ local_irq_enable();
+
+ wd_delta = clocksource_delta(wd_end, *wdnow, watchdog->mask);
+ wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, watchdog->shift);
+ if (wd_delay <= WATCHDOG_MAX_SKEW) {
+ if (nretries > 1 || nretries >= max_read_retries) {
+ pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n",
+ smp_processor_id(), watchdog->name, nretries);
+ }
+ return true;
+ }
+ }
+
+ pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n",
+ smp_processor_id(), watchdog->name, wd_delay, nretries);
+ return false;
+}
+
static void clocksource_watchdog(struct timer_list *unused)
{
- struct clocksource *cs;
u64 csnow, wdnow, cslast, wdlast, delta;
- int64_t wd_nsec, cs_nsec;
int next_cpu, reset_pending;
+ int64_t wd_nsec, cs_nsec;
+ struct clocksource *cs;
spin_lock(&watchdog_lock);
if (!watchdog_running)
@@ -228,11 +267,11 @@ static void clocksource_watchdog(struct timer_list *unused)
continue;
}
- local_irq_disable();
- csnow = cs->read(cs);
- clocksource_watchdog_inject_delay();
- wdnow = watchdog->read(watchdog);
- local_irq_enable();
+ if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
+ /* Clock readout unreliable, so give it up. */
+ __clocksource_unstable(cs);
+ continue;
+ }
/* Clocksource initialized ? */
if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
--
2.31.1.189.g2e36527f23
Currently, if skew is detected on a clock marked CLOCK_SOURCE_VERIFY_PERCPU,
that clock is checked on all CPUs. This is thorough, but might not be
what you want on a system with a few tens of CPUs, let alone a few hundred
of them.
Therefore, by default check only up to eight randomly chosen CPUs.
Also provide a new clocksource.verify_n_cpus kernel boot parameter.
A value of -1 says to check all of the CPUs, and a non-negative value says
to randomly select that number of CPUs, without concern about selecting
the same CPU multiple times. However, make use of a cpumask so that a
given CPU will be checked at most once.
Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Suggested-by: Thomas Gleixner <[email protected]> # For verify_n_cpus=1.
Cc: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 10 +++
kernel/time/clocksource.c | 74 ++++++++++++++++++-
2 files changed, 82 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index babcc9e80fba..4058a74df9ab 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -626,6 +626,16 @@
unstable. Defaults to three retries, that is,
four attempts to read the clock under test.
+ clocksource.verify_n_cpus= [KNL]
+ Limit the number of CPUs checked for clocksources
+ marked with CLOCK_SOURCE_VERIFY_PERCPU that
+ are marked unstable due to excessive skew.
+ A negative value says to check all CPUs, while
+ zero says not to check any. Values larger than
+ nr_cpu_ids are silently truncated to nr_cpu_ids.
+ The actual CPUs are chosen randomly, with
+ no replacement if the same CPU is chosen twice.
+
clearcpuid=BITNUM[,BITNUM...] [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 584433448226..f71f375df544 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -15,6 +15,8 @@
#include <linux/tick.h>
#include <linux/kthread.h>
#include <linux/delay.h>
+#include <linux/prandom.h>
+#include <linux/cpu.h>
#include "tick-internal.h"
#include "timekeeping_internal.h"
@@ -200,6 +202,8 @@ static int inject_delay_shift_percpu = -1;
module_param(inject_delay_shift_percpu, int, 0644);
static ulong max_read_retries = 3;
module_param(max_read_retries, ulong, 0644);
+static int verify_n_cpus = 8;
+module_param(verify_n_cpus, int, 0644);
static void clocksource_watchdog_inject_delay(void)
{
@@ -250,6 +254,55 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
static u64 csnow_mid;
static cpumask_t cpus_ahead;
static cpumask_t cpus_behind;
+static cpumask_t cpus_chosen;
+
+static void clocksource_verify_choose_cpus(void)
+{
+ int cpu, i, n = verify_n_cpus;
+
+ if (n < 0) {
+ /* Check all of the CPUs. */
+ cpumask_copy(&cpus_chosen, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), &cpus_chosen);
+ return;
+ }
+
+ /* If no checking desired, or no other CPU to check, leave. */
+ cpumask_clear(&cpus_chosen);
+ if (n == 0 || num_online_cpus() <= 1)
+ return;
+
+ /* Make sure to select at least one CPU other than the current CPU. */
+ cpu = cpumask_next(-1, cpu_online_mask);
+ if (cpu == smp_processor_id())
+ cpu = cpumask_next(cpu, cpu_online_mask);
+ if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
+ return;
+ cpumask_set_cpu(cpu, &cpus_chosen);
+
+ /* Force a sane value for the boot parameter. */
+ if (n > nr_cpu_ids)
+ n = nr_cpu_ids;
+
+ /*
+ * Randomly select the specified number of CPUs. If the same
+ * CPU is selected multiple times, that CPU is checked only once,
+ * and no replacement CPU is selected. This gracefully handles
+ * situations where verify_n_cpus is greater than the number of
+ * CPUs that are currently online.
+ */
+ for (i = 1; i < n; i++) {
+ cpu = prandom_u32() % nr_cpu_ids;
+ cpu = cpumask_next(cpu - 1, cpu_online_mask);
+ if (cpu >= nr_cpu_ids)
+ cpu = cpumask_next(-1, cpu_online_mask);
+ if (!WARN_ON_ONCE(cpu >= nr_cpu_ids))
+ cpumask_set_cpu(cpu, &cpus_chosen);
+ }
+
+ /* Don't verify ourselves. */
+ cpumask_clear_cpu(smp_processor_id(), &cpus_chosen);
+}
static void clocksource_verify_one_cpu(void *csin)
{
@@ -271,12 +324,22 @@ static void clocksource_verify_percpu(struct clocksource *cs)
int cpu, testcpu;
s64 delta;
+ if (verify_n_cpus == 0)
+ return;
cpumask_clear(&cpus_ahead);
cpumask_clear(&cpus_behind);
+ get_online_cpus();
preempt_disable();
+ clocksource_verify_choose_cpus();
+ if (cpumask_weight(&cpus_chosen) == 0) {
+ preempt_enable();
+ put_online_cpus();
+ pr_warn("Not enough CPUs to check clocksource '%s'.\n", cs->name);
+ return;
+ }
testcpu = smp_processor_id();
- pr_warn("Checking clocksource %s synchronization from CPU %d.\n", cs->name, testcpu);
- for_each_online_cpu(cpu) {
+ pr_warn("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n", cs->name, testcpu, cpumask_pr_args(&cpus_chosen));
+ for_each_cpu(cpu, &cpus_chosen) {
if (cpu == testcpu)
continue;
csnow_begin = cs->read(cs);
@@ -296,6 +359,7 @@ static void clocksource_verify_percpu(struct clocksource *cs)
cs_nsec_min = cs_nsec;
}
preempt_enable();
+ put_online_cpus();
if (!cpumask_empty(&cpus_ahead))
pr_warn(" CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
cpumask_pr_args(&cpus_ahead), testcpu, cs->name);
@@ -366,6 +430,12 @@ static void clocksource_watchdog(struct timer_list *unused)
watchdog->name, wdnow, wdlast, watchdog->mask);
pr_warn(" '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
cs->name, csnow, cslast, cs->mask);
+ if (curr_clocksource == cs)
+ pr_warn(" '%s' is current clocksource.\n", cs->name);
+ else if (curr_clocksource)
+ pr_warn(" '%s' (not '%s') is current clocksource.\n", curr_clocksource->name, cs->name);
+ else
+ pr_warn(" No current clocksource.\n");
__clocksource_unstable(cs);
continue;
}
--
2.31.1.189.g2e36527f23
When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks. Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can
delay interrupts-disabled regions of code ranging from SMI handlers to
vCPU preemption. It would be good to have some indication as to why
the clock was marked unstable.
The first step is a way of injecting such delays. Therefore, provide
clocksource.inject_delay_freq and clocksource.inject_delay_run kernel boot
parameters that specify that sufficient delay be injected to cause the
clocksource_watchdog() function to mark a clock unstable. This delay is
injected every Nth set of M calls to clocksource_watchdog(), where N is
the value specified for the inject_delay_freq boot parameter and M is the
value specified for the inject_delay_run boot parameter. Values of zero
or less for either parameter disable delay injection, and the default for
clocksource.inject_delay_freq is zero, that is, disabled. The default for
clocksource.inject_delay_run is the value one, that is single-call runs.
This facility is intended for diagnostic use only, and should be avoided
on production systems.
Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Cc: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Xing Zhengjun <[email protected]>
[ paulmck: Apply Rik van Riel feedback. ]
[ paulmck: Apply Thomas Gleixner feedback. ]
Reported-by: Chris Mason <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 21 +++++++++++++++++
kernel/time/clocksource.c | 23 +++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 04545725f187..b68cb54bc872 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -583,6 +583,27 @@
loops can be debugged more effectively on production
systems.
+ clocksource.inject_delay_period= [KNL]
+ Number of calls to clocksource_watchdog() before
+ delays are injected between reads from the
+ two clocksources. Values of zero disable this
+ delay injection. These delays can cause clocks
+ to be marked unstable, so use of this parameter
+ should therefore be avoided on production systems.
+ Defaults to zero (disabled).
+
+ clocksource.inject_delay_repeat= [KNL]
+ Number of repeated clocksource_watchdog() delay
+ injections per period. If inject_delay_period
+ is five and inject_delay_repeat is three, there
+ will be five delay-free reads followed by three
+ delayed reads. Set to 1 to test isolated delays
+ being silently ignored and recovered from,
+ to between 2 and clocksource.max_read_retries
+ to test grouped delays being ignored, recovered
+ from, and complained about, and to larger values
+ to force the clock to be marked unstable.
+
clearcpuid=BITNUM[,BITNUM...] [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a2cc7c..f1e1e6e4b387 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -14,6 +14,7 @@
#include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
#include <linux/tick.h>
#include <linux/kthread.h>
+#include <linux/delay.h>
#include "tick-internal.h"
#include "timekeeping_internal.h"
@@ -184,6 +185,27 @@ void clocksource_mark_unstable(struct clocksource *cs)
spin_unlock_irqrestore(&watchdog_lock, flags);
}
+static ulong inject_delay_period;
+module_param(inject_delay_period, ulong, 0644);
+static ulong inject_delay_repeat = 1;
+module_param(inject_delay_repeat, ulong, 0644);
+
+static void clocksource_watchdog_inject_delay(void)
+{
+ static unsigned int invocations = 1, injections;
+
+ if (!inject_delay_period || !inject_delay_repeat)
+ return;
+ if (!(invocations % inject_delay_period)) {
+ pr_warn("%s(): Injecting delay.\n", __func__);
+ mdelay(2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC);
+ if (++injections < inject_delay_repeat)
+ return;
+ injections = 0;
+ }
+ invocations++;
+}
+
static void clocksource_watchdog(struct timer_list *unused)
{
struct clocksource *cs;
@@ -208,6 +230,7 @@ static void clocksource_watchdog(struct timer_list *unused)
local_irq_disable();
csnow = cs->read(cs);
+ clocksource_watchdog_inject_delay();
wdnow = watchdog->read(watchdog);
local_irq_enable();
--
2.31.1.189.g2e36527f23
Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other. However, this problem has purportedy
been solved in the past ten years. Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization. How would anyone know?
Therefore apply CPU-to-CPU synchronization checking to newly unstable
clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag.
Lists of desynchronized CPUs are printed, with the caveat that if it
is the reporting CPU that is itself desynchronized, it will appear that
all the other clocks are wrong. Just like in real life.
Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Cc: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Reported-by: Chris Mason <[email protected]>
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ]
[ paulmck: Apply Thomas Gleixner feedback. ]
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/x86/kernel/tsc.c | 3 +-
include/linux/clocksource.h | 2 +-
kernel/time/clocksource.c | 60 +++++++++++++++++++++++++++++++++++++
3 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc2771f..56289170753c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
.mask = CLOCKSOURCE_MASK(64),
.flags = CLOCK_SOURCE_IS_CONTINUOUS |
CLOCK_SOURCE_VALID_FOR_HRES |
- CLOCK_SOURCE_MUST_VERIFY,
+ CLOCK_SOURCE_MUST_VERIFY |
+ CLOCK_SOURCE_VERIFY_PERCPU,
.vdso_clock_mode = VDSO_CLOCKMODE_TSC,
.enable = tsc_cs_enable,
.resume = tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143db6523..83a3ebff7456 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -131,7 +131,7 @@ struct clocksource {
#define CLOCK_SOURCE_UNSTABLE 0x40
#define CLOCK_SOURCE_SUSPEND_NONSTOP 0x80
#define CLOCK_SOURCE_RESELECT 0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU 0x200
/* simplify initialization of mask field */
#define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 94bfdb53f2f4..a8d73e1f9431 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -245,6 +245,60 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
return false;
}
+static u64 csnow_mid;
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+ struct clocksource *cs = (struct clocksource *)csin;
+
+ csnow_mid = cs->read(cs);
+}
+
+static void clocksource_verify_percpu(struct clocksource *cs)
+{
+ int64_t cs_nsec, cs_nsec_max = 0, cs_nsec_min = LLONG_MAX;
+ u64 csnow_begin, csnow_end;
+ int cpu, testcpu;
+ s64 delta;
+
+ cpumask_clear(&cpus_ahead);
+ cpumask_clear(&cpus_behind);
+ preempt_disable();
+ testcpu = smp_processor_id();
+ pr_warn("Checking clocksource %s synchronization from CPU %d.\n", cs->name, testcpu);
+ for_each_online_cpu(cpu) {
+ if (cpu == testcpu)
+ continue;
+ csnow_begin = cs->read(cs);
+ smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+ csnow_end = cs->read(cs);
+ delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
+ if (delta < 0)
+ cpumask_set_cpu(cpu, &cpus_behind);
+ delta = (csnow_end - csnow_mid) & cs->mask;
+ if (delta < 0)
+ cpumask_set_cpu(cpu, &cpus_ahead);
+ delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+ cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+ if (cs_nsec > cs_nsec_max)
+ cs_nsec_max = cs_nsec;
+ if (cs_nsec < cs_nsec_min)
+ cs_nsec_min = cs_nsec;
+ }
+ preempt_enable();
+ if (!cpumask_empty(&cpus_ahead))
+ pr_warn(" CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+ cpumask_pr_args(&cpus_ahead), testcpu, cs->name);
+ if (!cpumask_empty(&cpus_behind))
+ pr_warn(" CPUs %*pbl behind CPU %d for clocksource %s.\n",
+ cpumask_pr_args(&cpus_behind), testcpu, cs->name);
+ if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind))
+ pr_warn(" CPU %d check durations %lldns - %lldns for clocksource %s.\n",
+ testcpu, cs_nsec_min, cs_nsec_max, cs->name);
+}
+
static void clocksource_watchdog(struct timer_list *unused)
{
u64 csnow, wdnow, cslast, wdlast, delta;
@@ -469,6 +523,12 @@ static int __clocksource_watchdog_kthread(void)
unsigned long flags;
int select = 0;
+ /* Do any required per-CPU skew verification. */
+ if (curr_clocksource &&
+ curr_clocksource->flags & CLOCK_SOURCE_UNSTABLE &&
+ curr_clocksource->flags & CLOCK_SOURCE_VERIFY_PERCPU)
+ clocksource_verify_percpu(curr_clocksource);
+
spin_lock_irqsave(&watchdog_lock, flags);
list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
--
2.31.1.189.g2e36527f23
Hi "Paul,
I love your patch! Yet something to improve:
[auto build test ERROR on tip/timers/core]
[also build test ERROR on tip/x86/core linux/master linus/master v5.12]
[cannot apply to next-20210428]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Paul-E-McKenney/Do-not-mark-clocks-unstable-due-to-delays-for-v5-13/20210429-093259
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 2d036dfa5f10df9782f5278fc591d79d283c1fad
config: riscv-randconfig-p002-20210428 (attached as .config)
compiler: riscv64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/26be1e45593936a0c04aa1d268522f8c1cb646de
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Paul-E-McKenney/Do-not-mark-clocks-unstable-due-to-delays-for-v5-13/20210429-093259
git checkout 26be1e45593936a0c04aa1d268522f8c1cb646de
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 ARCH=riscv
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All errors (new ones prefixed by >>):
kernel/time/clocksource.c: In function '__clocksource_update_freq_scale':
>> kernel/time/clocksource.c:1094:36: error: 'WATCHDOG_MAX_SKEW' undeclared (first use in this function)
1094 | if (cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW)
| ^~~~~~~~~~~~~~~~~
kernel/time/clocksource.c:1094:36: note: each undeclared identifier is reported only once for each function it appears in
>> kernel/time/clocksource.c:1097:28: error: 'WATCHDOG_THRESHOLD' undeclared (first use in this function)
1097 | cs->uncertainty_margin = WATCHDOG_THRESHOLD;
| ^~~~~~~~~~~~~~~~~~
vim +/WATCHDOG_MAX_SKEW +1094 kernel/time/clocksource.c
1039
1040 /**
1041 * __clocksource_update_freq_scale - Used update clocksource with new freq
1042 * @cs: clocksource to be registered
1043 * @scale: Scale factor multiplied against freq to get clocksource hz
1044 * @freq: clocksource frequency (cycles per second) divided by scale
1045 *
1046 * This should only be called from the clocksource->enable() method.
1047 *
1048 * This *SHOULD NOT* be called directly! Please use the
1049 * __clocksource_update_freq_hz() or __clocksource_update_freq_khz() helper
1050 * functions.
1051 */
1052 void __clocksource_update_freq_scale(struct clocksource *cs, u32 scale, u32 freq)
1053 {
1054 u64 sec;
1055
1056 /*
1057 * Default clocksources are *special* and self-define their mult/shift.
1058 * But, you're not special, so you should specify a freq value.
1059 */
1060 if (freq) {
1061 /*
1062 * Calc the maximum number of seconds which we can run before
1063 * wrapping around. For clocksources which have a mask > 32-bit
1064 * we need to limit the max sleep time to have a good
1065 * conversion precision. 10 minutes is still a reasonable
1066 * amount. That results in a shift value of 24 for a
1067 * clocksource with mask >= 40-bit and f >= 4GHz. That maps to
1068 * ~ 0.06ppm granularity for NTP.
1069 */
1070 sec = cs->mask;
1071 do_div(sec, freq);
1072 do_div(sec, scale);
1073 if (!sec)
1074 sec = 1;
1075 else if (sec > 600 && cs->mask > UINT_MAX)
1076 sec = 600;
1077
1078 clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
1079 NSEC_PER_SEC / scale, sec * scale);
1080 }
1081
1082 /*
1083 * If the uncertainty margin is not specified, calculate it.
1084 * If both scale and freq are non-zero, calculate the clock
1085 * period, but bound below at 2*WATCHDOG_MAX_SKEW. However,
1086 * if either of scale or freq is zero, be very conservative and
1087 * take the tens-of-milliseconds WATCHDOG_THRESHOLD value for the
1088 * uncertainty margin. Allow stupidly small uncertainty margins
1089 * to be specified by the caller for testing purposes, but warn
1090 * to discourage production use of this capability.
1091 */
1092 if (scale && freq && !cs->uncertainty_margin) {
1093 cs->uncertainty_margin = NSEC_PER_SEC / (scale * freq);
> 1094 if (cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW)
1095 cs->uncertainty_margin = 2 * WATCHDOG_MAX_SKEW;
1096 } else if (!cs->uncertainty_margin) {
> 1097 cs->uncertainty_margin = WATCHDOG_THRESHOLD;
1098 }
1099 WARN_ON_ONCE(cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW);
1100
1101 /*
1102 * Ensure clocksources that have large 'mult' values don't overflow
1103 * when adjusted.
1104 */
1105 cs->maxadj = clocksource_max_adjustment(cs);
1106 while (freq && ((cs->mult + cs->maxadj < cs->mult)
1107 || (cs->mult - cs->maxadj > cs->mult))) {
1108 cs->mult >>= 1;
1109 cs->shift--;
1110 cs->maxadj = clocksource_max_adjustment(cs);
1111 }
1112
1113 /*
1114 * Only warn for *special* clocksources that self-define
1115 * their mult/shift values and don't specify a freq.
1116 */
1117 WARN_ONCE(cs->mult + cs->maxadj < cs->mult,
1118 "timekeeping: Clocksource %s might overflow on 11%% adjustment\n",
1119 cs->name);
1120
1121 clocksource_update_max_deferment(cs);
1122
1123 pr_info("%s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n",
1124 cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns);
1125 }
1126 EXPORT_SYMBOL_GPL(__clocksource_update_freq_scale);
1127
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <[email protected]> wrote:
>
> Hello!
>
> If there is a sufficient delay between reading the watchdog clock and the
> clock under test, the clock under test will be marked unstable through no
> fault of its own. This series checks for this, doing limited retries
> to get a good set of clock reads. If the clock is marked unstable
> and is marked as being per-CPU, cross-CPU synchronization is checked.
> This series also provides delay injection, which may be enabled via
> kernel boot parameters to test the checking for delays.
>
> Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> vCPU preemption.
>
> 1. Provide module parameters to inject delays in watchdog.
>
> 2. Retry clock read if long delays detected.
>
> 3. Check per-CPU clock synchronization when marked unstable.
>
> 4. Provide a module parameter to fuzz per-CPU clock checking.
>
> 5. Limit number of CPUs checked for clock synchronization.
>
> 6. Reduce clocksource-skew threshold for TSC.
>
> Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> kernel test robot:
>
> o Automatically compute the uncertainty margin for clocksource, and
> also allow them to be specified manually before that clocksource
> is registered.
>
> o For the automatically computed uncertainty margins, bound them
> below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
>
> o For the manually specified uncertainty margins, splat (but
> continue) if they are less than 100 microseconds (again 2 *
> WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> production use of this clock-skew-inducing debugging technique.
>
> o Manually set the uncertainty margin for clocksource_jiffies
> (and thus refined_jiffies) to TICK_NSEC to compensate for the
> very low frequency of these clocks.
>
> o Manually set the uncertainty margin for clocksource_tsc_early
> to 32 milliseconds.
>
> o Apply numerous "Link:" fields to all patches.
>
> o Add some acks and CCs.
>
> Changes since v9:
>
> o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> Zhengjun; and Thomas Gleixner.
>
> o Improve CPU selection for clock-synchronization checking.
>
> Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
>
> Changes since v8, based on Thomas Gleixner feedback:
>
> o Reduced clock-skew threshold to 200us and delay limit to 50us.
>
> o Split out a cs_watchdog_read() function.
>
> o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
>
> o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
>
> Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
>
> Changes since v7, based on Thomas Gleixner feedback:
>
> o Fix embarrassing git-format-patch operator error.
>
> o Merge pairwise clock-desynchronization checking into the checking
> of per-CPU clock synchronization when marked unstable.
>
> o Do selective per-CPU checking rather than blindly checking all
> CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> to control this behavior, with the value -1 choosing the old
> check-all-CPUs behavior. The default is to randomly check 8 CPUs.
>
> o Fix the clock-desynchronization checking to avoid a potential
> use-after-free error for dynamically allocated clocksource
> structures.
>
> o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> clocksource skew checking.
>
> o Update commit logs and do code-style updates.
>
> Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
>
> Changes since v5:
>
> o Rebased to v5.12-rc5.
>
> Changes since v4:
>
> o Rebased to v5.12-rc1.
>
> Changes since v3:
>
> o Rebased to v5.11.
>
> o Apply Randy Dunlap feedback.
>
> Changes since v2:
>
> o Rebased to v5.11-rc6.
>
> o Updated Cc: list.
>
> Changes since v1:
>
> o Applied feedback from Rik van Riel.
>
> o Rebased to v5.11-rc3.
>
> o Stripped "RFC" from the subject lines.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> Documentation/admin-guide/kernel-parameters.txt | 32 +++
> arch/x86/kernel/tsc.c | 1
> b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> b/arch/x86/kernel/tsc.c | 3
> b/include/linux/clocksource.h | 2
> b/kernel/time/clocksource.c | 23 ++
> b/kernel/time/jiffies.c | 15 -
> include/linux/clocksource.h | 3
> kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> 9 files changed, 304 insertions(+), 23 deletions(-)
Hi Paul,
using the v11, I added a approve flag and made it work for my early
inject test where tsc is good
through a cross tsc sync test. Ideally with the small tweak, we could
get less tsc issues to debug.
And I'm not sure it would help in real trouble shooting cases. But we
will see if it would help.
My hack patch snippet as below:
static void __clocksource_unstable(struct clocksource *cs)
{
+ if (!cs->approved)
+ return;
+
cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG);
cs->flags |= CLOCK_SOURCE_UNSTABLE;
@@ -366,9 +369,12 @@
if (!cpumask_empty(&cpus_behind))
pr_warn(" CPUs %*pbl behind CPU %d for
clocksource %s.\n",
cpumask_pr_args(&cpus_behind), testcpu, cs->name);
- if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind))
+ if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
pr_warn(" CPU %d check durations %lldns -
%lldns for clocksource %s.\n",
testcpu, cs_nsec_min, cs_nsec_max, cs->name);
+ cs->approved = true;
+ __clocksource_unstable(cs);
+ }
}
static void clocksource_watchdog(struct timer_list *unused)
@@ -396,6 +402,7 @@
if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
/* Clock readout unreliable, so give it up. */
+ cs->approved = false;
__clocksource_unstable(cs);
continue;
}
On Thu, Apr 29, 2021 at 03:02:36PM +0800, kernel test robot wrote:
> Hi "Paul,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on tip/timers/core]
> [also build test ERROR on tip/x86/core linux/master linus/master v5.12]
> [cannot apply to next-20210428]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/0day-ci/linux/commits/Paul-E-McKenney/Do-not-mark-clocks-unstable-due-to-delays-for-v5-13/20210429-093259
> base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 2d036dfa5f10df9782f5278fc591d79d283c1fad
> config: riscv-randconfig-p002-20210428 (attached as .config)
> compiler: riscv64-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://github.com/0day-ci/linux/commit/26be1e45593936a0c04aa1d268522f8c1cb646de
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review Paul-E-McKenney/Do-not-mark-clocks-unstable-due-to-delays-for-v5-13/20210429-093259
> git checkout 26be1e45593936a0c04aa1d268522f8c1cb646de
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 ARCH=riscv
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <[email protected]>
>
> All errors (new ones prefixed by >>):
>
> kernel/time/clocksource.c: In function '__clocksource_update_freq_scale':
> >> kernel/time/clocksource.c:1094:36: error: 'WATCHDOG_MAX_SKEW' undeclared (first use in this function)
> 1094 | if (cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW)
> | ^~~~~~~~~~~~~~~~~
> kernel/time/clocksource.c:1094:36: note: each undeclared identifier is reported only once for each function it appears in
> >> kernel/time/clocksource.c:1097:28: error: 'WATCHDOG_THRESHOLD' undeclared (first use in this function)
> 1097 | cs->uncertainty_margin = WATCHDOG_THRESHOLD;
> | ^~~~~~~~~~~~~~~~~~
Does the patch below help?
Longer term, these definitions will likely need to instead go into
a header file, but to move testing along in the here and now...
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c228f3727191..328a65ddb959 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -96,6 +96,20 @@ static char override_name[CS_NAME_LEN];
static int finished_booting;
static u64 suspend_start;
+/*
+ * Threshold: 0.0312s, when doubled: 0.0625s.
+ * Also a default for cs->uncertainty_margin when registering clocks.
+ */
+#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 5)
+
+/*
+ * Maximum permissible delay between two readouts of the watchdog
+ * clocksource surrounding a read of the clocksource being validated.
+ * This delay could be due to SMIs, NMIs, or to VCPU preemptions. Used as
+ * a lower bound for cs->uncertainty_margin values when registering clocks.
+ */
+#define WATCHDOG_MAX_SKEW (50 * NSEC_PER_USEC)
+
#ifdef CONFIG_CLOCKSOURCE_WATCHDOG
static void clocksource_watchdog_work(struct work_struct *work);
static void clocksource_select(void);
@@ -122,17 +136,9 @@ static int clocksource_watchdog_kthread(void *data);
static void __clocksource_change_rating(struct clocksource *cs, int rating);
/*
- * Interval: 0.5sec Threshold: 0.0312s, when doubled: 0.0625s
+ * Interval: 0.5sec.
*/
#define WATCHDOG_INTERVAL (HZ >> 1)
-#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 5)
-
-/*
- * Maximum permissible delay between two readouts of the watchdog
- * clocksource surrounding a read of the clocksource being validated.
- * This delay could be due to SMIs, NMIs, or to VCPU preemptions.
- */
-#define WATCHDOG_MAX_SKEW (50 * NSEC_PER_USEC)
static void clocksource_watchdog_work(struct work_struct *work)
{
On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <[email protected]> wrote:
> >
> > Hello!
> >
> > If there is a sufficient delay between reading the watchdog clock and the
> > clock under test, the clock under test will be marked unstable through no
> > fault of its own. This series checks for this, doing limited retries
> > to get a good set of clock reads. If the clock is marked unstable
> > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > This series also provides delay injection, which may be enabled via
> > kernel boot parameters to test the checking for delays.
> >
> > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > vCPU preemption.
> >
> > 1. Provide module parameters to inject delays in watchdog.
> >
> > 2. Retry clock read if long delays detected.
> >
> > 3. Check per-CPU clock synchronization when marked unstable.
> >
> > 4. Provide a module parameter to fuzz per-CPU clock checking.
> >
> > 5. Limit number of CPUs checked for clock synchronization.
> >
> > 6. Reduce clocksource-skew threshold for TSC.
> >
> > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > kernel test robot:
> >
> > o Automatically compute the uncertainty margin for clocksource, and
> > also allow them to be specified manually before that clocksource
> > is registered.
> >
> > o For the automatically computed uncertainty margins, bound them
> > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> >
> > o For the manually specified uncertainty margins, splat (but
> > continue) if they are less than 100 microseconds (again 2 *
> > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > production use of this clock-skew-inducing debugging technique.
> >
> > o Manually set the uncertainty margin for clocksource_jiffies
> > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > very low frequency of these clocks.
> >
> > o Manually set the uncertainty margin for clocksource_tsc_early
> > to 32 milliseconds.
> >
> > o Apply numerous "Link:" fields to all patches.
> >
> > o Add some acks and CCs.
> >
> > Changes since v9:
> >
> > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > Zhengjun; and Thomas Gleixner.
> >
> > o Improve CPU selection for clock-synchronization checking.
> >
> > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> >
> > Changes since v8, based on Thomas Gleixner feedback:
> >
> > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> >
> > o Split out a cs_watchdog_read() function.
> >
> > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> >
> > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> >
> > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> >
> > Changes since v7, based on Thomas Gleixner feedback:
> >
> > o Fix embarrassing git-format-patch operator error.
> >
> > o Merge pairwise clock-desynchronization checking into the checking
> > of per-CPU clock synchronization when marked unstable.
> >
> > o Do selective per-CPU checking rather than blindly checking all
> > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > to control this behavior, with the value -1 choosing the old
> > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> >
> > o Fix the clock-desynchronization checking to avoid a potential
> > use-after-free error for dynamically allocated clocksource
> > structures.
> >
> > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > clocksource skew checking.
> >
> > o Update commit logs and do code-style updates.
> >
> > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> >
> > Changes since v5:
> >
> > o Rebased to v5.12-rc5.
> >
> > Changes since v4:
> >
> > o Rebased to v5.12-rc1.
> >
> > Changes since v3:
> >
> > o Rebased to v5.11.
> >
> > o Apply Randy Dunlap feedback.
> >
> > Changes since v2:
> >
> > o Rebased to v5.11-rc6.
> >
> > o Updated Cc: list.
> >
> > Changes since v1:
> >
> > o Applied feedback from Rik van Riel.
> >
> > o Rebased to v5.11-rc3.
> >
> > o Stripped "RFC" from the subject lines.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > arch/x86/kernel/tsc.c | 1
> > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > b/arch/x86/kernel/tsc.c | 3
> > b/include/linux/clocksource.h | 2
> > b/kernel/time/clocksource.c | 23 ++
> > b/kernel/time/jiffies.c | 15 -
> > include/linux/clocksource.h | 3
> > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > 9 files changed, 304 insertions(+), 23 deletions(-)
>
> Hi Paul,
> using the v11, I added a approve flag and made it work for my early
> inject test where tsc is good
> through a cross tsc sync test. Ideally with the small tweak, we could
> get less tsc issues to debug.
> And I'm not sure it would help in real trouble shooting cases. But we
> will see if it would help.
Thank you for the patch!
However, Thomas had me rework this code to put the error injection into
a kernel module, so this effect is now obtained in a different way.
So I am unable to make use of your patch.
Thanx, Paul
> My hack patch snippet as below:
>
>
>
> static void __clocksource_unstable(struct clocksource *cs)
> {
> + if (!cs->approved)
> + return;
> +
> cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG);
> cs->flags |= CLOCK_SOURCE_UNSTABLE;
>
> @@ -366,9 +369,12 @@
> if (!cpumask_empty(&cpus_behind))
> pr_warn(" CPUs %*pbl behind CPU %d for
> clocksource %s.\n",
> cpumask_pr_args(&cpus_behind), testcpu, cs->name);
> - if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind))
> + if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
> pr_warn(" CPU %d check durations %lldns -
> %lldns for clocksource %s.\n",
> testcpu, cs_nsec_min, cs_nsec_max, cs->name);
> + cs->approved = true;
> + __clocksource_unstable(cs);
> + }
> }
>
> static void clocksource_watchdog(struct timer_list *unused)
> @@ -396,6 +402,7 @@
>
> if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
> /* Clock readout unreliable, so give it up. */
> + cs->approved = false;
> __clocksource_unstable(cs);
> continue;
> }
On Fri, Apr 30, 2021 at 1:11 PM Paul E. McKenney <[email protected]> wrote:
>
> On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> > On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <[email protected]> wrote:
> > >
> > > Hello!
> > >
> > > If there is a sufficient delay between reading the watchdog clock and the
> > > clock under test, the clock under test will be marked unstable through no
> > > fault of its own. This series checks for this, doing limited retries
> > > to get a good set of clock reads. If the clock is marked unstable
> > > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > > This series also provides delay injection, which may be enabled via
> > > kernel boot parameters to test the checking for delays.
> > >
> > > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > > vCPU preemption.
> > >
> > > 1. Provide module parameters to inject delays in watchdog.
> > >
> > > 2. Retry clock read if long delays detected.
> > >
> > > 3. Check per-CPU clock synchronization when marked unstable.
> > >
> > > 4. Provide a module parameter to fuzz per-CPU clock checking.
> > >
> > > 5. Limit number of CPUs checked for clock synchronization.
> > >
> > > 6. Reduce clocksource-skew threshold for TSC.
> > >
> > > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > > kernel test robot:
> > >
> > > o Automatically compute the uncertainty margin for clocksource, and
> > > also allow them to be specified manually before that clocksource
> > > is registered.
> > >
> > > o For the automatically computed uncertainty margins, bound them
> > > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> > >
> > > o For the manually specified uncertainty margins, splat (but
> > > continue) if they are less than 100 microseconds (again 2 *
> > > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > > production use of this clock-skew-inducing debugging technique.
> > >
> > > o Manually set the uncertainty margin for clocksource_jiffies
> > > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > > very low frequency of these clocks.
> > >
> > > o Manually set the uncertainty margin for clocksource_tsc_early
> > > to 32 milliseconds.
> > >
> > > o Apply numerous "Link:" fields to all patches.
> > >
> > > o Add some acks and CCs.
> > >
> > > Changes since v9:
> > >
> > > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > > Zhengjun; and Thomas Gleixner.
> > >
> > > o Improve CPU selection for clock-synchronization checking.
> > >
> > > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > >
> > > Changes since v8, based on Thomas Gleixner feedback:
> > >
> > > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> > >
> > > o Split out a cs_watchdog_read() function.
> > >
> > > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> > >
> > > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> > >
> > > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > >
> > > Changes since v7, based on Thomas Gleixner feedback:
> > >
> > > o Fix embarrassing git-format-patch operator error.
> > >
> > > o Merge pairwise clock-desynchronization checking into the checking
> > > of per-CPU clock synchronization when marked unstable.
> > >
> > > o Do selective per-CPU checking rather than blindly checking all
> > > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > > to control this behavior, with the value -1 choosing the old
> > > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> > >
> > > o Fix the clock-desynchronization checking to avoid a potential
> > > use-after-free error for dynamically allocated clocksource
> > > structures.
> > >
> > > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > > clocksource skew checking.
> > >
> > > o Update commit logs and do code-style updates.
> > >
> > > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > >
> > > Changes since v5:
> > >
> > > o Rebased to v5.12-rc5.
> > >
> > > Changes since v4:
> > >
> > > o Rebased to v5.12-rc1.
> > >
> > > Changes since v3:
> > >
> > > o Rebased to v5.11.
> > >
> > > o Apply Randy Dunlap feedback.
> > >
> > > Changes since v2:
> > >
> > > o Rebased to v5.11-rc6.
> > >
> > > o Updated Cc: list.
> > >
> > > Changes since v1:
> > >
> > > o Applied feedback from Rik van Riel.
> > >
> > > o Rebased to v5.11-rc3.
> > >
> > > o Stripped "RFC" from the subject lines.
> > >
> > > Thanx, Paul
> > >
> > > ------------------------------------------------------------------------
> > >
> > > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > > arch/x86/kernel/tsc.c | 1
> > > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > > b/arch/x86/kernel/tsc.c | 3
> > > b/include/linux/clocksource.h | 2
> > > b/kernel/time/clocksource.c | 23 ++
> > > b/kernel/time/jiffies.c | 15 -
> > > include/linux/clocksource.h | 3
> > > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > > 9 files changed, 304 insertions(+), 23 deletions(-)
> >
> > Hi Paul,
> > using the v11, I added a approve flag and made it work for my early
> > inject test where tsc is good
> > through a cross tsc sync test. Ideally with the small tweak, we could
> > get less tsc issues to debug.
> > And I'm not sure it would help in real trouble shooting cases. But we
> > will see if it would help.
>
> Thank you for the patch!
>
> However, Thomas had me rework this code to put the error injection into
> a kernel module, so this effect is now obtained in a different way.
> So I am unable to make use of your patch.
np, thanks for the heads up.
we will also need to measure the tsc sync retest and prove it's robust
enough to trump the bad decision from clocksource watchdog based on HPET
or other slow and old clocks while leaving good decisions pass through.
we will re-spin the tsc story when your code is settled and landed in
the mainline.
BR
Luming
On Fri, Apr 30, 2021 at 02:52:58PM +0800, Luming Yu wrote:
> On Fri, Apr 30, 2021 at 1:11 PM Paul E. McKenney <[email protected]> wrote:
> >
> > On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> > > On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > Hello!
> > > >
> > > > If there is a sufficient delay between reading the watchdog clock and the
> > > > clock under test, the clock under test will be marked unstable through no
> > > > fault of its own. This series checks for this, doing limited retries
> > > > to get a good set of clock reads. If the clock is marked unstable
> > > > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > > > This series also provides delay injection, which may be enabled via
> > > > kernel boot parameters to test the checking for delays.
> > > >
> > > > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > > > vCPU preemption.
> > > >
> > > > 1. Provide module parameters to inject delays in watchdog.
> > > >
> > > > 2. Retry clock read if long delays detected.
> > > >
> > > > 3. Check per-CPU clock synchronization when marked unstable.
> > > >
> > > > 4. Provide a module parameter to fuzz per-CPU clock checking.
> > > >
> > > > 5. Limit number of CPUs checked for clock synchronization.
> > > >
> > > > 6. Reduce clocksource-skew threshold for TSC.
> > > >
> > > > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > > > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > > > kernel test robot:
> > > >
> > > > o Automatically compute the uncertainty margin for clocksource, and
> > > > also allow them to be specified manually before that clocksource
> > > > is registered.
> > > >
> > > > o For the automatically computed uncertainty margins, bound them
> > > > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> > > >
> > > > o For the manually specified uncertainty margins, splat (but
> > > > continue) if they are less than 100 microseconds (again 2 *
> > > > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > > > production use of this clock-skew-inducing debugging technique.
> > > >
> > > > o Manually set the uncertainty margin for clocksource_jiffies
> > > > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > > > very low frequency of these clocks.
> > > >
> > > > o Manually set the uncertainty margin for clocksource_tsc_early
> > > > to 32 milliseconds.
> > > >
> > > > o Apply numerous "Link:" fields to all patches.
> > > >
> > > > o Add some acks and CCs.
> > > >
> > > > Changes since v9:
> > > >
> > > > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > > > Zhengjun; and Thomas Gleixner.
> > > >
> > > > o Improve CPU selection for clock-synchronization checking.
> > > >
> > > > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > > >
> > > > Changes since v8, based on Thomas Gleixner feedback:
> > > >
> > > > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> > > >
> > > > o Split out a cs_watchdog_read() function.
> > > >
> > > > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> > > >
> > > > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> > > >
> > > > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > > >
> > > > Changes since v7, based on Thomas Gleixner feedback:
> > > >
> > > > o Fix embarrassing git-format-patch operator error.
> > > >
> > > > o Merge pairwise clock-desynchronization checking into the checking
> > > > of per-CPU clock synchronization when marked unstable.
> > > >
> > > > o Do selective per-CPU checking rather than blindly checking all
> > > > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > > > to control this behavior, with the value -1 choosing the old
> > > > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> > > >
> > > > o Fix the clock-desynchronization checking to avoid a potential
> > > > use-after-free error for dynamically allocated clocksource
> > > > structures.
> > > >
> > > > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > > > clocksource skew checking.
> > > >
> > > > o Update commit logs and do code-style updates.
> > > >
> > > > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > > >
> > > > Changes since v5:
> > > >
> > > > o Rebased to v5.12-rc5.
> > > >
> > > > Changes since v4:
> > > >
> > > > o Rebased to v5.12-rc1.
> > > >
> > > > Changes since v3:
> > > >
> > > > o Rebased to v5.11.
> > > >
> > > > o Apply Randy Dunlap feedback.
> > > >
> > > > Changes since v2:
> > > >
> > > > o Rebased to v5.11-rc6.
> > > >
> > > > o Updated Cc: list.
> > > >
> > > > Changes since v1:
> > > >
> > > > o Applied feedback from Rik van Riel.
> > > >
> > > > o Rebased to v5.11-rc3.
> > > >
> > > > o Stripped "RFC" from the subject lines.
> > > >
> > > > Thanx, Paul
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > > > arch/x86/kernel/tsc.c | 1
> > > > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > > > b/arch/x86/kernel/tsc.c | 3
> > > > b/include/linux/clocksource.h | 2
> > > > b/kernel/time/clocksource.c | 23 ++
> > > > b/kernel/time/jiffies.c | 15 -
> > > > include/linux/clocksource.h | 3
> > > > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > > > 9 files changed, 304 insertions(+), 23 deletions(-)
> > >
> > > Hi Paul,
> > > using the v11, I added a approve flag and made it work for my early
> > > inject test where tsc is good
> > > through a cross tsc sync test. Ideally with the small tweak, we could
> > > get less tsc issues to debug.
> > > And I'm not sure it would help in real trouble shooting cases. But we
> > > will see if it would help.
> >
> > Thank you for the patch!
> >
> > However, Thomas had me rework this code to put the error injection into
> > a kernel module, so this effect is now obtained in a different way.
> > So I am unable to make use of your patch.
>
> np, thanks for the heads up.
>
> we will also need to measure the tsc sync retest and prove it's robust
> enough to trump the bad decision from clocksource watchdog based on HPET
> or other slow and old clocks while leaving good decisions pass through.
>
> we will re-spin the tsc story when your code is settled and landed in
> the mainline.
My current series exports clocksource_verify_percpu(), which might help
measuring TSC synchronization.
Thanx, Paul
Hi Paul,
It appears that the patch set is not in 5.13. Will it be in 5.14?
And more data proof seems to indciate that tsc is more stable than
tsc-watchdog.
and we need the patch set to dis-arm wrong actions when watch dog is
hit by a spik.
On Sat, May 1, 2021 at 12:28 PM Paul E. McKenney <[email protected]> wrote:
>
> On Fri, Apr 30, 2021 at 02:52:58PM +0800, Luming Yu wrote:
> > On Fri, Apr 30, 2021 at 1:11 PM Paul E. McKenney <[email protected]> wrote:
> > >
> > > On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> > > > On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <[email protected]> wrote:
> > > > >
> > > > > Hello!
> > > > >
> > > > > If there is a sufficient delay between reading the watchdog clock and the
> > > > > clock under test, the clock under test will be marked unstable through no
> > > > > fault of its own. This series checks for this, doing limited retries
> > > > > to get a good set of clock reads. If the clock is marked unstable
> > > > > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > > > > This series also provides delay injection, which may be enabled via
> > > > > kernel boot parameters to test the checking for delays.
> > > > >
> > > > > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > > > > vCPU preemption.
> > > > >
> > > > > 1. Provide module parameters to inject delays in watchdog.
> > > > >
> > > > > 2. Retry clock read if long delays detected.
> > > > >
> > > > > 3. Check per-CPU clock synchronization when marked unstable.
> > > > >
> > > > > 4. Provide a module parameter to fuzz per-CPU clock checking.
> > > > >
> > > > > 5. Limit number of CPUs checked for clock synchronization.
> > > > >
> > > > > 6. Reduce clocksource-skew threshold for TSC.
> > > > >
> > > > > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > > > > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > > > > kernel test robot:
> > > > >
> > > > > o Automatically compute the uncertainty margin for clocksource, and
> > > > > also allow them to be specified manually before that clocksource
> > > > > is registered.
> > > > >
> > > > > o For the automatically computed uncertainty margins, bound them
> > > > > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> > > > >
> > > > > o For the manually specified uncertainty margins, splat (but
> > > > > continue) if they are less than 100 microseconds (again 2 *
> > > > > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > > > > production use of this clock-skew-inducing debugging technique.
> > > > >
> > > > > o Manually set the uncertainty margin for clocksource_jiffies
> > > > > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > > > > very low frequency of these clocks.
> > > > >
> > > > > o Manually set the uncertainty margin for clocksource_tsc_early
> > > > > to 32 milliseconds.
> > > > >
> > > > > o Apply numerous "Link:" fields to all patches.
> > > > >
> > > > > o Add some acks and CCs.
> > > > >
> > > > > Changes since v9:
> > > > >
> > > > > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > > > > Zhengjun; and Thomas Gleixner.
> > > > >
> > > > > o Improve CPU selection for clock-synchronization checking.
> > > > >
> > > > > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > > > >
> > > > > Changes since v8, based on Thomas Gleixner feedback:
> > > > >
> > > > > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> > > > >
> > > > > o Split out a cs_watchdog_read() function.
> > > > >
> > > > > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> > > > >
> > > > > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> > > > >
> > > > > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > > > >
> > > > > Changes since v7, based on Thomas Gleixner feedback:
> > > > >
> > > > > o Fix embarrassing git-format-patch operator error.
> > > > >
> > > > > o Merge pairwise clock-desynchronization checking into the checking
> > > > > of per-CPU clock synchronization when marked unstable.
> > > > >
> > > > > o Do selective per-CPU checking rather than blindly checking all
> > > > > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > > > > to control this behavior, with the value -1 choosing the old
> > > > > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> > > > >
> > > > > o Fix the clock-desynchronization checking to avoid a potential
> > > > > use-after-free error for dynamically allocated clocksource
> > > > > structures.
> > > > >
> > > > > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > > > > clocksource skew checking.
> > > > >
> > > > > o Update commit logs and do code-style updates.
> > > > >
> > > > > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > > > >
> > > > > Changes since v5:
> > > > >
> > > > > o Rebased to v5.12-rc5.
> > > > >
> > > > > Changes since v4:
> > > > >
> > > > > o Rebased to v5.12-rc1.
> > > > >
> > > > > Changes since v3:
> > > > >
> > > > > o Rebased to v5.11.
> > > > >
> > > > > o Apply Randy Dunlap feedback.
> > > > >
> > > > > Changes since v2:
> > > > >
> > > > > o Rebased to v5.11-rc6.
> > > > >
> > > > > o Updated Cc: list.
> > > > >
> > > > > Changes since v1:
> > > > >
> > > > > o Applied feedback from Rik van Riel.
> > > > >
> > > > > o Rebased to v5.11-rc3.
> > > > >
> > > > > o Stripped "RFC" from the subject lines.
> > > > >
> > > > > Thanx, Paul
> > > > >
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > > > > arch/x86/kernel/tsc.c | 1
> > > > > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > > > > b/arch/x86/kernel/tsc.c | 3
> > > > > b/include/linux/clocksource.h | 2
> > > > > b/kernel/time/clocksource.c | 23 ++
> > > > > b/kernel/time/jiffies.c | 15 -
> > > > > include/linux/clocksource.h | 3
> > > > > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > > > > 9 files changed, 304 insertions(+), 23 deletions(-)
> > > >
> > > > Hi Paul,
> > > > using the v11, I added a approve flag and made it work for my early
> > > > inject test where tsc is good
> > > > through a cross tsc sync test. Ideally with the small tweak, we could
> > > > get less tsc issues to debug.
> > > > And I'm not sure it would help in real trouble shooting cases. But we
> > > > will see if it would help.
> > >
> > > Thank you for the patch!
> > >
> > > However, Thomas had me rework this code to put the error injection into
> > > a kernel module, so this effect is now obtained in a different way.
> > > So I am unable to make use of your patch.
> >
> > np, thanks for the heads up.
> >
> > we will also need to measure the tsc sync retest and prove it's robust
> > enough to trump the bad decision from clocksource watchdog based on HPET
> > or other slow and old clocks while leaving good decisions pass through.
> >
> > we will re-spin the tsc story when your code is settled and landed in
> > the mainline.
>
> My current series exports clocksource_verify_percpu(), which might help
> measuring TSC synchronization.
>
> Thanx, Paul
On Wed, Jun 02, 2021 at 01:10:37PM +0800, Luming Yu wrote:
> Hi Paul,
>
> It appears that the patch set is not in 5.13. Will it be in 5.14?
Indeed it is not in v5.13. There were some late-breaking reviews and
changes. I am currently thinking in terms of v5.14.
> And more data proof seems to indciate that tsc is more stable than
> tsc-watchdog.
The tsc-watchdog being HPET? Or some other clocksource?
> and we need the patch set to dis-arm wrong actions when watch dog is
> hit by a spik.
It does depend on the hardware. Thomas Gleixner provided a most
excellent summary of the possibilities here:
https://lore.kernel.org/lkml/[email protected]/
And then if your hardware's TSC is the most trustworthy clocksource
on your system, you can always boot with tsc=reliable and avoid the
clocksource watchdog completely, with or without this patch series.
Or am I missing your point?
Thanx, Paul
> On Sat, May 1, 2021 at 12:28 PM Paul E. McKenney <[email protected]> wrote:
> >
> > On Fri, Apr 30, 2021 at 02:52:58PM +0800, Luming Yu wrote:
> > > On Fri, Apr 30, 2021 at 1:11 PM Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> > > > > On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <[email protected]> wrote:
> > > > > >
> > > > > > Hello!
> > > > > >
> > > > > > If there is a sufficient delay between reading the watchdog clock and the
> > > > > > clock under test, the clock under test will be marked unstable through no
> > > > > > fault of its own. This series checks for this, doing limited retries
> > > > > > to get a good set of clock reads. If the clock is marked unstable
> > > > > > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > > > > > This series also provides delay injection, which may be enabled via
> > > > > > kernel boot parameters to test the checking for delays.
> > > > > >
> > > > > > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > > > > > vCPU preemption.
> > > > > >
> > > > > > 1. Provide module parameters to inject delays in watchdog.
> > > > > >
> > > > > > 2. Retry clock read if long delays detected.
> > > > > >
> > > > > > 3. Check per-CPU clock synchronization when marked unstable.
> > > > > >
> > > > > > 4. Provide a module parameter to fuzz per-CPU clock checking.
> > > > > >
> > > > > > 5. Limit number of CPUs checked for clock synchronization.
> > > > > >
> > > > > > 6. Reduce clocksource-skew threshold for TSC.
> > > > > >
> > > > > > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > > > > > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > > > > > kernel test robot:
> > > > > >
> > > > > > o Automatically compute the uncertainty margin for clocksource, and
> > > > > > also allow them to be specified manually before that clocksource
> > > > > > is registered.
> > > > > >
> > > > > > o For the automatically computed uncertainty margins, bound them
> > > > > > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> > > > > >
> > > > > > o For the manually specified uncertainty margins, splat (but
> > > > > > continue) if they are less than 100 microseconds (again 2 *
> > > > > > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > > > > > production use of this clock-skew-inducing debugging technique.
> > > > > >
> > > > > > o Manually set the uncertainty margin for clocksource_jiffies
> > > > > > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > > > > > very low frequency of these clocks.
> > > > > >
> > > > > > o Manually set the uncertainty margin for clocksource_tsc_early
> > > > > > to 32 milliseconds.
> > > > > >
> > > > > > o Apply numerous "Link:" fields to all patches.
> > > > > >
> > > > > > o Add some acks and CCs.
> > > > > >
> > > > > > Changes since v9:
> > > > > >
> > > > > > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > > > > > Zhengjun; and Thomas Gleixner.
> > > > > >
> > > > > > o Improve CPU selection for clock-synchronization checking.
> > > > > >
> > > > > > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > > > > >
> > > > > > Changes since v8, based on Thomas Gleixner feedback:
> > > > > >
> > > > > > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> > > > > >
> > > > > > o Split out a cs_watchdog_read() function.
> > > > > >
> > > > > > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> > > > > >
> > > > > > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> > > > > >
> > > > > > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > > > > >
> > > > > > Changes since v7, based on Thomas Gleixner feedback:
> > > > > >
> > > > > > o Fix embarrassing git-format-patch operator error.
> > > > > >
> > > > > > o Merge pairwise clock-desynchronization checking into the checking
> > > > > > of per-CPU clock synchronization when marked unstable.
> > > > > >
> > > > > > o Do selective per-CPU checking rather than blindly checking all
> > > > > > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > > > > > to control this behavior, with the value -1 choosing the old
> > > > > > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> > > > > >
> > > > > > o Fix the clock-desynchronization checking to avoid a potential
> > > > > > use-after-free error for dynamically allocated clocksource
> > > > > > structures.
> > > > > >
> > > > > > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > > > > > clocksource skew checking.
> > > > > >
> > > > > > o Update commit logs and do code-style updates.
> > > > > >
> > > > > > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > > > > >
> > > > > > Changes since v5:
> > > > > >
> > > > > > o Rebased to v5.12-rc5.
> > > > > >
> > > > > > Changes since v4:
> > > > > >
> > > > > > o Rebased to v5.12-rc1.
> > > > > >
> > > > > > Changes since v3:
> > > > > >
> > > > > > o Rebased to v5.11.
> > > > > >
> > > > > > o Apply Randy Dunlap feedback.
> > > > > >
> > > > > > Changes since v2:
> > > > > >
> > > > > > o Rebased to v5.11-rc6.
> > > > > >
> > > > > > o Updated Cc: list.
> > > > > >
> > > > > > Changes since v1:
> > > > > >
> > > > > > o Applied feedback from Rik van Riel.
> > > > > >
> > > > > > o Rebased to v5.11-rc3.
> > > > > >
> > > > > > o Stripped "RFC" from the subject lines.
> > > > > >
> > > > > > Thanx, Paul
> > > > > >
> > > > > > ------------------------------------------------------------------------
> > > > > >
> > > > > > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > > > > > arch/x86/kernel/tsc.c | 1
> > > > > > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > > > > > b/arch/x86/kernel/tsc.c | 3
> > > > > > b/include/linux/clocksource.h | 2
> > > > > > b/kernel/time/clocksource.c | 23 ++
> > > > > > b/kernel/time/jiffies.c | 15 -
> > > > > > include/linux/clocksource.h | 3
> > > > > > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > > > > > 9 files changed, 304 insertions(+), 23 deletions(-)
> > > > >
> > > > > Hi Paul,
> > > > > using the v11, I added a approve flag and made it work for my early
> > > > > inject test where tsc is good
> > > > > through a cross tsc sync test. Ideally with the small tweak, we could
> > > > > get less tsc issues to debug.
> > > > > And I'm not sure it would help in real trouble shooting cases. But we
> > > > > will see if it would help.
> > > >
> > > > Thank you for the patch!
> > > >
> > > > However, Thomas had me rework this code to put the error injection into
> > > > a kernel module, so this effect is now obtained in a different way.
> > > > So I am unable to make use of your patch.
> > >
> > > np, thanks for the heads up.
> > >
> > > we will also need to measure the tsc sync retest and prove it's robust
> > > enough to trump the bad decision from clocksource watchdog based on HPET
> > > or other slow and old clocks while leaving good decisions pass through.
> > >
> > > we will re-spin the tsc story when your code is settled and landed in
> > > the mainline.
> >
> > My current series exports clocksource_verify_percpu(), which might help
> > measuring TSC synchronization.
> >
> > Thanx, Paul
On Wed, Jun 02, 2021 at 10:46:50AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 02, 2021 at 01:10:37PM +0800, Luming Yu wrote:
> > Hi Paul,
> >
> > It appears that the patch set is not in 5.13. Will it be in 5.14?
>
> Indeed it is not in v5.13. There were some late-breaking reviews and
> changes. I am currently thinking in terms of v5.14.
>
> > And more data proof seems to indciate that tsc is more stable than
> > tsc-watchdog.
>
> The tsc-watchdog being HPET? Or some other clocksource?
>
> > and we need the patch set to dis-arm wrong actions when watch dog is
> > hit by a spik.
>
> It does depend on the hardware. Thomas Gleixner provided a most
> excellent summary of the possibilities here:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> And then if your hardware's TSC is the most trustworthy clocksource
> on your system, you can always boot with tsc=reliable and avoid the
> clocksource watchdog completely, with or without this patch series.
Oh, and firmware can and apparently still sometimes does "adjust" the TSC,
and so booting with tsc=reliable can such adjustments from you.
> Or am I missing your point?
Thanx, Paul
> > On Sat, May 1, 2021 at 12:28 PM Paul E. McKenney <[email protected]> wrote:
> > >
> > > On Fri, Apr 30, 2021 at 02:52:58PM +0800, Luming Yu wrote:
> > > > On Fri, Apr 30, 2021 at 1:11 PM Paul E. McKenney <[email protected]> wrote:
> > > > >
> > > > > On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> > > > > > On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <[email protected]> wrote:
> > > > > > >
> > > > > > > Hello!
> > > > > > >
> > > > > > > If there is a sufficient delay between reading the watchdog clock and the
> > > > > > > clock under test, the clock under test will be marked unstable through no
> > > > > > > fault of its own. This series checks for this, doing limited retries
> > > > > > > to get a good set of clock reads. If the clock is marked unstable
> > > > > > > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > > > > > > This series also provides delay injection, which may be enabled via
> > > > > > > kernel boot parameters to test the checking for delays.
> > > > > > >
> > > > > > > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > > > > > > vCPU preemption.
> > > > > > >
> > > > > > > 1. Provide module parameters to inject delays in watchdog.
> > > > > > >
> > > > > > > 2. Retry clock read if long delays detected.
> > > > > > >
> > > > > > > 3. Check per-CPU clock synchronization when marked unstable.
> > > > > > >
> > > > > > > 4. Provide a module parameter to fuzz per-CPU clock checking.
> > > > > > >
> > > > > > > 5. Limit number of CPUs checked for clock synchronization.
> > > > > > >
> > > > > > > 6. Reduce clocksource-skew threshold for TSC.
> > > > > > >
> > > > > > > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > > > > > > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > > > > > > kernel test robot:
> > > > > > >
> > > > > > > o Automatically compute the uncertainty margin for clocksource, and
> > > > > > > also allow them to be specified manually before that clocksource
> > > > > > > is registered.
> > > > > > >
> > > > > > > o For the automatically computed uncertainty margins, bound them
> > > > > > > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> > > > > > >
> > > > > > > o For the manually specified uncertainty margins, splat (but
> > > > > > > continue) if they are less than 100 microseconds (again 2 *
> > > > > > > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > > > > > > production use of this clock-skew-inducing debugging technique.
> > > > > > >
> > > > > > > o Manually set the uncertainty margin for clocksource_jiffies
> > > > > > > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > > > > > > very low frequency of these clocks.
> > > > > > >
> > > > > > > o Manually set the uncertainty margin for clocksource_tsc_early
> > > > > > > to 32 milliseconds.
> > > > > > >
> > > > > > > o Apply numerous "Link:" fields to all patches.
> > > > > > >
> > > > > > > o Add some acks and CCs.
> > > > > > >
> > > > > > > Changes since v9:
> > > > > > >
> > > > > > > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > > > > > > Zhengjun; and Thomas Gleixner.
> > > > > > >
> > > > > > > o Improve CPU selection for clock-synchronization checking.
> > > > > > >
> > > > > > > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > > > > > >
> > > > > > > Changes since v8, based on Thomas Gleixner feedback:
> > > > > > >
> > > > > > > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> > > > > > >
> > > > > > > o Split out a cs_watchdog_read() function.
> > > > > > >
> > > > > > > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> > > > > > >
> > > > > > > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> > > > > > >
> > > > > > > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > > > > > >
> > > > > > > Changes since v7, based on Thomas Gleixner feedback:
> > > > > > >
> > > > > > > o Fix embarrassing git-format-patch operator error.
> > > > > > >
> > > > > > > o Merge pairwise clock-desynchronization checking into the checking
> > > > > > > of per-CPU clock synchronization when marked unstable.
> > > > > > >
> > > > > > > o Do selective per-CPU checking rather than blindly checking all
> > > > > > > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > > > > > > to control this behavior, with the value -1 choosing the old
> > > > > > > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> > > > > > >
> > > > > > > o Fix the clock-desynchronization checking to avoid a potential
> > > > > > > use-after-free error for dynamically allocated clocksource
> > > > > > > structures.
> > > > > > >
> > > > > > > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > > > > > > clocksource skew checking.
> > > > > > >
> > > > > > > o Update commit logs and do code-style updates.
> > > > > > >
> > > > > > > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > > > > > >
> > > > > > > Changes since v5:
> > > > > > >
> > > > > > > o Rebased to v5.12-rc5.
> > > > > > >
> > > > > > > Changes since v4:
> > > > > > >
> > > > > > > o Rebased to v5.12-rc1.
> > > > > > >
> > > > > > > Changes since v3:
> > > > > > >
> > > > > > > o Rebased to v5.11.
> > > > > > >
> > > > > > > o Apply Randy Dunlap feedback.
> > > > > > >
> > > > > > > Changes since v2:
> > > > > > >
> > > > > > > o Rebased to v5.11-rc6.
> > > > > > >
> > > > > > > o Updated Cc: list.
> > > > > > >
> > > > > > > Changes since v1:
> > > > > > >
> > > > > > > o Applied feedback from Rik van Riel.
> > > > > > >
> > > > > > > o Rebased to v5.11-rc3.
> > > > > > >
> > > > > > > o Stripped "RFC" from the subject lines.
> > > > > > >
> > > > > > > Thanx, Paul
> > > > > > >
> > > > > > > ------------------------------------------------------------------------
> > > > > > >
> > > > > > > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > > > > > > arch/x86/kernel/tsc.c | 1
> > > > > > > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > > > > > > b/arch/x86/kernel/tsc.c | 3
> > > > > > > b/include/linux/clocksource.h | 2
> > > > > > > b/kernel/time/clocksource.c | 23 ++
> > > > > > > b/kernel/time/jiffies.c | 15 -
> > > > > > > include/linux/clocksource.h | 3
> > > > > > > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > > > > > > 9 files changed, 304 insertions(+), 23 deletions(-)
> > > > > >
> > > > > > Hi Paul,
> > > > > > using the v11, I added a approve flag and made it work for my early
> > > > > > inject test where tsc is good
> > > > > > through a cross tsc sync test. Ideally with the small tweak, we could
> > > > > > get less tsc issues to debug.
> > > > > > And I'm not sure it would help in real trouble shooting cases. But we
> > > > > > will see if it would help.
> > > > >
> > > > > Thank you for the patch!
> > > > >
> > > > > However, Thomas had me rework this code to put the error injection into
> > > > > a kernel module, so this effect is now obtained in a different way.
> > > > > So I am unable to make use of your patch.
> > > >
> > > > np, thanks for the heads up.
> > > >
> > > > we will also need to measure the tsc sync retest and prove it's robust
> > > > enough to trump the bad decision from clocksource watchdog based on HPET
> > > > or other slow and old clocks while leaving good decisions pass through.
> > > >
> > > > we will re-spin the tsc story when your code is settled and landed in
> > > > the mainline.
> > >
> > > My current series exports clocksource_verify_percpu(), which might help
> > > measuring TSC synchronization.
> > >
> > > Thanx, Paul
These options works as how they are designed. But it needs to
go through a manual work and can't scale in highly automated data center
fleets.
.if we really don't need a watch dog at runtime, we need some data proof.
Before we can disable TSC watchdog by default for all Linux instances, we
still need a solution based on your patch set to train the watch dog from doing
wrong things , just based on some other latency issues in system that might be
unavoidable. We really can't punish tsc for the latency issues from
unknown source.
I agree we still need to sort out the quality problems from these latency issues
noticed from watchdog rather than that we simply mute and hide as you
may suspect
that we may abuse the thresholds and the mechanism your patch set provides.
But it should be a focus after the patch is merged in upstream.
On Thu, Jun 3, 2021 at 2:24 AM Paul E. McKenney <[email protected]> wrote:
>
> On Wed, Jun 02, 2021 at 10:46:50AM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 02, 2021 at 01:10:37PM +0800, Luming Yu wrote:
> > > Hi Paul,
> > >
> > > It appears that the patch set is not in 5.13. Will it be in 5.14?
> >
> > Indeed it is not in v5.13. There were some late-breaking reviews and
> > changes. I am currently thinking in terms of v5.14.
> >
> > > And more data proof seems to indciate that tsc is more stable than
> > > tsc-watchdog.
> >
> > The tsc-watchdog being HPET? Or some other clocksource?
> >
> > > and we need the patch set to dis-arm wrong actions when watch dog is
> > > hit by a spik.
> >
> > It does depend on the hardware. Thomas Gleixner provided a most
> > excellent summary of the possibilities here:
> >
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > And then if your hardware's TSC is the most trustworthy clocksource
> > on your system, you can always boot with tsc=reliable and avoid the
> > clocksource watchdog completely, with or without this patch series.
>
> Oh, and firmware can and apparently still sometimes does "adjust" the TSC,
> and so booting with tsc=reliable can such adjustments from you.
>
> > Or am I missing your point?
>
> Thanx, Paul
>
> > > On Sat, May 1, 2021 at 12:28 PM Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > On Fri, Apr 30, 2021 at 02:52:58PM +0800, Luming Yu wrote:
> > > > > On Fri, Apr 30, 2021 at 1:11 PM Paul E. McKenney <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> > > > > > > On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <[email protected]> wrote:
> > > > > > > >
> > > > > > > > Hello!
> > > > > > > >
> > > > > > > > If there is a sufficient delay between reading the watchdog clock and the
> > > > > > > > clock under test, the clock under test will be marked unstable through no
> > > > > > > > fault of its own. This series checks for this, doing limited retries
> > > > > > > > to get a good set of clock reads. If the clock is marked unstable
> > > > > > > > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > > > > > > > This series also provides delay injection, which may be enabled via
> > > > > > > > kernel boot parameters to test the checking for delays.
> > > > > > > >
> > > > > > > > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > > > > > > > vCPU preemption.
> > > > > > > >
> > > > > > > > 1. Provide module parameters to inject delays in watchdog.
> > > > > > > >
> > > > > > > > 2. Retry clock read if long delays detected.
> > > > > > > >
> > > > > > > > 3. Check per-CPU clock synchronization when marked unstable.
> > > > > > > >
> > > > > > > > 4. Provide a module parameter to fuzz per-CPU clock checking.
> > > > > > > >
> > > > > > > > 5. Limit number of CPUs checked for clock synchronization.
> > > > > > > >
> > > > > > > > 6. Reduce clocksource-skew threshold for TSC.
> > > > > > > >
> > > > > > > > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > > > > > > > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > > > > > > > kernel test robot:
> > > > > > > >
> > > > > > > > o Automatically compute the uncertainty margin for clocksource, and
> > > > > > > > also allow them to be specified manually before that clocksource
> > > > > > > > is registered.
> > > > > > > >
> > > > > > > > o For the automatically computed uncertainty margins, bound them
> > > > > > > > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> > > > > > > >
> > > > > > > > o For the manually specified uncertainty margins, splat (but
> > > > > > > > continue) if they are less than 100 microseconds (again 2 *
> > > > > > > > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > > > > > > > production use of this clock-skew-inducing debugging technique.
> > > > > > > >
> > > > > > > > o Manually set the uncertainty margin for clocksource_jiffies
> > > > > > > > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > > > > > > > very low frequency of these clocks.
> > > > > > > >
> > > > > > > > o Manually set the uncertainty margin for clocksource_tsc_early
> > > > > > > > to 32 milliseconds.
> > > > > > > >
> > > > > > > > o Apply numerous "Link:" fields to all patches.
> > > > > > > >
> > > > > > > > o Add some acks and CCs.
> > > > > > > >
> > > > > > > > Changes since v9:
> > > > > > > >
> > > > > > > > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > > > > > > > Zhengjun; and Thomas Gleixner.
> > > > > > > >
> > > > > > > > o Improve CPU selection for clock-synchronization checking.
> > > > > > > >
> > > > > > > > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > > > > > > >
> > > > > > > > Changes since v8, based on Thomas Gleixner feedback:
> > > > > > > >
> > > > > > > > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> > > > > > > >
> > > > > > > > o Split out a cs_watchdog_read() function.
> > > > > > > >
> > > > > > > > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> > > > > > > >
> > > > > > > > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> > > > > > > >
> > > > > > > > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > > > > > > >
> > > > > > > > Changes since v7, based on Thomas Gleixner feedback:
> > > > > > > >
> > > > > > > > o Fix embarrassing git-format-patch operator error.
> > > > > > > >
> > > > > > > > o Merge pairwise clock-desynchronization checking into the checking
> > > > > > > > of per-CPU clock synchronization when marked unstable.
> > > > > > > >
> > > > > > > > o Do selective per-CPU checking rather than blindly checking all
> > > > > > > > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > > > > > > > to control this behavior, with the value -1 choosing the old
> > > > > > > > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> > > > > > > >
> > > > > > > > o Fix the clock-desynchronization checking to avoid a potential
> > > > > > > > use-after-free error for dynamically allocated clocksource
> > > > > > > > structures.
> > > > > > > >
> > > > > > > > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > > > > > > > clocksource skew checking.
> > > > > > > >
> > > > > > > > o Update commit logs and do code-style updates.
> > > > > > > >
> > > > > > > > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > > > > > > >
> > > > > > > > Changes since v5:
> > > > > > > >
> > > > > > > > o Rebased to v5.12-rc5.
> > > > > > > >
> > > > > > > > Changes since v4:
> > > > > > > >
> > > > > > > > o Rebased to v5.12-rc1.
> > > > > > > >
> > > > > > > > Changes since v3:
> > > > > > > >
> > > > > > > > o Rebased to v5.11.
> > > > > > > >
> > > > > > > > o Apply Randy Dunlap feedback.
> > > > > > > >
> > > > > > > > Changes since v2:
> > > > > > > >
> > > > > > > > o Rebased to v5.11-rc6.
> > > > > > > >
> > > > > > > > o Updated Cc: list.
> > > > > > > >
> > > > > > > > Changes since v1:
> > > > > > > >
> > > > > > > > o Applied feedback from Rik van Riel.
> > > > > > > >
> > > > > > > > o Rebased to v5.11-rc3.
> > > > > > > >
> > > > > > > > o Stripped "RFC" from the subject lines.
> > > > > > > >
> > > > > > > > Thanx, Paul
> > > > > > > >
> > > > > > > > ------------------------------------------------------------------------
> > > > > > > >
> > > > > > > > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > > > > > > > arch/x86/kernel/tsc.c | 1
> > > > > > > > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > > > > > > > b/arch/x86/kernel/tsc.c | 3
> > > > > > > > b/include/linux/clocksource.h | 2
> > > > > > > > b/kernel/time/clocksource.c | 23 ++
> > > > > > > > b/kernel/time/jiffies.c | 15 -
> > > > > > > > include/linux/clocksource.h | 3
> > > > > > > > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > > > > > > > 9 files changed, 304 insertions(+), 23 deletions(-)
> > > > > > >
> > > > > > > Hi Paul,
> > > > > > > using the v11, I added a approve flag and made it work for my early
> > > > > > > inject test where tsc is good
> > > > > > > through a cross tsc sync test. Ideally with the small tweak, we could
> > > > > > > get less tsc issues to debug.
> > > > > > > And I'm not sure it would help in real trouble shooting cases. But we
> > > > > > > will see if it would help.
> > > > > >
> > > > > > Thank you for the patch!
> > > > > >
> > > > > > However, Thomas had me rework this code to put the error injection into
> > > > > > a kernel module, so this effect is now obtained in a different way.
> > > > > > So I am unable to make use of your patch.
> > > > >
> > > > > np, thanks for the heads up.
> > > > >
> > > > > we will also need to measure the tsc sync retest and prove it's robust
> > > > > enough to trump the bad decision from clocksource watchdog based on HPET
> > > > > or other slow and old clocks while leaving good decisions pass through.
> > > > >
> > > > > we will re-spin the tsc story when your code is settled and landed in
> > > > > the mainline.
> > > >
> > > > My current series exports clocksource_verify_percpu(), which might help
> > > > measuring TSC synchronization.
> > > >
> > > > Thanx, Paul
On Thu, Jun 03, 2021 at 12:25:57PM +0800, Luming Yu wrote:
> These options works as how they are designed. But it needs to
> go through a manual work and can't scale in highly automated data center
> fleets.
>
> .if we really don't need a watch dog at runtime, we need some data proof.
>
> Before we can disable TSC watchdog by default for all Linux instances, we
> still need a solution based on your patch set to train the watch dog from doing
> wrong things , just based on some other latency issues in system that might be
> unavoidable. We really can't punish tsc for the latency issues from
> unknown source.
>
> I agree we still need to sort out the quality problems from these latency issues
> noticed from watchdog rather than that we simply mute and hide as you
> may suspect
> that we may abuse the thresholds and the mechanism your patch set provides.
> But it should be a focus after the patch is merged in upstream.
Works for me!
Would you be willing to provide your Reviewed-by, Acked-by, or Tested-by
for the series?
Thanx, Paul
> On Thu, Jun 3, 2021 at 2:24 AM Paul E. McKenney <[email protected]> wrote:
> >
> > On Wed, Jun 02, 2021 at 10:46:50AM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 02, 2021 at 01:10:37PM +0800, Luming Yu wrote:
> > > > Hi Paul,
> > > >
> > > > It appears that the patch set is not in 5.13. Will it be in 5.14?
> > >
> > > Indeed it is not in v5.13. There were some late-breaking reviews and
> > > changes. I am currently thinking in terms of v5.14.
> > >
> > > > And more data proof seems to indciate that tsc is more stable than
> > > > tsc-watchdog.
> > >
> > > The tsc-watchdog being HPET? Or some other clocksource?
> > >
> > > > and we need the patch set to dis-arm wrong actions when watch dog is
> > > > hit by a spik.
> > >
> > > It does depend on the hardware. Thomas Gleixner provided a most
> > > excellent summary of the possibilities here:
> > >
> > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > And then if your hardware's TSC is the most trustworthy clocksource
> > > on your system, you can always boot with tsc=reliable and avoid the
> > > clocksource watchdog completely, with or without this patch series.
> >
> > Oh, and firmware can and apparently still sometimes does "adjust" the TSC,
> > and so booting with tsc=reliable can such adjustments from you.
> >
> > > Or am I missing your point?
> >
> > Thanx, Paul
> >
> > > > On Sat, May 1, 2021 at 12:28 PM Paul E. McKenney <[email protected]> wrote:
> > > > >
> > > > > On Fri, Apr 30, 2021 at 02:52:58PM +0800, Luming Yu wrote:
> > > > > > On Fri, Apr 30, 2021 at 1:11 PM Paul E. McKenney <[email protected]> wrote:
> > > > > > >
> > > > > > > On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> > > > > > > > On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > Hello!
> > > > > > > > >
> > > > > > > > > If there is a sufficient delay between reading the watchdog clock and the
> > > > > > > > > clock under test, the clock under test will be marked unstable through no
> > > > > > > > > fault of its own. This series checks for this, doing limited retries
> > > > > > > > > to get a good set of clock reads. If the clock is marked unstable
> > > > > > > > > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > > > > > > > > This series also provides delay injection, which may be enabled via
> > > > > > > > > kernel boot parameters to test the checking for delays.
> > > > > > > > >
> > > > > > > > > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > > > > > > > > vCPU preemption.
> > > > > > > > >
> > > > > > > > > 1. Provide module parameters to inject delays in watchdog.
> > > > > > > > >
> > > > > > > > > 2. Retry clock read if long delays detected.
> > > > > > > > >
> > > > > > > > > 3. Check per-CPU clock synchronization when marked unstable.
> > > > > > > > >
> > > > > > > > > 4. Provide a module parameter to fuzz per-CPU clock checking.
> > > > > > > > >
> > > > > > > > > 5. Limit number of CPUs checked for clock synchronization.
> > > > > > > > >
> > > > > > > > > 6. Reduce clocksource-skew threshold for TSC.
> > > > > > > > >
> > > > > > > > > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > > > > > > > > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > > > > > > > > kernel test robot:
> > > > > > > > >
> > > > > > > > > o Automatically compute the uncertainty margin for clocksource, and
> > > > > > > > > also allow them to be specified manually before that clocksource
> > > > > > > > > is registered.
> > > > > > > > >
> > > > > > > > > o For the automatically computed uncertainty margins, bound them
> > > > > > > > > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> > > > > > > > >
> > > > > > > > > o For the manually specified uncertainty margins, splat (but
> > > > > > > > > continue) if they are less than 100 microseconds (again 2 *
> > > > > > > > > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > > > > > > > > production use of this clock-skew-inducing debugging technique.
> > > > > > > > >
> > > > > > > > > o Manually set the uncertainty margin for clocksource_jiffies
> > > > > > > > > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > > > > > > > > very low frequency of these clocks.
> > > > > > > > >
> > > > > > > > > o Manually set the uncertainty margin for clocksource_tsc_early
> > > > > > > > > to 32 milliseconds.
> > > > > > > > >
> > > > > > > > > o Apply numerous "Link:" fields to all patches.
> > > > > > > > >
> > > > > > > > > o Add some acks and CCs.
> > > > > > > > >
> > > > > > > > > Changes since v9:
> > > > > > > > >
> > > > > > > > > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > > > > > > > > Zhengjun; and Thomas Gleixner.
> > > > > > > > >
> > > > > > > > > o Improve CPU selection for clock-synchronization checking.
> > > > > > > > >
> > > > > > > > > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > > > > > > > >
> > > > > > > > > Changes since v8, based on Thomas Gleixner feedback:
> > > > > > > > >
> > > > > > > > > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> > > > > > > > >
> > > > > > > > > o Split out a cs_watchdog_read() function.
> > > > > > > > >
> > > > > > > > > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> > > > > > > > >
> > > > > > > > > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> > > > > > > > >
> > > > > > > > > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > > > > > > > >
> > > > > > > > > Changes since v7, based on Thomas Gleixner feedback:
> > > > > > > > >
> > > > > > > > > o Fix embarrassing git-format-patch operator error.
> > > > > > > > >
> > > > > > > > > o Merge pairwise clock-desynchronization checking into the checking
> > > > > > > > > of per-CPU clock synchronization when marked unstable.
> > > > > > > > >
> > > > > > > > > o Do selective per-CPU checking rather than blindly checking all
> > > > > > > > > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > > > > > > > > to control this behavior, with the value -1 choosing the old
> > > > > > > > > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> > > > > > > > >
> > > > > > > > > o Fix the clock-desynchronization checking to avoid a potential
> > > > > > > > > use-after-free error for dynamically allocated clocksource
> > > > > > > > > structures.
> > > > > > > > >
> > > > > > > > > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > > > > > > > > clocksource skew checking.
> > > > > > > > >
> > > > > > > > > o Update commit logs and do code-style updates.
> > > > > > > > >
> > > > > > > > > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > > > > > > > >
> > > > > > > > > Changes since v5:
> > > > > > > > >
> > > > > > > > > o Rebased to v5.12-rc5.
> > > > > > > > >
> > > > > > > > > Changes since v4:
> > > > > > > > >
> > > > > > > > > o Rebased to v5.12-rc1.
> > > > > > > > >
> > > > > > > > > Changes since v3:
> > > > > > > > >
> > > > > > > > > o Rebased to v5.11.
> > > > > > > > >
> > > > > > > > > o Apply Randy Dunlap feedback.
> > > > > > > > >
> > > > > > > > > Changes since v2:
> > > > > > > > >
> > > > > > > > > o Rebased to v5.11-rc6.
> > > > > > > > >
> > > > > > > > > o Updated Cc: list.
> > > > > > > > >
> > > > > > > > > Changes since v1:
> > > > > > > > >
> > > > > > > > > o Applied feedback from Rik van Riel.
> > > > > > > > >
> > > > > > > > > o Rebased to v5.11-rc3.
> > > > > > > > >
> > > > > > > > > o Stripped "RFC" from the subject lines.
> > > > > > > > >
> > > > > > > > > Thanx, Paul
> > > > > > > > >
> > > > > > > > > ------------------------------------------------------------------------
> > > > > > > > >
> > > > > > > > > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > > > > > > > > arch/x86/kernel/tsc.c | 1
> > > > > > > > > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > > > > > > > > b/arch/x86/kernel/tsc.c | 3
> > > > > > > > > b/include/linux/clocksource.h | 2
> > > > > > > > > b/kernel/time/clocksource.c | 23 ++
> > > > > > > > > b/kernel/time/jiffies.c | 15 -
> > > > > > > > > include/linux/clocksource.h | 3
> > > > > > > > > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > > > > > > > > 9 files changed, 304 insertions(+), 23 deletions(-)
> > > > > > > >
> > > > > > > > Hi Paul,
> > > > > > > > using the v11, I added a approve flag and made it work for my early
> > > > > > > > inject test where tsc is good
> > > > > > > > through a cross tsc sync test. Ideally with the small tweak, we could
> > > > > > > > get less tsc issues to debug.
> > > > > > > > And I'm not sure it would help in real trouble shooting cases. But we
> > > > > > > > will see if it would help.
> > > > > > >
> > > > > > > Thank you for the patch!
> > > > > > >
> > > > > > > However, Thomas had me rework this code to put the error injection into
> > > > > > > a kernel module, so this effect is now obtained in a different way.
> > > > > > > So I am unable to make use of your patch.
> > > > > >
> > > > > > np, thanks for the heads up.
> > > > > >
> > > > > > we will also need to measure the tsc sync retest and prove it's robust
> > > > > > enough to trump the bad decision from clocksource watchdog based on HPET
> > > > > > or other slow and old clocks while leaving good decisions pass through.
> > > > > >
> > > > > > we will re-spin the tsc story when your code is settled and landed in
> > > > > > the mainline.
> > > > >
> > > > > My current series exports clocksource_verify_percpu(), which might help
> > > > > measuring TSC synchronization.
> > > > >
> > > > > Thanx, Paul
On Thu, Jun 3, 2021 at 10:40 PM Paul E. McKenney <[email protected]> wrote:
>
> On Thu, Jun 03, 2021 at 12:25:57PM +0800, Luming Yu wrote:
> > These options works as how they are designed. But it needs to
> > go through a manual work and can't scale in highly automated data center
> > fleets.
> >
> > .if we really don't need a watch dog at runtime, we need some data proof.
> >
> > Before we can disable TSC watchdog by default for all Linux instances, we
> > still need a solution based on your patch set to train the watch dog from doing
> > wrong things , just based on some other latency issues in system that might be
> > unavoidable. We really can't punish tsc for the latency issues from
> > unknown source.
> >
> > I agree we still need to sort out the quality problems from these latency issues
> > noticed from watchdog rather than that we simply mute and hide as you
> > may suspect
> > that we may abuse the thresholds and the mechanism your patch set provides.
> > But it should be a focus after the patch is merged in upstream.
>
> Works for me!
>
> Would you be willing to provide your Reviewed-by, Acked-by, or Tested-by
> for the series?
sure!
Feel free to use Reviewed-by: [email protected] or [email protected]
for requesting the patch set to be merged in mainline as it can help us to
sort out real issues while it can improve overall user experience with tsc.
>
> Thanx, Paul
>
> > On Thu, Jun 3, 2021 at 2:24 AM Paul E. McKenney <[email protected]> wrote:
> > >
> > > On Wed, Jun 02, 2021 at 10:46:50AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Jun 02, 2021 at 01:10:37PM +0800, Luming Yu wrote:
> > > > > Hi Paul,
> > > > >
> > > > > It appears that the patch set is not in 5.13. Will it be in 5.14?
> > > >
> > > > Indeed it is not in v5.13. There were some late-breaking reviews and
> > > > changes. I am currently thinking in terms of v5.14.
> > > >
> > > > > And more data proof seems to indciate that tsc is more stable than
> > > > > tsc-watchdog.
> > > >
> > > > The tsc-watchdog being HPET? Or some other clocksource?
> > > >
> > > > > and we need the patch set to dis-arm wrong actions when watch dog is
> > > > > hit by a spik.
> > > >
> > > > It does depend on the hardware. Thomas Gleixner provided a most
> > > > excellent summary of the possibilities here:
> > > >
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > >
> > > > And then if your hardware's TSC is the most trustworthy clocksource
> > > > on your system, you can always boot with tsc=reliable and avoid the
> > > > clocksource watchdog completely, with or without this patch series.
> > >
> > > Oh, and firmware can and apparently still sometimes does "adjust" the TSC,
> > > and so booting with tsc=reliable can such adjustments from you.
> > >
> > > > Or am I missing your point?
> > >
> > > Thanx, Paul
> > >
> > > > > On Sat, May 1, 2021 at 12:28 PM Paul E. McKenney <[email protected]> wrote:
> > > > > >
> > > > > > On Fri, Apr 30, 2021 at 02:52:58PM +0800, Luming Yu wrote:
> > > > > > > On Fri, Apr 30, 2021 at 1:11 PM Paul E. McKenney <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> > > > > > > > > On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > Hello!
> > > > > > > > > >
> > > > > > > > > > If there is a sufficient delay between reading the watchdog clock and the
> > > > > > > > > > clock under test, the clock under test will be marked unstable through no
> > > > > > > > > > fault of its own. This series checks for this, doing limited retries
> > > > > > > > > > to get a good set of clock reads. If the clock is marked unstable
> > > > > > > > > > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > > > > > > > > > This series also provides delay injection, which may be enabled via
> > > > > > > > > > kernel boot parameters to test the checking for delays.
> > > > > > > > > >
> > > > > > > > > > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > > > > > > > > > vCPU preemption.
> > > > > > > > > >
> > > > > > > > > > 1. Provide module parameters to inject delays in watchdog.
> > > > > > > > > >
> > > > > > > > > > 2. Retry clock read if long delays detected.
> > > > > > > > > >
> > > > > > > > > > 3. Check per-CPU clock synchronization when marked unstable.
> > > > > > > > > >
> > > > > > > > > > 4. Provide a module parameter to fuzz per-CPU clock checking.
> > > > > > > > > >
> > > > > > > > > > 5. Limit number of CPUs checked for clock synchronization.
> > > > > > > > > >
> > > > > > > > > > 6. Reduce clocksource-skew threshold for TSC.
> > > > > > > > > >
> > > > > > > > > > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > > > > > > > > > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > > > > > > > > > kernel test robot:
> > > > > > > > > >
> > > > > > > > > > o Automatically compute the uncertainty margin for clocksource, and
> > > > > > > > > > also allow them to be specified manually before that clocksource
> > > > > > > > > > is registered.
> > > > > > > > > >
> > > > > > > > > > o For the automatically computed uncertainty margins, bound them
> > > > > > > > > > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> > > > > > > > > >
> > > > > > > > > > o For the manually specified uncertainty margins, splat (but
> > > > > > > > > > continue) if they are less than 100 microseconds (again 2 *
> > > > > > > > > > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > > > > > > > > > production use of this clock-skew-inducing debugging technique.
> > > > > > > > > >
> > > > > > > > > > o Manually set the uncertainty margin for clocksource_jiffies
> > > > > > > > > > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > > > > > > > > > very low frequency of these clocks.
> > > > > > > > > >
> > > > > > > > > > o Manually set the uncertainty margin for clocksource_tsc_early
> > > > > > > > > > to 32 milliseconds.
> > > > > > > > > >
> > > > > > > > > > o Apply numerous "Link:" fields to all patches.
> > > > > > > > > >
> > > > > > > > > > o Add some acks and CCs.
> > > > > > > > > >
> > > > > > > > > > Changes since v9:
> > > > > > > > > >
> > > > > > > > > > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > > > > > > > > > Zhengjun; and Thomas Gleixner.
> > > > > > > > > >
> > > > > > > > > > o Improve CPU selection for clock-synchronization checking.
> > > > > > > > > >
> > > > > > > > > > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > > > > > > > > >
> > > > > > > > > > Changes since v8, based on Thomas Gleixner feedback:
> > > > > > > > > >
> > > > > > > > > > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> > > > > > > > > >
> > > > > > > > > > o Split out a cs_watchdog_read() function.
> > > > > > > > > >
> > > > > > > > > > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> > > > > > > > > >
> > > > > > > > > > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> > > > > > > > > >
> > > > > > > > > > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > > > > > > > > >
> > > > > > > > > > Changes since v7, based on Thomas Gleixner feedback:
> > > > > > > > > >
> > > > > > > > > > o Fix embarrassing git-format-patch operator error.
> > > > > > > > > >
> > > > > > > > > > o Merge pairwise clock-desynchronization checking into the checking
> > > > > > > > > > of per-CPU clock synchronization when marked unstable.
> > > > > > > > > >
> > > > > > > > > > o Do selective per-CPU checking rather than blindly checking all
> > > > > > > > > > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > > > > > > > > > to control this behavior, with the value -1 choosing the old
> > > > > > > > > > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> > > > > > > > > >
> > > > > > > > > > o Fix the clock-desynchronization checking to avoid a potential
> > > > > > > > > > use-after-free error for dynamically allocated clocksource
> > > > > > > > > > structures.
> > > > > > > > > >
> > > > > > > > > > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > > > > > > > > > clocksource skew checking.
> > > > > > > > > >
> > > > > > > > > > o Update commit logs and do code-style updates.
> > > > > > > > > >
> > > > > > > > > > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > > > > > > > > >
> > > > > > > > > > Changes since v5:
> > > > > > > > > >
> > > > > > > > > > o Rebased to v5.12-rc5.
> > > > > > > > > >
> > > > > > > > > > Changes since v4:
> > > > > > > > > >
> > > > > > > > > > o Rebased to v5.12-rc1.
> > > > > > > > > >
> > > > > > > > > > Changes since v3:
> > > > > > > > > >
> > > > > > > > > > o Rebased to v5.11.
> > > > > > > > > >
> > > > > > > > > > o Apply Randy Dunlap feedback.
> > > > > > > > > >
> > > > > > > > > > Changes since v2:
> > > > > > > > > >
> > > > > > > > > > o Rebased to v5.11-rc6.
> > > > > > > > > >
> > > > > > > > > > o Updated Cc: list.
> > > > > > > > > >
> > > > > > > > > > Changes since v1:
> > > > > > > > > >
> > > > > > > > > > o Applied feedback from Rik van Riel.
> > > > > > > > > >
> > > > > > > > > > o Rebased to v5.11-rc3.
> > > > > > > > > >
> > > > > > > > > > o Stripped "RFC" from the subject lines.
> > > > > > > > > >
> > > > > > > > > > Thanx, Paul
> > > > > > > > > >
> > > > > > > > > > ------------------------------------------------------------------------
> > > > > > > > > >
> > > > > > > > > > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > > > > > > > > > arch/x86/kernel/tsc.c | 1
> > > > > > > > > > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > > > > > > > > > b/arch/x86/kernel/tsc.c | 3
> > > > > > > > > > b/include/linux/clocksource.h | 2
> > > > > > > > > > b/kernel/time/clocksource.c | 23 ++
> > > > > > > > > > b/kernel/time/jiffies.c | 15 -
> > > > > > > > > > include/linux/clocksource.h | 3
> > > > > > > > > > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > > > > > > > > > 9 files changed, 304 insertions(+), 23 deletions(-)
> > > > > > > > >
> > > > > > > > > Hi Paul,
> > > > > > > > > using the v11, I added a approve flag and made it work for my early
> > > > > > > > > inject test where tsc is good
> > > > > > > > > through a cross tsc sync test. Ideally with the small tweak, we could
> > > > > > > > > get less tsc issues to debug.
> > > > > > > > > And I'm not sure it would help in real trouble shooting cases. But we
> > > > > > > > > will see if it would help.
> > > > > > > >
> > > > > > > > Thank you for the patch!
> > > > > > > >
> > > > > > > > However, Thomas had me rework this code to put the error injection into
> > > > > > > > a kernel module, so this effect is now obtained in a different way.
> > > > > > > > So I am unable to make use of your patch.
> > > > > > >
> > > > > > > np, thanks for the heads up.
> > > > > > >
> > > > > > > we will also need to measure the tsc sync retest and prove it's robust
> > > > > > > enough to trump the bad decision from clocksource watchdog based on HPET
> > > > > > > or other slow and old clocks while leaving good decisions pass through.
> > > > > > >
> > > > > > > we will re-spin the tsc story when your code is settled and landed in
> > > > > > > the mainline.
> > > > > >
> > > > > > My current series exports clocksource_verify_percpu(), which might help
> > > > > > measuring TSC synchronization.
> > > > > >
> > > > > > Thanx, Paul
On Fri, Jun 04, 2021 at 12:23:02PM +0800, Luming Yu wrote:
> On Thu, Jun 3, 2021 at 10:40 PM Paul E. McKenney <[email protected]> wrote:
> >
> > On Thu, Jun 03, 2021 at 12:25:57PM +0800, Luming Yu wrote:
> > > These options works as how they are designed. But it needs to
> > > go through a manual work and can't scale in highly automated data center
> > > fleets.
> > >
> > > .if we really don't need a watch dog at runtime, we need some data proof.
> > >
> > > Before we can disable TSC watchdog by default for all Linux instances, we
> > > still need a solution based on your patch set to train the watch dog from doing
> > > wrong things , just based on some other latency issues in system that might be
> > > unavoidable. We really can't punish tsc for the latency issues from
> > > unknown source.
> > >
> > > I agree we still need to sort out the quality problems from these latency issues
> > > noticed from watchdog rather than that we simply mute and hide as you
> > > may suspect
> > > that we may abuse the thresholds and the mechanism your patch set provides.
> > > But it should be a focus after the patch is merged in upstream.
> >
> > Works for me!
> >
> > Would you be willing to provide your Reviewed-by, Acked-by, or Tested-by
> > for the series?
>
> sure!
>
> Feel free to use Reviewed-by: [email protected] or [email protected]
> for requesting the patch set to be merged in mainline as it can help us to
> sort out real issues while it can improve overall user experience with tsc.
Thank you! I have applied "Reviewed-by: Luming Yu <[email protected]>".
Thanx, Paul
> > > On Thu, Jun 3, 2021 at 2:24 AM Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > On Wed, Jun 02, 2021 at 10:46:50AM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Jun 02, 2021 at 01:10:37PM +0800, Luming Yu wrote:
> > > > > > Hi Paul,
> > > > > >
> > > > > > It appears that the patch set is not in 5.13. Will it be in 5.14?
> > > > >
> > > > > Indeed it is not in v5.13. There were some late-breaking reviews and
> > > > > changes. I am currently thinking in terms of v5.14.
> > > > >
> > > > > > And more data proof seems to indciate that tsc is more stable than
> > > > > > tsc-watchdog.
> > > > >
> > > > > The tsc-watchdog being HPET? Or some other clocksource?
> > > > >
> > > > > > and we need the patch set to dis-arm wrong actions when watch dog is
> > > > > > hit by a spik.
> > > > >
> > > > > It does depend on the hardware. Thomas Gleixner provided a most
> > > > > excellent summary of the possibilities here:
> > > > >
> > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > >
> > > > > And then if your hardware's TSC is the most trustworthy clocksource
> > > > > on your system, you can always boot with tsc=reliable and avoid the
> > > > > clocksource watchdog completely, with or without this patch series.
> > > >
> > > > Oh, and firmware can and apparently still sometimes does "adjust" the TSC,
> > > > and so booting with tsc=reliable can such adjustments from you.
> > > >
> > > > > Or am I missing your point?
> > > >
> > > > Thanx, Paul
> > > >
> > > > > > On Sat, May 1, 2021 at 12:28 PM Paul E. McKenney <[email protected]> wrote:
> > > > > > >
> > > > > > > On Fri, Apr 30, 2021 at 02:52:58PM +0800, Luming Yu wrote:
> > > > > > > > On Fri, Apr 30, 2021 at 1:11 PM Paul E. McKenney <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> > > > > > > > > > On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hello!
> > > > > > > > > > >
> > > > > > > > > > > If there is a sufficient delay between reading the watchdog clock and the
> > > > > > > > > > > clock under test, the clock under test will be marked unstable through no
> > > > > > > > > > > fault of its own. This series checks for this, doing limited retries
> > > > > > > > > > > to get a good set of clock reads. If the clock is marked unstable
> > > > > > > > > > > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > > > > > > > > > > This series also provides delay injection, which may be enabled via
> > > > > > > > > > > kernel boot parameters to test the checking for delays.
> > > > > > > > > > >
> > > > > > > > > > > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > > > > > > > > > > vCPU preemption.
> > > > > > > > > > >
> > > > > > > > > > > 1. Provide module parameters to inject delays in watchdog.
> > > > > > > > > > >
> > > > > > > > > > > 2. Retry clock read if long delays detected.
> > > > > > > > > > >
> > > > > > > > > > > 3. Check per-CPU clock synchronization when marked unstable.
> > > > > > > > > > >
> > > > > > > > > > > 4. Provide a module parameter to fuzz per-CPU clock checking.
> > > > > > > > > > >
> > > > > > > > > > > 5. Limit number of CPUs checked for clock synchronization.
> > > > > > > > > > >
> > > > > > > > > > > 6. Reduce clocksource-skew threshold for TSC.
> > > > > > > > > > >
> > > > > > > > > > > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > > > > > > > > > > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > > > > > > > > > > kernel test robot:
> > > > > > > > > > >
> > > > > > > > > > > o Automatically compute the uncertainty margin for clocksource, and
> > > > > > > > > > > also allow them to be specified manually before that clocksource
> > > > > > > > > > > is registered.
> > > > > > > > > > >
> > > > > > > > > > > o For the automatically computed uncertainty margins, bound them
> > > > > > > > > > > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> > > > > > > > > > >
> > > > > > > > > > > o For the manually specified uncertainty margins, splat (but
> > > > > > > > > > > continue) if they are less than 100 microseconds (again 2 *
> > > > > > > > > > > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > > > > > > > > > > production use of this clock-skew-inducing debugging technique.
> > > > > > > > > > >
> > > > > > > > > > > o Manually set the uncertainty margin for clocksource_jiffies
> > > > > > > > > > > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > > > > > > > > > > very low frequency of these clocks.
> > > > > > > > > > >
> > > > > > > > > > > o Manually set the uncertainty margin for clocksource_tsc_early
> > > > > > > > > > > to 32 milliseconds.
> > > > > > > > > > >
> > > > > > > > > > > o Apply numerous "Link:" fields to all patches.
> > > > > > > > > > >
> > > > > > > > > > > o Add some acks and CCs.
> > > > > > > > > > >
> > > > > > > > > > > Changes since v9:
> > > > > > > > > > >
> > > > > > > > > > > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > > > > > > > > > > Zhengjun; and Thomas Gleixner.
> > > > > > > > > > >
> > > > > > > > > > > o Improve CPU selection for clock-synchronization checking.
> > > > > > > > > > >
> > > > > > > > > > > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > > > > > > > > > >
> > > > > > > > > > > Changes since v8, based on Thomas Gleixner feedback:
> > > > > > > > > > >
> > > > > > > > > > > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> > > > > > > > > > >
> > > > > > > > > > > o Split out a cs_watchdog_read() function.
> > > > > > > > > > >
> > > > > > > > > > > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> > > > > > > > > > >
> > > > > > > > > > > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> > > > > > > > > > >
> > > > > > > > > > > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > > > > > > > > > >
> > > > > > > > > > > Changes since v7, based on Thomas Gleixner feedback:
> > > > > > > > > > >
> > > > > > > > > > > o Fix embarrassing git-format-patch operator error.
> > > > > > > > > > >
> > > > > > > > > > > o Merge pairwise clock-desynchronization checking into the checking
> > > > > > > > > > > of per-CPU clock synchronization when marked unstable.
> > > > > > > > > > >
> > > > > > > > > > > o Do selective per-CPU checking rather than blindly checking all
> > > > > > > > > > > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > > > > > > > > > > to control this behavior, with the value -1 choosing the old
> > > > > > > > > > > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> > > > > > > > > > >
> > > > > > > > > > > o Fix the clock-desynchronization checking to avoid a potential
> > > > > > > > > > > use-after-free error for dynamically allocated clocksource
> > > > > > > > > > > structures.
> > > > > > > > > > >
> > > > > > > > > > > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > > > > > > > > > > clocksource skew checking.
> > > > > > > > > > >
> > > > > > > > > > > o Update commit logs and do code-style updates.
> > > > > > > > > > >
> > > > > > > > > > > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > > > > > > > > > >
> > > > > > > > > > > Changes since v5:
> > > > > > > > > > >
> > > > > > > > > > > o Rebased to v5.12-rc5.
> > > > > > > > > > >
> > > > > > > > > > > Changes since v4:
> > > > > > > > > > >
> > > > > > > > > > > o Rebased to v5.12-rc1.
> > > > > > > > > > >
> > > > > > > > > > > Changes since v3:
> > > > > > > > > > >
> > > > > > > > > > > o Rebased to v5.11.
> > > > > > > > > > >
> > > > > > > > > > > o Apply Randy Dunlap feedback.
> > > > > > > > > > >
> > > > > > > > > > > Changes since v2:
> > > > > > > > > > >
> > > > > > > > > > > o Rebased to v5.11-rc6.
> > > > > > > > > > >
> > > > > > > > > > > o Updated Cc: list.
> > > > > > > > > > >
> > > > > > > > > > > Changes since v1:
> > > > > > > > > > >
> > > > > > > > > > > o Applied feedback from Rik van Riel.
> > > > > > > > > > >
> > > > > > > > > > > o Rebased to v5.11-rc3.
> > > > > > > > > > >
> > > > > > > > > > > o Stripped "RFC" from the subject lines.
> > > > > > > > > > >
> > > > > > > > > > > Thanx, Paul
> > > > > > > > > > >
> > > > > > > > > > > ------------------------------------------------------------------------
> > > > > > > > > > >
> > > > > > > > > > > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > > > > > > > > > > arch/x86/kernel/tsc.c | 1
> > > > > > > > > > > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > > > > > > > > > > b/arch/x86/kernel/tsc.c | 3
> > > > > > > > > > > b/include/linux/clocksource.h | 2
> > > > > > > > > > > b/kernel/time/clocksource.c | 23 ++
> > > > > > > > > > > b/kernel/time/jiffies.c | 15 -
> > > > > > > > > > > include/linux/clocksource.h | 3
> > > > > > > > > > > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > > > > > > > > > > 9 files changed, 304 insertions(+), 23 deletions(-)
> > > > > > > > > >
> > > > > > > > > > Hi Paul,
> > > > > > > > > > using the v11, I added a approve flag and made it work for my early
> > > > > > > > > > inject test where tsc is good
> > > > > > > > > > through a cross tsc sync test. Ideally with the small tweak, we could
> > > > > > > > > > get less tsc issues to debug.
> > > > > > > > > > And I'm not sure it would help in real trouble shooting cases. But we
> > > > > > > > > > will see if it would help.
> > > > > > > > >
> > > > > > > > > Thank you for the patch!
> > > > > > > > >
> > > > > > > > > However, Thomas had me rework this code to put the error injection into
> > > > > > > > > a kernel module, so this effect is now obtained in a different way.
> > > > > > > > > So I am unable to make use of your patch.
> > > > > > > >
> > > > > > > > np, thanks for the heads up.
> > > > > > > >
> > > > > > > > we will also need to measure the tsc sync retest and prove it's robust
> > > > > > > > enough to trump the bad decision from clocksource watchdog based on HPET
> > > > > > > > or other slow and old clocks while leaving good decisions pass through.
> > > > > > > >
> > > > > > > > we will re-spin the tsc story when your code is settled and landed in
> > > > > > > > the mainline.
> > > > > > >
> > > > > > > My current series exports clocksource_verify_percpu(), which might help
> > > > > > > measuring TSC synchronization.
> > > > > > >
> > > > > > > Thanx, Paul