2020-12-06 22:11:29

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 5/8] ntp: Make the RTC synchronization more reliable

Miroslav reported that the periodic RTC synchronization in the NTP code
fails more often than not to hit the specified update window.

The reason is that the code uses delayed_work to schedule the update which
needs to be in thread context as the underlying RTC might be connected via
a slow bus, e.g. I2C. In the update function it verifies whether the
current time is correct vs. the requirements of the underlying RTC.

But delayed_work is using the timer wheel for scheduling which is
inaccurate by design. Depending on the distance to the expiry the wheel
gets less granular to allow batching and to avoid the cascading of the
original timer wheel. See 500462a9de65 ("timers: Switch to a non-cascading
wheel") and the code for further details.

The code already deals with this by splitting the 660 seconds period into a
long 659 seconds timer and then retrying with a smaller delta.

But looking at the actual granularities of the timer wheel (which depend on
the HZ configuration) the 659 seconds timer ends up in an outer wheel level
and is affected by a worst case granularity of:

HZ Granularity
1000 32s
250 16s
100 40s

So the initial timer can be already off by max 12.5% which is not a big
issue as the period of the sync is defined as ~11 minutes.

The fine grained second attempt schedules to the desired update point with
a timer expiring less than a second from now. Depending on the actual delta
and the HZ setting even the second attempt can end up in outer wheel levels
which have a large enough granularity to make the correctness check fail.

As this is a fundamental property of the timer wheel there is no way to
make this more accurate short of iterating in one jiffies steps towards the
update point.

Switch it to an hrtimer instead which schedules the actual update work. The
hrtimer will expire precisely (max 1 jiffie delay when high resolution
timers are not available). The actual scheduling delay of the work is the
same as before.

The update is triggered from do_adjtimex() which is a bit racy but not much
more racy than it was before:

if (ntp_synced())
queue_delayed_work(system_power_efficient_wq, &sync_work, 0);

which is racy when the work is currently executed and has not managed to
reschedule itself.

This becomes now:

if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer))
queue_work(system_power_efficient_wq, &sync_work, 0);

which is racy when the hrtimer has expired and the work is currently
executed and has not yet managed to rearm the hrtimer.

Not a big problem as it just schedules work for nothing.

The new implementation has a safe guard in place to catch the case where
the hrtimer is queued on entry to the work function and avoids an extra
update attempt of the RTC that way.

Reported-by: Miroslav Lichvar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
---
include/linux/timex.h | 1
kernel/time/ntp.c | 90 ++++++++++++++++++++++++---------------------
kernel/time/ntp_internal.h | 7 +++
3 files changed, 55 insertions(+), 43 deletions(-)

--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -157,7 +157,6 @@ extern int do_clock_adjtime(const clocki
extern void hardpps(const struct timespec64 *, const struct timespec64 *);

int read_current_timer(unsigned long *timer_val);
-void ntp_notify_cmos_timer(void);

/* The clock frequency of the i8253/i8254 PIT */
#define PIT_TICK_RATE 1193182ul
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -494,65 +494,55 @@ int second_overflow(time64_t secs)
return leap;
}

+#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
static void sync_hw_clock(struct work_struct *work);
-static DECLARE_DELAYED_WORK(sync_work, sync_hw_clock);
-
-static void sched_sync_hw_clock(struct timespec64 now,
- unsigned long target_nsec, bool fail)
+static DECLARE_WORK(sync_work, sync_hw_clock);
+static struct hrtimer sync_hrtimer;
+#define SYNC_PERIOD_NS (11UL * 60 * NSEC_PER_SEC)

+static enum hrtimer_restart sync_timer_callback(struct hrtimer *timer)
{
- struct timespec64 next;
+ queue_work(system_power_efficient_wq, &sync_work);

- ktime_get_real_ts64(&next);
- if (!fail)
- next.tv_sec = 659;
- else {
- /*
- * Try again as soon as possible. Delaying long periods
- * decreases the accuracy of the work queue timer. Due to this
- * the algorithm is very likely to require a short-sleep retry
- * after the above long sleep to synchronize ts_nsec.
- */
- next.tv_sec = 0;
- }
+ return HRTIMER_NORESTART;
+}

- /* Compute the needed delay that will get to tv_nsec == target_nsec */
- next.tv_nsec = target_nsec - next.tv_nsec;
- if (next.tv_nsec <= 0)
- next.tv_nsec += NSEC_PER_SEC;
- if (next.tv_nsec >= NSEC_PER_SEC) {
- next.tv_sec++;
- next.tv_nsec -= NSEC_PER_SEC;
- }
+static void sched_sync_hw_clock(unsigned long offset_nsec, bool retry)
+{
+ ktime_t exp = ktime_set(ktime_get_real_seconds(), 0);

- queue_delayed_work(system_power_efficient_wq, &sync_work,
- timespec64_to_jiffies(&next));
+ if (retry)
+ exp = ktime_add_ns(exp, 2 * NSEC_PER_SEC - offset_nsec);
+ else
+ exp = ktime_add_ns(exp, SYNC_PERIOD_NS - offset_nsec);
+
+ hrtimer_start(&sync_hrtimer, exp, HRTIMER_MODE_ABS);
}

static void sync_rtc_clock(void)
{
- unsigned long target_nsec;
- struct timespec64 adjust, now;
+ unsigned long offset_nsec;
+ struct timespec64 adjust;
int rc;

if (!IS_ENABLED(CONFIG_RTC_SYSTOHC))
return;

- ktime_get_real_ts64(&now);
+ ktime_get_real_ts64(&adjust);

- adjust = now;
if (persistent_clock_is_local)
adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);

/*
- * The current RTC in use will provide the target_nsec it wants to be
- * called at, and does rtc_tv_nsec_ok internally.
+ * The current RTC in use will provide the nanoseconds offset prior
+ * to a full second it wants to be called at, and invokes
+ * rtc_tv_nsec_ok() internally.
*/
- rc = rtc_set_ntp_time(adjust, &target_nsec);
+ rc = rtc_set_ntp_time(adjust, &offset_nsec);
if (rc == -ENODEV)
return;

- sched_sync_hw_clock(now, target_nsec, rc);
+ sched_sync_hw_clock(offset_nsec, rc != 0);
}

#ifdef CONFIG_GENERIC_CMOS_UPDATE
@@ -599,7 +589,7 @@ static bool sync_cmos_clock(void)
}
}

- sched_sync_hw_clock(now, target_nsec, rc);
+ sched_sync_hw_clock(target_nsec, rc != 0);
return true;
}

@@ -613,7 +603,12 @@ static bool sync_cmos_clock(void)
*/
static void sync_hw_clock(struct work_struct *work)
{
- if (!ntp_synced())
+ /*
+ * Don't update if STA_UNSYNC is set and if ntp_notify_cmos_timer()
+ * managed to schedule the work between the timer firing and the
+ * work being able to rearm the timer. Wait for the timer to expire.
+ */
+ if (!ntp_synced() || hrtimer_is_queued(&sync_hrtimer))
return;

if (sync_cmos_clock())
@@ -624,13 +619,23 @@ static void sync_hw_clock(struct work_st

void ntp_notify_cmos_timer(void)
{
- if (!ntp_synced())
- return;
+ /*
+ * When the work is currently executed but has not yet the timer
+ * rearmed this queues the work immediately again. No big issue,
+ * just a pointless work scheduled.
+ */
+ if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer))
+ queue_work(system_power_efficient_wq, &sync_work);
+}

- if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
- IS_ENABLED(CONFIG_RTC_SYSTOHC))
- queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
+static void __init ntp_init_cmos_sync(void)
+{
+ hrtimer_init(&sync_hrtimer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+ sync_hrtimer.function = sync_timer_callback;
}
+#else /* CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC) */
+static inline void __init ntp_init_cmos_sync(void) { }
+#endif /* !CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC) */

/*
* Propagate a new txc->status value into the NTP state:
@@ -1044,4 +1049,5 @@ static int __init ntp_tick_adj_setup(cha
void __init ntp_init(void)
{
ntp_clear();
+ ntp_init_cmos_sync();
}
--- a/kernel/time/ntp_internal.h
+++ b/kernel/time/ntp_internal.h
@@ -12,4 +12,11 @@ extern int __do_adjtimex(struct __kernel
const struct timespec64 *ts,
s32 *time_tai, struct audit_ntp_data *ad);
extern void __hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts);
+
+#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
+extern void ntp_notify_cmos_timer(void);
+#else
+static inline void ntp_notify_cmos_timer(void) { }
+#endif
+
#endif /* _LINUX_NTP_INTERNAL_H */


2020-12-07 12:53:42

by Miroslav Lichvar

[permalink] [raw]
Subject: Re: [patch 5/8] ntp: Make the RTC synchronization more reliable

On Sun, Dec 06, 2020 at 10:46:18PM +0100, Thomas Gleixner wrote:
> Switch it to an hrtimer instead which schedules the actual update work. The
> hrtimer will expire precisely (max 1 jiffie delay when high resolution
> timers are not available). The actual scheduling delay of the work is the
> same as before.

It works well in my tests.

> This becomes now:
>
> if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer))
> queue_work(system_power_efficient_wq, &sync_work, 0);
>
> which is racy when the hrtimer has expired and the work is currently
> executed and has not yet managed to rearm the hrtimer.
>
> Not a big problem as it just schedules work for nothing.

No more unexpected updates of the RTC observed.

Tested-by: Miroslav Lichvar <[email protected]>

Thanks,

--
Miroslav Lichvar

2020-12-08 01:13:01

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [patch 5/8] ntp: Make the RTC synchronization more reliable

On Sun, Dec 06, 2020 at 10:46:18PM +0100, Thomas Gleixner wrote:
> Miroslav reported that the periodic RTC synchronization in the NTP code
> fails more often than not to hit the specified update window.
>
> The reason is that the code uses delayed_work to schedule the update which
> needs to be in thread context as the underlying RTC might be connected via
> a slow bus, e.g. I2C. In the update function it verifies whether the
> current time is correct vs. the requirements of the underlying RTC.
>
> But delayed_work is using the timer wheel for scheduling which is
> inaccurate by design. Depending on the distance to the expiry the wheel
> gets less granular to allow batching and to avoid the cascading of the
> original timer wheel. See 500462a9de65 ("timers: Switch to a non-cascading
> wheel") and the code for further details.
>
> The code already deals with this by splitting the 660 seconds period into a
> long 659 seconds timer and then retrying with a smaller delta.
>
> But looking at the actual granularities of the timer wheel (which depend on
> the HZ configuration) the 659 seconds timer ends up in an outer wheel level
> and is affected by a worst case granularity of:
>
> HZ Granularity
> 1000 32s
> 250 16s
> 100 40s
>
> So the initial timer can be already off by max 12.5% which is not a big
> issue as the period of the sync is defined as ~11 minutes.
>
> The fine grained second attempt schedules to the desired update point with
> a timer expiring less than a second from now. Depending on the actual delta
> and the HZ setting even the second attempt can end up in outer wheel levels
> which have a large enough granularity to make the correctness check fail.
>
> As this is a fundamental property of the timer wheel there is no way to
> make this more accurate short of iterating in one jiffies steps towards the
> update point.
>
> Switch it to an hrtimer instead which schedules the actual update work. The
> hrtimer will expire precisely (max 1 jiffie delay when high resolution
> timers are not available). The actual scheduling delay of the work is the
> same as before.
>
> The update is triggered from do_adjtimex() which is a bit racy but not much
> more racy than it was before:
>
> if (ntp_synced())
> queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
>
> which is racy when the work is currently executed and has not managed to
> reschedule itself.
>
> This becomes now:
>
> if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer))
> queue_work(system_power_efficient_wq, &sync_work, 0);
>
> which is racy when the hrtimer has expired and the work is currently
> executed and has not yet managed to rearm the hrtimer.
>
> Not a big problem as it just schedules work for nothing.
>
> The new implementation has a safe guard in place to catch the case where
> the hrtimer is queued on entry to the work function and avoids an extra
> update attempt of the RTC that way.
>
> Reported-by: Miroslav Lichvar <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Prarit Bhargava <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> ---
> include/linux/timex.h | 1
> kernel/time/ntp.c | 90 ++++++++++++++++++++++++---------------------
> kernel/time/ntp_internal.h | 7 +++
> 3 files changed, 55 insertions(+), 43 deletions(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

Subject: [tip: timers/core] ntp: Make the RTC synchronization more reliable

The following commit has been merged into the timers/core branch of tip:

Commit-ID: c9e6189fb03123a7dfb93589280347b46f30b161
Gitweb: https://git.kernel.org/tip/c9e6189fb03123a7dfb93589280347b46f30b161
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sun, 06 Dec 2020 22:46:18 +01:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Fri, 11 Dec 2020 10:40:52 +01:00

ntp: Make the RTC synchronization more reliable

Miroslav reported that the periodic RTC synchronization in the NTP code
fails more often than not to hit the specified update window.

The reason is that the code uses delayed_work to schedule the update which
needs to be in thread context as the underlying RTC might be connected via
a slow bus, e.g. I2C. In the update function it verifies whether the
current time is correct vs. the requirements of the underlying RTC.

But delayed_work is using the timer wheel for scheduling which is
inaccurate by design. Depending on the distance to the expiry the wheel
gets less granular to allow batching and to avoid the cascading of the
original timer wheel. See 500462a9de65 ("timers: Switch to a non-cascading
wheel") and the code for further details.

The code already deals with this by splitting the 660 seconds period into a
long 659 seconds timer and then retrying with a smaller delta.

But looking at the actual granularities of the timer wheel (which depend on
the HZ configuration) the 659 seconds timer ends up in an outer wheel level
and is affected by a worst case granularity of:

HZ Granularity
1000 32s
250 16s
100 40s

So the initial timer can be already off by max 12.5% which is not a big
issue as the period of the sync is defined as ~11 minutes.

The fine grained second attempt schedules to the desired update point with
a timer expiring less than a second from now. Depending on the actual delta
and the HZ setting even the second attempt can end up in outer wheel levels
which have a large enough granularity to make the correctness check fail.

As this is a fundamental property of the timer wheel there is no way to
make this more accurate short of iterating in one jiffies steps towards the
update point.

Switch it to an hrtimer instead which schedules the actual update work. The
hrtimer will expire precisely (max 1 jiffie delay when high resolution
timers are not available). The actual scheduling delay of the work is the
same as before.

The update is triggered from do_adjtimex() which is a bit racy but not much
more racy than it was before:

if (ntp_synced())
queue_delayed_work(system_power_efficient_wq, &sync_work, 0);

which is racy when the work is currently executed and has not managed to
reschedule itself.

This becomes now:

if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer))
queue_work(system_power_efficient_wq, &sync_work, 0);

which is racy when the hrtimer has expired and the work is currently
executed and has not yet managed to rearm the hrtimer.

Not a big problem as it just schedules work for nothing.

The new implementation has a safe guard in place to catch the case where
the hrtimer is queued on entry to the work function and avoids an extra
update attempt of the RTC that way.

Reported-by: Miroslav Lichvar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Miroslav Lichvar <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Acked-by: Alexandre Belloni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
include/linux/timex.h | 1 +-
kernel/time/ntp.c | 90 +++++++++++++++++++------------------
kernel/time/ntp_internal.h | 7 +++-
3 files changed, 55 insertions(+), 43 deletions(-)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index ce08597..9c2e54f 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -157,7 +157,6 @@ extern int do_clock_adjtime(const clockid_t which_clock, struct __kernel_timex *
extern void hardpps(const struct timespec64 *, const struct timespec64 *);

int read_current_timer(unsigned long *timer_val);
-void ntp_notify_cmos_timer(void);

/* The clock frequency of the i8253/i8254 PIT */
#define PIT_TICK_RATE 1193182ul
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 069ca78..ff1a7b8 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -494,65 +494,55 @@ out:
return leap;
}

+#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
static void sync_hw_clock(struct work_struct *work);
-static DECLARE_DELAYED_WORK(sync_work, sync_hw_clock);
-
-static void sched_sync_hw_clock(struct timespec64 now,
- unsigned long target_nsec, bool fail)
+static DECLARE_WORK(sync_work, sync_hw_clock);
+static struct hrtimer sync_hrtimer;
+#define SYNC_PERIOD_NS (11UL * 60 * NSEC_PER_SEC)

+static enum hrtimer_restart sync_timer_callback(struct hrtimer *timer)
{
- struct timespec64 next;
+ queue_work(system_power_efficient_wq, &sync_work);

- ktime_get_real_ts64(&next);
- if (!fail)
- next.tv_sec = 659;
- else {
- /*
- * Try again as soon as possible. Delaying long periods
- * decreases the accuracy of the work queue timer. Due to this
- * the algorithm is very likely to require a short-sleep retry
- * after the above long sleep to synchronize ts_nsec.
- */
- next.tv_sec = 0;
- }
+ return HRTIMER_NORESTART;
+}

- /* Compute the needed delay that will get to tv_nsec == target_nsec */
- next.tv_nsec = target_nsec - next.tv_nsec;
- if (next.tv_nsec <= 0)
- next.tv_nsec += NSEC_PER_SEC;
- if (next.tv_nsec >= NSEC_PER_SEC) {
- next.tv_sec++;
- next.tv_nsec -= NSEC_PER_SEC;
- }
+static void sched_sync_hw_clock(unsigned long offset_nsec, bool retry)
+{
+ ktime_t exp = ktime_set(ktime_get_real_seconds(), 0);
+
+ if (retry)
+ exp = ktime_add_ns(exp, 2 * NSEC_PER_SEC - offset_nsec);
+ else
+ exp = ktime_add_ns(exp, SYNC_PERIOD_NS - offset_nsec);

- queue_delayed_work(system_power_efficient_wq, &sync_work,
- timespec64_to_jiffies(&next));
+ hrtimer_start(&sync_hrtimer, exp, HRTIMER_MODE_ABS);
}

static void sync_rtc_clock(void)
{
- unsigned long target_nsec;
- struct timespec64 adjust, now;
+ unsigned long offset_nsec;
+ struct timespec64 adjust;
int rc;

if (!IS_ENABLED(CONFIG_RTC_SYSTOHC))
return;

- ktime_get_real_ts64(&now);
+ ktime_get_real_ts64(&adjust);

- adjust = now;
if (persistent_clock_is_local)
adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);

/*
- * The current RTC in use will provide the target_nsec it wants to be
- * called at, and does rtc_tv_nsec_ok internally.
+ * The current RTC in use will provide the nanoseconds offset prior
+ * to a full second it wants to be called at, and invokes
+ * rtc_tv_nsec_ok() internally.
*/
- rc = rtc_set_ntp_time(adjust, &target_nsec);
+ rc = rtc_set_ntp_time(adjust, &offset_nsec);
if (rc == -ENODEV)
return;

- sched_sync_hw_clock(now, target_nsec, rc);
+ sched_sync_hw_clock(offset_nsec, rc != 0);
}

#ifdef CONFIG_GENERIC_CMOS_UPDATE
@@ -599,7 +589,7 @@ static bool sync_cmos_clock(void)
}
}

- sched_sync_hw_clock(now, target_nsec, rc);
+ sched_sync_hw_clock(target_nsec, rc != 0);
return true;
}

@@ -613,7 +603,12 @@ static bool sync_cmos_clock(void)
*/
static void sync_hw_clock(struct work_struct *work)
{
- if (!ntp_synced())
+ /*
+ * Don't update if STA_UNSYNC is set and if ntp_notify_cmos_timer()
+ * managed to schedule the work between the timer firing and the
+ * work being able to rearm the timer. Wait for the timer to expire.
+ */
+ if (!ntp_synced() || hrtimer_is_queued(&sync_hrtimer))
return;

if (sync_cmos_clock())
@@ -624,13 +619,23 @@ static void sync_hw_clock(struct work_struct *work)

void ntp_notify_cmos_timer(void)
{
- if (!ntp_synced())
- return;
+ /*
+ * When the work is currently executed but has not yet the timer
+ * rearmed this queues the work immediately again. No big issue,
+ * just a pointless work scheduled.
+ */
+ if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer))
+ queue_work(system_power_efficient_wq, &sync_work);
+}

- if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
- IS_ENABLED(CONFIG_RTC_SYSTOHC))
- queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
+static void __init ntp_init_cmos_sync(void)
+{
+ hrtimer_init(&sync_hrtimer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+ sync_hrtimer.function = sync_timer_callback;
}
+#else /* CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC) */
+static inline void __init ntp_init_cmos_sync(void) { }
+#endif /* !CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC) */

/*
* Propagate a new txc->status value into the NTP state:
@@ -1044,4 +1049,5 @@ __setup("ntp_tick_adj=", ntp_tick_adj_setup);
void __init ntp_init(void)
{
ntp_clear();
+ ntp_init_cmos_sync();
}
diff --git a/kernel/time/ntp_internal.h b/kernel/time/ntp_internal.h
index 908ecaa..23d1b74 100644
--- a/kernel/time/ntp_internal.h
+++ b/kernel/time/ntp_internal.h
@@ -12,4 +12,11 @@ extern int __do_adjtimex(struct __kernel_timex *txc,
const struct timespec64 *ts,
s32 *time_tai, struct audit_ntp_data *ad);
extern void __hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts);
+
+#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
+extern void ntp_notify_cmos_timer(void);
+#else
+static inline void ntp_notify_cmos_timer(void) { }
+#endif
+
#endif /* _LINUX_NTP_INTERNAL_H */

2020-12-29 19:45:28

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [patch 5/8] ntp: Make the RTC synchronization more reliable

Hi Thomas,

On Sun, Dec 6, 2020 at 11:10 PM Thomas Gleixner <[email protected]> wrote:
> Miroslav reported that the periodic RTC synchronization in the NTP code
> fails more often than not to hit the specified update window.
>
> The reason is that the code uses delayed_work to schedule the update which
> needs to be in thread context as the underlying RTC might be connected via
> a slow bus, e.g. I2C. In the update function it verifies whether the
> current time is correct vs. the requirements of the underlying RTC.
>
> But delayed_work is using the timer wheel for scheduling which is
> inaccurate by design. Depending on the distance to the expiry the wheel
> gets less granular to allow batching and to avoid the cascading of the
> original timer wheel. See 500462a9de65 ("timers: Switch to a non-cascading
> wheel") and the code for further details.
>
> The code already deals with this by splitting the 660 seconds period into a
> long 659 seconds timer and then retrying with a smaller delta.
>
> But looking at the actual granularities of the timer wheel (which depend on
> the HZ configuration) the 659 seconds timer ends up in an outer wheel level
> and is affected by a worst case granularity of:
>
> HZ Granularity
> 1000 32s
> 250 16s
> 100 40s
>
> So the initial timer can be already off by max 12.5% which is not a big
> issue as the period of the sync is defined as ~11 minutes.
>
> The fine grained second attempt schedules to the desired update point with
> a timer expiring less than a second from now. Depending on the actual delta
> and the HZ setting even the second attempt can end up in outer wheel levels
> which have a large enough granularity to make the correctness check fail.
>
> As this is a fundamental property of the timer wheel there is no way to
> make this more accurate short of iterating in one jiffies steps towards the
> update point.
>
> Switch it to an hrtimer instead which schedules the actual update work. The
> hrtimer will expire precisely (max 1 jiffie delay when high resolution
> timers are not available). The actual scheduling delay of the work is the
> same as before.
>
> The update is triggered from do_adjtimex() which is a bit racy but not much
> more racy than it was before:
>
> if (ntp_synced())
> queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
>
> which is racy when the work is currently executed and has not managed to
> reschedule itself.
>
> This becomes now:
>
> if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer))
> queue_work(system_power_efficient_wq, &sync_work, 0);
>
> which is racy when the hrtimer has expired and the work is currently
> executed and has not yet managed to rearm the hrtimer.
>
> Not a big problem as it just schedules work for nothing.
>
> The new implementation has a safe guard in place to catch the case where
> the hrtimer is queued on entry to the work function and avoids an extra
> update attempt of the RTC that way.
>
> Reported-by: Miroslav Lichvar <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>

Thanks for your patch, which is now commit c9e6189fb03123a7 ("ntp: Make
the RTC synchronization more reliable").

Since this commit, the I2C RTC on the R-Car M2-W Koelsch development
board is accessed every two seconds. Sticking a WARN() in the I2C
activation path gives e.g.

Modules linked in:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted
5.10.0-rc1-koelsch-00038-gc9e6189fb031-dirty #1021
Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
Workqueue: events rtc_timer_do_work
[<c020ea7c>] (unwind_backtrace) from [<c020a4dc>] (show_stack+0x10/0x14)
[<c020a4dc>] (show_stack) from [<c090e314>] (dump_stack+0xa0/0xc8)
[<c090e314>] (dump_stack) from [<c02228a8>] (__warn+0xbc/0xfc)
[<c02228a8>] (__warn) from [<c090b434>] (warn_slowpath_fmt+0x78/0xb0)
[<c090b434>] (warn_slowpath_fmt) from [<c058f184>]
(cpg_mstp_clock_endisable+0x94/0x1f4)
[<c058f184>] (cpg_mstp_clock_endisable) from [<c05881c8>]
(clk_core_enable+0x194/0x29c)
[<c05881c8>] (clk_core_enable) from [<c05882e8>]
(clk_core_enable_lock+0x18/0x2c)
[<c05882e8>] (clk_core_enable_lock) from [<c063b5c4>]
(pm_clk_resume+0x68/0xa0)
[<c063b5c4>] (pm_clk_resume) from [<c063a738>]
(genpd_runtime_resume+0xc8/0x174)
[<c063a738>] (genpd_runtime_resume) from [<c063236c>]
(__rpm_callback+0x30/0xe0)
[<c063236c>] (__rpm_callback) from [<c063248c>] (rpm_callback+0x70/0x80)
[<c063248c>] (rpm_callback) from [<c063216c>] (rpm_resume+0x438/0x4dc)
[<c063216c>] (rpm_resume) from [<c0632274>] (__pm_runtime_resume+0x64/0x80)
[<c0632274>] (__pm_runtime_resume) from [<c0731ab4>]
(sh_mobile_i2c_xfer+0x38/0x554)
[<c0731ab4>] (sh_mobile_i2c_xfer) from [<c072a6c4>]
(__i2c_transfer+0x4e4/0x680)
[<c072a6c4>] (__i2c_transfer) from [<c072a8b8>] (i2c_transfer+0x58/0xf8)
[<c072a8b8>] (i2c_transfer) from [<c0645688>] (regmap_i2c_read+0x58/0x94)
[<c0645688>] (regmap_i2c_read) from [<c0641490>]
(_regmap_raw_read+0x19c/0x2f4)
[<c0641490>] (_regmap_raw_read) from [<c064162c>]
(_regmap_bus_read+0x44/0x68)
[<c064162c>] (_regmap_bus_read) from [<c0640308>] (_regmap_read+0x84/0x1a4)
[<c0640308>] (_regmap_read) from [<c0640984>]
(_regmap_update_bits+0xa8/0xf4)
[<c0640984>] (_regmap_update_bits) from [<c0641b5c>]
(regmap_update_bits_base+0x4c/0x70)
[<c0641b5c>] (regmap_update_bits_base) from [<c0728244>]
(regmap_update_bits+0x18/0x20)
[<c0728244>] (regmap_update_bits) from [<c072491c>]
(rtc_alarm_disable+0x28/0x38)
[<c072491c>] (rtc_alarm_disable) from [<c0726408>]
(rtc_timer_do_work+0x88/0x294)
[<c0726408>] (rtc_timer_do_work) from [<c023e27c>]
(process_one_work+0x308/0x524)
[<c023e27c>] (process_one_work) from [<c023ec80>]
(worker_thread+0x22c/0x2ec)
[<c023ec80>] (worker_thread) from [<c024417c>] (kthread+0x128/0x138)
[<c024417c>] (kthread) from [<c020010c>] (ret_from_fork+0x14/0x28)
Exception stack(0xc220ffb0 to 0xc220fff8)
ffa0: 00000000 00000000
00000000 00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
irq event stamp: 74448
hardirqs last enabled at (74447): [<c02a5f90>]
seqcount_lockdep_reader_access.constprop.0+0x58/0x68
hardirqs last disabled at (74448): [<c0919a68>]
_raw_spin_lock_irqsave+0x20/0x70
softirqs last enabled at (74360): [<c02012b0>] __do_softirq+0x1b8/0x468
softirqs last disabled at (74353): [<c0228f24>] __irq_exit_rcu+0x110/0x17c
---[ end trace 8c279400e5758606 ]---

Before, the synchronization happened only every 696s.

Worse, this synchronization may also happen while the system is partly
suspended, sometimes triggering a WARNING during resume from s2ram:

Disabling non-boot CPUs ...
Enabling non-boot CPUs ...
+------------[ cut here ]------------
+WARNING: CPU: 0 PID: 21 at drivers/i2c/i2c-core.h:54
__i2c_transfer+0x464/0x4a0
+i2c i2c-6: Transfer while suspended
+CPU: 0 PID: 21 Comm: kworker/0:1 Not tainted
5.11.0-rc1-shmobile-00107-gcf9760aa181f #829
+Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
+Workqueue: events_power_efficient sync_hw_clock
+[<c010dba4>] (unwind_backtrace) from [<c0109b28>] (show_stack+0x10/0x14)
+[<c0109b28>] (show_stack) from [<c07a120c>] (dump_stack+0x8c/0xa8)
+[<c07a120c>] (dump_stack) from [<c011c538>] (__warn+0xc0/0xec)
+[<c011c538>] (__warn) from [<c079a7bc>] (warn_slowpath_fmt+0x78/0xb0)
+[<c079a7bc>] (warn_slowpath_fmt) from [<c0566574>]
(__i2c_transfer+0x464/0x4a0)
+[<c0566574>] (__i2c_transfer) from [<c0566608>] (i2c_transfer+0x58/0xf8)
+[<c0566608>] (i2c_transfer) from [<c0489f80>] (regmap_i2c_read+0x58/0x94)
+[<c0489f80>] (regmap_i2c_read) from [<c0485e00>]
(_regmap_raw_read+0x108/0x1bc)
+[<c0485e00>] (_regmap_raw_read) from [<c0485ef8>]
(_regmap_bus_read+0x44/0x68)
+[<c0485ef8>] (_regmap_bus_read) from [<c0484018>] (_regmap_read+0x84/0x100)
+[<c0484018>] (_regmap_read) from [<c0485444>]
(_regmap_update_bits+0xa8/0xf4)
+[<c0485444>] (_regmap_update_bits) from [<c0485574>]
(_regmap_select_page+0xe4/0x100)
+[<c0485574>] (_regmap_select_page) from [<c0485664>]
(_regmap_raw_write_impl+0xd4/0x608)
+[<c0485664>] (_regmap_raw_write_impl) from [<c04863f4>]
(_regmap_raw_write+0xd8/0x114)
+[<c04863f4>] (_regmap_raw_write) from [<c0486488>]
(regmap_raw_write+0x58/0x7c)
+[<c0486488>] (regmap_raw_write) from [<c04866cc>]
(regmap_bulk_write+0x118/0x13c)
+[<c04866cc>] (regmap_bulk_write) from [<c05605b4>]
(da9063_rtc_set_time+0x44/0x8c)
+[<c05605b4>] (da9063_rtc_set_time) from [<c055e428>]
(rtc_set_time+0x8c/0x15c)
+[<c055e428>] (rtc_set_time) from [<c01872cc>] (sync_hw_clock+0x12c/0x210)
+[<c01872cc>] (sync_hw_clock) from [<c01337d0>]
(process_one_work+0x1bc/0x2ac)
+[<c01337d0>] (process_one_work) from [<c0133b18>]
(worker_thread+0x22c/0x2d0)
+[<c0133b18>] (worker_thread) from [<c01388a8>] (kthread+0x100/0x10c)
+[<c01388a8>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
+Exception stack(0xc1195fb0 to 0xc1195ff8)
+5fa0: 00000000 00000000
00000000 00000000
+5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
+5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
+---[ end trace 5d3a7a10ee0cec3d ]---
+da9063-rtc da9063-rtc: Failed to set RTC time data: -108
+da9063-rtc da9063-rtc: Failed to read RTC time data: -108
CPU1 is up

The latter is probably a pre-existing issue, just more likely to trigger
now the sync interval is 2s.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-01-11 13:05:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 5/8] ntp: Make the RTC synchronization more reliable

On Tue, Dec 29 2020 at 20:41, Geert Uytterhoeven wrote:
> Hi Thomas,
>> Reported-by: Miroslav Lichvar <[email protected]>
>> Signed-off-by: Thomas Gleixner <[email protected]>
>
> Thanks for your patch, which is now commit c9e6189fb03123a7 ("ntp: Make
> the RTC synchronization more reliable").
>
> Since this commit, the I2C RTC on the R-Car M2-W Koelsch development
> board is accessed every two seconds. Sticking a WARN() in the I2C
> activation path gives e.g.

Huch? Every two seconds? The timer is armed with 11 * 60 * NSEC_PER_SEC,
which is 11 minutes. Confused....

Thanks,

tglx

2021-01-11 13:08:40

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [patch 5/8] ntp: Make the RTC synchronization more reliable

Hi Thomas,

On Mon, Jan 11, 2021 at 11:12 AM Thomas Gleixner <[email protected]> wrote:
> On Tue, Dec 29 2020 at 20:41, Geert Uytterhoeven wrote:
> >> Reported-by: Miroslav Lichvar <[email protected]>
> >> Signed-off-by: Thomas Gleixner <[email protected]>
> >
> > Thanks for your patch, which is now commit c9e6189fb03123a7 ("ntp: Make
> > the RTC synchronization more reliable").
> >
> > Since this commit, the I2C RTC on the R-Car M2-W Koelsch development
> > board is accessed every two seconds. Sticking a WARN() in the I2C
> > activation path gives e.g.
>
> Huch? Every two seconds? The timer is armed with 11 * 60 * NSEC_PER_SEC,
> which is 11 minutes. Confused....

Thanks for the hint:

#define SYNC_PERIOD_NS (11UL * 60 * NSEC_PER_SEC)

is truncated on 32-bit platforms.

Patch sent.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds