2020-12-01 14:45:20

by Miroslav Lichvar

[permalink] [raw]
Subject: [PATCH] rtc: adapt allowed RTC update error

When the system clock is marked as synchronized via adjtimex(), the
kernel is expected to copy the system time to the RTC every 11 minutes.

There are reports that it doesn't always work reliably. It seems the
current requirement for the RTC update to happen within 5 ticks of the
target time in some cases can consistently fail for hours or even days.

It is better to set the RTC with a larger error than let it drift for
too long.

Add a static variable to rtc_tv_nsec_ok() to count the checks. With each
failed check, relax the requirement by one jiffie, and reset the counter
when it finally succeeds. This should allow the RTC update to happen in
a minute at most.

Signed-off-by: Miroslav Lichvar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
---
include/linux/rtc.h | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index 22d1575e4991..8d105f10ef6a 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -218,21 +218,30 @@ static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
struct timespec64 *to_set,
const struct timespec64 *now)
{
- /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */
- const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5;
struct timespec64 delay = {.tv_sec = 0,
.tv_nsec = set_offset_nsec};
+ unsigned long time_set_nsec_fuzz;
+ static unsigned int attempt;

*to_set = timespec64_add(*now, delay);

- if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) {
+ /*
+ * Determine allowed error in tv_nsec. Start at 5 jiffies and add a
+ * jiffie with each failed attempt to make sure the RTC will be set at
+ * some point, even if the update cannot be scheduled very accurately.
+ */
+ time_set_nsec_fuzz = (5 + attempt++) * TICK_NSEC;
+
+ if (to_set->tv_nsec < time_set_nsec_fuzz) {
to_set->tv_nsec = 0;
+ attempt = 0;
return true;
}

- if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) {
+ if (to_set->tv_nsec > NSEC_PER_SEC - time_set_nsec_fuzz) {
to_set->tv_sec++;
to_set->tv_nsec = 0;
+ attempt = 0;
return true;
}
return false;
--
2.26.2


2020-12-01 17:40:16

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Tue, Dec 01, 2020 at 06:14:20PM +0100, Miroslav Lichvar wrote:
> On Tue, Dec 01, 2020 at 12:12:24PM -0400, Jason Gunthorpe wrote:
> > On Tue, Dec 01, 2020 at 03:38:35PM +0100, Miroslav Lichvar wrote:
> > > + unsigned long time_set_nsec_fuzz;
> > > + static unsigned int attempt;
> >
> > Adding a static value instide a static inline should not be done
>
> Well, grepping through the other header files in include/linux, this
> would not be the first case.

Nevertheless, it should be avoided and has surprising behaviors.

> > I'm not sure using a static like this is the best idea anyhow, if you
> > want something like this it should be per-rtc, not global
>
> If there are multiple RTCs, are they all updated in this 11-minute
> sync?

Yes, but they have different offsets and thus different timers,
IIRC. Though only one gets to be the CONFIG_RTC_SYSTOHC_DEVICE

If you are going to put some static it is clearer to put it along side
the sync_rtc_clock()

> I found no good explanation. It seems to depend on what system is
> doing, if it's idle, etc. I suspect it's a property of the workqueues
> that they cannot generally guarantee the jobs to run exactly on time.
> I tried switching to the non-power-efficient and high priority
> workqueues and it didn't seem to make a big difference.

When I wrote it originally the workqueues got increasingly inaccurate
the longer the duration, so it does a very short sleep to get back on
track.

If you are missing that time target it must be constantly scheduling
new delayed work and none of it hits the target for long, long time
periods? This seems like a more fundamental problem someplace else in
the kernel.

Jason

2020-12-01 22:38:28

by Miroslav Lichvar

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Tue, Dec 01, 2020 at 12:12:24PM -0400, Jason Gunthorpe wrote:
> On Tue, Dec 01, 2020 at 03:38:35PM +0100, Miroslav Lichvar wrote:
> > + unsigned long time_set_nsec_fuzz;
> > + static unsigned int attempt;
>
> Adding a static value instide a static inline should not be done

Well, grepping through the other header files in include/linux, this
would not be the first case.

> I'm not sure using a static like this is the best idea anyhow, if you
> want something like this it should be per-rtc, not global

If there are multiple RTCs, are they all updated in this 11-minute
sync?

> Did you look at why time has become so in-accurate in your system? 5
> jiffies is usually a pretty long time?

I found no good explanation. It seems to depend on what system is
doing, if it's idle, etc. I suspect it's a property of the workqueues
that they cannot generally guarantee the jobs to run exactly on time.
I tried switching to the non-power-efficient and high priority
workqueues and it didn't seem to make a big difference.

--
Miroslav Lichvar

2020-12-01 22:39:01

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Tue, Dec 01, 2020 at 03:38:35PM +0100, Miroslav Lichvar wrote:
> When the system clock is marked as synchronized via adjtimex(), the
> kernel is expected to copy the system time to the RTC every 11 minutes.
>
> There are reports that it doesn't always work reliably. It seems the
> current requirement for the RTC update to happen within 5 ticks of the
> target time in some cases can consistently fail for hours or even days.
>
> It is better to set the RTC with a larger error than let it drift for
> too long.
>
> Add a static variable to rtc_tv_nsec_ok() to count the checks. With each
> failed check, relax the requirement by one jiffie, and reset the counter
> when it finally succeeds. This should allow the RTC update to happen in
> a minute at most.
>
> Signed-off-by: Miroslav Lichvar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Prarit Bhargava <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> include/linux/rtc.h | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/rtc.h b/include/linux/rtc.h
> index 22d1575e4991..8d105f10ef6a 100644
> +++ b/include/linux/rtc.h
> @@ -218,21 +218,30 @@ static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
> struct timespec64 *to_set,
> const struct timespec64 *now)
> {
> - /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */
> - const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5;
> struct timespec64 delay = {.tv_sec = 0,
> .tv_nsec = set_offset_nsec};
> + unsigned long time_set_nsec_fuzz;
> + static unsigned int attempt;

Adding a static value instide a static inline should not be done

I'm not sure using a static like this is the best idea anyhow, if you
want something like this it should be per-rtc, not global

Did you look at why time has become so in-accurate in your system? 5
jiffies is usually a pretty long time?

Jason

2020-12-02 10:05:42

by Miroslav Lichvar

[permalink] [raw]
Subject: [PATCHv2] rtc: adapt allowed RTC update error

When the system clock is marked as synchronized via adjtimex(), the
kernel is expected to copy the system time to the RTC every 11 minutes.

There are reports that it doesn't always work reliably. It seems the
current requirement for the RTC update to happen within 5 ticks of the
target time in some cases can consistently fail for hours or even days.

It is better to set the RTC with a larger error than let it drift for
too long.

Instead of increasing the constant again, use a static variable to count
the checks and with each failed check increase the allowed error by one
jiffie. Reset the counter when the check finally succeeds. This will
allow the RTC update to keep good accuracy if it can happen in the first
few attempts and it will not take more than a minute if the timing is
consistently bad for any reason.

Signed-off-by: Miroslav Lichvar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
---

Notes:
v2:
- moved the static variable to callers in ntp.c

drivers/rtc/systohc.c | 6 ++++--
include/linux/rtc.h | 14 +++++++++-----
kernel/time/ntp.c | 9 +++++++--
3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
index 8b70f0520e13..0777f590cdae 100644
--- a/drivers/rtc/systohc.c
+++ b/drivers/rtc/systohc.c
@@ -5,6 +5,7 @@
/**
* rtc_set_ntp_time - Save NTP synchronized time to the RTC
* @now: Current time of day
+ * @attempt: Number of previous failures used to adjust allowed error
* @target_nsec: pointer for desired now->tv_nsec value
*
* Replacement for the NTP platform function update_persistent_clock64
@@ -18,7 +19,8 @@
*
* If temporary failure is indicated the caller should try again 'soon'
*/
-int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec)
+int rtc_set_ntp_time(struct timespec64 now, unsigned int attempt,
+ unsigned long *target_nsec)
{
struct rtc_device *rtc;
struct rtc_time tm;
@@ -44,7 +46,7 @@ int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec)
* it does not we update target_nsec and return EPROTO to make the ntp
* code try again later.
*/
- ok = rtc_tv_nsec_ok(rtc->set_offset_nsec, &to_set, &now);
+ ok = rtc_tv_nsec_ok(rtc->set_offset_nsec, attempt, &to_set, &now);
if (!ok) {
err = -EPROTO;
goto out_close;
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index 22d1575e4991..9f3326b43620 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -165,7 +165,8 @@ int __rtc_register_device(struct module *owner, struct rtc_device *rtc);

extern int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm);
extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm);
-extern int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec);
+extern int rtc_set_ntp_time(struct timespec64 now, unsigned int attempt,
+ unsigned long *target_nsec);
int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm);
extern int rtc_read_alarm(struct rtc_device *rtc,
struct rtc_wkalrm *alrm);
@@ -213,24 +214,27 @@ static inline bool is_leap_year(unsigned int year)
* a zero in tv_nsecs, such that:
* to_set - set_delay_nsec == now +/- FUZZ
*
+ * The allowed error starts at 5 jiffies on the first attempt and is increased
+ * with each failed attempt to make sure the RTC will be set at some point,
+ * even if the timing is consistently inaccurate.
*/
static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
+ unsigned int attempt,
struct timespec64 *to_set,
const struct timespec64 *now)
{
- /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */
- const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5;
+ unsigned long time_set_nsec_fuzz = (5 + attempt) * TICK_NSEC;
struct timespec64 delay = {.tv_sec = 0,
.tv_nsec = set_offset_nsec};

*to_set = timespec64_add(*now, delay);

- if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) {
+ if (to_set->tv_nsec < time_set_nsec_fuzz) {
to_set->tv_nsec = 0;
return true;
}

- if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) {
+ if (to_set->tv_nsec > NSEC_PER_SEC - time_set_nsec_fuzz) {
to_set->tv_sec++;
to_set->tv_nsec = 0;
return true;
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 069ca78fb0bf..893bc7ed7845 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -531,6 +531,7 @@ static void sched_sync_hw_clock(struct timespec64 now,

static void sync_rtc_clock(void)
{
+ static unsigned int attempt;
unsigned long target_nsec;
struct timespec64 adjust, now;
int rc;
@@ -548,9 +549,11 @@ static void sync_rtc_clock(void)
* The current RTC in use will provide the target_nsec it wants to be
* called at, and does rtc_tv_nsec_ok internally.
*/
- rc = rtc_set_ntp_time(adjust, &target_nsec);
+ rc = rtc_set_ntp_time(adjust, attempt++, &target_nsec);
if (rc == -ENODEV)
return;
+ if (rc != -EPROTO)
+ attempt = 0;

sched_sync_hw_clock(now, target_nsec, rc);
}
@@ -564,6 +567,7 @@ int __weak update_persistent_clock64(struct timespec64 now64)

static bool sync_cmos_clock(void)
{
+ static unsigned int attempt;
static bool no_cmos;
struct timespec64 now;
struct timespec64 adjust;
@@ -585,7 +589,8 @@ static bool sync_cmos_clock(void)
* implement this legacy API.
*/
ktime_get_real_ts64(&now);
- if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) {
+ if (rtc_tv_nsec_ok(-1 * target_nsec, attempt++, &adjust, &now)) {
+ attempt = 0;
if (persistent_clock_is_local)
adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
rc = update_persistent_clock64(adjust);
--
2.26.2

2020-12-02 13:47:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Tue, Dec 01 2020 at 13:35, Jason Gunthorpe wrote:
> On Tue, Dec 01, 2020 at 06:14:20PM +0100, Miroslav Lichvar wrote:
>> I found no good explanation. It seems to depend on what system is
>> doing, if it's idle, etc. I suspect it's a property of the workqueues
>> that they cannot generally guarantee the jobs to run exactly on time.
>> I tried switching to the non-power-efficient and high priority
>> workqueues and it didn't seem to make a big difference.
>
> When I wrote it originally the workqueues got increasingly inaccurate
> the longer the duration, so it does a very short sleep to get back on
> track.
>
> If you are missing that time target it must be constantly scheduling
> new delayed work and none of it hits the target for long, long time
> periods?

delayed work is based on the timer wheel which is inaccurate by
design. Looking at the whole machinery:

sync_rtc_clock()
ktime_get_real_ts64(&now);

rtc_set_ntp_time(now, &target_nsec)

set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec);
// rtc->set_offset_nsec = NSEC_PER_SEC / 2
*target_nsec = to_set.tv_nsec;

sched_sync_hw_clock(now, target_nsec, rc) <- now is unused here

ktime_get_real_ts64(&next);
if (!fail)
next.tv_sec = 659;
else
next.tv_sec = 0;

next.tv_nsec = target_nsec - next.tv_nsec;
...
queue_delayed_work(system_power_efficient_wq, &sync_work,
timespec64_to_jiffies(&next));

Let's look at the 659 seconds case first. Depending on the HZ value the
granularity of the timer wheel bucket in which that timer ends up is:

HZ Granularity
1000 32s
250 16s
100 40s

That's been true for the old timer wheel implementation as well, just
the granularity values were slighty different.

The probability to run at the expected time is pretty low. The target
time would need to be exactly aligned with the wheel level period.

Now for the 0 second X nanoseconds case.

That's affected by the bucket granularities as well depending on the
resulting nanoseconds value:

* HZ 1000
* Level Offset Granularity Range
* 0 0 1 ms 0 ms - 63 ms
* 1 64 8 ms 64 ms - 511 ms
* 2 128 64 ms 512 ms - 4095 ms (512ms - ~4s)
*
* HZ 250
* Level Offset Granularity Range
* 0 0 4 ms 0 ms - 255 ms
* 1 64 32 ms 256 ms - 2047 ms (256ms - ~2s)
*
* HZ 100
* Level Offset Granularity Range
* 0 0 10 ms 0 ms - 630 ms
* 1 64 80 ms 640 ms - 5110 ms (640ms - ~5s)

So if this ends up in level #1 then again the chance is pretty low that
the expiry time is aligned to the the level period. If this works then
it only works by chance.

> This seems like a more fundamental problem someplace else in the
> kernel.

I don't think so. :)

Something like the completely untested below should make this reliable
and only needs to retry when the work is running late (busy machine),
but the wakeup will be on time or at max 1 jiffie off when high
resolution timers are not available or disabled.

Thanks,

tglx
---
kernel/time/ntp.c | 65 +++++++++++++++++++++++-------------------------------
1 file changed, 28 insertions(+), 37 deletions(-)

--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -495,64 +495,53 @@ int second_overflow(time64_t secs)
}

static void sync_hw_clock(struct work_struct *work);
-static DECLARE_DELAYED_WORK(sync_work, sync_hw_clock);
-
-static void sched_sync_hw_clock(struct timespec64 now,
- unsigned long target_nsec, bool fail)
+static DECLARE_WORK(sync_work, sync_hw_clock);
+static struct hrtimer sync_hrtimer;
+#define SYNC_PERIOD_NS (11UL * 60 * NSEC_PER_SEC)

+static enum hrtimer_restart sync_timer_callback(struct hrtimer *timer)
{
- struct timespec64 next;
+ queue_work(system_power_efficient_wq, &sync_work);

- ktime_get_real_ts64(&next);
- if (!fail)
- next.tv_sec = 659;
- else {
- /*
- * Try again as soon as possible. Delaying long periods
- * decreases the accuracy of the work queue timer. Due to this
- * the algorithm is very likely to require a short-sleep retry
- * after the above long sleep to synchronize ts_nsec.
- */
- next.tv_sec = 0;
- }
+ return HRTIMER_NORESTART;
+}

- /* Compute the needed delay that will get to tv_nsec == target_nsec */
- next.tv_nsec = target_nsec - next.tv_nsec;
- if (next.tv_nsec <= 0)
- next.tv_nsec += NSEC_PER_SEC;
- if (next.tv_nsec >= NSEC_PER_SEC) {
- next.tv_sec++;
- next.tv_nsec -= NSEC_PER_SEC;
- }
+static void sched_sync_hw_clock(unsigned long offset_nsec, bool retry)
+{
+ ktime_t exp = ktime_set(ktime_get_real_seconds(), 0);
+
+ if (retry)
+ exp = ktime_add_ns(exp, 2 * NSEC_PER_SEC - offset_nsec);
+ else
+ exp = ktime_add_ns(exp, SYNC_PERIOD_NS - offset_nsec);

- queue_delayed_work(system_power_efficient_wq, &sync_work,
- timespec64_to_jiffies(&next));
+ hrtimer_start(&sync_hrtimer, exp, HRTIMER_MODE_ABS);
}

static void sync_rtc_clock(void)
{
- unsigned long target_nsec;
- struct timespec64 adjust, now;
+ unsigned long offset_nsec;
+ struct timespec64 adjust;
int rc;

if (!IS_ENABLED(CONFIG_RTC_SYSTOHC))
return;

- ktime_get_real_ts64(&now);
+ ktime_get_real_ts64(&adjust);

- adjust = now;
if (persistent_clock_is_local)
adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);

/*
- * The current RTC in use will provide the target_nsec it wants to be
- * called at, and does rtc_tv_nsec_ok internally.
+ * The current RTC in use will provide the nanoseconds offset prior
+ * to a full second it wants to be called at, and invokes
+ * rtc_tv_nsec_ok() internally.
*/
- rc = rtc_set_ntp_time(adjust, &target_nsec);
+ rc = rtc_set_ntp_time(adjust, &offset_nsec);
if (rc == -ENODEV)
return;

- sched_sync_hw_clock(now, target_nsec, rc);
+ sched_sync_hw_clock(offset_nsec, rc != 0);
}

#ifdef CONFIG_GENERIC_CMOS_UPDATE
@@ -599,7 +588,7 @@ static bool sync_cmos_clock(void)
}
}

- sched_sync_hw_clock(now, target_nsec, rc);
+ sched_sync_hw_clock(target_nsec, rc != 0);
return true;
}

@@ -629,7 +618,7 @@ void ntp_notify_cmos_timer(void)

if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
IS_ENABLED(CONFIG_RTC_SYSTOHC))
- queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
+ queue_work(system_power_efficient_wq, &sync_work);
}

/*
@@ -1044,4 +1033,6 @@ static int __init ntp_tick_adj_setup(cha
void __init ntp_init(void)
{
ntp_clear();
+ hrtimer_init(&sync_hrtimer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+ sync_hrtimer.function = sync_timer_callback;
}



2020-12-02 15:11:25

by Miroslav Lichvar

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote:
> Something like the completely untested below should make this reliable
> and only needs to retry when the work is running late (busy machine),
> but the wakeup will be on time or at max 1 jiffie off when high
> resolution timers are not available or disabled.

It seems to work nicely. In my test most of the updates succeeded on
the first attempt hitting the right tick, the rest succeeding on the
second attempt. Only when the clock was set to run 10% faster, it
needed few more attempts to converge to the target time.

Thanks,

--
Miroslav Lichvar

2020-12-02 15:42:34

by Miroslav Lichvar

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Wed, Dec 02, 2020 at 04:07:28PM +0100, Miroslav Lichvar wrote:
> On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote:
> > Something like the completely untested below should make this reliable
> > and only needs to retry when the work is running late (busy machine),
> > but the wakeup will be on time or at max 1 jiffie off when high
> > resolution timers are not available or disabled.
>
> It seems to work nicely. In my test most of the updates succeeded on
> the first attempt hitting the right tick, the rest succeeding on the
> second attempt. Only when the clock was set to run 10% faster, it
> needed few more attempts to converge to the target time.

I noticed an observable change wrt adjtimex() calls though. It seems
it now reschedules the RTC update, i.e. there can be more than one
update per 11 minutes. Was this intended?

> @@ -629,7 +618,7 @@ void ntp_notify_cmos_timer(void)
>
> if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
> IS_ENABLED(CONFIG_RTC_SYSTOHC))
> - queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
> + queue_work(system_power_efficient_wq, &sync_work);
> }

--
Miroslav Lichvar

2020-12-02 16:32:24

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote:

> So if this ends up in level #1 then again the chance is pretty low that
> the expiry time is aligned to the the level period. If this works then
> it only works by chance.

Yes, I always thought it was timer wheel related, but at least in my
testing long ago it seemed reliable at the short timeout. Maybe I had
a lucky hz value.

> > This seems like a more fundamental problem someplace else in the
> > kernel.
>
> I don't think so. :)
>
> Something like the completely untested below should make this reliable
> and only needs to retry when the work is running late (busy machine),
> but the wakeup will be on time or at max 1 jiffie off when high
> resolution timers are not available or disabled.

This seems like the right approach

> @@ -629,7 +618,7 @@ void ntp_notify_cmos_timer(void)
>
> if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
> IS_ENABLED(CONFIG_RTC_SYSTOHC))
> - queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
> + queue_work(system_power_efficient_wq, &sync_work);

As Miroslav noted, probably the right thing to do here is to reset the
hrtimer and remove the sync_work? I think this code was to expedite an
RTC sync when NTP fixes the clock on boot.

IIRC this was made somewhat messy due to the dual path with rtclib and
old cmos.

It would be very nice to kill the cmos path, things are better
now.. But PPC still has a long way to go:

arch/powerpc/platforms/52xx/efika.c: .set_rtc_time = rtas_set_rtc_time,
arch/powerpc/platforms/8xx/mpc86xads_setup.c: .set_rtc_time = mpc8xx_set_rtc_time,
arch/powerpc/platforms/8xx/tqm8xx_setup.c: .set_rtc_time = mpc8xx_set_rtc_time,
arch/powerpc/platforms/cell/setup.c: .set_rtc_time = rtas_set_rtc_time,
arch/powerpc/platforms/chrp/setup.c: ppc_md.set_rtc_time = rtas_set_rtc_time;
arch/powerpc/platforms/chrp/setup.c: .set_rtc_time = chrp_set_rtc_time,
arch/powerpc/platforms/maple/setup.c: .set_rtc_time = maple_set_rtc_time,
arch/powerpc/platforms/powermac/setup.c: .set_rtc_time = pmac_set_rtc_time,
arch/powerpc/platforms/pseries/setup.c: .set_rtc_time = rtas_set_rtc_time,

Also x86 needs a touch, it already has RTC lib, no idea why it also
provides this old path too

I wonder if the cmos path could be killed off under the dead HW
principle?

Jason

2020-12-02 18:41:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Wed, Dec 02 2020 at 16:36, Miroslav Lichvar wrote:
> On Wed, Dec 02, 2020 at 04:07:28PM +0100, Miroslav Lichvar wrote:
>> On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote:
>> > Something like the completely untested below should make this reliable
>> > and only needs to retry when the work is running late (busy machine),
>> > but the wakeup will be on time or at max 1 jiffie off when high
>> > resolution timers are not available or disabled.
>>
>> It seems to work nicely. In my test most of the updates succeeded on
>> the first attempt hitting the right tick, the rest succeeding on the
>> second attempt. Only when the clock was set to run 10% faster, it
>> needed few more attempts to converge to the target time.
>
> I noticed an observable change wrt adjtimex() calls though. It seems
> it now reschedules the RTC update, i.e. there can be more than one
> update per 11 minutes. Was this intended?

No. Let me fix that.

Thanks,

tglx

2020-12-02 19:25:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Wed, Dec 02 2020 at 12:27, Jason Gunthorpe wrote:
> On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote:
>> if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
>> IS_ENABLED(CONFIG_RTC_SYSTOHC))
>> - queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
>> + queue_work(system_power_efficient_wq, &sync_work);
>
> As Miroslav noted, probably the right thing to do here is to reset the
> hrtimer and remove the sync_work? I think this code was to expedite an
> RTC sync when NTP fixes the clock on boot.

This has two purposes:

1) Initiating the update on boot once ntp is synced.

2) Reinitiating the sync after ntp lost sync and the work did not
reschedule itself because it observed !ntp_synced().

In both cases it's highly unlikely that the write actually happens when
the work is queued because do_adjtimex() would have to be exactly around
the valid update window. So it will not write immediately. It will run
through at least one retry.

I don't think the timer should be canceled if the ntp_synced() state did
not change. Otherwise every do_adtimex() call will cancel/restart
it, which does not make sense. Lemme stare at it some more.

> IIRC this was made somewhat messy due to the dual path with rtclib and
> old cmos.

:)

> It would be very nice to kill the cmos path, things are better
> now.. But PPC still has a long way to go:
>
> arch/powerpc/platforms/52xx/efika.c: .set_rtc_time = rtas_set_rtc_time,
> arch/powerpc/platforms/8xx/mpc86xads_setup.c: .set_rtc_time = mpc8xx_set_rtc_time,
> arch/powerpc/platforms/8xx/tqm8xx_setup.c: .set_rtc_time = mpc8xx_set_rtc_time,
> arch/powerpc/platforms/cell/setup.c: .set_rtc_time = rtas_set_rtc_time,
> arch/powerpc/platforms/chrp/setup.c: ppc_md.set_rtc_time = rtas_set_rtc_time;
> arch/powerpc/platforms/chrp/setup.c: .set_rtc_time = chrp_set_rtc_time,
> arch/powerpc/platforms/maple/setup.c: .set_rtc_time = maple_set_rtc_time,
> arch/powerpc/platforms/powermac/setup.c: .set_rtc_time = pmac_set_rtc_time,
> arch/powerpc/platforms/pseries/setup.c: .set_rtc_time = rtas_set_rtc_time,
>
> Also x86 needs a touch, it already has RTC lib, no idea why it also
> provides this old path too

Because nobody had the stomach and/or cycles to touch it :)

> I wonder if the cmos path could be killed off under the dead HW
> principle?

Unfortunately that code path is not that dead on x86. You need to fix
all the (ab)users first. :)

Thanks,

tglx

2020-12-02 20:59:48

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Wed, Dec 02, 2020 at 08:21:00PM +0100, Thomas Gleixner wrote:
> On Wed, Dec 02 2020 at 12:27, Jason Gunthorpe wrote:
> > On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote:
> >> if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
> >> IS_ENABLED(CONFIG_RTC_SYSTOHC))
> >> - queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
> >> + queue_work(system_power_efficient_wq, &sync_work);
> >
> > As Miroslav noted, probably the right thing to do here is to reset the
> > hrtimer and remove the sync_work? I think this code was to expedite an
> > RTC sync when NTP fixes the clock on boot.
>
> This has two purposes:
>
> 1) Initiating the update on boot once ntp is synced.
>
> 2) Reinitiating the sync after ntp lost sync and the work did not
> reschedule itself because it observed !ntp_synced().
>
> In both cases it's highly unlikely that the write actually happens when
> the work is queued because do_adjtimex() would have to be exactly around
> the valid update window.

Yes

> So it will not write immediately. It will run through at least one
> retry.

Right, bascially this is scheduling a WQ to do sched_sync_hw_clock()
which will only call hrtimer_start() - seems like jsut calling
hrtimer_start instead of queue_work above would be equivilant

> I don't think the timer should be canceled if the ntp_synced() state did
> not change. Otherwise every do_adtimex() call will cancel/restart
> it, which does not make sense. Lemme stare at it some more.

That makes sense, being conditional on the STA_UNSYNC prior to doing
any hrtimer_start seems OK?

> > Also x86 needs a touch, it already has RTC lib, no idea why it also
> > provides this old path too
>
> Because nobody had the stomach and/or cycles to touch it :)

Hahaha yes.. I vaugely remember looking at this once..

Lets see:

arch/x86/kernel/kvmclock.c: x86_platform.set_wallclock = kvm_set_wallclock;
arch/x86/kernel/x86_init.c: x86_platform.set_wallclock = set_rtc_noop;
arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock;
arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock;
All returns -ENODEV/EINVAL

arch/x86/kernel/x86_init.c: .set_wallclock = mach_set_rtc_mmss,
This is already rtclib under drivers/rtc/rtc-mc146818-lib.c

I suppose the issue here is the rtclib driver only binds via PNP and
very old x86 systems won't have the PNP tables? It seems doable to
check for a PNP device after late init and manually create a
platform_device for the RTC

arch/x86/platform/intel-mid/intel_mid_vrtc.c: x86_platform.set_wallclock = vrtc_set_mmss;
This is also already in rtclib under rtc-mrst.c, and this is already
wired to create the rtc platform device during init

So it is very close now to be able to delete all this for x86. Do you
know of something I've missed?

> > I wonder if the cmos path could be killed off under the dead HW
> > principle?
>
> Unfortunately that code path is not that dead on x86. You need to fix
> all the (ab)users first. :)

Assuming x86 can be resolved as above, that leaves two 20 year old MIPS
platforms and the PPC list from before. ARM is gone compared to last
time I looked! Progress :)

Thanks,
Jason

2020-12-02 22:11:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Wed, Dec 02 2020 at 16:54, Jason Gunthorpe wrote:
> On Wed, Dec 02, 2020 at 08:21:00PM +0100, Thomas Gleixner wrote:
>> So it will not write immediately. It will run through at least one
>> retry.
>
> Right, bascially this is scheduling a WQ to do sched_sync_hw_clock()
> which will only call hrtimer_start() - seems like jsut calling
> hrtimer_start instead of queue_work above would be equivilant

Something like that.

>> I don't think the timer should be canceled if the ntp_synced() state did
>> not change. Otherwise every do_adtimex() call will cancel/restart
>> it, which does not make sense. Lemme stare at it some more.
>
> That makes sense, being conditional on the STA_UNSYNC prior to doing
> any hrtimer_start seems OK?

Yeah.

>> > Also x86 needs a touch, it already has RTC lib, no idea why it also
>> > provides this old path too
>>
>> Because nobody had the stomach and/or cycles to touch it :)
>
> Hahaha yes.. I vaugely remember looking at this once..
>
> Lets see:
>
> arch/x86/kernel/kvmclock.c: x86_platform.set_wallclock = kvm_set_wallclock;
> arch/x86/kernel/x86_init.c: x86_platform.set_wallclock = set_rtc_noop;
> arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock;
> arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock;
> All returns -ENODEV/EINVAL

You forgot to stare at the .get_wallclock() functions. That's the more
interesting part, i.e. what's behind read_persistent_clock64().

> arch/x86/kernel/x86_init.c: .set_wallclock = mach_set_rtc_mmss,
> This is already rtclib under drivers/rtc/rtc-mc146818-lib.c

That's the shared library function for setting the darn thing.

> I suppose the issue here is the rtclib driver only binds via PNP and
> very old x86 systems won't have the PNP tables? It seems doable to
> check for a PNP device after late init and manually create a
> platform_device for the RTC

old crap, broken BIOSes and virt. Welcome to my wonderful world :)

> arch/x86/platform/intel-mid/intel_mid_vrtc.c: x86_platform.set_wallclock = vrtc_set_mmss;
> This is also already in rtclib under rtc-mrst.c, and this is already
> wired to create the rtc platform device during init
>
> So it is very close now to be able to delete all this for x86. Do you
> know of something I've missed?

Just the above :)

>> > I wonder if the cmos path could be killed off under the dead HW
>> > principle?
>>
>> Unfortunately that code path is not that dead on x86. You need to fix
>> all the (ab)users first. :)
>
> Assuming x86 can be resolved as above, that leaves two 20 year old MIPS
> platforms and the PPC list from before. ARM is gone compared to last
> time I looked! Progress :)

Yeah. We're zooming in ....

Thanks,

tglx

2020-12-02 23:05:29

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Wed, Dec 02, 2020 at 11:08:38PM +0100, Thomas Gleixner wrote:
> >
> > arch/x86/kernel/kvmclock.c: x86_platform.set_wallclock = kvm_set_wallclock;
> > arch/x86/kernel/x86_init.c: x86_platform.set_wallclock = set_rtc_noop;
> > arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock;
> > arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock;
> > All returns -ENODEV/EINVAL
>
> You forgot to stare at the .get_wallclock() functions. That's the more
> interesting part, i.e. what's behind read_persistent_clock64().

Small steps! I was only looking at deleting the legacy
CONFIG_GENERIC_CMOS_UPDATE from x86 which only controls the
update_persistent_clock64()

Yes, there is a similar redundancy with rtclib on the
read_persistant_clock side, and that does looks much further..

> > arch/x86/kernel/x86_init.c: .set_wallclock = mach_set_rtc_mmss,
> > This is already rtclib under drivers/rtc/rtc-mc146818-lib.c
>
> That's the shared library function for setting the darn thing.

Yes, but if a PNP entry is present then rtc-cmos will load and call
that function through the rtclib path instead of the
update_persistent_clock64() path

Jason

2020-12-03 01:19:01

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error


Cc+ RTC folks.

On Wed, Dec 02 2020 at 23:08, Thomas Gleixner wrote:
> On Wed, Dec 02 2020 at 16:54, Jason Gunthorpe wrote:
>>> I don't think the timer should be canceled if the ntp_synced() state did
>>> not change. Otherwise every do_adtimex() call will cancel/restart
>>> it, which does not make sense. Lemme stare at it some more.
>>
>> That makes sense, being conditional on the STA_UNSYNC prior to doing
>> any hrtimer_start seems OK?
>
> Yeah.

And I was staring at it some more. TBH, rtc_tv_nsec_ok() is pretty much
voodoo and wishful thinking.

The point is that the original MC146818 has an update cycle once per
second. That's what mc146818_is_updating() is checking.

Lets start right here. mc146818_get_time() is bonkers to begin with. It
does:

if (mc146818_is_updating())
mdelay(20);

lock(&rtc_lock);
read_stuff();
unlock(&rtc_lock);

That's crap, really.

The MC146818 does a periodic update of the time data once per second and
it advertises it via the UIP bit in Register A. The UIP bit is set 244us
_before_ the update starts and depending on the time base is takes
either 248us or 1984us (32kHz) until the bit goes low again.

So let's look at the time line assuming a 32kHz time base:

Time 0.000s ~0.998s 1.0s
| |
_____
UIP ____________________| |_______

Let's look at the code again and annotate it with time stamps:

0.997 read()
if (mc146818_is_updating())
lock(rtc);
read(regA);
unlock(rtc);
return regA & UIP; <- Lets assume FALSE

0.998 -> whatever happens (SMI, NMI, interrupt, preemption ...)

0.999 lock(rtc)
read_data() <- Result is undefined
because RTC is in the
middle of the update
unlock(rtc);

Seriously...

The comment above the if(...updating()) is full of wishful thinking:

/*
* read RTC once any update in progress is done. The update
* can take just over 2ms. We wait 20ms. There is no need to
* to poll-wait (up to 1s - eeccch) for the falling edge of RTC_UIP.
* If you need to know *exactly* when a second has started, enable
* periodic update complete interrupts, (via ioctl) and then
* immediately read /dev/rtc which will block until you get the IRQ.
* Once the read clears, read the RTC time (again via ioctl). Easy.
*/

- The update can take exactly up to 1984us, which is not just over 2ms.
Also the update starts 248us _after_ UIP goes high.

- Wait 20ms? What for? To make sure that the RTC has advanced by 2ms?

- Poll wait for the falling edge of UIP takes up to 1s?

I must have missed something in basic math. If UIP is high then
waiting for the falling edge takes at max 2ms.

I know what the comment tries to say, but on the first read I had one
of those forbidden-by-CoC moments.

- The need to know *exactly* part of the comment is even more amazing.

Q: Which system guarantees that the interrupt:

- will not happen _before_ read(/dev/rtc) after the ioctl() enabled
it?

- will be delivered without delay ?

- will make sure that the task in the blocking read will be
scheduled immediately and then do the ioctl based readout ?

A: None

Q: What is *exact* about this?

A: Nothing at all.

So the right thing to do here is:

read()
do {
lock(rtc)
start = ktime_get();
if (read(regA) & UIP) {
unlock(rtc);
wait(2ms);
continue;
}
read_data();
end = ktime_get();
unlock(rtc);
while (end - start > 2ms);

and return _all_ three values (start, end, rtcdata) so clueful userspace
can make sense of it. Returning nonsense as pointed off above is a non
option.

Hmm?

Now to the write side. That's really wishful thinking. Let's look at the
code:

write()

lock(rtc);
write(regB, read(reagB) | SET);
write_data();
write(regB, read(reagB) & ~SET);
unlock(rtc);

lock/unlock are irrelevant as they just serialize concurrent access to
the RTC.

The magic comment in ntp.c says that RTC will update the written value
0.5 seconds after writing it. That's pure fiction...

I have no idea where this comes from, but any spec out there says about
this:

SET - When the SET bit is a "0", the update cycle functions normally
by advancing the counts once-per-second. When the SET bit is written
to a "1", any update cycle in progress is aborted and the program may
initialize the time and calendar bytes without an update occuring in
the midst of initializing. .....

So yes, the comment is partially correct _if_ and only _if_ the time
base which schedules the update is exactly the same time base as the RTC
time base and the time on which the update is scheduled is perfectly in
sync with the RTC time base.

Plus the update must be completed _before_ then next update cycle of the
RTC starts which is what the 0.5 sec offset is trying to solve. Emphasis
on _trying_.

But with NTP that's not the case at all. The clocksource which is
adjusted by NTP (usually TSC on x86, but that does not matter) is not at
all guaranteed to run from the same crystal as the RTC. On most systems
the RTC has a seperate crystal which feeds it across poweroff.

Even if it _is_ using the same crystal, then the NTP adjustments are
going to skew the clocksource frequency against the underlying crystal
which feeds the RTC and the clocksource.

Q: So how can any assumption about update cycle correlatation between NTP
synced CLOCK_REALTIME and the RTC notion of clock realtime be correct?

A: Not at all.

Aside of that the magic correction of the time which is written to the
RTC is completely bogus. Lets start with the interface and the two
callers of it:

static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
struct timespec64 *to_set,
const struct timespec64 *now)

The callers are:

sync_cmos_clock() /* The legacy RTC cruft */
struct timespec64 now;
struct timespec64 adjust;
long target_nsec = NSEC_PER_SEC / 2;

ktime_get_real_ts64(&now);
if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) {
if (persistent_clock_is_local)
adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
rc = update_persistent_clock64(adjust);
}

sync_rtc_clock()
unsigned long target_nsec; <- Signed unsigned ....
struct timespec64 adjust, now;

ktime_get_real_ts64(&now);

adjust = now; <- Why the difference to the above?

if (persistent_clock_is_local) <- Again, why is the ordering different?
adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);

rc = rtc_set_ntp_time(adjust, &target_nsec)
// int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec)

struct timespec64 to_set;

set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec);
*target_nsec = to_set.tv_nsec; <- target_nsec = rtc->set_offset_nsec
because the timespec is normalized
ergo == rtc->set_offset_nsec
unless the set_offset_nsec would
be negative which makes at all.

if (rtc_tv_nsec_ok(rtc->set_offset_nsec, &to_set, &now))
update_rtc(...);

So sync_cmos_clock hands in -(NSEC_PER_SEC/2) and the rtc cruft hands in
NSEC_PER_SEC/2 by default. The comment in drivers/rtc/class.c says:

drivers/rtc/class.c- /* Drivers can revise this default after allocating the device. */
drivers/rtc/class.c: rtc->set_offset_nsec = NSEC_PER_SEC / 2;

but no driver ever bothered to change that value. Also the idea of
having this offset as type s64 is beyond my understanding. Why the heck
would any RTC require to set an offset which is _after_ the second
transition.

That aside. Looking at the above two variants let's go into
rtc_tv_nsec_ok()

static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
struct timespec64 *to_set,
const struct timespec64 *now)

/* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */
const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5;
struct timespec64 delay = {.tv_sec = 0,
.tv_nsec = set_offset_nsec};

*to_set = timespec64_add(*now, delay);

The scheduled point for both legacy CMOS and RTC modern is at the point
of the second minus 0.5 seconds (lets ignore that set_offset_nsec might
be different for this).

So let's assume the update was scheduled at 659s 500ms independent of
legacy or modern.

Now legacy does the following:

struct timespec64 delay = { .tv_sec = 0, tv_nsec = -5e8 }

which is an not normalized timespec to begin with but

*to_set = timespec64_add(*now , delay);

can deal with that. So the result of this computation is:

now - delay

IOW, 0.5 seconds before now: 659s 0ms

Now the same magic for the 'modern' RTC will do:

struct timespec64 delay = { .tv_sec = 0, tv_nsec = 5e8 }

so the result of the add is:

now + delay

IOW, 0.5 seconds _after_ now: 700s 0ms

Can you spot the subtle difference?

That said, can somebody answer the one million dollar question which
problem is solved by all of this magic nonsense?

If anyone involved seriously believes that any of this solves a real
world problem, then please come forth an make your case.

If not, all of this illusionary attempts to be "correct" can be removed
for good and the whole thing reduced to a

update_rtc_plus_minus_a_second()

mechanism, which is exactly what we have today just without the code
which pretends to be *exact* or whatever.

Thanks,

tglx

2020-12-03 02:09:07

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Thu, Dec 03, 2020 at 02:14:12AM +0100, Thomas Gleixner wrote:

> If anyone involved seriously believes that any of this solves a real
> world problem, then please come forth an make your case.

The original commit 0f295b0650c9 ("rtc: Allow rtc drivers to specify
the tv_nsec value for ntp") was tested by myself and RMK on various
ARM systems and did work as advertised. Here is the giant thread,
RMK's post explains the problem and gives his measurements of several
different RTCs:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

And the patch that resulted:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

There is a lot of detail in there.. Keep in mind none of this was for
the mc146818 style RTCs.

I can't recall any more why no drivers use the set_offset_nsec. I'm
surprised, maybe I forgot to send the patch for the RTCs I tested or
maybe it got dropped someplace.. It certainly was needed for some
maxim I2C chips.

The thread shows rmk had even written a hrtimer patch to go with this,
but it also got lost for some reason. Maybe all the arguing killed
further effort?

Jason

2020-12-03 02:13:58

by Alexandre Belloni

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

Hello Thomas,

I'll take more time to reply more in depth to the whole email but...

On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote:
> Aside of that the magic correction of the time which is written to the
> RTC is completely bogus. Lets start with the interface and the two
> callers of it:
>
> static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
> struct timespec64 *to_set,
> const struct timespec64 *now)
>
> The callers are:
>
> sync_cmos_clock() /* The legacy RTC cruft */
> struct timespec64 now;
> struct timespec64 adjust;
> long target_nsec = NSEC_PER_SEC / 2;
>
> ktime_get_real_ts64(&now);
> if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) {
> if (persistent_clock_is_local)
> adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
> rc = update_persistent_clock64(adjust);
> }
>
> sync_rtc_clock()
> unsigned long target_nsec; <- Signed unsigned ....
> struct timespec64 adjust, now;
>
> ktime_get_real_ts64(&now);
>
> adjust = now; <- Why the difference to the above?
>
> if (persistent_clock_is_local) <- Again, why is the ordering different?
> adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
>
> rc = rtc_set_ntp_time(adjust, &target_nsec)
> // int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec)
>
> struct timespec64 to_set;
>
> set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec);
> *target_nsec = to_set.tv_nsec; <- target_nsec = rtc->set_offset_nsec
> because the timespec is normalized
> ergo == rtc->set_offset_nsec
> unless the set_offset_nsec would
> be negative which makes at all.
>
> if (rtc_tv_nsec_ok(rtc->set_offset_nsec, &to_set, &now))
> update_rtc(...);
>
> So sync_cmos_clock hands in -(NSEC_PER_SEC/2) and the rtc cruft hands in
> NSEC_PER_SEC/2 by default. The comment in drivers/rtc/class.c says:
>
> drivers/rtc/class.c- /* Drivers can revise this default after allocating the device. */
> drivers/rtc/class.c: rtc->set_offset_nsec = NSEC_PER_SEC / 2;
>
> but no driver ever bothered to change that value. Also the idea of
> having this offset as type s64 is beyond my understanding. Why the heck
> would any RTC require to set an offset which is _after_ the second
> transition.
>

This (no driver making use of set_offset_nsec) happened because it got
applied without me agreeing to the change. I did complain at the time
and RMK walked away.

[...]

> That said, can somebody answer the one million dollar question which
> problem is solved by all of this magic nonsense?
>

The goal was to remove the 500ms offset for all the RTCs but the
MC146818 because there are RTC that will reset properly their counter
when setting the time, meaning they can be set very precisely.

IIRC, used in conjunction with rtc_hctosys which also adds
inconditionnaly 500ms this can ends up with the system time
being one second away from the wall clock time which NTP will take quite
some time to remove.

> If anyone involved seriously believes that any of this solves a real
> world problem, then please come forth an make your case.
>
> If not, all of this illusionary attempts to be "correct" can be removed
> for good and the whole thing reduced to a
>
> update_rtc_plus_minus_a_second()
>
> mechanism, which is exactly what we have today just without the code
> which pretends to be *exact* or whatever.
>

Coincidentally, I was going to revert those patches for v5.11. Also,
honestly, I still don't understand why the kernel needs to set the RTC
while userspace is very well equipped to do that. chrony is able to set
the RTC time and it can do so precisely. It can even compute how that RTC is
time drifting and that value can already be used to adjust the RTC
crystal.

From my tests, with some efforts, userspace can set the RTC time with a
20?s precision, even on low end systems. To do so, it doesn't need
set_offset_nsec.

I also don't like hctosys, it is currently wrong but I can see use cases
and now systemd relies on its presence so my plan is to fix it.


--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2020-12-03 15:43:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

Alexandre,

On Thu, Dec 03 2020 at 03:10, Alexandre Belloni wrote:
> On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote:
>> That said, can somebody answer the one million dollar question which
>> problem is solved by all of this magic nonsense?
>>
> The goal was to remove the 500ms offset for all the RTCs but the
> MC146818 because there are RTC that will reset properly their counter
> when setting the time, meaning they can be set very precisely.

The MC setting is halfways precise. The write resets the divider chain
and when the reset is removed then the next UIP will happen after the
magic 0.5 seconds. So yes, writing it 500ms _before_ the next second is
halfways correct assumed that there is no interference between
ktime_get_real() and the actual write which is a silly assumption as the
code is fully preemptible.

> IIRC, used in conjunction with rtc_hctosys which also adds
> inconditionnaly 500ms this can ends up with the system time
> being one second away from the wall clock time which NTP will take quite
> some time to remove.

The rtc_cmos() driver has a fun comment in cmos_set_time()....

The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I
pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and
rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in
behaviour.

> Coincidentally, I was going to revert those patches for v5.11.

Which will break the rtc_cmos() driver in a different way. We should
really fix that proper and just have the 500ms offset for rtc_cmos,
aka. MC146818. When other drivers want a different offset, then they
still can do so.

The direct /dev/rtc settime ioctl is not using that logic anyway. Thats
the business of the user space application to get that straight which is
scheduling lottery as well.

Thanks,

tglx

2020-12-03 15:55:28

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Thu, Dec 03, 2020 at 03:10:47AM +0100, Alexandre Belloni wrote:

> IIRC, used in conjunction with rtc_hctosys which also adds
> inconditionnaly 500ms this can ends up with the system time
> being one second away from the wall clock time which NTP will take quite
> some time to remove.

I can't remember the details, but this was not the intention.

As long as systohc and hctosys exist then the design goal of rtclib
should be to provide sub-second accuracy on the round trip of time
through the RTC.

Otherwise what is the point?

Jason

2020-12-03 16:11:35

by Alexandre Belloni

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On 03/12/2020 11:52:49-0400, Jason Gunthorpe wrote:
> On Thu, Dec 03, 2020 at 03:10:47AM +0100, Alexandre Belloni wrote:
>
> > IIRC, used in conjunction with rtc_hctosys which also adds
> > inconditionnaly 500ms this can ends up with the system time
> > being one second away from the wall clock time which NTP will take quite
> > some time to remove.
>
> I can't remember the details, but this was not the intention.
>
> As long as systohc and hctosys exist then the design goal of rtclib
> should be to provide sub-second accuracy on the round trip of time
> through the RTC.
>
> Otherwise what is the point?
>

hctosys never had a sub second accuracy because there is no way to
accurately read the rtc time without being ready to wait up to a second.
I have patches doing exactly that but I'm pretty sure nobody wants to
pay the price and have a kernel that boots significantly slower.
Especially since that would basically affect everyone since systemd
requires that hctosys is enabled.


--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2020-12-03 16:20:40

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Thu, Dec 03, 2020 at 04:39:21PM +0100, Thomas Gleixner wrote:

> The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I
> pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and
> rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in
> behaviour.

I understood this is because the two APIs work differently, rmk
explained this as:

> 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the
> time at around 500ms into the second.
>
> 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms,
> then we want to set the _next_ second.

ie one path is supposed to round down and one path is supposed to
round up, so you get to that 1s difference..

IIRC this is also connected to why the offset is signed..

Jason

2020-12-03 17:32:40

by Alexandre Belloni

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On 03/12/2020 16:39:21+0100, Thomas Gleixner wrote:
> Alexandre,
>
> On Thu, Dec 03 2020 at 03:10, Alexandre Belloni wrote:
> > On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote:
> >> That said, can somebody answer the one million dollar question which
> >> problem is solved by all of this magic nonsense?
> >>
> > The goal was to remove the 500ms offset for all the RTCs but the
> > MC146818 because there are RTC that will reset properly their counter
> > when setting the time, meaning they can be set very precisely.
>
> The MC setting is halfways precise. The write resets the divider chain
> and when the reset is removed then the next UIP will happen after the
> magic 0.5 seconds. So yes, writing it 500ms _before_ the next second is
> halfways correct assumed that there is no interference between
> ktime_get_real() and the actual write which is a silly assumption as the
> code is fully preemptible.
>
> > IIRC, used in conjunction with rtc_hctosys which also adds
> > inconditionnaly 500ms this can ends up with the system time
> > being one second away from the wall clock time which NTP will take quite
> > some time to remove.
>
> The rtc_cmos() driver has a fun comment in cmos_set_time()....
>
> The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I
> pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and
> rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in
> behaviour.
>
> > Coincidentally, I was going to revert those patches for v5.11.
>
> Which will break the rtc_cmos() driver in a different way. We should
> really fix that proper and just have the 500ms offset for rtc_cmos,
> aka. MC146818. When other drivers want a different offset, then they
> still can do so.
>

My point was to get back to the previous situation where only
rtc_cmos was supposed to work properly by removing rtc_tv_nsec_ok and
set_offset_nsec.

> The direct /dev/rtc settime ioctl is not using that logic anyway. Thats
> the business of the user space application to get that straight which is
> scheduling lottery as well.

I still don't see how userspace is worse than systohc in that regard and
why we need to do that in the kernel. Especially since hctosys is doing
a very bad job trying to read the time from the RTC. You may as well not
bother with the 500ms and just set the time to the current or next
second.

And what about the non configurable 659 period, isn't that policy that
should be left to userspace configuration?

I'm still convinced that set_offset_nsec is not needed to set the time
accurately and I still want to remove it. Also, this may be a good time
to move systohc.c to kernel/time/ntp.c as this is definitively some NTP
specific code being an RTC consumer, very much like alarmtimer.c

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2020-12-03 19:57:37

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

Alexandre,

On Thu, Dec 03 2020 at 18:29, Alexandre Belloni wrote:
> On 03/12/2020 16:39:21+0100, Thomas Gleixner wrote:
>> On Thu, Dec 03 2020 at 03:10, Alexandre Belloni wrote:
>> > On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote:
>> > Coincidentally, I was going to revert those patches for v5.11.
>>
>> Which will break the rtc_cmos() driver in a different way. We should
>> really fix that proper and just have the 500ms offset for rtc_cmos,
>> aka. MC146818. When other drivers want a different offset, then they
>> still can do so.
>>
>
> My point was to get back to the previous situation where only
> rtc_cmos was supposed to work properly by removing rtc_tv_nsec_ok and
> set_offset_nsec.

If you remove the offset, then rtc_cmos() is off by ~500ms.

> I'm still convinced that set_offset_nsec is not needed to set the time
> accurately and I still want to remove it. Also, this may be a good time
> to move systohc.c to kernel/time/ntp.c as this is definitively some NTP
> specific code being an RTC consumer, very much like alarmtimer.c

No objections from my side.

The main pain point is that the periodic update is seen as ABI by some
folks. The fact that you can disable it in Kconfig does not matter. You
can disable other stuff like posix timers which is ABI as well.

So removing it is not really an option. Keeping it broken or break it
differently is neither...

Thanks,

tglx


2020-12-03 20:14:06

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Thu, Dec 03, 2020 at 05:07:53PM +0100, Alexandre Belloni wrote:
> On 03/12/2020 11:52:49-0400, Jason Gunthorpe wrote:
> > On Thu, Dec 03, 2020 at 03:10:47AM +0100, Alexandre Belloni wrote:
> >
> > > IIRC, used in conjunction with rtc_hctosys which also adds
> > > inconditionnaly 500ms this can ends up with the system time
> > > being one second away from the wall clock time which NTP will take quite
> > > some time to remove.
> >
> > I can't remember the details, but this was not the intention.
> >
> > As long as systohc and hctosys exist then the design goal of rtclib
> > should be to provide sub-second accuracy on the round trip of time
> > through the RTC.
> >
> > Otherwise what is the point?
>
> hctosys never had a sub second accuracy because there is no way to
> accurately read the rtc time without being ready to wait up to a
> second.

Yes, I know. I was talking about a goal..

If that is not the goal then just delete it all and update the docs
that userspace needs to deal with all of this and the kernel stuff
isn't supposed to be useful.

Jason

2020-12-03 21:10:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote:
> On Thu, Dec 03, 2020 at 04:39:21PM +0100, Thomas Gleixner wrote:
>
>> The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I
>> pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and
>> rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in
>> behaviour.
>
> I understood this is because the two APIs work differently, rmk
> explained this as:
>
>> 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the
>> time at around 500ms into the second.
>>
>> 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms,
>> then we want to set the _next_ second.
>
> ie one path is supposed to round down and one path is supposed to
> round up, so you get to that 1s difference..
>
> IIRC this is also connected to why the offset is signed..

The problem is that it is device specific and therefore having the
offset parameter is a good starting point.

Lets look at the two scenarios:

1) Direct accessible RTC:

tsched t1 t2
write(newsec) RTC increments seconds

For rtc_cmos/MC1... tinc = t2 - t1 = 500ms

There are RTCs which reset the thing on write so tinc = t2 - t1 = 1000ms

No idea what other variants are out there, but the principle is the
same for all of them.

Lets assume that the event is accurate for now and ignore the fuzz
logic, i.e. tsched == t1

tsched must be scheduled to happen tinc before wallclock increments
seconds so that the RTC increments seconds at the same time.

That means newsec = t1.tv_sec.

So now the fuzz logic for the legacy cmos path does:

newtime = t1 - tinc;

if (newtime.tv_nsec < FUZZ)
newsec = newtime.tv_sec;
else if (newtime.tv_nsec > NSEC_PER_SEC - FUZZ)
newsec = newtime.tv_sec + 1;
else
goto fail;

The first condition handles the case where t1 >= tsched and the second
one where t1 < tsched.

We need the same logic for rtc_cmos() when the update goes through
the RTC path, which is broken today. See below.

2) I2C/SPI ...

tsched t0 t1 t2
transfer(newsec) RTC update (newsec) RTC increments seconds

Lets assume that ttransfer = t1 - t0 is known.

tinc is the same as above = t2 - t1

Again, lets assume that the event is accurate for now and ignore the fuzz
logic, i.e. tsched == t0

So tsched has to be ttot = t2 - t0 _before_ wallclock reaches t2 and
increments seconds.

In this case newsec = t1.tv_sec = (t0 + ttransfer).tv_sec

So now the fuzz logic for this is:

newtime = t0 + ttransfer;

if (newtime.tv_nsec < FUZZ)
newsec = newtime.tv_sec;
else if (newtime.tv_nsec > NSEC_PER_SEC - FUZZ)
newsec = newtime.tv_sec + 1;
else
goto fail;

Again the first condition handles the case where t1 >= tsched and the
second one where t1 < tsched.

So now we have two options to fix this:

1) Use a negative sync_offset for devices which need #1 above
(rtc_cmos & similar)

That requires setting tsched to t2 - abs(sync_offset)

2) Use always a positive sync_offset and a flag which tells
rtc_tv_nsec_ok() whether it needs to add or subtract.

#1 is good enough. All it takes is a comment at the timer start code why
abs() is required.

Let me hack that up along with the hrtimer muck.

Thanks,

tglx

2020-12-03 21:33:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Thu, Dec 03 2020 at 22:05, Thomas Gleixner wrote:
> On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote:
> So now we have two options to fix this:
>
> 1) Use a negative sync_offset for devices which need #1 above
> (rtc_cmos & similar)
>
> That requires setting tsched to t2 - abs(sync_offset)
>
> 2) Use always a positive sync_offset and a flag which tells
> rtc_tv_nsec_ok() whether it needs to add or subtract.
>
> #1 is good enough. All it takes is a comment at the timer start code why
> abs() is required.
>
> Let me hack that up along with the hrtimer muck.

That comment in rtc.h makes me cry:

/* Number of nsec it takes to set the RTC clock. This influences when
* the set ops are called. An offset:
* - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s
* - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s
* - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s
*/

Setting the wall clock time 10.0 at 10.5 is only possible for time
traveling RTCs. It magically works, but come on ...

Thanks,

tglx

2020-12-03 22:03:09

by Alexandre Belloni

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On 03/12/2020 22:05:09+0100, Thomas Gleixner wrote:
> 2) I2C/SPI ...
>
> tsched t0 t1 t2
> transfer(newsec) RTC update (newsec) RTC increments seconds
>
> Lets assume that ttransfer = t1 - t0 is known.

Note that ttransfer is one of the reason why setting set_offset_nsec
from the RTC driver is not a good idea. The same RTC may be on busses
with different rates and there is no way to know that. I think that was
one of my objections at the time.

ttransfer is not a function of the RTC model but rather of how it is
integrated in the system.

>
> tinc is the same as above = t2 - t1
>
> Again, lets assume that the event is accurate for now and ignore the fuzz
> logic, i.e. tsched == t0
>
> So tsched has to be ttot = t2 - t0 _before_ wallclock reaches t2 and
> increments seconds.
>

I had a rough week and I'm probably not awake enough to follow
completely but is that thinking correct?

For the mc146818, you have t1 - t0 which is probably negligible and t2 -
T& == 500 ms

For most of the other RTCs, you have t1 - t0 is somewhat important,
probably around 100 to 150?s and t2 - t1 is 1s. I would think that what
is needed is tsched has to be t1-t0 before wallclock reaches t1. In that
case t2 doesn't matter, it will always be 1s after t1.

> In this case newsec = t1.tv_sec = (t0 + ttransfer).tv_sec
>
> So now the fuzz logic for this is:
>
> newtime = t0 + ttransfer;
>
> if (newtime.tv_nsec < FUZZ)
> newsec = newtime.tv_sec;
> else if (newtime.tv_nsec > NSEC_PER_SEC - FUZZ)
> newsec = newtime.tv_sec + 1;
> else
> goto fail;
>
> Again the first condition handles the case where t1 >= tsched and the
> second one where t1 < tsched.
>
> So now we have two options to fix this:
>
> 1) Use a negative sync_offset for devices which need #1 above
> (rtc_cmos & similar)
>
> That requires setting tsched to t2 - abs(sync_offset)
>
> 2) Use always a positive sync_offset and a flag which tells
> rtc_tv_nsec_ok() whether it needs to add or subtract.
>
> #1 is good enough. All it takes is a comment at the timer start code why
> abs() is required.
>
> Let me hack that up along with the hrtimer muck.
>
> Thanks,
>
> tglx

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2020-12-03 22:39:20

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Thu, Dec 03, 2020 at 10:31:02PM +0100, Thomas Gleixner wrote:
> On Thu, Dec 03 2020 at 22:05, Thomas Gleixner wrote:
> > On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote:
> > So now we have two options to fix this:
> >
> > 1) Use a negative sync_offset for devices which need #1 above
> > (rtc_cmos & similar)
> >
> > That requires setting tsched to t2 - abs(sync_offset)
> >
> > 2) Use always a positive sync_offset and a flag which tells
> > rtc_tv_nsec_ok() whether it needs to add or subtract.
> >
> > #1 is good enough. All it takes is a comment at the timer start code why
> > abs() is required.
> >
> > Let me hack that up along with the hrtimer muck.
>
> That comment in rtc.h makes me cry:
>
> /* Number of nsec it takes to set the RTC clock. This influences when
> * the set ops are called. An offset:
> * - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s
> * - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s
> * - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s
> */
>
> Setting the wall clock time 10.0 at 10.5 is only possible for time
> traveling RTCs. It magically works, but come on ...

No tardis required. You can think of storing to a RTC as including a
millisecond component, so the three examples are: 10.0 stores 9.5,
10.0 stores 8.5, 10.0 stores 10.5.

It was probably included due to cmos, either as a misunderstanding
what it does, or it does actually store 10.5 when you store 10.0..

Jason

2020-12-04 09:39:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Thu, Dec 03 2020 at 23:00, Alexandre Belloni wrote:
> On 03/12/2020 22:05:09+0100, Thomas Gleixner wrote:
>> 2) I2C/SPI ...
>>
>> tsched t0 t1 t2
>> transfer(newsec) RTC update (newsec) RTC increments seconds
>>
>> Lets assume that ttransfer = t1 - t0 is known.
>
> Note that ttransfer is one of the reason why setting set_offset_nsec
> from the RTC driver is not a good idea. The same RTC may be on busses
> with different rates and there is no way to know that. I think that was
> one of my objections at the time.
>
> ttransfer is not a function of the RTC model but rather of how it is
> integrated in the system.

Yes, but it's the right place to store that information.

It's a fundamental problem of the RTC driver because that's the one
which has to be able to tell the caller about it. The caller has
absolutely no way to figure it out because it does not even know what
type of RTC is there.

So either the RTC knows the requirements for tsched, e.g. the MC14xxx
datasheet, or it can retrieve that information from DT or by querying
the underlying bus mechanics for the xfer time estimate or just by
timing an xfer for reference.

Thanks,

tglx





2020-12-04 09:57:02

by Alexandre Belloni

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On 04/12/2020 10:34:13+0100, Thomas Gleixner wrote:
> On Thu, Dec 03 2020 at 23:00, Alexandre Belloni wrote:
> > On 03/12/2020 22:05:09+0100, Thomas Gleixner wrote:
> >> 2) I2C/SPI ...
> >>
> >> tsched t0 t1 t2
> >> transfer(newsec) RTC update (newsec) RTC increments seconds
> >>
> >> Lets assume that ttransfer = t1 - t0 is known.
> >
> > Note that ttransfer is one of the reason why setting set_offset_nsec
> > from the RTC driver is not a good idea. The same RTC may be on busses
> > with different rates and there is no way to know that. I think that was
> > one of my objections at the time.
> >
> > ttransfer is not a function of the RTC model but rather of how it is
> > integrated in the system.
>
> Yes, but it's the right place to store that information.
>
> It's a fundamental problem of the RTC driver because that's the one
> which has to be able to tell the caller about it. The caller has
> absolutely no way to figure it out because it does not even know what
> type of RTC is there.
>
> So either the RTC knows the requirements for tsched, e.g. the MC14xxx
> datasheet, or it can retrieve that information from DT or by querying
> the underlying bus mechanics for the xfer time estimate or just by
> timing an xfer for reference.
>

What I do from userspace is that the caller is the one estimating the
transfer time and this works very well. I really think that the ntp code
could do just the same.


--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2020-12-04 10:47:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Fri, Dec 04 2020 at 10:51, Alexandre Belloni wrote:
> On 04/12/2020 10:34:13+0100, Thomas Gleixner wrote:
>> So either the RTC knows the requirements for tsched, e.g. the MC14xxx
>> datasheet, or it can retrieve that information from DT or by querying
>> the underlying bus mechanics for the xfer time estimate or just by
>> timing an xfer for reference.
>>
>
> What I do from userspace is that the caller is the one estimating the
> transfer time and this works very well. I really think that the ntp code
> could do just the same.

For MC14xxx type RTCs tsched is defined by a constant, so heuristics are
really horrible because you have to poll the RTC to get it correct.
What's the point if the driver can just provide the value from the data
sheet?

For RTC's behind a bus the driver its pretty simple to let the driver
tell at RTC registration time that the transfer time is unknown. So you
don't have to add the estimation procedure to each driver. You simply
can add it to the core in one place and expose that info to user space
as well as a starting point.

Sticking that into the NTP code is really the wrong place. That's like
asking the users of a timer device to calibrate it before usage.

The requirements for writing a RTC are not a problem of the caller, they
are fundamental properties of the RTC itself. So why on earth are you
asking every user to implement heuristics to figure these out themself?

Having it as property of the RTC device gives at least a halfways
correct value for the periodic kernel side update and if user space
want's to do better then it still can do so.

Thanks,

tglx

2020-12-04 13:06:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Thu, Dec 03 2020 at 18:36, Jason Gunthorpe wrote:
> On Thu, Dec 03, 2020 at 10:31:02PM +0100, Thomas Gleixner wrote:
>> On Thu, Dec 03 2020 at 22:05, Thomas Gleixner wrote:
>> > On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote:
>> > So now we have two options to fix this:
>> >
>> > 1) Use a negative sync_offset for devices which need #1 above
>> > (rtc_cmos & similar)
>> >
>> > That requires setting tsched to t2 - abs(sync_offset)
>> >
>> > 2) Use always a positive sync_offset and a flag which tells
>> > rtc_tv_nsec_ok() whether it needs to add or subtract.
>> >
>> > #1 is good enough. All it takes is a comment at the timer start code why
>> > abs() is required.
>> >
>> > Let me hack that up along with the hrtimer muck.
>>
>> That comment in rtc.h makes me cry:
>>
>> /* Number of nsec it takes to set the RTC clock. This influences when
>> * the set ops are called. An offset:
>> * - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s
>> * - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s
>> * - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s
>> */
>>
>> Setting the wall clock time 10.0 at 10.5 is only possible for time
>> traveling RTCs. It magically works, but come on ...
>
> No tardis required. You can think of storing to a RTC as including a
> millisecond component, so the three examples are: 10.0 stores 9.5,
> 10.0 stores 8.5, 10.0 stores 10.5.
>
> It was probably included due to cmos, either as a misunderstanding
> what it does, or it does actually store 10.5 when you store 10.0..

Yes, it kinda stores 10.5 because after the write the next seconds
increment happens 500ms later.

But none of this magic is actually required because the behaviour of
most RTCs is that the next seconds increment happens exactly 1000ms
_after_ the write.

Which means _all_ of these offsets are positive:

tsched twrite tnextsec

For CMOS tsched == twrite and tnextsec - twrite = 500ms

For I2C tsched = tnextsec - 1000ms - ttransport

which means the formula is the same for all of them

tRTCinc = tnextsec - twrite

tsched = tnextsec - tRTCinc - ttransport

And this covers also the (probably unlikely) case where the I2C RTC has
a tRTCinc != 1000ms. Imagine a i2c based MC14xxx which would have:

offset = 500ms + ttransport

No magic sign calculation required if you look at it from the actual
timeline and account the time between write and next second increment
correctly.

Thanks,

tglx

2020-12-04 14:11:28

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Fri, Dec 04, 2020 at 02:02:57PM +0100, Thomas Gleixner wrote:

> No magic sign calculation required if you look at it from the actual
> timeline and account the time between write and next second increment
> correctly.

Yes, it is equivalent to break things into two values, and does look
to be more understandable as one can read at least one of the values
from a datasheet and the other could be estimated by timing a read
clock

Jason

2020-12-04 14:42:08

by Alexandre Belloni

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On 04/12/2020 10:08:19-0400, Jason Gunthorpe wrote:
> On Fri, Dec 04, 2020 at 02:02:57PM +0100, Thomas Gleixner wrote:
>
> > No magic sign calculation required if you look at it from the actual
> > timeline and account the time between write and next second increment
> > correctly.
>
> Yes, it is equivalent to break things into two values, and does look
> to be more understandable as one can read at least one of the values
> from a datasheet and the other could be estimated by timing a read
> clock
>

If you want to read an RTC accurately, you don't want to time a read,
what you want is to time an alarm. This is a common misconception and
is, again, why hctosys in its current state is not useful.

And because people using systohc are definitively using hctosys, this
will still result in an up to 500ms error in the current time.
As said, the price to pay for a proper solution will be an up to one
second delay when booting which is not acceptable for most users.

Is "fixing" systohc worth the effort and the maintenance cost?

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2020-12-04 14:50:49

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Fri, Dec 04, 2020 at 03:37:35PM +0100, Alexandre Belloni wrote:
> On 04/12/2020 10:08:19-0400, Jason Gunthorpe wrote:
> > On Fri, Dec 04, 2020 at 02:02:57PM +0100, Thomas Gleixner wrote:
> >
> > > No magic sign calculation required if you look at it from the actual
> > > timeline and account the time between write and next second increment
> > > correctly.
> >
> > Yes, it is equivalent to break things into two values, and does look
> > to be more understandable as one can read at least one of the values
> > from a datasheet and the other could be estimated by timing a read
> > clock
> >
>
> If you want to read an RTC accurately, you don't want to time a read,
> what you want is to time an alarm. This is a common misconception and
> is, again, why hctosys in its current state is not useful.

I mean literatally time the excution of something like a straight
read. This will give some estimate of the bus latency and it should
linearly relate to the bus latency for a write.

The driver could time configuring an alarm as well, if it likes.

> And because people using systohc are definitively using hctosys, this
> will still result in an up to 500ms error in the current time.
> As said, the price to pay for a proper solution will be an up to one
> second delay when booting which is not acceptable for most users.

IIRC I had fixed this in some embedded system long ago by having
hctosys reading seconds time during boot, then in parallel using the
'up to one second' method to get the sub-second resolution.

This means there was a sub second non-monotonicity in the realtime
clock, but the system was designed to tolerate this as it also ran a
modified NTP which would unconditionally step the clock on first sync
if it was over .1s out of alignment.

The goal was to bring the system to correct time as quickly as
possible in as many cases as possible, not to maintain the
monotonicity of the realtime clock.

> Is "fixing" systohc worth the effort and the maintenance cost?

As I said before, if there is no desire to address the read side then
the whole thing should be abandoned.

Jason

2020-12-04 15:13:04

by Alexandre Belloni

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On 04/12/2020 10:46:59-0400, Jason Gunthorpe wrote:
> > If you want to read an RTC accurately, you don't want to time a read,
> > what you want is to time an alarm. This is a common misconception and
> > is, again, why hctosys in its current state is not useful.
>
> I mean literatally time the excution of something like a straight
> read. This will give some estimate of the bus latency and it should
> linearly relate to the bus latency for a write.
>

It doesn't, some rtc will require writing dozen registers to set the
time and reading only 3 to get the time, the only accurate way is to
really time setting the time. You set the RTC time once, set up an alarm for
the next second, once you get the alarm, you get system time and you now
how far you are off.

Notice that systohc could do that if you wanted to be accurate and then
the whole issue with mc146818 is gone and this nicely solves it for all
the RTCs at once.

> The driver could time configuring an alarm as well, if it likes.

The driver should definitively not have to do the timing. the core,
maybe but I won't go over the 165 drivers to add timing.

>
> > And because people using systohc are definitively using hctosys, this
> > will still result in an up to 500ms error in the current time.
> > As said, the price to pay for a proper solution will be an up to one
> > second delay when booting which is not acceptable for most users.
>
> IIRC I had fixed this in some embedded system long ago by having
> hctosys reading seconds time during boot, then in parallel using the
> 'up to one second' method to get the sub-second resolution.
>
> This means there was a sub second non-monotonicity in the realtime
> clock, but the system was designed to tolerate this as it also ran a
> modified NTP which would unconditionally step the clock on first sync
> if it was over .1s out of alignment.
>
> The goal was to bring the system to correct time as quickly as
> possible in as many cases as possible, not to maintain the
> monotonicity of the realtime clock.
>

I'm really curious, in your use case, couldn't you have read the RTC
from userspace and set the system time from there, right before starting
NTP and other applications?
Doing so, you would have probably been able to ensure monotonicity.

> > Is "fixing" systohc worth the effort and the maintenance cost?
>
> As I said before, if there is no desire to address the read side then
> the whole thing should be abandoned.
>

What was your plan back in 2017?

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2020-12-04 15:59:59

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On Fri, Dec 04, 2020 at 04:08:57PM +0100, Alexandre Belloni wrote:

> > I mean literatally time the excution of something like a straight
> > read. This will give some estimate of the bus latency and it should
> > linearly relate to the bus latency for a write.
>
>
> It doesn't, some rtc will require writing dozen registers to set the
> time and reading only 3 to get the time, the only accurate way is to
> really time setting the time. You set the RTC time once, set up an alarm for
> the next second, once you get the alarm, you get system time and you now
> how far you are off.

This is why I said linearly related. Yes the relation is per-driver,
based on the ops required, but in principle it can get close.

> Notice that systohc could do that if you wanted to be accurate and then
> the whole issue with mc146818 is gone and this nicely solves it for all
> the RTCs at once.

Another good solution is to have systohc progam the time and then
immediately read it back (eg with an alarm). The difference between
the read back and the system RTC is the single value to plug into the
adjustment offset and naturally includes all the time factors Thomas
identified, including the additional factor of 'latency to read the
time'

> > The goal was to bring the system to correct time as quickly as
> > possible in as many cases as possible, not to maintain the
> > monotonicity of the realtime clock.
>
> I'm really curious, in your use case, couldn't you have read the RTC
> from userspace and set the system time from there, right before starting
> NTP and other applications?

This was RAM constrainted embedded, a few hundred bytes of kernel code
wins over 100k of userspace

> Doing so, you would have probably been able to ensure monotonicity.

No, this case also wasn't willing to wait the 1s to load the time. It
had to go parallel with the rest of the boot up. This was enterprise
gear, boot time to operational counts against the achievable
availability rating.

From an upstream perspective this was hacky because
- We historically try not to create non-monotinicity in CLOCK_REALTIME
because too much stuff wrongly works on CLOCK_REALTIME when they
really need CLOCK_MONOTONIC
- Having the kernel async set the clock is racy with NTP also trying
to set the clock

But in a closed system the two above were delt with reliably.

> > As I said before, if there is no desire to address the read side then
> > the whole thing should be abandoned.
>
> What was your plan back in 2017?

Well I was helping RMK because we had similar issues, but I saw some
path to achive the low offset round trip, and view this as laudable
for the embedded world.

I see I also changed jobs right around then which probably explains
why things stopped at this one patch. The fact nobody else picked it
up probably says people really just don't care about realtime
accuracy.

Jason

2020-12-04 16:39:28

by Alexandre Belloni

[permalink] [raw]
Subject: Re: [PATCH] rtc: adapt allowed RTC update error

On 04/12/2020 11:57:08-0400, Jason Gunthorpe wrote:
> On Fri, Dec 04, 2020 at 04:08:57PM +0100, Alexandre Belloni wrote:
>
> > > I mean literatally time the excution of something like a straight
> > > read. This will give some estimate of the bus latency and it should
> > > linearly relate to the bus latency for a write.
> >
> >
> > It doesn't, some rtc will require writing dozen registers to set the
> > time and reading only 3 to get the time, the only accurate way is to
> > really time setting the time. You set the RTC time once, set up an alarm for
> > the next second, once you get the alarm, you get system time and you now
> > how far you are off.
>
> This is why I said linearly related. Yes the relation is per-driver,
> based on the ops required, but in principle it can get close.
>
> > Notice that systohc could do that if you wanted to be accurate and then
> > the whole issue with mc146818 is gone and this nicely solves it for all
> > the RTCs at once.
>
> Another good solution is to have systohc progam the time and then
> immediately read it back (eg with an alarm).

This is what I was suggesting, with an alarm.

> The difference between
> the read back and the system RTC is the single value to plug into the
> adjustment offset and naturally includes all the time factors Thomas
> identified, including the additional factor of 'latency to read the
> time'

There is no 'latency to read the time' because you don't have to read
the time. You already know what the time will be when the alarm fires.
That is when you read the system time and you can figure out what the
offset is. But you never need to read the time.

[...]

> I see I also changed jobs right around then which probably explains
> why things stopped at this one patch. The fact nobody else picked it
> up probably says people really just don't care about realtime
> accuracy.

Or those that do care do it from userspace.

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com