Miroslav ran into a situation where the periodic RTC synchronization almost
never was able to hit the time window for the update. That happens due the
usage of delayed_work and the properties of the timer wheel.
While that particular problem is halfways simple to fix this started to
unearth other problems with that code particularly with rtc_set_npt_time()
but expanded into other things as well.
1) The update offset for rtc-cmos is off by a full second
2) The readout of MC146818 (rtc-cmos and arch code) is broken and can
return garbage.
2) Alexandre questioned the approach in general and wants to get rid of
it. Of course there are better methods to do that and it can be
completely done in user space.
Unfortunately it's not that simple as this would be a user visible
change, so making it at least halfways correct.
3) Alexandre requested to move that code into the NTP code as this is not
really RTC functionality and just usage of the RTC API.
4) The update offset itself was questioned, but the time between the
write and the next seconds increment in the RTC is fundamentaly a
hardware property. The transport time, which is pretty irrelevant for
direct accessible RTCs (rtc-cmos), but relevant for RTC behind i2c/SPI
needs to be added on top.
It's undebated that this transport time cannot be correctly estimated,
but right now it's 500ms which is far off. The correct transport time
can be calibrated, a halfways correct value supplied via DT, but
that's an orthogonal problem.
The following series addresses the above:
1) Fix the readout function of MC146818
2) Fix the rtc-cmos offset
3) Reduce the default transport time
4) Switch the NTP periodic sync code to a hrtimer/work combo
5) Move rtc_set_npt_time() to the ntp code
6) Make the update offset more intuitive
7) Simplify the whole machinery
Thanks,
tglx
---
a/drivers/rtc/systohc.c | 61 ----------
drivers/rtc/Makefile | 1
drivers/rtc/class.c | 9 +
drivers/rtc/rtc-cmos.c | 3
drivers/rtc/rtc-mc146818-lib.c | 70 +++++++-----
include/linux/rtc.h | 69 +++++-------
include/linux/timex.h | 1
kernel/time/ntp.c | 232 ++++++++++++++++++++++++-----------------
kernel/time/ntp_internal.h | 7 +
9 files changed, 225 insertions(+), 228 deletions(-)
Alexandre,
On Sun, Dec 06 2020 at 22:46, Thomas Gleixner wrote:
> Miroslav ran into a situation where the periodic RTC synchronization almost
> never was able to hit the time window for the update. That happens due the
> usage of delayed_work and the properties of the timer wheel.
>
> While that particular problem is halfways simple to fix this started to
> unearth other problems with that code particularly with rtc_set_npt_time()
> but expanded into other things as well.
>
> 1) The update offset for rtc-cmos is off by a full second
>
> 2) The readout of MC146818 (rtc-cmos and arch code) is broken and can
> return garbage.
>
> 2) Alexandre questioned the approach in general and wants to get rid of
> it. Of course there are better methods to do that and it can be
> completely done in user space.
>
> Unfortunately it's not that simple as this would be a user visible
> change, so making it at least halfways correct.
>
> 3) Alexandre requested to move that code into the NTP code as this is not
> really RTC functionality and just usage of the RTC API.
>
> 4) The update offset itself was questioned, but the time between the
> write and the next seconds increment in the RTC is fundamentaly a
> hardware property. The transport time, which is pretty irrelevant for
> direct accessible RTCs (rtc-cmos), but relevant for RTC behind i2c/SPI
> needs to be added on top.
>
> It's undebated that this transport time cannot be correctly estimated,
> but right now it's 500ms which is far off. The correct transport time
> can be calibrated, a halfways correct value supplied via DT, but
> that's an orthogonal problem.
>
> The following series addresses the above:
>
> 1) Fix the readout function of MC146818
> 2) Fix the rtc-cmos offset
> 3) Reduce the default transport time
>
> 4) Switch the NTP periodic sync code to a hrtimer/work combo
>
> 5) Move rtc_set_npt_time() to the ntp code
> 6) Make the update offset more intuitive
> 7) Simplify the whole machinery
any opinion on this?
Thanks,
tglx
On 09/12/2020 01:33:08+0100, Thomas Gleixner wrote:
> Alexandre,
>
> On Sun, Dec 06 2020 at 22:46, Thomas Gleixner wrote:
> > Miroslav ran into a situation where the periodic RTC synchronization almost
> > never was able to hit the time window for the update. That happens due the
> > usage of delayed_work and the properties of the timer wheel.
> >
> > While that particular problem is halfways simple to fix this started to
> > unearth other problems with that code particularly with rtc_set_npt_time()
> > but expanded into other things as well.
> >
> > 1) The update offset for rtc-cmos is off by a full second
> >
> > 2) The readout of MC146818 (rtc-cmos and arch code) is broken and can
> > return garbage.
> >
> > 2) Alexandre questioned the approach in general and wants to get rid of
> > it. Of course there are better methods to do that and it can be
> > completely done in user space.
> >
> > Unfortunately it's not that simple as this would be a user visible
> > change, so making it at least halfways correct.
> >
> > 3) Alexandre requested to move that code into the NTP code as this is not
> > really RTC functionality and just usage of the RTC API.
> >
> > 4) The update offset itself was questioned, but the time between the
> > write and the next seconds increment in the RTC is fundamentaly a
> > hardware property. The transport time, which is pretty irrelevant for
> > direct accessible RTCs (rtc-cmos), but relevant for RTC behind i2c/SPI
> > needs to be added on top.
> >
> > It's undebated that this transport time cannot be correctly estimated,
> > but right now it's 500ms which is far off. The correct transport time
> > can be calibrated, a halfways correct value supplied via DT, but
> > that's an orthogonal problem.
> >
> > The following series addresses the above:
> >
> > 1) Fix the readout function of MC146818
> > 2) Fix the rtc-cmos offset
> > 3) Reduce the default transport time
> >
> > 4) Switch the NTP periodic sync code to a hrtimer/work combo
> >
> > 5) Move rtc_set_npt_time() to the ntp code
> > 6) Make the update offset more intuitive
> > 7) Simplify the whole machinery
>
> any opinion on this?
>
This looks very good to me, however, I think the 10ms offset is a bit
too much. Do you mind waiting one or two days so I can get my test setup
back up?
--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
On Wed, Dec 09 2020 at 05:01, Alexandre Belloni wrote:
> On 09/12/2020 01:33:08+0100, Thomas Gleixner wrote:
>> > The following series addresses the above:
>> >
>> > 1) Fix the readout function of MC146818
>> > 2) Fix the rtc-cmos offset
>> > 3) Reduce the default transport time
>> >
>> > 4) Switch the NTP periodic sync code to a hrtimer/work combo
>> >
>> > 5) Move rtc_set_npt_time() to the ntp code
>> > 6) Make the update offset more intuitive
>> > 7) Simplify the whole machinery
>>
>> any opinion on this?
>>
> This looks very good to me, however, I think the 10ms offset is a bit
> too much.
I pulled it out of thin air TBH. It's at least 50 times more accurate
than what we had before :)
> Do you mind waiting one or two days so I can get my test setup
> back up?
No problem.
Thanks,
tglx