Daphne,
On Sat, Mar 13 2021 at 17:44, bugzilla-daemon wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=212265
I'm leaving the text from the BZ entry untrimmed so everyone on Cc is on
the same page.
> In order for CLOCK_TAI to function properly, a program (usually ntpd)
> has to use the adjtimex family of system calls in order to tell the
> kernel what the difference is between TAI and UTC. ntpd will do this
> as long as it has been configured with a leap seconds file.
>
> Unfortunately, although the majority of distributions ship with a leap
> second file from the zoneinfo database, many or most of them (I have
> Arch here) do not configure ntpd to know about it, so ntpd does not
> set things up properly for CLOCK_TAI to work. Calling
> clock_gettime(CLOCK_TAI, ...) produces the same result as
> clock_gettime(CLOCK_REALTIME, ...), yielding UTC instead of the
> requested TAI.
>
> The result is that CLOCK_TAI, which one would usually wish to use to
> improve the correctness of a program’s date and time handling,
> produces utterly incorrect behaviours on the vast majority of boxes,
> unless the system administrator is conscientious enough to configure.
Yes, that's unfortunate, but pretty much historical behaviour and I fear
it's not really documented either.
> I would like to suggest that clock_gettime(CLOCK_TAI, ...) and friends
> should return an error (EINVAL? ENOTSUP?) when it would return the
> same as CLOCK_REALTIME, so that programs can detect when it’s not been
> set up correctly and either tell users to go and set up their leap
> second data file properly,
That would be a user visible change and might hit existing user space by
surprise, so that's not a necessarily a good option.
Of course it could be argued that a given kernel can return -ENOTSUP or
whatever is appropriate for any CLOCK id, but that really needs some
deep thoughts and analysis vs. eventual disruption.
> or try to improvise TAI on top of UTC using (at the cost of not being
> able to be accurate during leap seconds themselves), or both.
The problem with TAI is that the number of leapseconds which need to be
accounted for at a given date/time, i.e. when the machine boots, looks
simple but leap seconds are not predictable due to the non-linear
behaviour of earth rotation.
So we'd need to have an up-to-date leap seconds table:
https://www.ietf.org/timezones/data/leap-seconds.list
which is not rocket science, but there is this little spoilsport in that
file:
File expires on: 28 December 2021
and the kernel on it's own has no way to check for and retrieve an
up-to-date version. That's why it is delegated to user space.
No idea though why this is not enabled by default in distros when NTP is
on. Of course I did not notice because I had that entry in my
ntpd/chrony configs forever since we started to hack on it.
> A workaround for programs which want to detect when CLOCK_TAI is wrong is to
> try to detect when it hasn't been set up properly by getting both CLOCK_TAI and
> CLOCK_REALTIME and falling back to trying to emulate TAI on top of time_t when
> the difference between the tv_sec value is ≤ 1 second (not = 0, because it
> could happen that the first clock was checked at .00001 seconds before a whole
> second and the latter one at .00001 seconds after the whole second). But even
> that has edge cases — putting similar logic in the kernel could make it work
> correctly all the time.
adjtimex()/ntp_timex() allows you to read out tai_offset race
free. Whether that's a good answer is a different question.
My initial takeaway is that at least the documentation sucks.
I hope the NTP/TAI wizards have some more insight/opinions on this.
Thanks,
tglx
On Fri, Mar 26, 2021 at 12:13:43PM +0100, Thomas Gleixner wrote:
> On Sat, Mar 13 2021 at 17:44, bugzilla-daemon wrote:
> > Unfortunately, although the majority of distributions ship with a leap
> > second file from the zoneinfo database, many or most of them (I have
> > Arch here) do not configure ntpd to know about it, so ntpd does not
> > set things up properly for CLOCK_TAI to work.
I'm not sure about "many or most" distros. In Debian, the ntp package
depends on tzdata, and the default /etc/ntp.conf does use the leap
seconds file.
> That would be a user visible change and might hit existing user space by
> surprise, so that's not a necessarily a good option.
Agreed.
> and the kernel on it's own has no way to check for and retrieve an
> up-to-date version. That's why it is delegated to user space.
Right, the kernel can't make any assumptions about the TAI-UTC offset.
> I hope the NTP/TAI wizards have some more insight/opinions on this.
I agree that ntpd and the current distros don't handle this very well,
but all the pieces are there to allow user space to handle TAI and
leap seconds as reasonably as possible. The fundamental issue is that
there is no way to determine the TAI-UTC offset without some kind of
input from the real world.
Even with GPS, after a cold boot you cannot know the offset
immediately, because the leap second information is broadcast in the
almanac only every 12.5 minutes, and so you can be left in suspense
for a long time.
Using ntpd on Debian, the service will set the offset, but only after
synchronization with the upstream server has been established, and
this takes about five minutes, IIRC.
If waiting 5 or 12.5 minutes is too long for your requirements, you
can boot strap the time from RTC [1] and then consult the leap seconds
table [2] to set the TAI-UTC offset in the kernel via adjtimex().
Unfortunately there is no user space utility for setting TAI-UTC, but
I hacked one 'adjtimex' program for this purpose:
https://github.com/richardcochran/ntpclient-2015
Getting back to the original point of the kernel returning an error,
I don't see a need for this. Applications that require correct leap
seconds can simply call adjtimex() and wait until the initial zero
value is changed by ntpd/etc to the correct offset. That isn't
fundamentally harder than calling clock_gettime() and waiting until
the error would go away.
Thanks,
Richard
1. Assuming the RTC was set and has a fresh battery, and assuming no
leap seconds occurred while your computer was off!
2. Assuming the RTC value is not newer than the expiration date of the
leap seconds file.
On Fri, Mar 26, 2021 at 08:28:59PM -0700, Richard Cochran wrote:
> Using ntpd on Debian, the service will set the offset, but only after
> synchronization with the upstream server has been established, and
> this takes about five minutes, IIRC.
With the iburst option it shouldn't take more than 10 seconds. There
might be an issue wrt stepping the clock when the initial offset is
large. In Fedora and derived distros using chrony by default the
TAI-UTC offset should be set right on the first update of the clock as
expected.
> Getting back to the original point of the kernel returning an error,
> I don't see a need for this. Applications that require correct leap
> seconds can simply call adjtimex() and wait until the initial zero
> value is changed by ntpd/etc to the correct offset. That isn't
> fundamentally harder than calling clock_gettime() and waiting until
> the error would go away.
There are at least two issues with handling a zero offset as a special
value. One is that zero could potentially be a valid value in distant
future. The other is that the kernel updates the offset when a leap
second is inserted/deleted even if the original offset is zero, so
checking for zero (in the kernel or an application) works only until
the first leap second after boot.
The kernel would need to set a flag that the offset was set. Returning
an error in clock_gettime() until the offset is set sounds reasonable
to me, but I have no idea how many of the existing applications it
would break.
--
Miroslav Lichvar
On 29 Mar 2021, at 11:16, Miroslav Lichvar <[email protected]> wrote:
> On Fri, Mar 26, 2021 at 08:28:59PM -0700, Richard Cochran wrote:
>> Using ntpd on Debian, the service will set the offset, but only after
>> synchronization with the upstream server has been established, and
>> this takes about five minutes, IIRC.
>
> With the iburst option it shouldn't take more than 10 seconds. There
> might be an issue wrt stepping the clock when the initial offset is
> large. In Fedora and derived distros using chrony by default the
> TAI-UTC offset should be set right on the first update of the clock as
> expected.
Yeah, I personally am not really concerned about the immediate post-boot environment. As long as it’s ready by the time userland services are starting, I think most applications that need TAI will be satisfied.
>> Getting back to the original point of the kernel returning an error,
>> I don't see a need for this. Applications that require correct leap
>> seconds can simply call adjtimex() and wait until the initial zero
>> value is changed by ntpd/etc to the correct offset. That isn't
>> fundamentally harder than calling clock_gettime() and waiting until
>> the error would go away.
>
> There are at least two issues with handling a zero offset as a special
> value. One is that zero could potentially be a valid value in distant
> future.
Since even a single negative leap second was, until recently, considered (quite literally) astronomically unlikely, and even now (where the earth is spinning faster than ever hitherto expected) the most likely scenario by far seems to be that it’ll just be a longer wait than usual for the next positive leap second, I’d say minus 37 leap seconds is a prospect for the very very distant future indeed. But in theory, yes.
> The other is that the kernel updates the offset when a leap
> second is inserted/deleted even if the original offset is zero, so
> checking for zero (in the kernel or an application) works only until
> the first leap second after boot.
This is a problem and definitely speaks for having a way to tell whether CLOCK_TAI has been set up at all.
> The kernel would need to set a flag that the offset was set. Returning
> an error in clock_gettime() until the offset is set sounds reasonable
> to me, but I have no idea how many of the existing applications it
> would break.
Given that CLOCK_TAI doesn’t exist except on Linux, any portable Unix application is likely to have a fallback of some kind, though perhaps only at compile time.
Daphne Preston-Kendal
On Mon, Mar 29, 2021 at 11:56:31AM +0200, Daphne Preston-Kendal wrote:
> > The other is that the kernel updates the offset when a leap
> > second is inserted/deleted even if the original offset is zero, so
> > checking for zero (in the kernel or an application) works only until
> > the first leap second after boot.
>
> This is a problem and definitely speaks for having a way to tell whether CLOCK_TAI has been set up at all.
+1
Thanks,
Richard
On Mon, Mar 29, 2021 at 11:16:48AM +0200, Miroslav Lichvar wrote:
> On Fri, Mar 26, 2021 at 08:28:59PM -0700, Richard Cochran wrote:
> > Using ntpd on Debian, the service will set the offset, but only after
> > synchronization with the upstream server has been established, and
> > this takes about five minutes, IIRC.
>
> With the iburst option it shouldn't take more than 10 seconds. There
> might be an issue wrt stepping the clock when the initial offset is
> large.
Really? Debian has
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
pool 0.debian.pool.ntp.org iburst
pool 1.debian.pool.ntp.org iburst
pool 2.debian.pool.ntp.org iburst
pool 3.debian.pool.ntp.org iburst
I guess I'll measure again, but I'm pretty sure it took a long time to
get to TAI being set.
> In Fedora and derived distros using chrony by default the
> TAI-UTC offset should be set right on the first update of the clock as
> expected.
(Maybe it is time to switch to chrony ;)
Thanks,
Richard
On Mon, Mar 29, 2021 at 11:16:48AM +0200, Miroslav Lichvar wrote:
> There are at least two issues with handling a zero offset as a special
> value. One is that zero could potentially be a valid value in distant
> future.
I not losing sleep over that, but
> The other is that the kernel updates the offset when a leap
> second is inserted/deleted even if the original offset is zero, so
> checking for zero (in the kernel or an application) works only until
> the first leap second after boot.
oh, I didn't think of that. I hate leap seconds. Good thing Earth is
picking up the pace again!
> The kernel would need to set a flag that the offset was set. Returning
> an error in clock_gettime() until the offset is set sounds reasonable
> to me, but I have no idea how many of the existing applications it
> would break.
I think it wiser to provide another way, sysfs or something else.
Thanks,
Richard
On Mon, Mar 29 2021 at 07:26, Richard Cochran wrote:
> On Mon, Mar 29, 2021 at 11:16:48AM +0200, Miroslav Lichvar wrote:
>> There are at least two issues with handling a zero offset as a special
>> value. One is that zero could potentially be a valid value in distant
>> future.
>
> I not losing sleep over that, but
>
>> The other is that the kernel updates the offset when a leap
>> second is inserted/deleted even if the original offset is zero, so
>> checking for zero (in the kernel or an application) works only until
>> the first leap second after boot.
>
> oh, I didn't think of that. I hate leap seconds. Good thing Earth is
> picking up the pace again!
>
>> The kernel would need to set a flag that the offset was set. Returning
>> an error in clock_gettime() until the offset is set sounds reasonable
>> to me, but I have no idea how many of the existing applications it
>> would break.
>
> I think it wiser to provide another way, sysfs or something else.
I think adjtimex is the right place and not yet another random file
somewhere. Something like the below.
Thanks,
tglx
---
include/uapi/linux/timex.h | 7 +++++--
kernel/time/ntp.c | 4 +++-
2 files changed, 8 insertions(+), 3 deletions(-)
--- a/include/uapi/linux/timex.h
+++ b/include/uapi/linux/timex.h
@@ -188,9 +188,12 @@ struct __kernel_timex {
#define STA_MODE 0x4000 /* mode (0 = PLL, 1 = FLL) (ro) */
#define STA_CLK 0x8000 /* clock source (0 = A, 1 = B) (ro) */
+#define STA_TAISET 0x10000 /* TAI offset was set via adjtimex (ro) */
+
/* read-only bits */
-#define STA_RONLY (STA_PPSSIGNAL | STA_PPSJITTER | STA_PPSWANDER | \
- STA_PPSERROR | STA_CLOCKERR | STA_NANO | STA_MODE | STA_CLK)
+#define STA_RONLY (STA_PPSSIGNAL | STA_PPSJITTER | STA_PPSWANDER | \
+ STA_PPSERROR | STA_CLOCKERR | STA_NANO | STA_MODE | \
+ STA_CLK | STA_TAISET)
/*
* Clock states (time_state)
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -741,8 +741,10 @@ static inline void process_adjtimex_mode
}
if (txc->modes & ADJ_TAI &&
- txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET)
+ txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET) {
*time_tai = txc->constant;
+ time_status |= STA_TAISET;
+ }
if (txc->modes & ADJ_OFFSET)
ntp_update_offset(txc->offset);
On Mon, Mar 29, 2021 at 04:57:55PM +0200, Thomas Gleixner wrote:
> I think adjtimex is the right place and not yet another random file
> somewhere. Something like the below.
Perfect.
Acked-by: Richard Cochran <[email protected]>
> ---
> include/uapi/linux/timex.h | 7 +++++--
> kernel/time/ntp.c | 4 +++-
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> --- a/include/uapi/linux/timex.h
> +++ b/include/uapi/linux/timex.h
> @@ -188,9 +188,12 @@ struct __kernel_timex {
> #define STA_MODE 0x4000 /* mode (0 = PLL, 1 = FLL) (ro) */
> #define STA_CLK 0x8000 /* clock source (0 = A, 1 = B) (ro) */
>
> +#define STA_TAISET 0x10000 /* TAI offset was set via adjtimex (ro) */
> +
> /* read-only bits */
> -#define STA_RONLY (STA_PPSSIGNAL | STA_PPSJITTER | STA_PPSWANDER | \
> - STA_PPSERROR | STA_CLOCKERR | STA_NANO | STA_MODE | STA_CLK)
> +#define STA_RONLY (STA_PPSSIGNAL | STA_PPSJITTER | STA_PPSWANDER | \
> + STA_PPSERROR | STA_CLOCKERR | STA_NANO | STA_MODE | \
> + STA_CLK | STA_TAISET)
>
> /*
> * Clock states (time_state)
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -741,8 +741,10 @@ static inline void process_adjtimex_mode
> }
>
> if (txc->modes & ADJ_TAI &&
> - txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET)
> + txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET) {
> *time_tai = txc->constant;
> + time_status |= STA_TAISET;
> + }
>
> if (txc->modes & ADJ_OFFSET)
> ntp_update_offset(txc->offset);
On Mon, Mar 29 2021 at 08:36, Richard Cochran wrote:
> On Mon, Mar 29, 2021 at 04:57:55PM +0200, Thomas Gleixner wrote:
>> I think adjtimex is the right place and not yet another random file
>> somewhere. Something like the below.
>
> Perfect.
>
> Acked-by: Richard Cochran <[email protected]>
But one problem is with that trivial bit that the interface does not
tell you whether that bit will ever be set or not, i.e. it won't on an
older kernel even if TAI was set by ntp/chrony/...
So that needs some thoughts. The trivial hack is in the updated patch
below. If you want to spare the extra bits in status then you could use
one of the spare int's at the end for this.
If someone has cycles and can turn that into a proper patch with all the
bells and whistels (changelog, manpage update, example ..), that would
be appreciated. Otherwise I stick it to the other things on that ever
growing todo list and tend to it Mañana. :)
Thanks,
tglx
---
include/uapi/linux/timex.h | 8 ++++++--
kernel/time/ntp.c | 6 ++++--
2 files changed, 10 insertions(+), 4 deletions(-)
--- a/include/uapi/linux/timex.h
+++ b/include/uapi/linux/timex.h
@@ -188,9 +188,13 @@ struct __kernel_timex {
#define STA_MODE 0x4000 /* mode (0 = PLL, 1 = FLL) (ro) */
#define STA_CLK 0x8000 /* clock source (0 = A, 1 = B) (ro) */
+#define STA_TAIREF 0x10000 /* Set TAI offset is reflected in STA_TAISET (ro) */
+#define STA_TAISET 0x20000 /* TAI offset was set via adjtimex (ro) */
+
/* read-only bits */
-#define STA_RONLY (STA_PPSSIGNAL | STA_PPSJITTER | STA_PPSWANDER | \
- STA_PPSERROR | STA_CLOCKERR | STA_NANO | STA_MODE | STA_CLK)
+#define STA_RONLY (STA_PPSSIGNAL | STA_PPSJITTER | STA_PPSWANDER | \
+ STA_PPSERROR | STA_CLOCKERR | STA_NANO | STA_MODE | \
+ STA_CLK | STA_TAIREF | STA_TAISET)
/*
* Clock states (time_state)
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -57,7 +57,7 @@ static u64 tick_length_base;
static int time_state = TIME_OK;
/* clock status bits: */
-static int time_status = STA_UNSYNC;
+static int time_status = STA_UNSYNC | STA_TAIREF;
/* time adjustment (nsecs): */
static s64 time_offset;
@@ -741,8 +741,10 @@ static inline void process_adjtimex_mode
}
if (txc->modes & ADJ_TAI &&
- txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET)
+ txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET) {
*time_tai = txc->constant;
+ time_status |= STA_TAISET;
+ }
if (txc->modes & ADJ_OFFSET)
ntp_update_offset(txc->offset);