LinuxLists.cc - [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]

2015-02-12 13:58:52

Subject: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]

During leap second insertion testing it was noticed that a small window
exists where the time_state could be reset such that
time_state = TIME_OK, which then causes the leap second to not occur, or
causes the entire leap second state machine to fail with time_state =
TIME_INS at the end of the leap second.

The test did the following in userspace:

tx.modes = ADJ_STATUS;
tx.status = STA_INS;

/* send leap second request */
ret = adjtimex(&tx);

/* Check adjtimex output every half second */
now = tx.time.tv_sec;
while (now < next_leap+2) {
char buf[26];
ret = adjtimex(&tx);

ctime_r(&tx.time.tv_sec, buf);
buf[strlen(buf)-1] = 0; /*remove trailing\n */

printf("%s + %6ld us\t%s\n",
buf,
tx.time.tv_usec,
time_state_str(ret));
now = tx.time.tv_sec;
/* Sleep for another half second */
ts.tv_sec = 0;
ts.tv_nsec = NSEC_PER_SEC/2;
clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
}

which was intended to mimic the insertion of a leap second. A
successful run of the test would result in the time_state transitioning
from TIME_OK to TIME_INS, then to TIME_OOP when the leap second was
inserted, and then to TIME_WAIT when the leap second was completed. While
running this code failures were seen in which the time_state remained TIME_INS,
even though the leap second had occurred.

After some investigation it was noted that the test contained a small error:
the test does not reinitialize tx.status and reissues the STA_INS every
1/2 second. As a result of this broken test, the following failure was noticed
(the output below is a mix of kernel messages and the output from the test
program, the remaining annotations are printk's in the code and my own
additional notes):

[ 942.952833] time_state [1] change from TIME_OK to TIME_INS

Fri Feb 13 18:59:51 2015 + 318126 us TIME_INS
Fri Feb 13 18:59:51 2015 + 818167 us TIME_INS
Fri Feb 13 18:59:52 2015 + 318208 us TIME_INS
Fri Feb 13 18:59:52 2015 + 818248 us TIME_INS
Fri Feb 13 18:59:53 2015 + 318290 us TIME_INS
Fri Feb 13 18:59:53 2015 + 818331 us TIME_INS
Fri Feb 13 18:59:54 2015 + 318372 us TIME_INS
Fri Feb 13 18:59:54 2015 + 818413 us TIME_INS
Fri Feb 13 18:59:55 2015 + 318454 us TIME_INS
Fri Feb 13 18:59:55 2015 + 818495 us TIME_INS
Fri Feb 13 18:59:56 2015 + 318534 us TIME_INS
Fri Feb 13 18:59:56 2015 + 818575 us TIME_INS
Fri Feb 13 18:59:57 2015 + 318617 us TIME_INS
Fri Feb 13 18:59:57 2015 + 818660 us TIME_INS
Fri Feb 13 18:59:58 2015 + 318702 us TIME_INS
Fri Feb 13 18:59:58 2015 + 818744 us TIME_INS
Fri Feb 13 18:59:59 2015 + 318785 us TIME_INS
Fri Feb 13 18:59:59 2015 + 818837 us TIME_INS

[ 952.953143] time_state [4] change from TIME_INS to TIME_OOP
[ 952.953150] Clock: inserting leap second 23:59:60 UTC
[ 953.299905] process_adj_status: insert_leap_sec[1223] setting time_state back
to TIME_OK [1, 1] <<< adjtimex() call every 1/2 second
[ 953.299913] time_state [9] change from TIME_OOP to TIME_OK

Fri Feb 13 18:59:59 2015 + 318878 us TIME_OK
Fri Feb 13 18:59:59 2015 + 818931 us TIME_OK

[ 954.064237] time_state [1] change from TIME_OK to TIME_INS

Fri Feb 13 19:00:00 2015 + 318972 us TIME_INS
Fri Feb 13 19:00:00 2015 + 819012 us TIME_INS
Fri Feb 13 19:00:01 2015 + 319051 us TIME_INS
Fri Feb 13 19:00:01 2015 + 819089 us TIME_INS
Fri Feb 13 19:00:02 2015 + 319128 us TIME_INS

As previously stated, the time_state remains TIME_INS even though the leap
second has already occurred @ 952.953150.

The test was changed to reset tx.status to 0 in the loop, and the test then
succeeded with a 100% rate with the time state ending in TIME_WAIT.

While this is highly unlikely to ever happen in the real world it is
still something we should protect against, as breaking the state machine
is bad.

If the time_state == TIME_OOP (ie, the leap second is in progress) do not
allow an external update to time_state in process_adj_status(). This will
prevent external adjtimex() calls from breaking the leap second state
machine.

[v2]: Only block time_state change when TIME_OOP
[v3]: Write a much more detailed explanation of the bug.

Signed-off-by: Prarit Bhargava <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Miroslav Lichvar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/time/ntp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 28bf91c..6ff5cd5 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -535,7 +535,8 @@ void ntp_notify_cmos_timer(void) { }
static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
{
if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
- time_state = TIME_OK;
+ if (time_state != TIME_OOP)
+ time_state = TIME_OK;
time_status = STA_UNSYNC;
/* restart PPS frequency calibration */
pps_reset_freq_interval();
--
1.7.9.3

2015-02-17 23:16:21

by John Stultz

[permalink] [raw]

Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]

On Thu, Feb 12, 2015 at 5:58 AM, Prarit Bhargava <[email protected]> wrote:
> During leap second insertion testing it was noticed that a small window
> exists where the time_state could be reset such that
> time_state = TIME_OK, which then causes the leap second to not occur, or
> causes the entire leap second state machine to fail with time_state =
> TIME_INS at the end of the leap second.
>
> The test did the following in userspace:
>
> tx.modes = ADJ_STATUS;
> tx.status = STA_INS;
>
> /* send leap second request */
> ret = adjtimex(&tx);
>
> /* Check adjtimex output every half second */
> now = tx.time.tv_sec;
> while (now < next_leap+2) {
> char buf[26];
> ret = adjtimex(&tx);
>
> ctime_r(&tx.time.tv_sec, buf);
> buf[strlen(buf)-1] = 0; /*remove trailing\n */
>
> printf("%s + %6ld us\t%s\n",
> buf,
> tx.time.tv_usec,
> time_state_str(ret));
> now = tx.time.tv_sec;
> /* Sleep for another half second */
> ts.tv_sec = 0;
> ts.tv_nsec = NSEC_PER_SEC/2;
> clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
> }
>
> which was intended to mimic the insertion of a leap second. A
> successful run of the test would result in the time_state transitioning
> from TIME_OK to TIME_INS, then to TIME_OOP when the leap second was
> inserted, and then to TIME_WAIT when the leap second was completed. While
> running this code failures were seen in which the time_state remained TIME_INS,
> even though the leap second had occurred.
>
> After some investigation it was noted that the test contained a small error:
> the test does not reinitialize tx.status and reissues the STA_INS every
> 1/2 second. As a result of this broken test, the following failure was noticed
> (the output below is a mix of kernel messages and the output from the test
> program, the remaining annotations are printk's in the code and my own
> additional notes):
>
> [ 942.952833] time_state [1] change from TIME_OK to TIME_INS
>
> Fri Feb 13 18:59:51 2015 + 318126 us TIME_INS
> Fri Feb 13 18:59:51 2015 + 818167 us TIME_INS
> Fri Feb 13 18:59:52 2015 + 318208 us TIME_INS
> Fri Feb 13 18:59:52 2015 + 818248 us TIME_INS
> Fri Feb 13 18:59:53 2015 + 318290 us TIME_INS
> Fri Feb 13 18:59:53 2015 + 818331 us TIME_INS
> Fri Feb 13 18:59:54 2015 + 318372 us TIME_INS
> Fri Feb 13 18:59:54 2015 + 818413 us TIME_INS
> Fri Feb 13 18:59:55 2015 + 318454 us TIME_INS
> Fri Feb 13 18:59:55 2015 + 818495 us TIME_INS
> Fri Feb 13 18:59:56 2015 + 318534 us TIME_INS
> Fri Feb 13 18:59:56 2015 + 818575 us TIME_INS
> Fri Feb 13 18:59:57 2015 + 318617 us TIME_INS
> Fri Feb 13 18:59:57 2015 + 818660 us TIME_INS
> Fri Feb 13 18:59:58 2015 + 318702 us TIME_INS
> Fri Feb 13 18:59:58 2015 + 818744 us TIME_INS
> Fri Feb 13 18:59:59 2015 + 318785 us TIME_INS
> Fri Feb 13 18:59:59 2015 + 818837 us TIME_INS
>
> [ 952.953143] time_state [4] change from TIME_INS to TIME_OOP
> [ 952.953150] Clock: inserting leap second 23:59:60 UTC
> [ 953.299905] process_adj_status: insert_leap_sec[1223] setting time_state back
> to TIME_OK [1, 1] <<< adjtimex() call every 1/2 second
> [ 953.299913] time_state [9] change from TIME_OOP to TIME_OK
>
> Fri Feb 13 18:59:59 2015 + 318878 us TIME_OK
> Fri Feb 13 18:59:59 2015 + 818931 us TIME_OK
>
> [ 954.064237] time_state [1] change from TIME_OK to TIME_INS
>
> Fri Feb 13 19:00:00 2015 + 318972 us TIME_INS
> Fri Feb 13 19:00:00 2015 + 819012 us TIME_INS
> Fri Feb 13 19:00:01 2015 + 319051 us TIME_INS
> Fri Feb 13 19:00:01 2015 + 819089 us TIME_INS
> Fri Feb 13 19:00:02 2015 + 319128 us TIME_INS
>
> As previously stated, the time_state remains TIME_INS even though the leap
> second has already occurred @ 952.953150.
>
> The test was changed to reset tx.status to 0 in the loop, and the test then
> succeeded with a 100% rate with the time state ending in TIME_WAIT.
>
> While this is highly unlikely to ever happen in the real world it is
> still something we should protect against, as breaking the state machine
> is bad.
>
> If the time_state == TIME_OOP (ie, the leap second is in progress) do not
> allow an external update to time_state in process_adj_status(). This will
> prevent external adjtimex() calls from breaking the leap second state
> machine.
>
> [v2]: Only block time_state change when TIME_OOP
> [v3]: Write a much more detailed explanation of the bug.

Ok, thanks for the more verbose explanation. Although this is more a
history of what you've seen rather then the crux of the change.

To distill this down just a bit, the point is the usual mode for NTP
time_state machine looks like:

TIME_OK -> TIME_INS -> TIME_OOP
| |
v v
TIME_DEL ------------> TIME_WAIT -(back)-> TIME_OK

(hopefully the ascii art survives here)

Now, from any of these states, currently if adjtimex is called w/ the
STA_PLL bit cleared (after STA_PLL was set), we reset back to TIME_OK,
effectively cancelling any transitions. (You'll have to imagine a line
from any of the states back to TIME_OK, since that's going to be too
ugly to do in ascii)

Your patch is trying to remove the line back from TIME_OOP back to
TIME_OK. Basically stopping the ability to reset the ntp state during
a leapsecond.

I do get that the behavior seen was strange due to a bug in the test
code which caused unexpected cancellation of state, but I'm not sure
if we should change the behavior to enforce that cancellation not be
possible. I could imagine some logic which really wants to reset the
state, which just by chance lands during a leap second, and the
application is confused since the state change didn't occur as
expected.

So I guess I'm not seeing that the state machine is actually "broken"
in this case that you've outlined. If you can articulate better why
the OOP -> OK transition is truly invalid, I'd be interested in
hearing, but I'm not sure I want to risk a behavioral change unless
there's wide agreement.

thanks
-john

2015-02-18 17:14:11

by Jiri Bohac

[permalink] [raw]

Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]

On Tue, Feb 17, 2015 at 03:16:18PM -0800, John Stultz wrote:
> Ok, thanks for the more verbose explanation. Although this is more a
> history of what you've seen rather then the crux of the change.
>
> To distill this down just a bit, the point is the usual mode for NTP
> time_state machine looks like:
>
> TIME_OK -> TIME_INS -> TIME_OOP
> | |
> v v
> TIME_DEL ------------> TIME_WAIT -(back)-> TIME_OK
>
> (hopefully the ascii art survives here)
>
> Now, from any of these states, currently if adjtimex is called w/ the
> STA_PLL bit cleared (after STA_PLL was set), we reset back to TIME_OK,
> effectively cancelling any transitions. (You'll have to imagine a line
> from any of the states back to TIME_OK, since that's going to be too
> ugly to do in ascii)
>
> Your patch is trying to remove the line back from TIME_OOP back to
> TIME_OK. Basically stopping the ability to reset the ntp state during
> a leapsecond.
>
> I do get that the behavior seen was strange due to a bug in the test
> code which caused unexpected cancellation of state, but I'm not sure
> if we should change the behavior to enforce that cancellation not be
> possible. I could imagine some logic which really wants to reset the
> state, which just by chance lands during a leap second, and the
> application is confused since the state change didn't occur as
> expected.
>
> So I guess I'm not seeing that the state machine is actually "broken"
> in this case that you've outlined. If you can articulate better why
> the OOP -> OK transition is truly invalid, I'd be interested in
> hearing, but I'm not sure I want to risk a behavioral change unless
> there's wide agreement.

I think the only real problem occurs when the adjtimex is called in the
the TIME_OOP state with STA_PLL cleared _and_ STA_INS set.
In this case the state machine is reset to TIME_OK but goes back
to TIME_INS on the next second_overflow, potentially causing
another false leap second to be inserted on the following
midnight.

The state machine is meant to only go back to TIME_INS once STA_INS is
cleared and then set again - this is what the TIME_WAIT state is
for.

In fact, I don't see a reason why the STA_PLL -> !STA_PLL transition should
ever set the time_state to TIME_OK.
- When the STA_INS/STA_DEL flag is removed from the status, the state
machine will end up in TIME_OK from any state.
- When STA_INS/STA_DEL is set in
the status, the state mchine will transition from TIME_OK to
TIME_INS/TIME_DEL anyway.

I think the "time_status = TIME_OK" should be just dropped.

It has been added by eea83d896e318bda54be2d2770d2c5d6668d11db
(ntp: NTP4 user space bits update) and it's not clear why.
Roman?

--
Jiri Bohac <[email protected]>
SUSE Labs, SUSE CZ

2015-02-18 17:38:57

by Jiri Bohac

[permalink] [raw]

Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]

On Wed, Feb 18, 2015 at 06:14:04PM +0100, Jiri Bohac wrote:
> I think the only real problem occurs when the adjtimex is called in the
> the TIME_OOP state

... or the TIME_WAIT state ...

> with STA_PLL cleared _and_ STA_INS set.
> In this case the state machine is reset to TIME_OK but goes back
> to TIME_INS on the next second_overflow, potentially causing
> another false leap second to be inserted on the following
> midnight.
>
> The state machine is meant to only go back to TIME_INS once STA_INS is
> cleared and then set again - this is what the TIME_WAIT state is
> for.
>
> In fact, I don't see a reason why the STA_PLL -> !STA_PLL transition should
> ever set the time_state to TIME_OK.
> - When the STA_INS/STA_DEL flag is removed from the status, the state
> machine will end up in TIME_OK from any state.
> - When STA_INS/STA_DEL is set in
> the status, the state mchine will transition from TIME_OK to
> TIME_INS/TIME_DEL anyway.
>
> I think the "time_status = TIME_OK" should be just dropped.
>
> It has been added by eea83d896e318bda54be2d2770d2c5d6668d11db
> (ntp: NTP4 user space bits update) and it's not clear why.
> Roman?

--
Jiri Bohac <[email protected]>
SUSE Labs, SUSE CZ

2015-02-19 17:00:51

by Jiri Bohac

[permalink] [raw]

Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]

Hi,

I'm trying to understand what exactly is going on here...

On Thu, Feb 12, 2015 at 08:58:19AM -0500, Prarit Bhargava wrote:
> The test did the following in userspace:
>
> tx.modes = ADJ_STATUS;
> tx.status = STA_INS;
>
> /* send leap second request */
> ret = adjtimex(&tx);
>
> /* Check adjtimex output every half second */
> now = tx.time.tv_sec;
> while (now < next_leap+2) {
> char buf[26];
> ret = adjtimex(&tx);
>
> ctime_r(&tx.time.tv_sec, buf);
> buf[strlen(buf)-1] = 0; /*remove trailing\n */
>
> printf("%s + %6ld us\t%s\n",
> buf,
> tx.time.tv_usec,
> time_state_str(ret));
> now = tx.time.tv_sec;
> /* Sleep for another half second */
> ts.tv_sec = 0;
> ts.tv_nsec = NSEC_PER_SEC/2;
> clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
> }
>
> After some investigation it was noted that the test contained a small error:
> the test does not reinitialize tx.status and reissues the STA_INS every
> 1/2 second.

Prarit, can you explain who sets the STA_PLL flag, so that
process_adj_status() detects a STA_PLL->!STA_PLL transition and
goes to the branch that sets time_state = TIME_OK?

Is that ntpd running in parallel with your test program? If that
is the case, you would eventually end up oscilating between the
the TIME_INS and TIME_OK states anyway, even with your patch.
ntpd will clear the STA_INS flag after midnight, the state
machine will transition from TIME_WAIT to TIME_OK and your test
program will set STA_INS again (fighting with ntpd which will
clear the flag from time to time) ... right?

Thanks,

--
Jiri Bohac <[email protected]>
SUSE Labs, SUSE CZ

2015-02-20 14:12:51

by Prarit Bhargava

[permalink] [raw]

Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]

On 02/17/2015 06:16 PM, John Stultz wrote:
> On Thu, Feb 12, 2015 at 5:58 AM, Prarit Bhargava <[email protected]> wrote:

>>
>> which was intended to mimic the insertion of a leap second. A
>> successful run of the test would result in the time_state transitioning
>> from TIME_OK to TIME_INS, then to TIME_OOP when the leap second was
>> inserted, and then to TIME_WAIT when the leap second was completed. While
>> running this code failures were seen in which the time_state remained TIME_INS,
>> even though the leap second had occurred.
>>
>
>
> Ok, thanks for the more verbose explanation. Although this is more a
> history of what you've seen rather then the crux of the change.
>
> To distill this down just a bit, the point is the usual mode for NTP
> time_state machine looks like:
>
> TIME_OK -> TIME_INS -> TIME_OOP
> | |
> v v
> TIME_DEL ------------> TIME_WAIT -(back)-> TIME_OK
>
> (hopefully the ascii art survives here)
>
> Now, from any of these states, currently if adjtimex is called w/ the
> STA_PLL bit cleared (after STA_PLL was set), we reset back to TIME_OK,
> effectively cancelling any transitions. (You'll have to imagine a line
> from any of the states back to TIME_OK, since that's going to be too
> ugly to do in ascii)
>
> Your patch is trying to remove the line back from TIME_OOP back to
> TIME_OK. Basically stopping the ability to reset the ntp state during
> a leapsecond.

Correct.

>
> I do get that the behavior seen was strange due to a bug in the test
> code which caused unexpected cancellation of state, but I'm not sure
> if we should change the behavior to enforce that cancellation not be
> possible. I could imagine some logic which really wants to reset the
> state, which just by chance lands during a leap second, and the
> application is confused since the state change didn't occur as
> expected.

I think setting it in the middle of the leap second should be a NOOP. We all
know how fragile this code has been in the past and allowing a state transition
at that particular time isn't a good idea given the outcome that the state may
remain TIME_INS.

>
> So I guess I'm not seeing that the state machine is actually "broken"
> in this case that you've outlined. If you can articulate better why
> the OOP -> OK transition is truly invalid, I'd be interested in
> hearing, but I'm not sure I want to risk a behavioral change unless
> there's wide agreement.

I understand -- After thinking about it from your point of view I agree that
calling it "broken" is not right. Perhaps a better way of looking at it is, as
you also point out, if OOP -> OK is truly valid.

P.

>
> thanks
> -john
>

2015-02-20 14:15:49

by Prarit Bhargava

[permalink] [raw]

Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]

On 02/19/2015 12:00 PM, Jiri Bohac wrote:
> Hi,
>
> I'm trying to understand what exactly is going on here...
>
> On Thu, Feb 12, 2015 at 08:58:19AM -0500, Prarit Bhargava wrote:
>> The test did the following in userspace:
>>
>> tx.modes = ADJ_STATUS;
>> tx.status = STA_INS;
>>
>> /* send leap second request */
>> ret = adjtimex(&tx);
>>
>> /* Check adjtimex output every half second */
>> now = tx.time.tv_sec;
>> while (now < next_leap+2) {
>> char buf[26];
>> ret = adjtimex(&tx);
>>
>> ctime_r(&tx.time.tv_sec, buf);
>> buf[strlen(buf)-1] = 0; /*remove trailing\n */
>>
>> printf("%s + %6ld us\t%s\n",
>> buf,
>> tx.time.tv_usec,
>> time_state_str(ret));
>> now = tx.time.tv_sec;
>> /* Sleep for another half second */
>> ts.tv_sec = 0;
>> ts.tv_nsec = NSEC_PER_SEC/2;
>> clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
>> }
>>
>> After some investigation it was noted that the test contained a small error:
>> the test does not reinitialize tx.status and reissues the STA_INS every
>> 1/2 second.
>
> Prarit, can you explain who sets the STA_PLL flag, so that
> process_adj_status() detects a STA_PLL->!STA_PLL transition and
> goes to the branch that sets time_state = TIME_OK?

Jiri,

The test being run is:

https://github.com/johnstultz-work/timetests/blob/master/leap-a-day.c

prior to commit

https://github.com/johnstultz-work/timetests/commit/be4526e8b5d48cd108a8d2cf7f5c8fd763acf421

>
> Is that ntpd running in parallel with your test program? If that

No -- ntpd is disabled (chronyd in the case of systemd + current Fedora).

P.

2015-02-20 17:19:18

by Jiri Bohac

[permalink] [raw]

Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3]

On Fri, Feb 20, 2015 at 09:15:23AM -0500, Prarit Bhargava wrote:
> On 02/19/2015 12:00 PM, Jiri Bohac wrote:
> > Prarit, can you explain who sets the STA_PLL flag, so that
> > process_adj_status() detects a STA_PLL->!STA_PLL transition and
> > goes to the branch that sets time_state = TIME_OK?
>
> Jiri,
>
> The test being run is:
>
> https://github.com/johnstultz-work/timetests/blob/master/leap-a-day.c
>
> prior to commit
>
> https://github.com/johnstultz-work/timetests/commit/be4526e8b5d48cd108a8d2cf7f5c8fd763acf421

I can't make sense of the output of your test:

On Thu, Feb 12, 2015 at 08:58:19AM -0500, Prarit Bhargava wrote:
> [ 942.952833] time_state [1] change from TIME_OK to TIME_INS
>
> Fri Feb 13 18:59:51 2015 + 318126 us TIME_INS
> Fri Feb 13 18:59:51 2015 + 818167 us TIME_INS
> Fri Feb 13 18:59:52 2015 + 318208 us TIME_INS
> Fri Feb 13 18:59:52 2015 + 818248 us TIME_INS
> Fri Feb 13 18:59:53 2015 + 318290 us TIME_INS
> Fri Feb 13 18:59:53 2015 + 818331 us TIME_INS
> Fri Feb 13 18:59:54 2015 + 318372 us TIME_INS
> Fri Feb 13 18:59:54 2015 + 818413 us TIME_INS
> Fri Feb 13 18:59:55 2015 + 318454 us TIME_INS
> Fri Feb 13 18:59:55 2015 + 818495 us TIME_INS
> Fri Feb 13 18:59:56 2015 + 318534 us TIME_INS
> Fri Feb 13 18:59:56 2015 + 818575 us TIME_INS

Why did the test program print the above lines? It's supposed to
sleep until 3 seconds prior to the midnight:

/* Wake up 3 seconds before leap */
ts.tv_sec = next_leap - 3;
ts.tv_nsec = 0;
while(clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &ts, NULL))
printf("Something woke us up, returning to sleep\n");

> Fri Feb 13 18:59:57 2015 + 318617 us TIME_INS
> Fri Feb 13 18:59:57 2015 + 818660 us TIME_INS
> Fri Feb 13 18:59:58 2015 + 318702 us TIME_INS
> Fri Feb 13 18:59:58 2015 + 818744 us TIME_INS
> Fri Feb 13 18:59:59 2015 + 318785 us TIME_INS
> Fri Feb 13 18:59:59 2015 + 818837 us TIME_INS
>
> [ 952.953143] time_state [4] change from TIME_INS to TIME_OOP
> [ 952.953150] Clock: inserting leap second 23:59:60 UTC
> [ 953.299905] process_adj_status: insert_leap_sec[1223] setting time_state back
> to TIME_OK [1, 1] <<< adjtimex() call every 1/2 second
> [ 953.299913] time_state [9] change from TIME_OOP to TIME_OK

2) The only place where the test program sets STA_PLL is in
clear_time_state(); It clears it right after that.

clear_time_state() is not called inside the while "(now < next_leap+2)" loop,
except in the SIGINT/SIGKILL handler. Did you send signals to the program
at this point?

If not, I can't understand how the status went from STA_PLL to !STA_PLL
and thus why time_state went to TIME_OK

--
Jiri Bohac <[email protected]>
SUSE Labs, SUSE CZ