MIME-Version: 1.0
In-Reply-To: <1423749499-18520-1-git-send-email-prarit@redhat.com>
References: <1423749499-18520-1-git-send-email-prarit@redhat.com>
Date: Tue, 17 Feb 2015 15:16:18 -0800
Message-ID: <CALAqxLU5iKW_VS-ukULUGwaPj3ZgHUUHyyQt_fKHat__K6NO1w@mail.gmail.com>
Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap
 second [v3]
From: John Stultz <john.stultz@linaro.org>
To: Prarit Bhargava <prarit@redhat.com>
Cc: lkml <linux-kernel@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
        Miroslav Lichvar <mlichvar@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 6529
Lines: 150

On Thu, Feb 12, 2015 at 5:58 AM, Prarit Bhargava <prarit@redhat.com> wrote:
> During leap second insertion testing it was noticed that a small window
> exists where the time_state could be reset such that
> time_state = TIME_OK, which then causes the leap second to not occur, or
> causes the entire leap second state machine to fail with time_state =
> TIME_INS at the end of the leap second.
>
> The test did the following in userspace:
>
>         tx.modes = ADJ_STATUS;
>         tx.status = STA_INS;
>
>         /* send leap second request */
>         ret = adjtimex(&tx);
>
>         /* Check adjtimex output every half second */
>         now = tx.time.tv_sec;
>         while (now < next_leap+2) {
>                 char buf[26];
>                 ret = adjtimex(&tx);
>
>                 ctime_r(&tx.time.tv_sec, buf);
>                 buf[strlen(buf)-1] = 0; /*remove trailing\n */
>
>                 printf("%s + %6ld us\t%s\n",
>                                 buf,
>                                 tx.time.tv_usec,
>                                 time_state_str(ret));
>                 now = tx.time.tv_sec;
>                 /* Sleep for another half second */
>                 ts.tv_sec = 0;
>                 ts.tv_nsec = NSEC_PER_SEC/2;
>                 clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
>         }
>
> which was intended to mimic the insertion of a leap second.  A
> successful run of the test would result in the time_state transitioning
> from TIME_OK to TIME_INS, then to TIME_OOP when the leap second was
> inserted, and then to TIME_WAIT when the leap second was completed.  While
> running this code failures were seen in which the time_state remained TIME_INS,
> even though the leap second had occurred.
>
> After some investigation it was noted that the test contained a small error:
> the test does not reinitialize tx.status and reissues the STA_INS every
> 1/2 second.  As a result of this broken test, the following failure was noticed
> (the output below is a mix of kernel messages and the output from the test
> program, the remaining annotations are printk's in the code and my own
> additional notes):
>
> [  942.952833] time_state [1] change from TIME_OK to TIME_INS
>
> Fri Feb 13 18:59:51 2015 + 318126 us    TIME_INS
> Fri Feb 13 18:59:51 2015 + 818167 us    TIME_INS
> Fri Feb 13 18:59:52 2015 + 318208 us    TIME_INS
> Fri Feb 13 18:59:52 2015 + 818248 us    TIME_INS
> Fri Feb 13 18:59:53 2015 + 318290 us    TIME_INS
> Fri Feb 13 18:59:53 2015 + 818331 us    TIME_INS
> Fri Feb 13 18:59:54 2015 + 318372 us    TIME_INS
> Fri Feb 13 18:59:54 2015 + 818413 us    TIME_INS
> Fri Feb 13 18:59:55 2015 + 318454 us    TIME_INS
> Fri Feb 13 18:59:55 2015 + 818495 us    TIME_INS
> Fri Feb 13 18:59:56 2015 + 318534 us    TIME_INS
> Fri Feb 13 18:59:56 2015 + 818575 us    TIME_INS
> Fri Feb 13 18:59:57 2015 + 318617 us    TIME_INS
> Fri Feb 13 18:59:57 2015 + 818660 us    TIME_INS
> Fri Feb 13 18:59:58 2015 + 318702 us    TIME_INS
> Fri Feb 13 18:59:58 2015 + 818744 us    TIME_INS
> Fri Feb 13 18:59:59 2015 + 318785 us    TIME_INS
> Fri Feb 13 18:59:59 2015 + 818837 us    TIME_INS
>
> [  952.953143] time_state [4] change from TIME_INS to TIME_OOP
> [  952.953150] Clock: inserting leap second 23:59:60 UTC
> [  953.299905] process_adj_status: insert_leap_sec[1223] setting time_state back
> to TIME_OK [1, 1]   <<< adjtimex() call every 1/2 second
> [  953.299913] time_state [9] change from TIME_OOP to TIME_OK
>
> Fri Feb 13 18:59:59 2015 + 318878 us    TIME_OK
> Fri Feb 13 18:59:59 2015 + 818931 us    TIME_OK
>
> [  954.064237] time_state [1] change from TIME_OK to TIME_INS
>
> Fri Feb 13 19:00:00 2015 + 318972 us    TIME_INS
> Fri Feb 13 19:00:00 2015 + 819012 us    TIME_INS
> Fri Feb 13 19:00:01 2015 + 319051 us    TIME_INS
> Fri Feb 13 19:00:01 2015 + 819089 us    TIME_INS
> Fri Feb 13 19:00:02 2015 + 319128 us    TIME_INS
>
> As previously stated, the time_state remains TIME_INS even though the leap
> second has already occurred @ 952.953150.
>
> The test was changed to reset tx.status to 0 in the loop, and the test then
> succeeded with a 100% rate with the time state ending in TIME_WAIT.
>
> While this is highly unlikely to ever happen in the real world it is
> still something we should protect against, as breaking the state machine
> is bad.
>
> If the time_state == TIME_OOP (ie, the leap second is in progress) do not
> allow an external update to time_state in process_adj_status().  This will
> prevent external adjtimex() calls from breaking the leap second state
> machine.
>
> [v2]: Only block time_state change when TIME_OOP
> [v3]: Write a much more detailed explanation of the bug.


Ok, thanks for the more verbose explanation. Although this is more a
history of what you've seen rather then the crux of the change.

To distill this down just a bit, the point is the usual mode for NTP
time_state machine looks like:

TIME_OK -> TIME_INS -> TIME_OOP
  |                       |
  v                       v
TIME_DEL ------------> TIME_WAIT  -(back)-> TIME_OK

(hopefully the ascii art survives here)

Now, from any of these states, currently if adjtimex is called w/ the
STA_PLL bit cleared (after STA_PLL was set), we reset back to TIME_OK,
effectively cancelling any transitions. (You'll have to imagine a line
from any of the states back to TIME_OK, since that's going to be too
ugly to do in ascii)

Your patch is trying to remove the line back from TIME_OOP back to
TIME_OK. Basically stopping the ability to reset the ntp state during
a leapsecond.

I do get that the behavior seen was strange due to a bug in the test
code which caused unexpected cancellation of state, but I'm not sure
if we should change the behavior to enforce that cancellation not be
possible. I could imagine some logic which really wants to reset the
state, which just by chance lands during a leap second, and the
application is confused since the state change didn't occur as
expected.

So I guess I'm not seeing that the state machine is actually "broken"
in this case that you've outlined.  If you can articulate better why
the OOP -> OK transition is truly invalid, I'd be interested in
hearing, but I'm not sure I want to risk a behavioral change unless
there's wide agreement.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/