2015-02-04 12:29:01

by Prarit Bhargava

[permalink] [raw]
Subject: [PATCH] time, ntp: Do not update time_state in middle of leap second

Resending ...

P.

----8<----

During leap second insertion testing it was noticed that a small window
exists where the time_state could be reset such that
time_state = TIME_OK, which then causes the leap second to not occur, or
causes the entire leap second state machine to fail.

While this is highly unlikely to ever happen in the real world it is
still something we should protect against, as breaking the state machine
is obviously bad.

If the time_state == TIME_OOP (ie, the leap second is in progress) do not
allow an external update to time_state.

Signed-off-by: Prarit Bhargava <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/time/ntp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 28bf91c..f9ebf06 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -534,7 +534,8 @@ void ntp_notify_cmos_timer(void) { }
*/
static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
{
- if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
+ if ((time_status & STA_PLL) && !(txc->status & STA_PLL) &&
+ (time_state != TIME_OOP)) {
time_state = TIME_OK;
time_status = STA_UNSYNC;
/* restart PPS frequency calibration */
--
1.7.9.3


2015-02-04 16:30:06

by Miroslav Lichvar

[permalink] [raw]
Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap

Prarit Bhargava wrote:
> While this is highly unlikely to ever happen in the real world it is
> still something we should protect against, as breaking the state machine
> is obviously bad.

I'm not sure what exactly breaks here. If the PLL is disabled before
time_state is set to TIME_OOP, the insertion/deletion will be aborted.
If after that, adjtimex() will return with TIME_ERROR as expected, or
not?

> static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
> {
> - if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
> + if ((time_status & STA_PLL) && !(txc->status & STA_PLL) &&
> + (time_state != TIME_OOP)) {
> time_state = TIME_OK;
> time_status = STA_UNSYNC;
> /* restart PPS frequency calibration */

Shouldn't be time_status reset and the PPS calibration restarted even
when state is TIME_OOP?

--
Miroslav Lichvar

2015-02-05 13:20:17

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap



On 02/04/2015 11:30 AM, Miroslav Lichvar wrote:
> Prarit Bhargava wrote:
>> While this is highly unlikely to ever happen in the real world it is
>> still something we should protect against, as breaking the state machine
>> is obviously bad.
>
> I'm not sure what exactly breaks here. If the PLL is disabled before
> time_state is set to TIME_OOP, the insertion/deletion will be aborted.

Yes, that is correct.

> If after that, adjtimex() will return with TIME_ERROR as expected, or
> not?

It is possible that an adjtimex() will set the time_state here back to TIME_OK
and return TIME_OK to userspace. Again, and I want to stress this, this is
extremely unlikely to happen. I only hit this due to a bug in a test program.
But at the end of the day, it is possible that this happens and we should
protect against it.


[ 942.952833] time_state [1] change from TIME_OK to TIME_INS

Fri Feb 13 18:59:51 2015 + 318126 us TIME_INS
Fri Feb 13 18:59:51 2015 + 818167 us TIME_INS
Fri Feb 13 18:59:52 2015 + 318208 us TIME_INS
Fri Feb 13 18:59:52 2015 + 818248 us TIME_INS
Fri Feb 13 18:59:53 2015 + 318290 us TIME_INS
Fri Feb 13 18:59:53 2015 + 818331 us TIME_INS
Fri Feb 13 18:59:54 2015 + 318372 us TIME_INS
Fri Feb 13 18:59:54 2015 + 818413 us TIME_INS
Fri Feb 13 18:59:55 2015 + 318454 us TIME_INS
Fri Feb 13 18:59:55 2015 + 818495 us TIME_INS
Fri Feb 13 18:59:56 2015 + 318534 us TIME_INS
Fri Feb 13 18:59:56 2015 + 818575 us TIME_INS
Fri Feb 13 18:59:57 2015 + 318617 us TIME_INS
Fri Feb 13 18:59:57 2015 + 818660 us TIME_INS
Fri Feb 13 18:59:58 2015 + 318702 us TIME_INS
Fri Feb 13 18:59:58 2015 + 818744 us TIME_INS
Fri Feb 13 18:59:59 2015 + 318785 us TIME_INS
Fri Feb 13 18:59:59 2015 + 818837 us TIME_INS

[ 952.953143] time_state [4] change from TIME_INS to TIME_OOP
[ 952.953150] Clock: inserting leap second 23:59:60 UTC
[ 953.299905] process_adj_status: insert_leap_sec[1223] setting time_state back
to TIME_OK [1, 1] <<< adjtimex() call
[ 953.299913] time_state [9] change from TIME_OOP to TIME_OK

Fri Feb 13 18:59:59 2015 + 318878 us TIME_OK
Fri Feb 13 18:59:59 2015 + 818931 us TIME_OK

[ 954.064237] time_state [1] change from TIME_OK to TIME_INS

Fri Feb 13 19:00:00 2015 + 318972 us TIME_INS
Fri Feb 13 19:00:00 2015 + 819012 us TIME_INS
Fri Feb 13 19:00:01 2015 + 319051 us TIME_INS
Fri Feb 13 19:00:01 2015 + 819089 us TIME_INS
Fri Feb 13 19:00:02 2015 + 319128 us TIME_INS

P.

>
>> static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
>> {
>> - if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
>> + if ((time_status & STA_PLL) && !(txc->status & STA_PLL) &&
>> + (time_state != TIME_OOP)) {
>> time_state = TIME_OK;
>> time_status = STA_UNSYNC;
>> /* restart PPS frequency calibration */
>
> Shouldn't be time_status reset and the PPS calibration restarted even
> when state is TIME_OOP?

No, this should only happen after the leap second is done IMO (which should be
no more than 2 seconds later).

>

2015-02-06 10:37:21

by Miroslav Lichvar

[permalink] [raw]
Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap

On Thu, Feb 05, 2015 at 08:20:08AM -0500, Prarit Bhargava wrote:
> On 02/04/2015 11:30 AM, Miroslav Lichvar wrote:
> > If after that, adjtimex() will return with TIME_ERROR as expected, or
> > not?
>
> It is possible that an adjtimex() will set the time_state here back to TIME_OK
> and return TIME_OK to userspace. Again, and I want to stress this, this is
> extremely unlikely to happen. I only hit this due to a bug in a test program.
> But at the end of the day, it is possible that this happens and we should
> protect against it.

Could it break any applications? I guess PLL is normally disabled only
when a time synchronization process ends. FWIW, the reference
nanokernel implementation has this too.

> >> - if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
> >> + if ((time_status & STA_PLL) && !(txc->status & STA_PLL) &&
> >> + (time_state != TIME_OOP)) {
> >> time_state = TIME_OK;
> >> time_status = STA_UNSYNC;
> >> /* restart PPS frequency calibration */
> >
> > Shouldn't be time_status reset and the PPS calibration restarted even
> > when state is TIME_OOP?
>
> No, this should only happen after the leap second is done IMO (which should be
> no more than 2 seconds later).

But that will not happen automatically, the application would have to
enable and disable the PLL again. Interestingly, the "time_status =
STA_UNSYNC" assignment doesn't seem to do anything here, as the
variable is always reset couple lines after that, STA_UNSYNC is not a
readonly flag.

--
Miroslav Lichvar

2015-02-06 10:50:50

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap



On 02/06/2015 05:38 AM, Miroslav Lichvar wrote:
> On Thu, Feb 05, 2015 at 08:20:08AM -0500, Prarit Bhargava wrote:
>> On 02/04/2015 11:30 AM, Miroslav Lichvar wrote:
>>> If after that, adjtimex() will return with TIME_ERROR as expected, or
>>> not?
>>
>> It is possible that an adjtimex() will set the time_state here back to TIME_OK
>> and return TIME_OK to userspace. Again, and I want to stress this, this is
>> extremely unlikely to happen. I only hit this due to a bug in a test program.
>> But at the end of the day, it is possible that this happens and we should
>> protect against it.
>
> Could it break any applications? I guess PLL is normally disabled only
> when a time synchronization process ends. FWIW, the reference
> nanokernel implementation has this too.

Not that I saw. I did take a look with top, etc., to see if anything in
userspace went bad, and I ran programs that were calling gettimeofday() and
clock_gettime() to see if there were any problems. I didn't see anything. I
also played around with a program to see if the timer expiry failed but again,
didn't see anything.

The outcome of TIME_INS->TIME_OOP->TIME_OK->TIME_INS, AFAICT was only that
TIME_INS was left issued. Which could lead to another leap second insertion
down the road unless ntp (or some other program) was left to reset the state.

>
>>>> - if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
>>>> + if ((time_status & STA_PLL) && !(txc->status & STA_PLL) &&
>>>> + (time_state != TIME_OOP)) {
>>>> time_state = TIME_OK;
>>>> time_status = STA_UNSYNC;
>>>> /* restart PPS frequency calibration */
>>>
>>> Shouldn't be time_status reset and the PPS calibration restarted even
>>> when state is TIME_OOP?
>>
>> No, this should only happen after the leap second is done IMO (which should be
>> no more than 2 seconds later).
>
> But that will not happen automatically, the application would have to
> enable and disable the PLL again. Interestingly, the "time_status =
> STA_UNSYNC" assignment doesn't seem to do anything here, as the

Hmmm ... good point. I didn't think of that. Let me go back and change the
code to do the reset.

> variable is always reset couple lines after that, STA_UNSYNC is not a
> readonly flag.
>

P.