Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753949AbbBRROL (ORCPT ); Wed, 18 Feb 2015 12:14:11 -0500 Received: from cantor2.suse.de ([195.135.220.15]:52012 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752120AbbBRROH (ORCPT ); Wed, 18 Feb 2015 12:14:07 -0500 Date: Wed, 18 Feb 2015 18:14:04 +0100 From: Jiri Bohac To: John Stultz , Roman Zippel Cc: Prarit Bhargava , lkml , Thomas Gleixner , Miroslav Lichvar , Peter Zijlstra Subject: Re: [PATCH] time, ntp: Do not update time_state in middle of leap second [v3] Message-ID: <20150218171404.GA31353@midget.suse.cz> References: <1423749499-18520-1-git-send-email-prarit@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3028 Lines: 73 On Tue, Feb 17, 2015 at 03:16:18PM -0800, John Stultz wrote: > Ok, thanks for the more verbose explanation. Although this is more a > history of what you've seen rather then the crux of the change. > > To distill this down just a bit, the point is the usual mode for NTP > time_state machine looks like: > > TIME_OK -> TIME_INS -> TIME_OOP > | | > v v > TIME_DEL ------------> TIME_WAIT -(back)-> TIME_OK > > (hopefully the ascii art survives here) > > Now, from any of these states, currently if adjtimex is called w/ the > STA_PLL bit cleared (after STA_PLL was set), we reset back to TIME_OK, > effectively cancelling any transitions. (You'll have to imagine a line > from any of the states back to TIME_OK, since that's going to be too > ugly to do in ascii) > > Your patch is trying to remove the line back from TIME_OOP back to > TIME_OK. Basically stopping the ability to reset the ntp state during > a leapsecond. > > I do get that the behavior seen was strange due to a bug in the test > code which caused unexpected cancellation of state, but I'm not sure > if we should change the behavior to enforce that cancellation not be > possible. I could imagine some logic which really wants to reset the > state, which just by chance lands during a leap second, and the > application is confused since the state change didn't occur as > expected. > > So I guess I'm not seeing that the state machine is actually "broken" > in this case that you've outlined. If you can articulate better why > the OOP -> OK transition is truly invalid, I'd be interested in > hearing, but I'm not sure I want to risk a behavioral change unless > there's wide agreement. I think the only real problem occurs when the adjtimex is called in the the TIME_OOP state with STA_PLL cleared _and_ STA_INS set. In this case the state machine is reset to TIME_OK but goes back to TIME_INS on the next second_overflow, potentially causing another false leap second to be inserted on the following midnight. The state machine is meant to only go back to TIME_INS once STA_INS is cleared and then set again - this is what the TIME_WAIT state is for. In fact, I don't see a reason why the STA_PLL -> !STA_PLL transition should ever set the time_state to TIME_OK. - When the STA_INS/STA_DEL flag is removed from the status, the state machine will end up in TIME_OK from any state. - When STA_INS/STA_DEL is set in the status, the state mchine will transition from TIME_OK to TIME_INS/TIME_DEL anyway. I think the "time_status = TIME_OK" should be just dropped. It has been added by eea83d896e318bda54be2d2770d2c5d6668d11db (ntp: NTP4 user space bits update) and it's not clear why. Roman? -- Jiri Bohac SUSE Labs, SUSE CZ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/