2012-06-15 18:57:11

by john stultz

[permalink] [raw]
Subject: [PATCH -stable] ntp: Correct TAI offset during leap second

Hey Greg,
Forgot to cc stable on this one. Thanks to Dave Jones for noticing!
-john

------------------

From: Richard Cochran<[email protected]>

commit dd48d708ff3e917f6d6b6c2b696c3f18c019feed upstream.

When repeating a UTC time value during a leap second (when the UTC

time should be 23:59:60), the TAI timescale should not stop. The kernel

NTP code increments the TAI offset one second too late. This patch fixes

the issue by incrementing the offset during the leap second itself.

Signed-off-by: Richard Cochran<[email protected]>

Signed-off-by: John Stultz<[email protected]>

---

kernel/time/ntp.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c

index f03fd83..e8c8671 100644

--- a/kernel/time/ntp.c

+++ b/kernel/time/ntp.c

@@ -412,6 +412,7 @@ int second_overflow(unsigned long secs)

if (secs % 86400 == 0) {

leap = -1;

time_state = TIME_OOP;

+ time_tai++;

printk(KERN_NOTICE

"Clock: inserting leap second 23:59:60 UTC\n");

}

@@ -426,7 +427,6 @@ int second_overflow(unsigned long secs)

}

break;

case TIME_OOP:

- time_tai++;

time_state = TIME_WAIT;

break;

--

1.7.9.5


2012-06-15 19:06:01

by john stultz

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On 06/15/2012 11:56 AM, John Stultz wrote:
> Hey Greg,
> Forgot to cc stable on this one. Thanks to Dave Jones for noticing!
> -john

Gah. Thunderbird's preformat wrecked that one. Sorry.
Here it is again.
-john

------------------

From: Richard Cochran<[email protected]>

commit dd48d708ff3e917f6d6b6c2b696c3f18c019feed upstream.

When repeating a UTC time value during a leap second (when the UTC
time should be 23:59:60), the TAI timescale should not stop. The kernel
NTP code increments the TAI offset one second too late. This patch fixes
the issue by incrementing the offset during the leap second itself.

Signed-off-by: Richard Cochran<[email protected]>
Signed-off-by: John Stultz<[email protected]>
---
kernel/time/ntp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index f03fd83..e8c8671 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -412,6 +412,7 @@ int second_overflow(unsigned long secs)
if (secs % 86400 == 0) {
leap = -1;
time_state = TIME_OOP;
+ time_tai++;
printk(KERN_NOTICE
"Clock: inserting leap second 23:59:60 UTC\n");
}
@@ -426,7 +427,6 @@ int second_overflow(unsigned long secs)
}
break;
case TIME_OOP:
- time_tai++;
time_state = TIME_WAIT;
break;

--
1.7.9.5

2012-06-17 14:43:58

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On Fri, 2012-06-15 at 11:56 -0700, John Stultz wrote:
> Hey Greg,
> Forgot to cc stable on this one. Thanks to Dave Jones for noticing!
> -john
>
> ------------------
>
> From: Richard Cochran<[email protected]>
>
> commit dd48d708ff3e917f6d6b6c2b696c3f18c019feed upstream.
>
> When repeating a UTC time value during a leap second (when the UTC
>
> time should be 23:59:60), the TAI timescale should not stop. The kernel
>
> NTP code increments the TAI offset one second too late. This patch fixes
>
> the issue by incrementing the offset during the leap second itself.
[...]

This doesn't apply to 3.2.y, unsurprisingly. Let me know if there are
any urgent leap second fixes that will be needed there.

Ben.

--
Ben Hutchings
Time is nature's way of making sure that everything doesn't happen at once.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2012-06-17 16:47:58

by Jonathan Nieder

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

Ben Hutchings wrote:
> On Fri, 2012-06-15 at 11:56 -0700, John Stultz wrote:

>> commit dd48d708ff3e917f6d6b6c2b696c3f18c019feed upstream.
[...]
> This doesn't apply to 3.2.y, unsurprisingly. Let me know if there are
> any urgent leap second fixes that will be needed there.

6b43ae8a619d (ntp: Fix leap-second hrtimer livelock) sounds important,
but the patch depends on bd3312681f69 (ntp: Add ntp_lock to replace
xtime_locking) which does not have a commit message explaining its
purpose (and that patch in turn depends on ea7cf49a7633).

John, is that bug present in 3.2.y and 3.0.y, too? Any hints for
fixing it?

Thanks,
Jonathan

2012-06-17 17:35:10

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On Sun, Jun 17, 2012 at 11:47:51AM -0500, Jonathan Nieder wrote:
> Ben Hutchings wrote:
> > On Fri, 2012-06-15 at 11:56 -0700, John Stultz wrote:
>
> >> commit dd48d708ff3e917f6d6b6c2b696c3f18c019feed upstream.
> [...]
> > This doesn't apply to 3.2.y, unsurprisingly. Let me know if there are
> > any urgent leap second fixes that will be needed there.
>
> 6b43ae8a619d (ntp: Fix leap-second hrtimer livelock) sounds important,
> but the patch depends on bd3312681f69 (ntp: Add ntp_lock to replace
> xtime_locking) which does not have a commit message explaining its
> purpose (and that patch in turn depends on ea7cf49a7633).
>
> John, is that bug present in 3.2.y and 3.0.y, too? Any hints for
> fixing it?

It looks like incrementing the TAI offset was wrong even before

6b43ae8a ntp: Fix leap-second hrtimer livelock v3.4-rc1~44^2~9

The offset should change upon entering state OOP, so something like
the following (untested) patch should fix it for 3.2.9.

Thanks,
Richard

---
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index f6117a4..d7bd953 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -365,6 +365,7 @@ static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
break;
case TIME_INS:
timekeeping_leap_insert(-1);
+ time_tai++;
time_state = TIME_OOP;
printk(KERN_NOTICE
"Clock: inserting leap second 23:59:60 UTC\n");
@@ -379,7 +380,6 @@ static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
"Clock: deleting leap second 23:59:59 UTC\n");
break;
case TIME_OOP:
- time_tai++;
time_state = TIME_WAIT;
/* fall through */
case TIME_WAIT:

2012-06-18 13:55:30

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On Sun, 2012-06-17 at 19:34 +0200, Richard Cochran wrote:
> On Sun, Jun 17, 2012 at 11:47:51AM -0500, Jonathan Nieder wrote:
> > Ben Hutchings wrote:
> > > On Fri, 2012-06-15 at 11:56 -0700, John Stultz wrote:
> >
> > >> commit dd48d708ff3e917f6d6b6c2b696c3f18c019feed upstream.
> > [...]
> > > This doesn't apply to 3.2.y, unsurprisingly. Let me know if there are
> > > any urgent leap second fixes that will be needed there.
> >
> > 6b43ae8a619d (ntp: Fix leap-second hrtimer livelock) sounds important,
> > but the patch depends on bd3312681f69 (ntp: Add ntp_lock to replace
> > xtime_locking) which does not have a commit message explaining its
> > purpose (and that patch in turn depends on ea7cf49a7633).

If I understand the commit message for 6b43ae8a619d correctly, the
livelock results from ntp_lock and xtime_lock being acquired in opposite
orders in two threads. Which means it wasn't possible before ntp_lock
was introduced in bd3312681f69.

> > John, is that bug present in 3.2.y and 3.0.y, too? Any hints for
> > fixing it?
>
> It looks like incrementing the TAI offset was wrong even before
>
> 6b43ae8a ntp: Fix leap-second hrtimer livelock v3.4-rc1~44^2~9
>
> The offset should change upon entering state OOP, so something like
> the following (untested) patch should fix it for 3.2.9.
[...]

It looks like this patch just changes the offset reported by adjtimex()
during an inserted second; is that right?

Other than that, is 3.2.y likely to be OK? Is there a good way to test
that in advance; does
<http://codemonkey.org.uk/2012/06/15/testing-leap-code/> look
reasonable?

Ben.

--
Ben Hutchings
If more than one person is responsible for a bug, no one is at fault.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2012-06-18 16:29:09

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On Mon, Jun 18, 2012 at 02:55:11PM +0100, Ben Hutchings wrote:
> On Sun, 2012-06-17 at 19:34 +0200, Richard Cochran wrote:
> > The offset should change upon entering state OOP, so something like
> > the following (untested) patch should fix it for 3.2.9.
> [...]
>
> It looks like this patch just changes the offset reported by adjtimex()
> during an inserted second; is that right?

Right, nothing really terrible will happen. The worst that I can
imagine is that ntpd will set the new TAI offset during OOP, and then
the kernel will add one to it, resulting in the TAI offset being off
by one.

But I really doubt any software makes use of this information.

> Other than that, is 3.2.y likely to be OK? Is there a good way to test
> that in advance; does
> <http://codemonkey.org.uk/2012/06/15/testing-leap-code/> look
> reasonable?

Well, if you want to wait all night then that is one way to do it.
Here is a little test program I have been using:

https://github.com/richardcochran/leap

Thanks,
Richard

2012-06-18 18:20:41

by john stultz

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On 06/18/2012 06:55 AM, Ben Hutchings wrote:
> On Sun, 2012-06-17 at 19:34 +0200, Richard Cochran wrote:
>> On Sun, Jun 17, 2012 at 11:47:51AM -0500, Jonathan Nieder wrote:
>>> Ben Hutchings wrote:
>>>
>>> 6b43ae8a619d (ntp: Fix leap-second hrtimer livelock) sounds important,
>>> but the patch depends on bd3312681f69 (ntp: Add ntp_lock to replace
>>> xtime_locking) which does not have a commit message explaining its
>>> purpose (and that patch in turn depends on ea7cf49a7633).
> If I understand the commit message for 6b43ae8a619d correctly, the
> livelock results from ntp_lock and xtime_lock being acquired in opposite
> orders in two threads. Which means it wasn't possible before ntp_lock
> was introduced in bd3312681f69.
Yes, I think Ben is right that before the ntp_lock split the potential
deadlock couldn't happen.


>>> John, is that bug present in 3.2.y and 3.0.y, too? Any hints for
>>> fixing it?
>> It looks like incrementing the TAI offset was wrong even before
>>
>> 6b43ae8a ntp: Fix leap-second hrtimer livelock v3.4-rc1~44^2~9
>>
>> The offset should change upon entering state OOP, so something like
>> the following (untested) patch should fix it for 3.2.9.
> [...]
>
> It looks like this patch just changes the offset reported by adjtimex()
> during an inserted second; is that right?

Yep. It just makes sure the TAI offset is adjusted at the same point
that the leapsecond is inserted (as opposed to a second late).

>
> Other than that, is 3.2.y likely to be OK? Is there a good way to test
> that in advance; does
> <http://codemonkey.org.uk/2012/06/15/testing-leap-code/> look
> reasonable?
Attached is a simple leap second test you can play with.

thanks
-john


Attachments:
leaptest.c (2.03 kB)

2012-06-19 11:55:09

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On Mon, 2012-06-18 at 18:28 +0200, Richard Cochran wrote:
> On Mon, Jun 18, 2012 at 02:55:11PM +0100, Ben Hutchings wrote:
> > On Sun, 2012-06-17 at 19:34 +0200, Richard Cochran wrote:
> > > The offset should change upon entering state OOP, so something like
> > > the following (untested) patch should fix it for 3.2.9.
> > [...]
> >
> > It looks like this patch just changes the offset reported by adjtimex()
> > during an inserted second; is that right?
>
> Right, nothing really terrible will happen. The worst that I can
> imagine is that ntpd will set the new TAI offset during OOP, and then
> the kernel will add one to it, resulting in the TAI offset being off
> by one.
>
> But I really doubt any software makes use of this information.
>
> > Other than that, is 3.2.y likely to be OK? Is there a good way to test
> > that in advance; does
> > <http://codemonkey.org.uk/2012/06/15/testing-leap-code/> look
> > reasonable?
>
> Well, if you want to wait all night then that is one way to do it.

I was intending to change the clock too...

> Here is a little test program I have been using:
>
> https://github.com/richardcochran/leap

Thanks, that runs without incident but does show the incorrect offset
during OOP.

Ben.

--
Ben Hutchings
If more than one person is responsible for a bug, no one is at fault.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2012-06-19 11:57:58

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On Mon, 2012-06-18 at 11:20 -0700, John Stultz wrote:
> On 06/18/2012 06:55 AM, Ben Hutchings wrote:
[...]
> >>> John, is that bug present in 3.2.y and 3.0.y, too? Any hints for
> >>> fixing it?
> >> It looks like incrementing the TAI offset was wrong even before
> >>
> >> 6b43ae8a ntp: Fix leap-second hrtimer livelock v3.4-rc1~44^2~9
> >>
> >> The offset should change upon entering state OOP, so something like
> >> the following (untested) patch should fix it for 3.2.9.
> > [...]
> >
> > It looks like this patch just changes the offset reported by adjtimex()
> > during an inserted second; is that right?
>
> Yep. It just makes sure the TAI offset is adjusted at the same point
> that the leapsecond is inserted (as opposed to a second late).
>
> >
> > Other than that, is 3.2.y likely to be OK? Is there a good way to test
> > that in advance; does
> > <http://codemonkey.org.uk/2012/06/15/testing-leap-code/> look
> > reasonable?
> Attached is a simple leap second test you can play with.

Thanks. That also detects inconsistency on some runs, but I don't see
anything worse. So I don't intend to apply any of the ntp fixes to
3.2.y.

Ben.

--
Ben Hutchings
If more than one person is responsible for a bug, no one is at fault.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2012-06-19 17:26:59

by john stultz

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On 06/19/2012 04:54 AM, Ben Hutchings wrote:
> On Mon, 2012-06-18 at 18:28 +0200, Richard Cochran wrote:
>> On Mon, Jun 18, 2012 at 02:55:11PM +0100, Ben Hutchings wrote:
>>> On Sun, 2012-06-17 at 19:34 +0200, Richard Cochran wrote:
>>>> The offset should change upon entering state OOP, so something like
>>>> the following (untested) patch should fix it for 3.2.9.
>>> [...]
>>>
>>> It looks like this patch just changes the offset reported by adjtimex()
>>> during an inserted second; is that right?
>> Right, nothing really terrible will happen. The worst that I can
>> imagine is that ntpd will set the new TAI offset during OOP, and then
>> the kernel will add one to it, resulting in the TAI offset being off
>> by one.
>>
>> But I really doubt any software makes use of this information.
>>
>>> Other than that, is 3.2.y likely to be OK? Is there a good way to test
>>> that in advance; does
>>> <http://codemonkey.org.uk/2012/06/15/testing-leap-code/> look
>>> reasonable?
>> Well, if you want to wait all night then that is one way to do it.
> I was intending to change the clock too...
>
>> Here is a little test program I have been using:
>>
>> https://github.com/richardcochran/leap
> Thanks, that runs without incident but does show the incorrect offset
> during OOP.
Yep. That's a long-standing issue, due to the leap-second processing
happening at tick time (further complicated since to handle NOHZ, we
accumulate in fixed-tick intervals that don't exactly line up with the
interrupt edge), so we'll usually get one to two ticks into the second
before we apply the leap second.

Richard has recently taken a stab at correcting this.

Richard, are you planning on taking another go at those changes? Or were
my last suggestions just too daft? ;)

thanks
-john

2012-06-20 16:25:42

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On Tue, Jun 19, 2012 at 10:26:49AM -0700, John Stultz wrote:
>
> Richard, are you planning on taking another go at those changes? Or
> were my last suggestions just too daft? ;)

I wonder if it is really worth the effort to fix up adjtimex.

But I am strongly in favor of adding CLOCK_TAI, since that would be
easy to add (as you showed), is free from the tick issue, and would be
really useful for Test and Measurement style applications

Thanks,
Richard

2012-06-20 16:43:32

by john stultz

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On 06/20/2012 09:25 AM, Richard Cochran wrote:
> On Tue, Jun 19, 2012 at 10:26:49AM -0700, John Stultz wrote:
>> Richard, are you planning on taking another go at those changes? Or
>> were my last suggestions just too daft? ;)
> I wonder if it is really worth the effort to fix up adjtimex.
>
> But I am strongly in favor of adding CLOCK_TAI, since that would be
> easy to add (as you showed), is free from the tick issue, and would be
> really useful for Test and Measurement style applications
I've got the CLOCK_TAI patches queued for 3.6. I should send them to
-tip soon here.

thanks
-john

2012-07-01 01:28:44

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH -stable] ntp: Correct TAI offset during leap second

On Mon, 2012-06-18 at 14:55 +0100, Ben Hutchings wrote:
> On Sun, 2012-06-17 at 19:34 +0200, Richard Cochran wrote:
> > On Sun, Jun 17, 2012 at 11:47:51AM -0500, Jonathan Nieder wrote:
> > > Ben Hutchings wrote:
> > > > On Fri, 2012-06-15 at 11:56 -0700, John Stultz wrote:
> > >
> > > >> commit dd48d708ff3e917f6d6b6c2b696c3f18c019feed upstream.
> > > [...]
> > > > This doesn't apply to 3.2.y, unsurprisingly. Let me know if there are
> > > > any urgent leap second fixes that will be needed there.
> > >
> > > 6b43ae8a619d (ntp: Fix leap-second hrtimer livelock) sounds important,
> > > but the patch depends on bd3312681f69 (ntp: Add ntp_lock to replace
> > > xtime_locking) which does not have a commit message explaining its
> > > purpose (and that patch in turn depends on ea7cf49a7633).
>
> If I understand the commit message for 6b43ae8a619d correctly, the
> livelock results from ntp_lock and xtime_lock being acquired in opposite
> orders in two threads. Which means it wasn't possible before ntp_lock
> was introduced in bd3312681f69.
[...]

Apparently some other livelock was possible, though I was unable to
reproduce it myself. Some proportion of systems running 2.6.32 or 3.2
(not sure about the intermediate stable-supported versions) are reported
to have locked up, either in adjtimex or during the leap second.

I understand that we might not have so long to wait for the next leap
second, so if anyone understands what fixes are still needed in stable
updates I would really appreciate that.

Ben.

--
Ben Hutchings
If you seem to know what you are doing, you'll be given more to do.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part