2009-06-01 23:22:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:timers/ntp] ntp: adjust SHIFT_PLL to improve NTP convergence


* John Stultz <[email protected]> wrote:

> On Wed, 2009-05-06 at 09:46 +0000, tip-bot for john stultz wrote:
> > Commit-ID: 22cfbbfd9f67b67fe073010f51cb71d3632387d5
> > Gitweb: http://git.kernel.org/tip/22cfbbfd9f67b67fe073010f51cb71d3632387d5
> > Author: john stultz <[email protected]>
> > AuthorDate: Wed, 6 May 2009 11:43:57 +0200
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Wed, 6 May 2009 11:44:02 +0200
> >
> > ntp: adjust SHIFT_PLL to improve NTP convergence
> >
> > The conversion to the ntpv4 reference model
> > f19923937321244e7dc334767eb4b67e0e3d5c74 ("ntp: convert to the NTP4
> > reference model") in 2.6.19 added nanosecond resolution the adjtimex
> > interface, but also changed the "stiffness" of the frequency adjustments,
> > causing NTP convergence time to greatly increase.
> >
> > SHIFT_PLL, which reduces the stiffness of the freq adjustments, was
> > designed to be inversely linked to HZ, and the reference value of 4 was
> > designed for Unix systems using HZ=100. However Linux's clock steering
> > code mostly independent of HZ.
> >
> > So this patch reduces the SHIFT_PLL value from 4 to 2, which causes NTPd
> > behavior to match kernels prior to 2.6.19, greatly reducing convergence
> > times, and improving close synchronization through environmental thermal
> > changes.
> >
> > The patch also changes some l's to L's in nearby code to avoid misreading
> > 50l as 501.
> >
> > [ Impact: tweak NTP algorithm for faster convergence ]
>
> Hey Ingo,
>
> So I've been speaking with Miroslav (cc'ed) who maintains
> the RH ntpd packages, and he's concerned that this patch takes us
> out of NTP's expected behavior, which may cause problems when
> dealing with non-linux systems using NTP.

I might be missing something here - but Linux converging faster
seems like a genuinely good thing. What non-Linux problem could
there be? Linux's convergence is really Linux's private issue.

(since things like the PIT calibration are random noise for which
there can be no external expecation about our convergence speed
anyway.)

Ingo


2009-06-01 23:58:45

by john stultz

[permalink] [raw]
Subject: Re: [tip:timers/ntp] ntp: adjust SHIFT_PLL to improve NTP convergence

On Tue, 2009-06-02 at 01:22 +0200, Ingo Molnar wrote:
> * John Stultz <[email protected]> wrote:
> > On Wed, 2009-05-06 at 09:46 +0000, tip-bot for john stultz wrote:
> > > ntp: adjust SHIFT_PLL to improve NTP convergence
> > >
> > > The conversion to the ntpv4 reference model
> > > f19923937321244e7dc334767eb4b67e0e3d5c74 ("ntp: convert to the NTP4
> > > reference model") in 2.6.19 added nanosecond resolution the adjtimex
> > > interface, but also changed the "stiffness" of the frequency adjustments,
> > > causing NTP convergence time to greatly increase.
> > >
> > > SHIFT_PLL, which reduces the stiffness of the freq adjustments, was
> > > designed to be inversely linked to HZ, and the reference value of 4 was
> > > designed for Unix systems using HZ=100. However Linux's clock steering
> > > code mostly independent of HZ.
> > >
> > > So this patch reduces the SHIFT_PLL value from 4 to 2, which causes NTPd
> > > behavior to match kernels prior to 2.6.19, greatly reducing convergence
> > > times, and improving close synchronization through environmental thermal
> > > changes.
> > >
> > >
> > > [ Impact: tweak NTP algorithm for faster convergence ]
> >
> > So I've been speaking with Miroslav (cc'ed) who maintains
> > the RH ntpd packages, and he's concerned that this patch takes us
> > out of NTP's expected behavior, which may cause problems when
> > dealing with non-linux systems using NTP.
>
> I might be missing something here - but Linux converging faster
> seems like a genuinely good thing. What non-Linux problem could
> there be? Linux's convergence is really Linux's private issue.

Yea. It does seem that way. Miroslav can likely expand on the issue to
help clarify, but as I understand it, the example is if you have a
number of systems that are peers in an NTP network. All of them are
using the same userland NTP daemon. However, if the rate of change that
corrections are applied is different in half of them, you will have
problems getting all the systems to converge together.

An rough analogy might be creating a an automatic control system to
drive an RC car, and then letting it control a Ferrari.

Now, that said, I have yet to actually see or hear of any negative
effects of the patch in testing, but I'd prefer to try to keep to the
NTP spec if we can.

So Miroslav is helping by looking for similar changes we can possibly
make from the userland side. This not only lets us keep the adjtimex()
interface stable while we sort out what options we have, but also, by
trying to tune the NTP knobs in userland instead of kernel space, we are
more likely to get feedback and hopefully constructive solutions by the
NTP gurus who have in the past have maybe been less interested in
Linux's behavior.

So yea, we'll see what comes of it. In the meantime, I know some folks
who will continue to use this patch on their systems because it really
improves things for them. So I'm not abandoning the patch just yet, but
I want to really make sure we're not needlessly changing the kernel
behavior.

thanks
-john

2009-06-02 00:07:09

by Rik van Riel

[permalink] [raw]
Subject: Re: [tip:timers/ntp] ntp: adjust SHIFT_PLL to improve NTP convergence

John Stultz wrote:
> On Tue, 2009-06-02 at 01:22 +0200, Ingo Molnar wrote:

>> I might be missing something here - but Linux converging faster
>> seems like a genuinely good thing. What non-Linux problem could
>> there be? Linux's convergence is really Linux's private issue.
>
> Yea. It does seem that way. Miroslav can likely expand on the issue to
> help clarify, but as I understand it, the example is if you have a
> number of systems that are peers in an NTP network. All of them are
> using the same userland NTP daemon. However, if the rate of change that
> corrections are applied is different in half of them, you will have
> problems getting all the systems to converge together.

Would this not be true already, because the convergence
of Linux system suddenly became a lot slower in 2.6.19?

Damned if we do, damned if we don't - except the new
behaviour introduced by your patches is nicer.

--
All rights reversed.

2009-06-02 00:21:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:timers/ntp] ntp: adjust SHIFT_PLL to improve NTP convergence


* Rik van Riel <[email protected]> wrote:

> John Stultz wrote:
>> On Tue, 2009-06-02 at 01:22 +0200, Ingo Molnar wrote:
>
>>> I might be missing something here - but Linux converging faster
>>> seems like a genuinely good thing. What non-Linux problem could
>>> there be? Linux's convergence is really Linux's private issue.
>>
>> Yea. It does seem that way. Miroslav can likely expand on the
>> issue to help clarify, but as I understand it, the example is if
>> you have a number of systems that are peers in an NTP network.
>> All of them are using the same userland NTP daemon. However, if
>> the rate of change that corrections are applied is different in
>> half of them, you will have problems getting all the systems to
>> converge together.
>
> Would this not be true already, because the convergence of Linux
> system suddenly became a lot slower in 2.6.19?
>
> Damned if we do, damned if we don't - except the new behaviour
> introduced by your patches is nicer.

Not just that - but there's calibration noise during bootup that can
cause randomly distributed recalibrations as well. So other hosts in
a mixed environment will see inconsistencies anyway, after every
bootup.

NTP is all about being able to be resilient against time noise and
being able to sync up to a common time base ASAP.

Ingo

2009-06-02 00:30:12

by john stultz

[permalink] [raw]
Subject: Re: [tip:timers/ntp] ntp: adjust SHIFT_PLL to improve NTP convergence

On Mon, 2009-06-01 at 20:06 -0400, Rik van Riel wrote:
> John Stultz wrote:
> > On Tue, 2009-06-02 at 01:22 +0200, Ingo Molnar wrote:
>
> >> I might be missing something here - but Linux converging faster
> >> seems like a genuinely good thing. What non-Linux problem could
> >> there be? Linux's convergence is really Linux's private issue.
> >
> > Yea. It does seem that way. Miroslav can likely expand on the issue to
> > help clarify, but as I understand it, the example is if you have a
> > number of systems that are peers in an NTP network. All of them are
> > using the same userland NTP daemon. However, if the rate of change that
> > corrections are applied is different in half of them, you will have
> > problems getting all the systems to converge together.
>
> Would this not be true already, because the convergence
> of Linux system suddenly became a lot slower in 2.6.19?

Yes, this is true. But some folks have considered Linux to have had a
faulty NTP implementation up until 2.6.19.

> Damned if we do, damned if we don't - except the new
> behaviour introduced by your patches is nicer.

It would seem this way, so I'm not throwing the patch out yet. I'm just
suggesting we hold off including it until we've tried attacking the
issue from a few other angles. Miroslav understands the details behind
the NTP protocol much better then I, so I'd like to try to address them
before going out on our own.

I just want to avoid the kernel from oscillating between fast(and maybe
incorrect)convergence and ntp-spec-compliance.

thanks
-john

2009-06-02 03:39:45

by Ray Lee

[permalink] [raw]
Subject: Re: [tip:timers/ntp] ntp: adjust SHIFT_PLL to improve NTP convergence

On Mon, Jun 1, 2009 at 4:58 PM, John Stultz <[email protected]> wrote:
> On Tue, 2009-06-02 at 01:22 +0200, Ingo Molnar wrote:
>> * John Stultz <[email protected]> wrote:
>> > On Wed, 2009-05-06 at 09:46 +0000, tip-bot for john stultz wrote:
>> > > ntp: adjust SHIFT_PLL to improve NTP convergence
>> > >
>> > > The conversion to the ntpv4 reference model
>> > > f19923937321244e7dc334767eb4b67e0e3d5c74 ("ntp: convert to the NTP4
>> > > reference model") in 2.6.19 added nanosecond resolution the adjtimex
>> > > interface, but also changed the "stiffness" of the frequency adjustments,
>> > > causing NTP convergence time to greatly increase.
>> > >
>> > > SHIFT_PLL, which reduces the stiffness of the freq adjustments, was
>> > > designed to be inversely linked to HZ, and the reference value of 4 was
>> > > designed for Unix systems using HZ=100.  However Linux's clock steering
>> > > code mostly independent of HZ.
>> > >
>> > > So this patch reduces the SHIFT_PLL value from 4 to 2, which causes NTPd
>> > > behavior to match kernels prior to 2.6.19, greatly reducing convergence
>> > > times, and improving close synchronization through environmental thermal
>> > > changes.
>> > >
>> > >
>> > > [ Impact: tweak NTP algorithm for faster convergence ]
>> >
>> >     So I've been speaking with Miroslav (cc'ed) who maintains
>> > the RH ntpd packages, and he's concerned that this patch takes us
>> > out of NTP's expected behavior, which may cause problems when
>> > dealing with non-linux systems using NTP.
>>
>> I might be missing something here - but Linux converging faster
>> seems like a genuinely good thing. What non-Linux problem could
>> there be? Linux's convergence is really Linux's private issue.
>
> Yea. It does seem that way. Miroslav can likely expand on the issue to
> help clarify, but as I understand it, the example is if you have a
> number of systems that are peers in an NTP network. All of them are
> using the same userland NTP daemon. However, if the rate of change that
> corrections are applied is different in half of them, you will have
> problems getting all the systems to converge together.

Your point is clear, however -- reasonably speaking -- how many
instances will there be out there of networks of peers partially
upgraded versus lone systems slowly or never converging off of
masters?

By my naive understanding, the latter would strongly outnumber the former.

2009-06-02 16:22:40

by Miroslav Lichvar

[permalink] [raw]
Subject: Re: [tip:timers/ntp] ntp: adjust SHIFT_PLL to improve NTP convergence

On Tue, Jun 02, 2009 at 02:20:39AM +0200, Ingo Molnar wrote:
> > Would this not be true already, because the convergence of Linux
> > system suddenly became a lot slower in 2.6.19?
> >
> > Damned if we do, damned if we don't - except the new behaviour
> > introduced by your patches is nicer.
>
> Not just that - but there's calibration noise during bootup that can
> cause randomly distributed recalibrations as well. So other hosts in
> a mixed environment will see inconsistencies anyway, after every
> bootup.
>
> NTP is all about being able to be resilient against time noise and
> being able to sync up to a common time base ASAP.

There has to be a compromise between frequency and offset noise. When
SHIFT_PLL is set to 2 the frequency noise will be higher and that will
have a negative impact on the long-term ability to keep the clock
accurate. The error will grow faster when network connection is
suspended.

The PLL response can be configured to be the same as the proposed
SHIFT_PLL 2 by decreasing the time constant value in adjtimex
structure, so I'd rather keep following the NTP specification and
control it from userspace if necessary.

As for the calibration issue, would it be possible to export the
information that an instable clocksource is used and when was the last
time it was calibrated? Then we'd know when the drift file should not
be trusted and let NTP calculate the frequency directly (it takes
about 15 minutes).

--
Miroslav Lichvar

2009-06-02 18:07:28

by Mr. James W. Laferriere

[permalink] [raw]
Subject: Re: [tip:timers/ntp] ntp: adjust SHIFT_PLL to improve NTP convergence

Hello All ,

On Mon, 1 Jun 2009, Ray Lee wrote:
> On Mon, Jun 1, 2009 at 4:58 PM, John Stultz <[email protected]> wrote:
>> On Tue, 2009-06-02 at 01:22 +0200, Ingo Molnar wrote:
>>> * John Stultz <[email protected]> wrote:
>>>> On Wed, 2009-05-06 at 09:46 +0000, tip-bot for john stultz wrote:
>>>>> ntp: adjust SHIFT_PLL to improve NTP convergence
>>>>>
>>>>> The conversion to the ntpv4 reference model
>>>>> f19923937321244e7dc334767eb4b67e0e3d5c74 ("ntp: convert to the NTP4
>>>>> reference model") in 2.6.19 added nanosecond resolution the adjtimex
>>>>> interface, but also changed the "stiffness" of the frequency adjustments,
>>>>> causing NTP convergence time to greatly increase.
>>>>>
>>>>> SHIFT_PLL, which reduces the stiffness of the freq adjustments, was
>>>>> designed to be inversely linked to HZ, and the reference value of 4 was
>>>>> designed for Unix systems using HZ=100.  However Linux's clock steering
>>>>> code mostly independent of HZ.
>>>>>
>>>>> So this patch reduces the SHIFT_PLL value from 4 to 2, which causes NTPd
>>>>> behavior to match kernels prior to 2.6.19, greatly reducing convergence
>>>>> times, and improving close synchronization through environmental thermal
>>>>> changes.
>>>>>
>>>>>
>>>>> [ Impact: tweak NTP algorithm for faster convergence ]
>>>>
>>>>     So I've been speaking with Miroslav (cc'ed) who maintains
>>>> the RH ntpd packages, and he's concerned that this patch takes us
>>>> out of NTP's expected behavior, which may cause problems when
>>>> dealing with non-linux systems using NTP.
>>>
>>> I might be missing something here - but Linux converging faster
>>> seems like a genuinely good thing. What non-Linux problem could
>>> there be? Linux's convergence is really Linux's private issue.
>>
>> Yea. It does seem that way. Miroslav can likely expand on the issue to
>> help clarify, but as I understand it, the example is if you have a
>> number of systems that are peers in an NTP network. All of them are
>> using the same userland NTP daemon. However, if the rate of change that
>> corrections are applied is different in half of them, you will have
>> problems getting all the systems to converge together.
>
> Your point is clear, however -- reasonably speaking -- how many
> instances will there be out there of networks of peers partially
> upgraded versus lone systems slowly or never converging off of
> masters?
A site with three or four differant system types , ie: sparc running
sloaris , pc running openbsd , Dec(hp) running VMS , Dec(hp) Alpha running
Linux , ...
This moving the Hardware Arch & OS differencews is sometimes done to
limit hacks & software glicthes by NOT using the same hardware arch or OS .

> By my naive understanding, the latter would strongly outnumber the former.
You'd be VERY unhappily suprised .

Hth , JimL
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network&System Engineer | 2133 McCullam Ave | Give me Linux |
| [email protected] | Fairbanks, AK. 99701 | only on AXP |
+------------------------------------------------------------------+

2009-06-02 20:55:45

by john stultz

[permalink] [raw]
Subject: Re: [tip:timers/ntp] ntp: adjust SHIFT_PLL to improve NTP convergence

On Tue, 2009-06-02 at 18:22 +0200, Miroslav Lichvar wrote:
> On Tue, Jun 02, 2009 at 02:20:39AM +0200, Ingo Molnar wrote:
> >
> > Not just that - but there's calibration noise during bootup that can
> > cause randomly distributed recalibrations as well. So other hosts in
> > a mixed environment will see inconsistencies anyway, after every
> > bootup.
[snip]
> As for the calibration issue, would it be possible to export the
> information that an instable clocksource is used and when was the last
> time it was calibrated? Then we'd know when the drift file should not
> be trusted and let NTP calculate the frequency directly (it takes
> about 15 minutes).

Just to de-thread the issues here, the calibration noise issue really is
separate from the SHIFT_PLL convergence issue.

I'd really prefer the calibration noise issue to be resolved by the
kernel, as its really only an issue on a subset of x86 machines. The
tsc_khz= boot option I proposed earlier for folks who really care seems
to me like a good route.

The only NTPd side change to help the calibration issue that might be
useful, would be a explicit ntp option to force NTP to always calculate
the freq on startup if the drift file was present or not. Anything else
would be way too much of a hack to get around bad kernel behavior.

thanks
-john