LinuxLists.cc - [GIT pull] ntp updates for 2.6.31

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

On Mon, Jun 15, 2009 at 7:06 AM, Thomas Gleixner<[email protected]> wrote:
> Linus,
>
> Please pull the latest timers-for-linus-ntp git tree from:
>
> ? git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-for-linus-ntp
>
> Thanks,
>
> ? ? ? ?tglx
>
> ------------------>
> John Stultz (2):
> ? ? ?ntp: adjust SHIFT_PLL to improve NTP convergence
> ? ? ?ntp: fix comment typos

Thomas,
Could we hold off on pushing this? I'm still working with Miroslav
to try to work out a solution here from ntpd user-side.

thanks
-john

2009-06-15 23:41:49

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

On Mon, Jun 15, 2009 at 1:16 PM, john stultz<[email protected]> wrote:
> On Mon, Jun 15, 2009 at 7:06 AM, Thomas Gleixner<[email protected]> wrote:
>> Linus,
>>
>> Please pull the latest timers-for-linus-ntp git tree from:
>>
>> ? git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-for-linus-ntp
>>
>> Thanks,
>>
>> ? ? ? ?tglx
>>
>> ------------------>
>> John Stultz (2):
>> ? ? ?ntp: adjust SHIFT_PLL to improve NTP convergence
>> ? ? ?ntp: fix comment typos
>
> Thomas,
> ? Could we hold off on pushing this? I'm still working with Miroslav
> to try to work out a solution here from ntpd user-side.

Linus,
You probably didn't see this before merging. Could you yank the
above two patches? Miroslav (RH package maintainer for ntpd), has
voiced concerns that the SHIFT_PLL patch breaks the NTP design and is
worried it may negatively effect NTP networks of systems running with
different SHIFT_PLL values.

While the patch does greatly improve NTP convergence times, and so far
no negative results have been seen in tests, its out of an abundance
of caution and a desire to keep the adjtimex behavior stable that I
requested Thomas and Ingo to hold off on merging this patch, while I
work with Miroslav to see if we cannot get the same benefit by
adjusting the userspace NTPd.

So if you could revert the two patches until we either sort things out
in userspace or I resubmit, I'd appreciate it.

Sorry for the mixup here.

thanks
-john

2009-06-16 09:07:24

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

* john stultz <[email protected]> wrote:

> Linus,
> You probably didn't see this before merging. Could you yank the
> above two patches? Miroslav (RH package maintainer for ntpd), has
> voiced concerns that the SHIFT_PLL patch breaks the NTP design and is
> worried it may negatively effect NTP networks of systems running with
> different SHIFT_PLL values.
>
> While the patch does greatly improve NTP convergence times, and so
> far no negative results have been seen in tests, its out of an
> abundance of caution and a desire to keep the adjtimex behavior
> stable that I requested Thomas and Ingo to hold off on merging
> this patch, while I work with Miroslav to see if we cannot get the
> same benefit by adjusting the userspace NTPd.

As i explained it in previous threads i disagree. The only
technically correct direction is to improve NTP stabilization and
convergence times as much as possible. [*]

( [*] Without getting into over-compensation and without starting to
oscillate instead of converging - that would be a bug, but
such a bug has not been reported so far. )

The 'concern' voiced was that: "what if other OSs converge slower in
a cluster and now we have a faster OS in the mix". This absolutely
ignores the other 99% of cases where people would have crappier
convergence after the revert and for no good reason.

And even regarding that 1% example, well, duh: different OSes have
different convergence times, fundamentally so - such as Linux had a
very slow convergence time from about 2.6.18 up to recent kernels
due to a bug. Now it's converging even faster ...

So i dont think that "Linux is too good" is a good basis to
artificially make Linux's NTP code crappier. Really. We dont 'play
nice' by being equally crappy.

Each OS should converge back to the correct time _as fast as
physically possible_. If this is a problem and if someone wants
crappy time and longer periods of convergence for some odd reason
then that header file change can be edited by hand even. It's not
like it's that hard to change, if there's genuine interest.

So i'm against any revert on this basis. If another basis comes up
we can reconsider of course. What do you think?

Ingo

2009-06-16 11:30:29

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

On Tue, 16 Jun 2009, Ingo Molnar wrote:
> Each OS should converge back to the correct time _as fast as
> physically possible_. If this is a problem and if someone wants
> crappy time and longer periods of convergence for some odd reason
> then that header file change can be edited by hand even. It's not
> like it's that hard to change, if there's genuine interest.
>
> So i'm against any revert on this basis. If another basis comes up
> we can reconsider of course. What do you think?

I completely agree.

Consistent convergence across different OSs is a wet dream.

We see even different behaviour across kernel versions :) Also I
recently looked at an embedded system running the same kernel version
as a PC in the same network. Same version of user space tools. Main
difference aside the arch was HZ (100 vs. 1000). The PC convergence
time was about 40% higher than the embedded systems.

Thanks,

tglx

2009-06-16 12:53:19

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

On Tue, Jun 16, 2009 at 11:06:47AM +0200, Ingo Molnar wrote:
>
> * john stultz <[email protected]> wrote:
>
> > Linus,
> > You probably didn't see this before merging. Could you yank the
> > above two patches? Miroslav (RH package maintainer for ntpd), has
> > voiced concerns that the SHIFT_PLL patch breaks the NTP design and is
> > worried it may negatively effect NTP networks of systems running with
> > different SHIFT_PLL values.
> >
> > While the patch does greatly improve NTP convergence times, and so
> > far no negative results have been seen in tests, its out of an
> > abundance of caution and a desire to keep the adjtimex behavior
> > stable that I requested Thomas and Ingo to hold off on merging
> > this patch, while I work with Miroslav to see if we cannot get the
> > same benefit by adjusting the userspace NTPd.

[..]

> Each OS should converge back to the correct time _as fast as
> physically possible_. If this is a problem and if someone wants
> crappy time and longer periods of convergence for some odd reason
> then that header file change can be edited by hand even. It's not
> like it's that hard to change, if there's genuine interest.
>
> So i'm against any revert on this basis. If another basis comes up
> we can reconsider of course. What do you think?

I think the most important one is following the NTP specification.

If Linux really needs to have the fastest PLL, could it be done by
modifying the time constant passed in adjtimex structure instead of
changing SHIFT_PLL? The PLL response will be exactly the same, but it
will allow the applications (and admins) to detect that it is
different than expected.

Something like:

--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -425,6 +425,8 @@
time_constant = txc->constant;
if (!(time_status & STA_NANO))
time_constant += 4;
+ /* We want faster PLL */
+ time_constant -= 2;
time_constant = min(time_constant, (long)MAXTC);
time_constant = max(time_constant, 0l);
}

Thanks,

--
Miroslav Lichvar

2009-06-17 15:38:55

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

On Tue, 2009-06-16 at 14:52 +0200, Miroslav Lichvar wrote:
> On Tue, Jun 16, 2009 at 11:06:47AM +0200, Ingo Molnar wrote:
> >
> > * john stultz <[email protected]> wrote:
> >
> > > Linus,
> > > You probably didn't see this before merging. Could you yank the
> > > above two patches? Miroslav (RH package maintainer for ntpd), has
> > > voiced concerns that the SHIFT_PLL patch breaks the NTP design and is
> > > worried it may negatively effect NTP networks of systems running with
> > > different SHIFT_PLL values.
> > >
> > > While the patch does greatly improve NTP convergence times, and so
> > > far no negative results have been seen in tests, its out of an
> > > abundance of caution and a desire to keep the adjtimex behavior
> > > stable that I requested Thomas and Ingo to hold off on merging
> > > this patch, while I work with Miroslav to see if we cannot get the
> > > same benefit by adjusting the userspace NTPd.
>
> [..]
>
> > Each OS should converge back to the correct time _as fast as
> > physically possible_. If this is a problem and if someone wants
> > crappy time and longer periods of convergence for some odd reason
> > then that header file change can be edited by hand even. It's not
> > like it's that hard to change, if there's genuine interest.
> >
> > So i'm against any revert on this basis. If another basis comes up
> > we can reconsider of course. What do you think?
>
> I think the most important one is following the NTP specification.
>
> If Linux really needs to have the fastest PLL, could it be done by
> modifying the time constant passed in adjtimex structure instead of
> changing SHIFT_PLL? The PLL response will be exactly the same, but it
> will allow the applications (and admins) to detect that it is
> different than expected.
>
> Something like:
>
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -425,6 +425,8 @@
> time_constant = txc->constant;
> if (!(time_status & STA_NANO))
> time_constant += 4;
> + /* We want faster PLL */
> + time_constant -= 2;
> time_constant = min(time_constant, (long)MAXTC);
> time_constant = max(time_constant, 0l);
> }

It looks mathematically equivalent, although I've not had time to test
it yet. Probably needs a bigger comment :)

The nice thing with this version is that we're able to expose that the
behavior would be different then other systems, but the other side of
that coin might be that when the user specifies a time_constant value,
the interface will show a different one being used. This might cause
some bug reports saying the interface isn't responding properly, or
something. Although this is already the case for !STA_NANO, and so far
few have noticed.

thanks
-john

2009-06-17 16:51:22

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

* John Stultz <[email protected]> wrote:

> On Tue, 2009-06-16 at 14:52 +0200, Miroslav Lichvar wrote:
> > On Tue, Jun 16, 2009 at 11:06:47AM +0200, Ingo Molnar wrote:
> > >
> > > * john stultz <[email protected]> wrote:
> > >
> > > > Linus,
> > > > You probably didn't see this before merging. Could you yank the
> > > > above two patches? Miroslav (RH package maintainer for ntpd), has
> > > > voiced concerns that the SHIFT_PLL patch breaks the NTP design and is
> > > > worried it may negatively effect NTP networks of systems running with
> > > > different SHIFT_PLL values.
> > > >
> > > > While the patch does greatly improve NTP convergence times, and so
> > > > far no negative results have been seen in tests, its out of an
> > > > abundance of caution and a desire to keep the adjtimex behavior
> > > > stable that I requested Thomas and Ingo to hold off on merging
> > > > this patch, while I work with Miroslav to see if we cannot get the
> > > > same benefit by adjusting the userspace NTPd.
> >
> > [..]
> >
> > > Each OS should converge back to the correct time _as fast as
> > > physically possible_. If this is a problem and if someone wants
> > > crappy time and longer periods of convergence for some odd reason
> > > then that header file change can be edited by hand even. It's not
> > > like it's that hard to change, if there's genuine interest.
> > >
> > > So i'm against any revert on this basis. If another basis comes up
> > > we can reconsider of course. What do you think?
> >
> > I think the most important one is following the NTP specification.
> >
> > If Linux really needs to have the fastest PLL, could it be done by
> > modifying the time constant passed in adjtimex structure instead of
> > changing SHIFT_PLL? The PLL response will be exactly the same, but it
> > will allow the applications (and admins) to detect that it is
> > different than expected.
> >
> > Something like:
> >
> > --- a/kernel/time/ntp.c
> > +++ b/kernel/time/ntp.c
> > @@ -425,6 +425,8 @@
> > time_constant = txc->constant;
> > if (!(time_status & STA_NANO))
> > time_constant += 4;
> > + /* We want faster PLL */
> > + time_constant -= 2;
> > time_constant = min(time_constant, (long)MAXTC);
> > time_constant = max(time_constant, 0l);
> > }
>
>
> It looks mathematically equivalent, although I've not had time to
> test it yet. Probably needs a bigger comment :)
>
> The nice thing with this version is that we're able to expose that
> the behavior would be different then other systems, but the other
> side of that coin might be that when the user specifies a
> time_constant value, the interface will show a different one being
> used. This might cause some bug reports saying the interface isn't
> responding properly, or something. Although this is already the
> case for !STA_NANO, and so far few have noticed.

Sounds good to me. It feels a bit quirky that we 'correct' the
user-space provided parameter by 2 ... Definitely needs a big
comment.

Ingo

2009-06-17 17:23:57

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

On Wed, Jun 17, 2009 at 08:38:22AM -0700, John Stultz wrote:
> On Tue, 2009-06-16 at 14:52 +0200, Miroslav Lichvar wrote:
> > If Linux really needs to have the fastest PLL, could it be done by
> > modifying the time constant passed in adjtimex structure instead of
> > changing SHIFT_PLL? The PLL response will be exactly the same, but it
> > will allow the applications (and admins) to detect that it is
> > different than expected.
> >
> > Something like:
> >
> > --- a/kernel/time/ntp.c
> > +++ b/kernel/time/ntp.c
> > @@ -425,6 +425,8 @@
> > time_constant = txc->constant;
> > if (!(time_status & STA_NANO))
> > time_constant += 4;
> > + /* We want faster PLL */
> > + time_constant -= 2;
> > time_constant = min(time_constant, (long)MAXTC);
> > time_constant = max(time_constant, 0l);
> > }
>
>
> It looks mathematically equivalent, although I've not had time to test
> it yet. Probably needs a bigger comment :)
>
> The nice thing with this version is that we're able to expose that the
> behavior would be different then other systems, but the other side of
> that coin might be that when the user specifies a time_constant value,
> the interface will show a different one being used. This might cause
> some bug reports saying the interface isn't responding properly, or
> something. Although this is already the case for !STA_NANO, and so far
> few have noticed.

I have checked the NTP sources and the returned time constant is used
only for reporting, at least for NTP it shouldn't cause any problems.

Returning correct time constant will be very useful if NTP developers
decide to use lower values or have it configurable as decreasing the
constant by another two will make the PLL unstable.

Still, I'd really like to see the original behavior restored. Most of
the users complaining about slow convergence are probably just hitting
the calibration problem, which needs to be fixed by other means than
making PLL faster. Also, users of other systems seem to be happy with
their slow convergence. At least that's the impression I have from NTP
lists.

Thanks,

--
Miroslav Lichvar

2009-06-17 17:26:21

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

* Miroslav Lichvar <[email protected]> wrote:

> Still, I'd really like to see the original behavior restored. Most
> of the users complaining about slow convergence are probably just
> hitting the calibration problem, which needs to be fixed by other
> means than making PLL faster. Also, users of other systems seem to
> be happy with their slow convergence. At least that's the
> impression I have from NTP lists.

Wouldnt the goal be to calibrate as fast as possible? (Without any
bad oscillation)

Ingo

2009-06-17 17:56:44

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

On Wed, 2009-06-17 at 19:26 +0200, Ingo Molnar wrote:
> * Miroslav Lichvar <[email protected]> wrote:
>
> > Still, I'd really like to see the original behavior restored. Most
> > of the users complaining about slow convergence are probably just
> > hitting the calibration problem, which needs to be fixed by other
> > means than making PLL faster. Also, users of other systems seem to
> > be happy with their slow convergence. At least that's the
> > impression I have from NTP lists.
>
> Wouldnt the goal be to calibrate as fast as possible? (Without any
> bad oscillation)

I believe he means the TSC calibration error issue, where every boot the
TSC calibration varies by 30-80ppm. This makes it hard for systems to
stay in NTP sync after a reboot, because ntpd has to search for a new
freq (and the SHIFT_PLL & time_constant values control how fast that
happens).

While the TSC calibration is an issue, there is also the fact that NTP's
slow convergence model (which is "by design", for good or bad) doesn't
seem to handle thermal environment changes quickly enough to keep close
sync.

Now, weather we fix the change by tweaking ntpd or the kernel, I still
think is a question that we've not answered well. Even though I'm of the
opinion something needs to change, I'm not yet convinced of which side
is the right side to fix. And that is why I requested we hold off on
merging the SHIFT_PLL patch.

And really, if you look at Miroslav's patch, which is mathematically
equivalent to the SHIFT_PLL change, all we're doing is decreasing what
ntpd gave as the time_constant us by two. So the question is, why is
that fix best done in the kernel, instead of making ntpd reduce what it
passes in to the kernel?

thanks
-john

2009-06-18 12:13:49

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

On Wed, Jun 17, 2009 at 07:26:01PM +0200, Ingo Molnar wrote:
> * Miroslav Lichvar <[email protected]> wrote:
>
> > Still, I'd really like to see the original behavior restored. Most
> > of the users complaining about slow convergence are probably just
> > hitting the calibration problem, which needs to be fixed by other
> > means than making PLL faster. Also, users of other systems seem to
> > be happy with their slow convergence. At least that's the
> > impression I have from NTP lists.
>
> Wouldnt the goal be to calibrate as fast as possible? (Without any
> bad oscillation)

Not really. It depends on how noisy is the input signal. On an idle
LAN the jitter is just few microseconds, but over internet it easily
reaches miliseconds. Over a certain point faster PLL will just make
things worse.

PLL is mainly about handling the signal noise, frequency adjusting is
secondary. When the noise is very low or the update interval is long
enough, the frequency variations caused by temperature changes will
dominate the signal noise and this is where FLL should kick in.

The PLL/FLL switching is controlled by update interval. Ideally it
would be adaptive, but NTP is not that sophisticated. By default, FLL
is enabled when the interval is longer than 2048 seconds. This is of
course not the optimal value for all systems.

Unfortunately in kernel it can be configured only to 2048 or 256 and
NTP never uses the shorter one. The NTP daemon has its own loop which
can be used instead and it allows to use arbitrary values though.

--
Miroslav Lichvar

2009-06-23 09:58:17

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

* Miroslav Lichvar <[email protected]> wrote:

> On Wed, Jun 17, 2009 at 07:26:01PM +0200, Ingo Molnar wrote:
> > * Miroslav Lichvar <[email protected]> wrote:
> >
> > > Still, I'd really like to see the original behavior restored.
> > > Most of the users complaining about slow convergence are
> > > probably just hitting the calibration problem, which needs to
> > > be fixed by other means than making PLL faster. Also, users of
> > > other systems seem to be happy with their slow convergence. At
> > > least that's the impression I have from NTP lists.
> >
> > Wouldnt the goal be to calibrate as fast as possible? (Without
> > any bad oscillation)
>
> Not really. It depends on how noisy is the input signal. On an
> idle LAN the jitter is just few microseconds, but over internet it
> easily reaches miliseconds. Over a certain point faster PLL will
> just make things worse.

That is what i called 'bad oscillation' - a 'too fast' PLL that
over-compensates and does not converge well enough.

Is there a claim that this change causes that? (John's testing
suggested that there's no such effect)

> PLL is mainly about handling the signal noise, frequency adjusting
> is secondary. When the noise is very low or the update interval is
> long enough, the frequency variations caused by temperature
> changes will dominate the signal noise and this is where FLL
> should kick in.
>
> The PLL/FLL switching is controlled by update interval. Ideally it
> would be adaptive, but NTP is not that sophisticated. By default,
> FLL is enabled when the interval is longer than 2048 seconds. This
> is of course not the optimal value for all systems.
>
> Unfortunately in kernel it can be configured only to 2048 or 256
> and NTP never uses the shorter one. The NTP daemon has its own
> loop which can be used instead and it allows to use arbitrary
> values though.

How about going towards the ideal, adaptive design, to which ntpd
passes in time samples and which observes noise and converges as
quickly as possible (given the noise level) and stays stable once
there? I guess we need extensions to the NTP syscall for that.

The NTP code in kernel/time/ntp.c is now reasonably clean for
efforts like that.

It would also pave the way to properly support PPS devices in the
kernel. Would you be interested in things like this?

Ingo

2009-06-23 13:16:49

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

On Tue, Jun 23, 2009 at 11:57:45AM +0200, Ingo Molnar wrote:
> > > Wouldnt the goal be to calibrate as fast as possible? (Without
> > > any bad oscillation)
> >
> > Not really. It depends on how noisy is the input signal. On an
> > idle LAN the jitter is just few microseconds, but over internet it
> > easily reaches miliseconds. Over a certain point faster PLL will
> > just make things worse.
>
> That is what i called 'bad oscillation' - a 'too fast' PLL that
> over-compensates and does not converge well enough.
>
> Is there a claim that this change causes that? (John's testing
> suggested that there's no such effect)

I think John's tests were done on LAN and in an environment with
sudden temperature changes. This is the case where frequency
variations strongly dominate the noise and faster PLL performs better.

On the opposite side is an idle machine in a room with stable
temperature syncing over wireless or dial-up. I don't have access to
such machine, but in simulations (noise with exponential distribution)
I see that offset RMS doubles when the time constant is decreased by 2.

Maybe for most of the users the change would be an improvement. I
don't have any statistics to back it up or claim otherwise. However,
if the constant needs to be adjusted, it's better to do it in NTP.

> > PLL is mainly about handling the signal noise, frequency adjusting
> > is secondary. When the noise is very low or the update interval is
> > long enough, the frequency variations caused by temperature
> > changes will dominate the signal noise and this is where FLL
> > should kick in.
> >
> > The PLL/FLL switching is controlled by update interval. Ideally it
> > would be adaptive, but NTP is not that sophisticated. By default,
> > FLL is enabled when the interval is longer than 2048 seconds. This
> > is of course not the optimal value for all systems.
> >
> > Unfortunately in kernel it can be configured only to 2048 or 256
> > and NTP never uses the shorter one. The NTP daemon has its own
> > loop which can be used instead and it allows to use arbitrary
> > values though.
>
> How about going towards the ideal, adaptive design, to which ntpd
> passes in time samples and which observes noise and converges as
> quickly as possible (given the noise level) and stays stable once
> there? I guess we need extensions to the NTP syscall for that.

Not sure how hard that would be. The ntp-hackers list is a better
place to discuss such modifications.

Other NTP clients don't have to use the PLL interface. For example,
chrony uses only the SINGLESHOT mode and sets the frequency directly.
It has an adaptive model using linear regression, it converges really
fast and in my tests performs better than NTP.

> The NTP code in kernel/time/ntp.c is now reasonably clean for
> efforts like that.
>
> It would also pave the way to properly support PPS devices in the
> kernel. Would you be interested in things like this?

I'm not very familiar with the PPS API, is there something wrong with
it?

--
Miroslav Lichvar

2009-06-23 13:37:01

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

* Miroslav Lichvar <[email protected]> wrote:

> On Tue, Jun 23, 2009 at 11:57:45AM +0200, Ingo Molnar wrote:
> > > > Wouldnt the goal be to calibrate as fast as possible? (Without
> > > > any bad oscillation)
> > >
> > > Not really. It depends on how noisy is the input signal. On an
> > > idle LAN the jitter is just few microseconds, but over internet it
> > > easily reaches miliseconds. Over a certain point faster PLL will
> > > just make things worse.
> >
> > That is what i called 'bad oscillation' - a 'too fast' PLL that
> > over-compensates and does not converge well enough.
> >
> > Is there a claim that this change causes that? (John's testing
> > suggested that there's no such effect)
>
> I think John's tests were done on LAN and in an environment with
> sudden temperature changes. This is the case where frequency
> variations strongly dominate the noise and faster PLL performs
> better.

I'd also expect this to be quite similar to most everyday Linux
uses.

> On the opposite side is an idle machine in a room with stable
> temperature syncing over wireless or dial-up. I don't have access
> to such machine, but in simulations (noise with exponential
> distribution) I see that offset RMS doubles when the time constant
> is decreased by 2.

The thing is, an idle machine in a room with stable temperature is
in a good position anyway to have stable time, right? We should
rather care about the common-case of temperature variations,
reboots, etc.

That is where NTP _helps the most_ - as the physical environment is
very entropy laden to begin with.

> Maybe for most of the users the change would be an improvement. I
> don't have any statistics to back it up or claim otherwise.
> However, if the constant needs to be adjusted, it's better to do
> it in NTP.
>
> > > PLL is mainly about handling the signal noise, frequency adjusting
> > > is secondary. When the noise is very low or the update interval is
> > > long enough, the frequency variations caused by temperature
> > > changes will dominate the signal noise and this is where FLL
> > > should kick in.
> > >
> > > The PLL/FLL switching is controlled by update interval. Ideally it
> > > would be adaptive, but NTP is not that sophisticated. By default,
> > > FLL is enabled when the interval is longer than 2048 seconds. This
> > > is of course not the optimal value for all systems.
> > >
> > > Unfortunately in kernel it can be configured only to 2048 or 256
> > > and NTP never uses the shorter one. The NTP daemon has its own
> > > loop which can be used instead and it allows to use arbitrary
> > > values though.
> >
> > How about going towards the ideal, adaptive design, to which ntpd
> > passes in time samples and which observes noise and converges as
> > quickly as possible (given the noise level) and stays stable once
> > there? I guess we need extensions to the NTP syscall for that.
>
> Not sure how hard that would be. The ntp-hackers list is a better
> place to discuss such modifications.
>
> Other NTP clients don't have to use the PLL interface. For
> example, chrony uses only the SINGLESHOT mode and sets the
> frequency directly. It has an adaptive model using linear
> regression, it converges really fast and in my tests performs
> better than NTP.

That's good. Could this be integrated into the kernel, for even
better results?

> > The NTP code in kernel/time/ntp.c is now reasonably clean for
> > efforts like that.
> >
> > It would also pave the way to properly support PPS devices in
> > the kernel. Would you be interested in things like this?
>
> I'm not very familiar with the PPS API, is there something wrong
> with it?

The PPS patches i've seen just export IRQ timestamps to user-space.

That is not very robust in my opinion when it comes to do time
approximations - to get quick, low-latency action and precise
measurements it's best to keep the critical path as short as
possible, and within a single source code repository: i.e. within
the kernel.

There's little policy really, other than setting some general
parameters. NTPd can still provide the raw _network time_
timestamps, as that is probably best fetched by user-space and fed
to the kernel.

Ingo

2009-06-23 14:34:30

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

On Tue, Jun 23, 2009 at 03:36:25PM +0200, Ingo Molnar wrote:
> > I think John's tests were done on LAN and in an environment with
> > sudden temperature changes. This is the case where frequency
> > variations strongly dominate the noise and faster PLL performs
> > better.
>
> I'd also expect this to be quite similar to most everyday Linux
> uses.

I'd say that most users keep the default distribution configs, i.e.
syncing over internet to servers from pool.ntp.org. The network jitter
is in hundreds of microseconds or even miliseconds and the temperature
changes are dominated by the noise.

> > Other NTP clients don't have to use the PLL interface. For
> > example, chrony uses only the SINGLESHOT mode and sets the
> > frequency directly. It has an adaptive model using linear
> > regression, it converges really fast and in my tests performs
> > better than NTP.
>
> That's good. Could this be integrated into the kernel, for even
> better results?

The code is quite complex with possibly lot of room for improvement.
I think it's better to keep it in userspace. There are two things that
would help chrony on kernel side though. Supporting nanosecond offset
in the SINGLESHOT mode and updating the reported offset with every
adjtimex call, not only once per second, so chrony would know exactly
how much of the offset was already applied.

> > I'm not very familiar with the PPS API, is there something wrong
> > with it?
>
> The PPS patches i've seen just export IRQ timestamps to user-space.
>
> That is not very robust in my opinion when it comes to do time
> approximations - to get quick, low-latency action and precise
> measurements it's best to keep the critical path as short as
> possible, and within a single source code repository: i.e. within
> the kernel.

That's what kernel PPS discipline does, it will be probably included
later. Its performance is an order or two better than the PLL/FLL
discipline.

--
Miroslav Lichvar

2009-06-23 19:19:03

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

* Miroslav Lichvar <[email protected]> wrote:

> > > I'm not very familiar with the PPS API, is there something
> > > wrong with it?
> >
> > The PPS patches i've seen just export IRQ timestamps to
> > user-space.
> >
> > That is not very robust in my opinion when it comes to do time
> > approximations - to get quick, low-latency action and precise
> > measurements it's best to keep the critical path as short as
> > possible, and within a single source code repository: i.e.
> > within the kernel.
>
> That's what kernel PPS discipline does, it will be probably
> included later. Its performance is an order or two better than the
> PLL/FLL discipline.

Is there some kernel patch i can look at?

Ingo

2009-06-23 19:49:39

[permalink] [raw]

Subject: Re: [GIT pull] ntp updates for 2.6.31

On Tue, Jun 23, 2009 at 09:18:38PM +0200, Ingo Molnar wrote:
>
> * Miroslav Lichvar <[email protected]> wrote:
>
> > > > I'm not very familiar with the PPS API, is there something
> > > > wrong with it?
> > >
> > > The PPS patches i've seen just export IRQ timestamps to
> > > user-space.
> > >
> > > That is not very robust in my opinion when it comes to do time
> > > approximations - to get quick, low-latency action and precise
> > > measurements it's best to keep the critical path as short as
> > > possible, and within a single source code repository: i.e.
> > > within the kernel.
> >
> > That's what kernel PPS discipline does, it will be probably
> > included later. Its performance is an order or two better than the
> > PLL/FLL discipline.
>
> Is there some kernel patch i can look at?

It's in the old PPSkit patches for 2.4 kernels, function hardpps().

--
Miroslav Lichvar

2009-06-23 21:42:11