Got a few cycles to take another look at this, and tried to address
Miroslav's latest comments. Please let me know if you have further
thoughts!
thanks
-john
The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.
Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.
The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.
Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.
1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct that error as best we can.
2) Then make a small single unit adjustment to correct any cumulative
error that we've collected.
This method performs fairly well in the simulator Miroslav created.
I also dropped the tk->ntp_error adjustments, as I've never quite
been able to recall how that was correct, and it doesn't seem to
have any affect in the simulator. Will have to proceed carefully
with testing here.
I still think this is probably 3.15 or later material, but I'd be
very interested in feedback, thoughts, and testing.
Cc: Miroslav Lichvar <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Reported-by: Miroslav Lichvar <[email protected]>
Signed-off-by: John Stultz <[email protected]>
---
kernel/time/timekeeping.c | 163 +++++++++++++++++-----------------------------
1 file changed, 61 insertions(+), 102 deletions(-)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 87b4f00..15354d4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1048,54 +1048,50 @@ static int __init timekeeping_init_ops(void)
device_initcall(timekeeping_init_ops);
/*
- * If the error is already larger, we look ahead even further
- * to compensate for late or lost adjustments.
+ * Calculate the future error caused by incorrect freq value
+ * and adjust the timekeeper to correct that.
*/
-static __always_inline int timekeeping_bigadjust(struct timekeeper *tk,
- s64 error, s64 *interval,
- s64 *offset)
+static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
+ s64 interval,
+ s64 offset)
{
s64 tick_error, i;
- u32 look_ahead, adj;
- s32 error2, mult;
+ u32 adj;
+ s32 mult = 1;
- /*
- * Use the current error value to determine how much to look ahead.
- * The larger the error the slower we adjust for it to avoid problems
- * with losing too many ticks, otherwise we would overadjust and
- * produce an even larger error. The smaller the adjustment the
- * faster we try to adjust for it, as lost ticks can do less harm
- * here. This is tuned so that an error of about 1 msec is adjusted
- * within about 1 sec (or 2^20 nsec in 2^SHIFT_HZ ticks).
- */
- error2 = tk->ntp_error >> (NTP_SCALE_SHIFT + 22 - 2 * SHIFT_HZ);
- error2 = abs(error2);
- for (look_ahead = 0; error2 > 0; look_ahead++)
- error2 >>= 2;
+ /* Calculate current error per tick */
+ tick_error = ntp_tick_length() >> tk->ntp_error_shift;
+ tick_error -= tk->xtime_interval;
- /*
- * Now calculate the error in (1 << look_ahead) ticks, but first
- * remove the single look ahead already included in the error.
- */
- tick_error = ntp_tick_length() >> (tk->ntp_error_shift + 1);
- tick_error -= tk->xtime_interval >> 1;
- error = ((error - tick_error) >> look_ahead) + tick_error;
+ /* Don't worry about correcting it if its small */
+ if (likely(abs(tick_error) < 2*interval))
+ return;
- /* Finally calculate the adjustment shift value. */
- i = *interval;
- mult = 1;
- if (error < 0) {
- error = -error;
- *interval = -*interval;
- *offset = -*offset;
- mult = -1;
+ if (tick_error < 0) {
+ interval = -interval;
+ offset = -offset;
+ mult = -mult;
}
- for (adj = 0; error > i; adj++)
- error >>= 1;
- *interval <<= adj;
- *offset <<= adj;
- return mult << adj;
+ /* Sort out the magnitude of the correction */
+ tick_error = abs(tick_error);
+ i = abs(interval);
+ for (adj = 0; tick_error > i; adj++)
+ tick_error >>= 1;
+
+ /* scale the corrections */
+ interval <<= adj;
+ offset <<= adj;
+ mult <<= adj;
+
+ /*
+ * Apply the correction to the timekeeper.
+ * See long comment in timekeeping_adjust to explain the math.
+ */
+ tk->mult += mult;
+ tk->xtime_interval += interval;
+ tk->xtime_nsec -= offset;
+
}
/*
@@ -1108,65 +1104,23 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
s64 error, interval = tk->cycle_interval;
int adj;
- /*
- * The point of this is to check if the error is greater than half
- * an interval.
- *
- * First we shift it down from NTP_SHIFT to clocksource->shifted nsecs.
- *
- * Note we subtract one in the shift, so that error is really error*2.
- * This "saves" dividing(shifting) interval twice, but keeps the
- * (error > interval) comparison as still measuring if error is
- * larger than half an interval.
- *
- * Note: It does not "save" on aggravation when reading the code.
- */
- error = tk->ntp_error >> (tk->ntp_error_shift - 1);
- if (error > interval) {
- /*
- * We now divide error by 4(via shift), which checks if
- * the error is greater than twice the interval.
- * If it is greater, we need a bigadjust, if its smaller,
- * we can adjust by 1.
- */
- error >>= 2;
- /*
- * XXX - In update_wall_time, we round up to the next
- * nanosecond, and store the amount rounded up into
- * the error. This causes the likely below to be unlikely.
- *
- * The proper fix is to avoid rounding up by using
- * the high precision tk->xtime_nsec instead of
- * xtime.tv_nsec everywhere. Fixing this will take some
- * time.
- */
- if (likely(error <= interval))
- adj = 1;
- else
- adj = timekeeping_bigadjust(tk, error, &interval, &offset);
- } else {
- if (error < -interval) {
- /* See comment above, this is just switched for the negative */
- error >>= 2;
- if (likely(error >= -interval)) {
- adj = -1;
- interval = -interval;
- offset = -offset;
- } else {
- adj = timekeeping_bigadjust(tk, error, &interval, &offset);
- }
- } else {
- goto out_adjust;
- }
- }
+ /* First correct for the current frequency error */
+ timekeeping_freqadjust(tk, interval, offset);
+
+
+ /* Next make a small adjustment to fix any cumulative error */
+ error = tk->ntp_error >> tk->ntp_error_shift;
+ if (likely(abs(error) <= interval/2))
+ goto out_adjust;
+
+ if (error < 0) {
+ adj = -1;
+ interval = -interval;
+ offset = -offset;
+ } else
+ adj = 1;
+
- if (unlikely(tk->clock->maxadj &&
- (tk->mult + adj > tk->clock->mult + tk->clock->maxadj))) {
- printk_once(KERN_WARNING
- "Adjusting %s more than 11%% (%ld vs %ld)\n",
- tk->clock->name, (long)tk->mult + adj,
- (long)tk->clock->mult + tk->clock->maxadj);
- }
/*
* So the following can be confusing.
*
@@ -1213,15 +1167,21 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
* xtime_nsec_2 = xtime_nsec_1 - offset
* Which simplfies to:
* xtime_nsec -= offset
- *
- * XXX - TODO: Doc ntp_error calculation.
*/
tk->mult += adj;
tk->xtime_interval += interval;
tk->xtime_nsec -= offset;
- tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
out_adjust:
+
+ if (unlikely(tk->clock->maxadj &&
+ (tk->mult > tk->clock->mult + tk->clock->maxadj))) {
+ printk_once(KERN_WARNING
+ "Adjusting %s more than 11%% (%ld vs %ld)\n",
+ tk->clock->name, (long)tk->mult,
+ (long)tk->clock->mult + tk->clock->maxadj);
+ }
+
/*
* It may be possible that when we entered this function, xtime_nsec
* was very small. Further, if we're slightly speeding the clocksource
@@ -1241,7 +1201,6 @@ out_adjust:
tk->xtime_nsec = 0;
tk->ntp_error += neg << tk->ntp_error_shift;
}
-
}
/**
--
1.8.3.2
On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
>
> I still think this is probably 3.15 or later material, but I'd be
> very interested in feedback, thoughts, and testing.
Over the weekend I did a short test of this new approach. I compiled
two kernels, one plain v3.12.7 and one with the proposed fix. The
test was run on three different kernel variants, the current nohz
kernel, the same with nohz=off, and the same but with John's patch.
I used a simple test script (see below). Using a PCIe express card as
a PHC, the Intel Corporation 82574L, I simply let the this clock run
free (not sync'ed to gps or anything), and then synchronized the Linux
system clock to the PHC using the phc2sys program with a sample rate
of once every 32 seconds. Each of the tests ran for at least three
hours on a system without any other load.
- Linux 3.12.7-nohz-plain-20140106 nohz-plain.log
- Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF periodic-plain.log
- Linux 3.12.7-nohz-fix-20140106-00001-gd753140 nohz-fix.log
The performance in the log files as reflected in the clock offset is
summarized in this table. The values are in nanoseconds.
| | periodic-plain | nohz-fix | nohz-plain |
|---------+----------------+---------------+---------------|
| minimum | -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
| maximum | +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
| mean | +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
| stddev | +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |
I also made two graphs.
http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png
(The log files and scripts are also in that directory.)
So in this test, it looks like the new approach performed at least as
well as using regular, periodic ticks.
Thanks,
Richard
---
# nohz-fix-testing.sh
#
# set the ptp clock time from the system time
#
./testptp -s; ./testptp -g; date
#
# wait a bit to let the clocks drift
#
sleep 10
#
# start the servo
#
./phc2sys -s eth4 -m -q -l7 -O0 -P0 -I0 -R.03125
On 01/13/2014 09:51 AM, Richard Cochran wrote:
> On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
>> I still think this is probably 3.15 or later material, but I'd be
>> very interested in feedback, thoughts, and testing.
> Over the weekend I did a short test of this new approach. I compiled
> two kernels, one plain v3.12.7 and one with the proposed fix. The
> test was run on three different kernel variants, the current nohz
> kernel, the same with nohz=off, and the same but with John's patch.
>
> I used a simple test script (see below). Using a PCIe express card as
> a PHC, the Intel Corporation 82574L, I simply let the this clock run
> free (not sync'ed to gps or anything), and then synchronized the Linux
> system clock to the PHC using the phc2sys program with a sample rate
> of once every 32 seconds. Each of the tests ran for at least three
> hours on a system without any other load.
>
> - Linux 3.12.7-nohz-plain-20140106 nohz-plain.log
> - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF periodic-plain.log
> - Linux 3.12.7-nohz-fix-20140106-00001-gd753140 nohz-fix.log
>
> The performance in the log files as reflected in the clock offset is
> summarized in this table. The values are in nanoseconds.
>
> | | periodic-plain | nohz-fix | nohz-plain |
> |---------+----------------+---------------+---------------|
> | minimum | -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
> | maximum | +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
> | mean | +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
> | stddev | +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |
>
> I also made two graphs.
>
> http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
> http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png
>
> (The log files and scripts are also in that directory.)
>
> So in this test, it looks like the new approach performed at least as
> well as using regular, periodic ticks.
That's great to hear! Thanks so much, I really appreciate the testing!
And this is with HZ=?
If you do get a chance to look again, I'd also be interested if running
with nohz=off w/ the fix doesn't show any regression compared to the
unmodified nohz=off case.
I've been trying to validate the ntpd case to ensure there aren't
regressions when there's more erratic steering, but unfortunately got
side-tracked with what seems to be an ntpd/virtualization issue.
thanks
-john
On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
> That's great to hear! Thanks so much, I really appreciate the testing!
> And this is with HZ=?
HZ=1000
> If you do get a chance to look again, I'd also be interested if running
> with nohz=off w/ the fix doesn't show any regression compared to the
> unmodified nohz=off case.
Okay, will do.
Thanks,
Richard
I tested for a regression using the patched kernel with the nohz=off
command line option...
On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
> On 01/13/2014 09:51 AM, Richard Cochran wrote:
> >
> > - Linux 3.12.7-nohz-plain-20140106 nohz-plain.log
> > - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF periodic-plain.log
> > - Linux 3.12.7-nohz-fix-20140106-00001-gd753140 nohz-fix.log
Regression check:
- Linux homeboy 3.12.7-nohz-fix-20140106-00001-gd753140 NOHZ=OFF
nohz-fix-periodic.log
> > The performance in the log files as reflected in the clock offset is
> > summarized in this table. The values are in nanoseconds.
> >
> > | | periodic-plain | nohz-fix | nohz-plain |
> > |---------+----------------+---------------+---------------|
> > | minimum | -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
> > | maximum | +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
> > | mean | +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
> > | stddev | +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |
Comparing the nohz=off case with and without the patch, the three hour
test looks like this.
| | periodic-plain | nohz-fix-periodic |
|---------+----------------+-------------------+
| minimum | -1.599000e+03 | -1.427000e+03 |
| maximum | +1.311000e+03 | +1.279000e+03 |
| mean | +9.880240e-02 | -2.710778e+01 |
| stddev | +4.610021e+02 | +3.974372e+02 |
> > http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
> > http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png
I also made a third graph showing before and after the patch.
http://linuxptp.sourceforge.net/nohz-fix/nohz_regression.png
> If you do get a chance to look again, I'd also be interested if running
> with nohz=off w/ the fix doesn't show any regression compared to the
> unmodified nohz=off case.
Looks like there is no regression.
Thanks,
Richard
On Tue, Jan 28, 2014 at 9:58 AM, Richard Cochran
<[email protected]> wrote:
> I tested for a regression using the patched kernel with the nohz=off
> command line option...
>
> On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
>> On 01/13/2014 09:51 AM, Richard Cochran wrote:
>> >
>> > - Linux 3.12.7-nohz-plain-20140106 nohz-plain.log
>> > - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF periodic-plain.log
>> > - Linux 3.12.7-nohz-fix-20140106-00001-gd753140 nohz-fix.log
>
> Regression check:
>
> - Linux homeboy 3.12.7-nohz-fix-20140106-00001-gd753140 NOHZ=OFF
> nohz-fix-periodic.log
>
>> > The performance in the log files as reflected in the clock offset is
>> > summarized in this table. The values are in nanoseconds.
>> >
>> > | | periodic-plain | nohz-fix | nohz-plain |
>> > |---------+----------------+---------------+---------------|
>> > | minimum | -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
>> > | maximum | +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
>> > | mean | +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
>> > | stddev | +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |
>
> Comparing the nohz=off case with and without the patch, the three hour
> test looks like this.
>
> | | periodic-plain | nohz-fix-periodic |
> |---------+----------------+-------------------+
> | minimum | -1.599000e+03 | -1.427000e+03 |
> | maximum | +1.311000e+03 | +1.279000e+03 |
> | mean | +9.880240e-02 | -2.710778e+01 |
> | stddev | +4.610021e+02 | +3.974372e+02 |
>
>> > http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
>> > http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png
>
> I also made a third graph showing before and after the patch.
>
> http://linuxptp.sourceforge.net/nohz-fix/nohz_regression.png
>
>> If you do get a chance to look again, I'd also be interested if running
>> with nohz=off w/ the fix doesn't show any regression compared to the
>> unmodified nohz=off case.
>
> Looks like there is no regression.
That's great to hear! Looks like we might be able to queue it for 3.15...
Thanks so much again for the detailed testing! I really appreciate it!
thanks
-john
On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
> Got a few cycles to take another look at this, and tried to address
> Miroslav's latest comments. Please let me know if you have further
> thoughts!
I've had finally some time to look at this, sorry for the delay.
> I also dropped the tk->ntp_error adjustments, as I've never quite
> been able to recall how that was correct, and it doesn't seem to
> have any affect in the simulator. Will have to proceed carefully
> with testing here.
I see some effect of the ntp_error correction in the simulator, but it
seems rather disruptive than helpful. Perhaps the correction was meant
to account the fact that for the ntp_error accumulation ntp_tick
should change at the time when mult is changed instead of start of the
tick. I tried to find that correction and came up with this:
(ntp_tick1 - ntp_tick2) * offset / interval + offset
That doesn't look usable. But I don't think it's really necessary to
have any correction for that and I'm for dropping that line from the
code as your patch does.
Anyway, it seems there is a different problem in the code. I modified
the simulator to see how the code works when the clocksource frequency
is not divisible by HZ. With TSC_FREQ set to 3579545 (acpi_pm freq)
ntp_error doesn't settle down and the clock has a large frequency
error.
On top of your patch, this fixes the problem for me:
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1065,7 +1065,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
/* Calculate current error per tick */
tick_error = ntp_tick_length() >> tk->ntp_error_shift;
- tick_error -= tk->xtime_interval;
+ tick_error -= tk->xtime_interval + tk->xtime_remainder;
/* Don't worry about correcting it if its small */
if (likely(abs(tick_error) < 2*interval))
My patch has this problem too. The original code seems to be affected
to some extent, it's able to converge to the correct frequency, but
there is some residual ntp error. Adding xtime_remainder to
timekeeping_bigadjust() fixes that.
I've some comments on the performance of the proposed code, I'll send
them in a separate mail later.
Thanks,
--
Miroslav Lichvar
On 02/07/2014 03:45 AM, Miroslav Lichvar wrote:
> On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
>> Got a few cycles to take another look at this, and tried to address
>> Miroslav's latest comments. Please let me know if you have further
>> thoughts!
> I've had finally some time to look at this, sorry for the delay.
>
>> I also dropped the tk->ntp_error adjustments, as I've never quite
>> been able to recall how that was correct, and it doesn't seem to
>> have any affect in the simulator. Will have to proceed carefully
>> with testing here.
> I see some effect of the ntp_error correction in the simulator, but it
> seems rather disruptive than helpful. Perhaps the correction was meant
> to account the fact that for the ntp_error accumulation ntp_tick
> should change at the time when mult is changed instead of start of the
> tick. I tried to find that correction and came up with this:
Yes, so I actually re-added the logic a few days after sending that
patch out.
I realized the issue is that for the accumulation point, we're adjusting
the time forward or backwards, so with the new freq the non-accumulated
offset lines up with the current time. Thus the ntp_error is the error
at that accumulation point, which also needs to be adjusted
appropriately (apologies its much easier to see when drawn).
> (ntp_tick1 - ntp_tick2) * offset / interval + offset
>
> That doesn't look usable. But I don't think it's really necessary to
> have any correction for that and I'm for dropping that line from the
> code as your patch does.
I'll have to try to look over your suggestion here another day. I
unfortunately have a head cold at the moment, so any complicated math is
a bit treacherous. :)
> Anyway, it seems there is a different problem in the code. I modified
> the simulator to see how the code works when the clocksource frequency
> is not divisible by HZ. With TSC_FREQ set to 3579545 (acpi_pm freq)
> ntp_error doesn't settle down and the clock has a large frequency
> error.
>
> On top of your patch, this fixes the problem for me:
>
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -1065,7 +1065,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
>
> /* Calculate current error per tick */
> tick_error = ntp_tick_length() >> tk->ntp_error_shift;
> - tick_error -= tk->xtime_interval;
> + tick_error -= tk->xtime_interval + tk->xtime_remainder;
>
> /* Don't worry about correcting it if its small */
> if (likely(abs(tick_error) < 2*interval))
>
> My patch has this problem too. The original code seems to be affected
> to some extent, it's able to converge to the correct frequency, but
> there is some residual ntp error. Adding xtime_remainder to
> timekeeping_bigadjust() fixes that.
>
> I've some comments on the performance of the proposed code, I'll send
> them in a separate mail later.
Ok.. I'll look into this as well. Thanks so much for your review and fixes!
thanks!
-john
On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
> Got a few cycles to take another look at this, and tried to address
> Miroslav's latest comments. Please let me know if you have further
> thoughts!
In the simulations this version of the patch does indeed work better
than the previous one. I tested it with the ntp_error correction added
back as per your previous mail. It converges, the measured frequency
matches the value set by adjtimex and the ntp error stays close to
zero. I'm concerned about two things now and I'm not sure if they can
be fixed easily.
One is that the convergence still seems too slow to me.
./tk_test -t 500 -n 4000
samples: 1-500 reg: 1-500 slope: 1.00 dev: 1947.5 max: 36940.9 freq: -100.08081
samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 430.7 max: 1669.6 freq: -100.07168
samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 675.3 max: 3172.9 freq: -100.07393
samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 453.9 max: 2223.4 freq: -100.07177
samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 2601.2 max: 10875.4 freq: -100.03978
samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 185.6 max: 1251.6 freq: -99.99987
samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 160.1 max: 1181.7 freq: -99.99996
samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 150.7 max: 1103.2 freq: -99.99990
You can see in this test it takes about 2500 updates to correct the
initial ntp error and settle down. That's with 1GHz clocksource. In
some tests I did with smaller clock frequencies or different frequency
offsets it took much longer than that.
$ ./tk_test -s 2500 -n 5000
samples: 1-5000 reg: 2501-5000 slope: 1.00 dev: 135942.4 max: 390761.8 freq: -100.00000
Here the regression line is calculated only through the latter half of
the samples (where it was already settled down) and we can see the
total ntp error corrected since the first update is around 390
microseconds.
I'm not saying ntpd or linuxptp can't work with this. The PLL and PI
controllers seem to handle this slowness in the frequency adjustment
well. I'm more worried about designs that may change the frequency
quickly and rely on accurate prediction of the clock in their feedback
loop.
The other thing I'm concerned about is that the multiplier doesn't
change only between two closest values when the loop has settled down,
but in a larger range. With the 1GHz clock I see the multiplier is
moving in range of 8387767-8387772, which makes the ntp error several
times larger than it would be if the multiplier switched just between
8387769 and 8387770.
Let's make a comparison between the current kernel (A), your patch
(B), and my patch (C). The script used for these tests is in the
tktest git as test1.sh. The first two columns are errors in the
measured frequency offsets (in ppm) from the first 10 and 100 samples. The
other two columns are time error deviation and maximum (in
nanoseconds) when the loop has converged. Both nohz and nohz off are
tested. The tests are repeated with 100 slightly different frequencies
and the mean value or stddev is presented.
freq10 freq100 dev max
A, nohz on 38.38368 2.72579 1468940.9 9846788.2
B, nohz on 1.37940 0.15085 298.5 1312.5
C, nohz on 0.00601 0.00028 74.0 279.4
A, nohz off 3.89181 0.10436 0.2 0.6
B, nohz off 1.29396 0.14372 0.2 0.6
C, nohz off 0.05867 0.00204 0.2 0.6
In these tests A with nohz is really bad. C is much better than B in
the frequency control with both nohz enabled and disabled. As for the
time error, with nohz disabled all perform similarly, with nohz
enabled C is about 4 times better than B.
I've attached the latest version of my patch. Some bugfixes and
optimizations were made, but that division is still used in the code.
I've noticed there is a division in logarithmic_accumulation(), which
is used as a workaround for an older compiler bug. Perhaps it could be
removed now, so there is more space for this one?
Please reconsider.
Thanks,
--
Miroslav Lichvar
Hey Miroslav!
Once again, a few months pass and I finally get some more time to look
at this. :( Sorry for how slow this has been going.
On 02/12/2014 07:42 AM, Miroslav Lichvar wrote:
> On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
>> Got a few cycles to take another look at this, and tried to address
>> Miroslav's latest comments. Please let me know if you have further
>> thoughts!
> In the simulations this version of the patch does indeed work better
> than the previous one. I tested it with the ntp_error correction added
> back as per your previous mail. It converges, the measured frequency
> matches the value set by adjtimex and the ntp error stays close to
> zero. I'm concerned about two things now and I'm not sure if they can
> be fixed easily.
>
> One is that the convergence still seems too slow to me.
>
> ./tk_test -t 500 -n 4000
> samples: 1-500 reg: 1-500 slope: 1.00 dev: 1947.5 max: 36940.9 freq: -100.08081
> samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 430.7 max: 1669.6 freq: -100.07168
> samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 675.3 max: 3172.9 freq: -100.07393
> samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 453.9 max: 2223.4 freq: -100.07177
> samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 2601.2 max: 10875.4 freq: -100.03978
> samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 185.6 max: 1251.6 freq: -99.99987
> samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 160.1 max: 1181.7 freq: -99.99996
> samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 150.7 max: 1103.2 freq: -99.99990
>
> You can see in this test it takes about 2500 updates to correct the
> initial ntp error and settle down. That's with 1GHz clocksource. In
> some tests I did with smaller clock frequencies or different frequency
> offsets it took much longer than that.
So I started to look into this slow update issue. I was a little
confused, as the logarithmic approximation done in the frequency
correction shouldn't let *that* much initial error accumulate before get
to the +1/0 adjustments.
It ends up this is more a reflection of a different part of your patch.
Particularly the tk->ntp_tick storage. Bascially the ntp_tick variable
is a cache of the ntp_tick_length() value, however it doesn't get set to
ntp_tick_length() until *after* you do the first frequency correction.
Basically this avoids accumulating any error until after the first
correction is made.
My main concern is that this seems like an accounting error. By
basically avoiding accumulating the initial error it seems it would
never be corrected, no?
If I add a similar ntp_tick caching to my current implementation, it
converges practically as fast (with the only initial error coming from
the logarithmic approximation of the divide being quite small).
Similarly, if you remove the usage of tk->ntp_tick in the error
accumulation loop, using ntp_tick_length() your patch behaves quite
similarly to mine.
Does this align with your view of the code? Or am I misunderstanding
something?
> $ ./tk_test -s 2500 -n 5000
> samples: 1-5000 reg: 2501-5000 slope: 1.00 dev: 135942.4 max: 390761.8 freq: -100.00000
>
> Here the regression line is calculated only through the latter half of
> the samples (where it was already settled down) and we can see the
> total ntp error corrected since the first update is around 390
> microseconds.
>
> I'm not saying ntpd or linuxptp can't work with this. The PLL and PI
> controllers seem to handle this slowness in the frequency adjustment
> well. I'm more worried about designs that may change the frequency
> quickly and rely on accurate prediction of the clock in their feedback
> loop.
>
> The other thing I'm concerned about is that the multiplier doesn't
> change only between two closest values when the loop has settled down,
> but in a larger range. With the 1GHz clock I see the multiplier is
> moving in range of 8387767-8387772, which makes the ntp error several
> times larger than it would be if the multiplier switched just between
> 8387769 and 8387770.
So this point is valid. I spent some time reworking things and I'm not
totally happy with it currently (there's a little mess to it), but its
much closer to your implementation logically, but again just avoids the
divide and keeps the accounting closer to what we already have.
I'll hopefully clean up my current work and send it out tomorrow.
Also I do really appreciate your submissions here and apologize I've not
been able to put more time towards it recently. I also know having me
rewrite your patch over and over with various mistakes is probably
frustrating, but I do really want to make sure I both understand all the
subtleties of your changes (which resynthesizing helps) as well as make
sure the accounting is really correct (or at least not changed subtlety
without clear reason).
Thanks so much again!
-john
On Wed, Apr 23, 2014 at 09:22:45PM -0700, John Stultz wrote:
> On 02/12/2014 07:42 AM, Miroslav Lichvar wrote:
> > You can see in this test it takes about 2500 updates to correct the
> > initial ntp error and settle down. That's with 1GHz clocksource. In
> > some tests I did with smaller clock frequencies or different frequency
> > offsets it took much longer than that.
>
> So I started to look into this slow update issue. I was a little
> confused, as the logarithmic approximation done in the frequency
> correction shouldn't let *that* much initial error accumulate before get
> to the +1/0 adjustments.
>
> It ends up this is more a reflection of a different part of your patch.
> Particularly the tk->ntp_tick storage. Bascially the ntp_tick variable
> is a cache of the ntp_tick_length() value, however it doesn't get set to
> ntp_tick_length() until *after* you do the first frequency correction.
> Basically this avoids accumulating any error until after the first
> correction is made.
Yes, that was one of the main points of the patch. Postponing the tick
length change should remove the biggest source of the ntp error.
> My main concern is that this seems like an accounting error. By
> basically avoiding accumulating the initial error it seems it would
> never be corrected, no?
Not if we don't consider it to be something that should be corrected.
When the ntp tick length is changed (by adjtimex call or on second
overflow), to which ticks should be this change applied? The current
code always accumulates ntp error with the current tick length, i.e.
the change is effectively applied to already passed ticks since last
accumulation. I'm not saying this is necessarily wrong, but it causes
large ntp errors. I'm proposing to look at the frequency change
in a different way and apply it at the current time in the current
tick when the clock is updated.
In my understanding, there are three sources of ntp error in the
current code:
- change in the tick length is not effectively applied at the current
time in the clock update
- mult is controlled by an iterative method
- insufficient resolution of mult
I think the first source can be removed by postpoing the tick length
change as explained above. The second source can be removed by
calculating mult precisely with division instead of an iterative
method. There is probably nothing we can do about the third source
(except switching to 64-bit mult), but it's small, predictable and can
be handled easily and cheaply by the +1/0 mult adjustment.
I'm still not convinced the clock can be controlled quickly and
accurately without information about when will be the next clock
update if the first and possibly second source of ntp error remain
there. As you have probably seen when working on the patches, the
requirements are in conflict and it's difficult or maybe not even
possible to get something working well with all different update
intervals, clock multipliers and frequency changes.
>From my view, as someone involved in development of algorithms
controlling clocks, I'd like the clock to be as deterministic as
possible. When I set the frequency, the kernel shouldn't be correcting
some large unknown phase error behind my back. I still wouldn't know
when exactly was the frequency actually set, but if that information
was exported by adjtimex (I have some ideas how to do that), it would
be perfect for me.
Thanks for still working on this.
--
Miroslav Lichvar