2014-04-24 23:04:50

by John Stultz

[permalink] [raw]
Subject: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering

Continuing the sporadic work on improving the timekeeping
frequency steering logic when NOHZ is enabled, I've made a number
of changes to my re-implementation of Miroslav's patch (most
recently posted here: https://lkml.org/lkml/2014/2/12/401 ),
and I'm getting much closer results in the simulator.

Compared with Miroslav's patch, this avoids doing any extra
divisions, and instead approximates the correction
logarithmically. In addition to trying to lower overhead, this
allows the other accounting bits to be managed in much the
same way we have been doing for awhile.

This patch set contains one portion from Miroslav's patch
(ntp_tick caching) split out into a separate logical patch.

Below are some of the simulator results (BTW Miroslav, w/ 3.15-rc,
you'll need to revert b399fe355b30d01 to get the simulator building)
comparing this patchset to Miroslav's patch to show we're getting in
the right order of magnitude.


Miroslav's patch:
-----------------
$ ./tk_test -t 500 -n 4000
samples: 1-500 reg: 1-500 slope: 1.00 dev: 115.5 max: 300.1 freq: -100.00004
samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 73.7 max: 289.6 freq: -100.00003
samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 59.7 max: 283.9 freq: -100.00001
samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 50.6 max: 291.6 freq: -100.00000
samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 48.6 max: 285.7 freq: -100.00002
samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 44.8 max: 293.5 freq: -99.99993
samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 40.5 max: 303.8 freq: -99.99996
samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 37.6 max: 288.5 freq: -99.99997

$ ./test1.sh
freq10 freq100 dev max
nohz on 0.00601 0.00028 74.0 279.4
nohz off 0.05867 0.00204 0.2 0.6


This patchset:
--------------
$ ./tk_test -t 500 -n 4000
samples: 1-500 reg: 1-500 slope: 1.00 dev: 104.8 max: 278.4 freq: -100.00000
samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 75.9 max: 295.5 freq: -100.00000
samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 59.5 max: 284.5 freq: -100.00001
samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 52.5 max: 287.3 freq: -99.99998
samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 46.2 max: 292.2 freq: -100.00003
samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 43.4 max: 294.8 freq: -100.00005
samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 39.2 max: 279.5 freq: -99.99999
samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 37.6 max: 285.2 freq: -99.99999


$ ./test1.sh
freq10 freq100 dev max
nohz on 0.00748 0.00076 110.8 476.4
nohz off 0.07173 0.03590 0.6 2.1


Again, many many thanks to Miroslav for pointing out this issue, providing
the simulator and initial patches, as well as helping to point out problems
with my rework of his change.

Comments, feedback and testing would be greatly appreciated!

thanks
-john

Cc: Miroslav Lichvar <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>

John Stultz (3):
[RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
[RFC] timekeeping: Use cached ntp_tick_length when accumulating error
[RFC] timekeeping: Loop in the freqadjust logic to speed up
convergence

include/linux/timekeeper_internal.h | 7 ++
kernel/time/timekeeping.c | 201 ++++++++++++++++--------------------
2 files changed, 98 insertions(+), 110 deletions(-)

--
1.8.3.2


2014-04-24 23:04:53

by John Stultz

[permalink] [raw]
Subject: [PATCH 3/3] [RFC] timekeeping: Loop in the freqadjust logic to speed up convergence

I'm not convinced this is necessary, but this patch changes
the freqadjust logic so instead of approximating the correct
frequency adjustment making one logarithmic adjustment per tick,
we loop until the correct adjustment is found on the first call.

Thus, rather then splitting the correction up over a number of ticks
we do it all in the first tick. This is an increase of work, but
should avoid error accumulation which would slow convergence.

Again, I'm not sure the error accumulation is significant enough
to warrant the extra work here, but it improves simulator results
so I'm including this change for review and discussion.

Cc: Miroslav Lichvar <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Signed-off-by: John Stultz <[email protected]>
---
kernel/time/timekeeping.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index a130d00..6e715c3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1139,6 +1139,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
bool negative;
u32 adj;

+again:
tk->ntp_tick = ntp_tick_length();
/* Calculate current error per tick */
tick_error = ntp_tick_length() >> tk->ntp_error_shift;
@@ -1159,6 +1160,8 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,

/* scale the corrections */
timekeeping_apply_adjustment(tk, offset, negative, adj);
+ if (adj)
+ goto again;
}


--
1.8.3.2

2014-04-24 23:05:46

by John Stultz

[permalink] [raw]
Subject: [PATCH 2/3] [RFC] timekeeping: Use cached ntp_tick_length when accumulating error

By caching the ntp_tick_length() when we correct the frequency error,
and then using that cached value to accumulate error, we avoid large
initial errors when the tick length is changed.

This makes convergence happen much faster in the simulator, since the
initial error doesn't have to be slowly whittled away.

This initially seems like an accounting error, but Miroslav pointed out
that ntp_tick_length() can change mid-tick, so when we apply it in the
error accumulation, we are applying any recent change to the entire tick.

This approach chooses to apply changes in the ntp_tick_length() only to
the next tick, which allows us to calculate the freq correction before
using the new tick length, which avoids accummulating error.

Credit to Miroslav for pointing this out and providing the original patch
this functionality has been pulled out from, along with the rational.

Cc: Miroslav Lichvar <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Reported-by: Miroslav Lichvar <[email protected]>
Signed-off-by: John Stultz <[email protected]>
---
include/linux/timekeeper_internal.h | 7 +++++++
kernel/time/timekeeping.c | 4 +++-
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index c1825eb..83fd40f 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -34,6 +34,13 @@ struct timekeeper {
/* Clock shifted nano seconds */
u64 xtime_nsec;

+ /* The ntp_tick_length() value currently being used.
+ * This cached copy ensures we consistently apply the tick
+ * length for an entire tick, as ntp_tick_length may change
+ * mid-tick, and we don't want to apply that new value to
+ * the tick in progress.
+ */
+ u64 ntp_tick;
/* Difference between accumulated time and NTP time in ntp
* shifted nano seconds. */
s64 ntp_error;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9aa8ccf..a130d00 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -148,6 +148,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
* to counteract clock drifting.
*/
tk->mult = clock->mult;
+ tk->ntp_tick = ntpinterval << tk->ntp_error_shift;
}

/* Timekeeper helper functions. */
@@ -1138,6 +1139,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
bool negative;
u32 adj;

+ tk->ntp_tick = ntp_tick_length();
/* Calculate current error per tick */
tick_error = ntp_tick_length() >> tk->ntp_error_shift;
tick_error -= (tk->xtime_interval + tk->xtime_remainder);
@@ -1292,7 +1294,7 @@ static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
tk->raw_time.tv_nsec = raw_nsecs;

/* Accumulate error between NTP and clock interval */
- tk->ntp_error += ntp_tick_length() << shift;
+ tk->ntp_error += tk->ntp_tick << shift;
tk->ntp_error -= (tk->xtime_interval + tk->xtime_remainder) <<
(tk->ntp_error_shift + shift);

--
1.8.3.2

2014-04-24 23:06:23

by John Stultz

[permalink] [raw]
Subject: [PATCH 1/3] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.

Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.

The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.

Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.

1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct it so it doesn't cause
much accumulated error.

2) Then make a small single unit adjustment to correct any cumulative
error that has collected over time.

This method performs fairly well in the simulator Miroslav created.

Major credit to Miroslav for pointing out the issue, providing the
original patch to resolve this, a simulator for testing, as well as
helping debug and resolve issues in my implementation so that it
performed closer to his original implementation.

I'd be very interested in feedback, thoughts, and testing.

Cc: Miroslav Lichvar <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Reported-by: Miroslav Lichvar <[email protected]>
Signed-off-by: John Stultz <[email protected]>
---
kernel/time/timekeeping.c | 194 ++++++++++++++++++++--------------------------
1 file changed, 85 insertions(+), 109 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index f7df8ea..9aa8ccf 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1052,122 +1052,31 @@ static int __init timekeeping_init_ops(void)

device_initcall(timekeeping_init_ops);

-/*
- * If the error is already larger, we look ahead even further
- * to compensate for late or lost adjustments.
- */
-static __always_inline int timekeeping_bigadjust(struct timekeeper *tk,
- s64 error, s64 *interval,
- s64 *offset)
-{
- s64 tick_error, i;
- u32 look_ahead, adj;
- s32 error2, mult;
-
- /*
- * Use the current error value to determine how much to look ahead.
- * The larger the error the slower we adjust for it to avoid problems
- * with losing too many ticks, otherwise we would overadjust and
- * produce an even larger error. The smaller the adjustment the
- * faster we try to adjust for it, as lost ticks can do less harm
- * here. This is tuned so that an error of about 1 msec is adjusted
- * within about 1 sec (or 2^20 nsec in 2^SHIFT_HZ ticks).
- */
- error2 = tk->ntp_error >> (NTP_SCALE_SHIFT + 22 - 2 * SHIFT_HZ);
- error2 = abs(error2);
- for (look_ahead = 0; error2 > 0; look_ahead++)
- error2 >>= 2;
-
- /*
- * Now calculate the error in (1 << look_ahead) ticks, but first
- * remove the single look ahead already included in the error.
- */
- tick_error = ntp_tick_length() >> (tk->ntp_error_shift + 1);
- tick_error -= tk->xtime_interval >> 1;
- error = ((error - tick_error) >> look_ahead) + tick_error;
-
- /* Finally calculate the adjustment shift value. */
- i = *interval;
- mult = 1;
- if (error < 0) {
- error = -error;
- *interval = -*interval;
- *offset = -*offset;
- mult = -1;
- }
- for (adj = 0; error > i; adj++)
- error >>= 1;

- *interval <<= adj;
- *offset <<= adj;
- return mult << adj;
-}

-/*
- * Adjust the multiplier to reduce the error value,
- * this is optimized for the most common adjustments of -1,0,1,
- * for other values we can do a bit more work.
- */
-static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
+static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
+ s64 offset,
+ bool negative,
+ int adj_scale)
{
- s64 error, interval = tk->cycle_interval;
- int adj;
+ s64 interval = tk->cycle_interval;
+ s32 mult_adj = 1;

- /*
- * The point of this is to check if the error is greater than half
- * an interval.
- *
- * First we shift it down from NTP_SHIFT to clocksource->shifted nsecs.
- *
- * Note we subtract one in the shift, so that error is really error*2.
- * This "saves" dividing(shifting) interval twice, but keeps the
- * (error > interval) comparison as still measuring if error is
- * larger than half an interval.
- *
- * Note: It does not "save" on aggravation when reading the code.
- */
- error = tk->ntp_error >> (tk->ntp_error_shift - 1);
- if (error > interval) {
- /*
- * We now divide error by 4(via shift), which checks if
- * the error is greater than twice the interval.
- * If it is greater, we need a bigadjust, if its smaller,
- * we can adjust by 1.
- */
- error >>= 2;
- if (likely(error <= interval))
- adj = 1;
- else
- adj = timekeeping_bigadjust(tk, error, &interval, &offset);
- } else {
- if (error < -interval) {
- /* See comment above, this is just switched for the negative */
- error >>= 2;
- if (likely(error >= -interval)) {
- adj = -1;
- interval = -interval;
- offset = -offset;
- } else {
- adj = timekeeping_bigadjust(tk, error, &interval, &offset);
- }
- } else {
- goto out_adjust;
- }
+ if (negative) {
+ mult_adj = -mult_adj;
+ interval = -interval;
+ offset = -offset;
}
+ mult_adj <<= adj_scale;
+ interval <<= adj_scale;
+ offset <<= adj_scale;

- if (unlikely(tk->clock->maxadj &&
- (tk->mult + adj > tk->clock->mult + tk->clock->maxadj))) {
- printk_once(KERN_WARNING
- "Adjusting %s more than 11%% (%ld vs %ld)\n",
- tk->clock->name, (long)tk->mult + adj,
- (long)tk->clock->mult + tk->clock->maxadj);
- }
/*
* So the following can be confusing.
*
- * To keep things simple, lets assume adj == 1 for now.
+ * To keep things simple, lets assume mult_adj == 1 for now.
*
- * When adj != 1, remember that the interval and offset values
+ * When mult_adj != 1, remember that the interval and offset values
* have been appropriately scaled so the math is the same.
*
* The basic idea here is that we're increasing the multiplier
@@ -1211,12 +1120,80 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
*
* XXX - TODO: Doc ntp_error calculation.
*/
- tk->mult += adj;
+ tk->mult += mult_adj;
tk->xtime_interval += interval;
tk->xtime_nsec -= offset;
tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
+}
+
+/*
+ * Calculate the future error caused by incorrect freq value
+ * and adjust the timekeeper to correct that.
+ */
+static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
+ s64 interval,
+ s64 offset)
+{
+ s64 tick_error, i;
+ bool negative;
+ u32 adj;
+
+ /* Calculate current error per tick */
+ tick_error = ntp_tick_length() >> tk->ntp_error_shift;
+ tick_error -= (tk->xtime_interval + tk->xtime_remainder);
+
+ /* Don't worry about correcting it if its small */
+ if (likely(abs(tick_error) < interval/2) && (tick_error > 0))
+ return;
+
+ /* preserve the direction of correction */
+ negative = (tick_error < 0);
+
+ /* Sort out the magnitude of the correction */
+ tick_error = abs(tick_error);
+ i = abs(interval);
+ for (adj = 0; tick_error > i; adj++)
+ tick_error >>= 1;
+
+ /* scale the corrections */
+ timekeeping_apply_adjustment(tk, offset, negative, adj);
+}
+
+
+/*
+ * Adjust the multiplier to reduce the error value,
+ * this is optimized for the most common adjustments of -1,0,1,
+ * for other values we can do a bit more work.
+ */
+static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
+{
+
+ s64 interval = tk->cycle_interval;
+ static int tk_erradj;
+
+ /* Undo any existing error adjustment */
+ if (tk_erradj) {
+ timekeeping_apply_adjustment(tk, offset, 1, 0);
+ tk_erradj = 0;
+ }
+
+ /* Correct for the current frequency error */
+ timekeeping_freqadjust(tk, interval, offset);
+
+ /* Next make a small adjustment to fix any cumulative error */
+ if (tk->ntp_error > 0) {
+ tk_erradj = 1;
+ timekeeping_apply_adjustment(tk, offset, 0, 0);
+ }
+
+ if (unlikely(tk->clock->maxadj &&
+ (tk->mult > tk->clock->mult + tk->clock->maxadj))) {
+ printk_once(KERN_WARNING
+ "Adjusting %s more than 11%% (%ld vs %ld)\n",
+ tk->clock->name, (long)tk->mult,
+ (long)tk->clock->mult + tk->clock->maxadj);
+ }

-out_adjust:
/*
* It may be possible that when we entered this function, xtime_nsec
* was very small. Further, if we're slightly speeding the clocksource
@@ -1236,7 +1213,6 @@ out_adjust:
tk->xtime_nsec = 0;
tk->ntp_error += neg << tk->ntp_error_shift;
}
-
}

/**
--
1.8.3.2

2014-04-25 14:04:29

by Miroslav Lichvar

[permalink] [raw]
Subject: Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering

On Thu, Apr 24, 2014 at 04:04:34PM -0700, John Stultz wrote:
> Continuing the sporadic work on improving the timekeeping
> frequency steering logic when NOHZ is enabled, I've made a number
> of changes to my re-implementation of Miroslav's patch (most
> recently posted here: https://lkml.org/lkml/2014/2/12/401 ),
> and I'm getting much closer results in the simulator.

Thanks, in my initial testing it seems to be working well. The results
from the simulator are much better than with the previous patch.

> Compared with Miroslav's patch, this avoids doing any extra
> divisions, and instead approximates the correction
> logarithmically.

Hm, doesn't that basically make the code a software implementation of
division? It seems it needs about 4-8 iterations to get to the final
result.

I didn't measure it, but I think with this change it now may be close
or possibly even slower than my patch. The extra division and
multiplication in my patch are used only when the tick length changes
(normally once per second), otherwise the update is very cheap.

> Miroslav's patch:
> -----------------
> $ ./test1.sh
> freq10 freq100 dev max
> nohz on 0.00601 0.00028 74.0 279.4
> nohz off 0.05867 0.00204 0.2 0.6

> This patchset:
> --------------
> $ ./test1.sh
> freq10 freq100 dev max
> nohz on 0.00748 0.00076 110.8 476.4
> nohz off 0.07173 0.03590 0.6 2.1

This looks pretty good to me. It's interesting that the performance
with nohz off got worse, but when I modify the test to use more points
I can see it does converge to the correct frequency and probably it's
not a big problem.

It seems it still doesn't always switch mult only between the two
closest values, which explains the slightly worse dev and max values.

Thanks,

--
Miroslav Lichvar

2014-04-25 19:20:47

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering



On 04/24/2014 07:04 PM, John Stultz wrote:
> Continuing the sporadic work on improving the timekeeping
> frequency steering logic when NOHZ is enabled, I've made a number
> of changes to my re-implementation of Miroslav's patch (most
> recently posted here: https://lkml.org/lkml/2014/2/12/401 ),
> and I'm getting much closer results in the simulator.
>
> Compared with Miroslav's patch, this avoids doing any extra
> divisions, and instead approximates the correction
> logarithmically. In addition to trying to lower overhead, this
> allows the other accounting bits to be managed in much the
> same way we have been doing for awhile.
>
> This patch set contains one portion from Miroslav's patch
> (ntp_tick caching) split out into a separate logical patch.
>
> Below are some of the simulator results (BTW Miroslav, w/ 3.15-rc,
> you'll need to revert b399fe355b30d01 to get the simulator building)
> comparing this patchset to Miroslav's patch to show we're getting in
> the right order of magnitude.
>

John, I'm going to put this into a weekend long test right away (along with
another unrelated cpupower patch). I'll ping back Monday to let you know how it
went.

P.

2014-04-25 21:05:58

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering

On 04/25/2014 07:04 AM, Miroslav Lichvar wrote:
> On Thu, Apr 24, 2014 at 04:04:34PM -0700, John Stultz wrote:
>> Continuing the sporadic work on improving the timekeeping
>> frequency steering logic when NOHZ is enabled, I've made a number
>> of changes to my re-implementation of Miroslav's patch (most
>> recently posted here: https://lkml.org/lkml/2014/2/12/401 ),
>> and I'm getting much closer results in the simulator.
> Thanks, in my initial testing it seems to be working well. The results
> from the simulator are much better than with the previous patch.
>
>> Compared with Miroslav's patch, this avoids doing any extra
>> divisions, and instead approximates the correction
>> logarithmically.
> Hm, doesn't that basically make the code a software implementation of
> division? It seems it needs about 4-8 iterations to get to the final
> result.
>
> I didn't measure it, but I think with this change it now may be close
> or possibly even slower than my patch. The extra division and
> multiplication in my patch are used only when the tick length changes
> (normally once per second), otherwise the update is very cheap.

This point is quite true, though I'm really not convinced the last patch
in the series is realistically needed. It does help the code get close
to your implementation in comparisons w/ the simulator but I haven't had
the chance to sort out what the real world impact is likely to be.

That said, you're points about just setting the freq rather then trying
to converge towards it to eliminate the second layer of correction
oscillation is starting to resonate more with me. Thus I may very well
switch to the division, but if so I also want to get the manipulations
to the supporting accounting variables similarly documented as the
scaled manipulation currently provides.


>
>> Miroslav's patch:
>> -----------------
>> $ ./test1.sh
>> freq10 freq100 dev max
>> nohz on 0.00601 0.00028 74.0 279.4
>> nohz off 0.05867 0.00204 0.2 0.6
>> This patchset:
>> --------------
>> $ ./test1.sh
>> freq10 freq100 dev max
>> nohz on 0.00748 0.00076 110.8 476.4
>> nohz off 0.07173 0.03590 0.6 2.1
> This looks pretty good to me. It's interesting that the performance
> with nohz off got worse, but when I modify the test to use more points
> I can see it does converge to the correct frequency and probably it's
> not a big problem.
>
> It seems it still doesn't always switch mult only between the two
> closest values, which explains the slightly worse dev and max values.
Huh. I don't think I saw that in my testing. I'll look into it again.

I suspect the extra error comes from the occasional underflow handling
(which you avoid w/ the second_overflow_skip stuff which would help but
feels a little clunky to me - but I'm still thinking it over).

thanks
-john

2014-04-29 11:22:31

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering



On 04/24/2014 07:04 PM, John Stultz wrote:
> Continuing the sporadic work on improving the timekeeping
> frequency steering logic when NOHZ is enabled, I've made a number
> of changes to my re-implementation of Miroslav's patch (most
> recently posted here: https://lkml.org/lkml/2014/2/12/401 ),
> and I'm getting much closer results in the simulator.
>
> Compared with Miroslav's patch, this avoids doing any extra
> divisions, and instead approximates the correction
> logarithmically. In addition to trying to lower overhead, this
> allows the other accounting bits to be managed in much the
> same way we have been doing for awhile.
>
> This patch set contains one portion from Miroslav's patch
> (ntp_tick caching) split out into a separate logical patch.
>
> Below are some of the simulator results (BTW Miroslav, w/ 3.15-rc,
> you'll need to revert b399fe355b30d01 to get the simulator building)
> comparing this patchset to Miroslav's patch to show we're getting in
> the right order of magnitude.
>
>

Tested-by: Prarit Bhargava <[email protected]>

P.

2014-04-30 14:01:28

by Miroslav Lichvar

[permalink] [raw]
Subject: Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering

On Fri, Apr 25, 2014 at 02:05:49PM -0700, John Stultz wrote:
> On 04/25/2014 07:04 AM, Miroslav Lichvar wrote:
> > It seems it still doesn't always switch mult only between the two
> > closest values, which explains the slightly worse dev and max values.
> Huh. I don't think I saw that in my testing. I'll look into it again.

I can see it with tk_test -o 100000, for instance. It's switching
between 8389446, 8389447 and 8389448.

> I suspect the extra error comes from the occasional underflow handling
> (which you avoid w/ the second_overflow_skip stuff which would help but
> feels a little clunky to me - but I'm still thinking it over).

It seems to be something else as I can see it even when I remove
"advance_ticks(3, 4, 1);" from tk_test.c so clock updates are aligned
exactly with ticks and no underflow can happen (i.e. offset in
timekeeping_apply_adjustment() is zero).

I agree the skip_second_overflow flag in my patch is ugly, but it's
necesssary as the code would otherwise take too long to correct the
underflowed part in ntp error.

Anyway, I did more testing and I think I found a more serious problem.
It seems the loop doesn't handle well tick lengths which happen to be
close to the middle between multipliers. For example:

$ ./tk_test -n 10000 -o 100077
samples: 1-10000 reg: 1-10000 slope: 1.00 dev: 1241.7 max: 3532.3 freq: 100.07717

When I add the following line to the kernel code to see the value of
mult and ntp_error after clock update:

+++ b/kernel/time/timekeeping.c
@@ -1386,6 +1386,7 @@ void update_wall_time(void)
/* correct the clock when NTP error is too big */
timekeeping_adjust(tk, offset);

+ printk("%d %lld\n", tk->mult, tk->ntp_error >> (tk->ntp_error_shift + tk->shift));

I get this:

8389447 -101
8389449 6
8389447 -321
8389448 -198
8389447 -249
...
8389447 -6344
8389448 -6158
8389447 -6223
8389448 -6211
8389447 -6265
8389448 -6029

It looks like the correction is not able to handle the random
cumulation of differences in the lengths between odd and even update
intervals. The overall frequency is accurate, but ntp error is in
microseconds here.

Can you please look into this?

Thanks,

--
Miroslav Lichvar