2017-06-08 16:17:40

by Miroslav Lichvar

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] Improve stability of system clock

On Fri, May 19, 2017 at 05:35:38PM -0700, John Stultz wrote:
> >> On Wed, May 17, 2017 at 10:22 AM, Miroslav Lichvar <[email protected]> wrote:
> >> > Is there a better way to run the timekeeping code in an userspace
> >> > application? I suspect it would need something like the Linux Kernel
> >> > Library project.

> So a few years ago I mentioned this at a testing session at I think
> Linux Plubmers' and Rusty (CC'ed) commented that he had some netfilter
> (or iptables?) simulator code that never made it upstream. However,
> now that kselftests are integrated with the kernel this could change.
> At least that's my memory of the discussion.
>
> Anyway, I still think its worth trying to submit. Worse case its a
> huge pain and we pull it back out?

I've tried something different. I've reimplemented the simulated test
as an ordinary user-space application using the CLOCK_MONOTONIC and
CLOCK_MONOTONIC_RAW clocks. It's not deterministic, it doesn't give
results instantly, and it's not as precise as the original test, but
it can clearly show the difference that this patchset makes.

Before:
Clock precision: 46 ns
CLOCK_MONOTONIC_RAW frequency offset: 0.00 ppm
base step freq dev max freq dev max
15.24 40960 -0.04 273 1852 +0.01 2 3
228.26 40960 +6.07 33089 225914 -0.09 2 4
55.42 40960 -11.90 46413 232834 +0.01 1 3
237.65 40960 -4.75 25574 173479 +0.05 1 3
240.63 40960 -0.08 846 5758 +0.05 2 3
205.52 640 -0.11 781 5231 -0.01 2 4
246.81 640 +0.21 809 5486 +0.08 1 3
164.04 640 +0.16 282 1920 +0.11 2 3
171.32 640 -0.08 408 2756 -0.01 2 3
243.04 640 -0.03 349 2377 +0.03 2 3
179.91 10 +0.07 28 62 -0.00 2 6
45.44 10 +0.10 18 119 +0.00 6 29
204.30 10 -0.00 21 122 -0.00 4 9
76.18 10 +0.03 39 85 -0.00 3 5
158.18 10 -0.02 26 94 -0.00 4 9

After:
Clock precision: 46 ns
CLOCK_MONOTONIC_RAW frequency offset: -0.00 ppm
base step freq dev max freq dev max
93.98 40960 +0.00 3 7 +0.00 4 8
117.95 40960 +0.00 3 9 -0.00 3 8
230.44 40960 -0.00 4 9 -0.00 2 5
240.56 40960 +0.00 3 6 -0.00 3 7
228.39 40960 +0.00 3 7 -0.00 3 9
237.85 640 -0.00 4 10 +0.00 3 8
250.74 640 +0.00 4 11 -0.00 3 7
249.06 640 +0.00 4 9 -0.00 3 9
114.98 640 -0.00 3 8 +0.00 3 8
120.59 640 +0.00 3 7 +0.00 3 7
190.66 10 -0.00 3 7 +0.00 3 7
228.83 10 +0.00 3 7 -0.00 3 6
18.91 10 +0.00 3 8 -0.00 3 8
12.39 10 +0.00 3 8 +0.00 4 8
12.01 10 +0.00 4 9 -0.00 4 9

Each line has statistics from 100 samples collected in 0.1 second
interval.

The frequency error in the second "freq" column with values up to 0.11
ppm shows the problem with the clock very slowly correcting a large
NTP error.

I can add some limits for the measured errors and submit it as new a
kselftest. If the measured precision is too large (e.g. >100ns), the
test can return "skip" in order to avoid false negatives.

If adjtimex() had an option to return the NTP error directly, or it
was possible to read the two clocks at the same time, the test could
be much more sensitive and observe shorter intervals (spanning fewer
clock updates).

--
Miroslav Lichvar


2017-06-08 18:36:16

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] Improve stability of system clock

On Thu, Jun 8, 2017 at 9:17 AM, Miroslav Lichvar <[email protected]> wrote:
> On Fri, May 19, 2017 at 05:35:38PM -0700, John Stultz wrote:
>> >> On Wed, May 17, 2017 at 10:22 AM, Miroslav Lichvar <[email protected]> wrote:
>> >> > Is there a better way to run the timekeeping code in an userspace
>> >> > application? I suspect it would need something like the Linux Kernel
>> >> > Library project.
>
>> So a few years ago I mentioned this at a testing session at I think
>> Linux Plubmers' and Rusty (CC'ed) commented that he had some netfilter
>> (or iptables?) simulator code that never made it upstream. However,
>> now that kselftests are integrated with the kernel this could change.
>> At least that's my memory of the discussion.
>>
>> Anyway, I still think its worth trying to submit. Worse case its a
>> huge pain and we pull it back out?
>
> I've tried something different. I've reimplemented the simulated test
> as an ordinary user-space application using the CLOCK_MONOTONIC and
> CLOCK_MONOTONIC_RAW clocks. It's not deterministic, it doesn't give
> results instantly, and it's not as precise as the original test, but
> it can clearly show the difference that this patchset makes.
>
> Before:
> Clock precision: 46 ns
> CLOCK_MONOTONIC_RAW frequency offset: 0.00 ppm
> base step freq dev max freq dev max
> 15.24 40960 -0.04 273 1852 +0.01 2 3
> 228.26 40960 +6.07 33089 225914 -0.09 2 4
> 55.42 40960 -11.90 46413 232834 +0.01 1 3
> 237.65 40960 -4.75 25574 173479 +0.05 1 3
> 240.63 40960 -0.08 846 5758 +0.05 2 3
> 205.52 640 -0.11 781 5231 -0.01 2 4
> 246.81 640 +0.21 809 5486 +0.08 1 3
> 164.04 640 +0.16 282 1920 +0.11 2 3
> 171.32 640 -0.08 408 2756 -0.01 2 3
> 243.04 640 -0.03 349 2377 +0.03 2 3
> 179.91 10 +0.07 28 62 -0.00 2 6
> 45.44 10 +0.10 18 119 +0.00 6 29
> 204.30 10 -0.00 21 122 -0.00 4 9
> 76.18 10 +0.03 39 85 -0.00 3 5
> 158.18 10 -0.02 26 94 -0.00 4 9
>
> After:
> Clock precision: 46 ns
> CLOCK_MONOTONIC_RAW frequency offset: -0.00 ppm
> base step freq dev max freq dev max
> 93.98 40960 +0.00 3 7 +0.00 4 8
> 117.95 40960 +0.00 3 9 -0.00 3 8
> 230.44 40960 -0.00 4 9 -0.00 2 5
> 240.56 40960 +0.00 3 6 -0.00 3 7
> 228.39 40960 +0.00 3 7 -0.00 3 9
> 237.85 640 -0.00 4 10 +0.00 3 8
> 250.74 640 +0.00 4 11 -0.00 3 7
> 249.06 640 +0.00 4 9 -0.00 3 9
> 114.98 640 -0.00 3 8 +0.00 3 8
> 120.59 640 +0.00 3 7 +0.00 3 7
> 190.66 10 -0.00 3 7 +0.00 3 7
> 228.83 10 +0.00 3 7 -0.00 3 6
> 18.91 10 +0.00 3 8 -0.00 3 8
> 12.39 10 +0.00 3 8 +0.00 4 8
> 12.01 10 +0.00 4 9 -0.00 4 9
>
> Each line has statistics from 100 samples collected in 0.1 second
> interval.
>
> The frequency error in the second "freq" column with values up to 0.11
> ppm shows the problem with the clock very slowly correcting a large
> NTP error.

Might rename the headers for the second column set for clarity?

>
> I can add some limits for the measured errors and submit it as new a
> kselftest. If the measured precision is too large (e.g. >100ns), the
> test can return "skip" in order to avoid false negatives.

That all sounds great!

(Though I still do really like your simulator! Being able to have
deterministic test results from known inputs is a big plus.)

thanks
-john