DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 74EDA7EBA7
Date: Thu, 8 Jun 2017 18:17:33 +0200
From: Miroslav Lichvar <mlichvar@redhat.com>
To: John Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>,
        lkml <linux-kernel@vger.kernel.org>,
        Prarit Bhargava <prarit@redhat.com>,
        Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [PATCH RFC 0/3] Improve stability of system clock
Message-ID: <20170608161733.GB13262@localhost>
References: <20170517161317.19557-1-mlichvar@redhat.com>
 <CALAqxLVjzOAf_UmY_5F4cYevqFD2Qa4+-uPB-2HRYNk93qvsXw@mail.gmail.com>
 <20170517165756.GA19423@localhost>
 <CALAqxLVBeskNrAdDR-J_akouZNWpY2crvRBoF-gwk0VPk9boSg@mail.gmail.com>
 <20170517172220.GB19423@localhost>
 <CALAqxLUt7t8t6t6+u29mmUsKTgsPMgFoRwpN2HJEqHUH31oX0w@mail.gmail.com>
 <20170518045435.GB2258@localhost.localdomain>
 <CALAqxLX+-gQzQ2-sbwmNn=A6C1XXKB00-XrehyX4nqV=3_wNAw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALAqxLX+-gQzQ2-sbwmNn=A6C1XXKB00-XrehyX4nqV=3_wNAw@mail.gmail.com>
User-Agent: Mutt/1.8.0 (2017-02-23)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4070
Lines: 79

On Fri, May 19, 2017 at 05:35:38PM -0700, John Stultz wrote:
> >> On Wed, May 17, 2017 at 10:22 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> >> > Is there a better way to run the timekeeping code in an userspace
> >> > application? I suspect it would need something like the Linux Kernel
> >> > Library project.

> So a few years ago I mentioned this at a testing session at I think
> Linux Plubmers' and Rusty (CC'ed) commented that he had some netfilter
> (or iptables?) simulator code that never made it upstream. However,
> now that kselftests are integrated with the kernel this could change.
> At least that's my memory of the discussion.
> 
> Anyway, I still think its worth trying to submit. Worse case its a
> huge pain and we pull it back out?

I've tried something different. I've reimplemented the simulated test
as an ordinary user-space application using the CLOCK_MONOTONIC and
CLOCK_MONOTONIC_RAW clocks. It's not deterministic, it doesn't give
results instantly, and it's not as precise as the original test, but
it can clearly show the difference that this patchset makes.

Before:
Clock precision: 46 ns
CLOCK_MONOTONIC_RAW frequency offset: 0.00 ppm
   base   step       freq    dev     max       freq    dev     max
  15.24  40960      -0.04    273    1852      +0.01      2       3
 228.26  40960      +6.07  33089  225914      -0.09      2       4
  55.42  40960     -11.90  46413  232834      +0.01      1       3
 237.65  40960      -4.75  25574  173479      +0.05      1       3
 240.63  40960      -0.08    846    5758      +0.05      2       3
 205.52    640      -0.11    781    5231      -0.01      2       4
 246.81    640      +0.21    809    5486      +0.08      1       3
 164.04    640      +0.16    282    1920      +0.11      2       3
 171.32    640      -0.08    408    2756      -0.01      2       3
 243.04    640      -0.03    349    2377      +0.03      2       3
 179.91     10      +0.07     28      62      -0.00      2       6
  45.44     10      +0.10     18     119      +0.00      6      29
 204.30     10      -0.00     21     122      -0.00      4       9
  76.18     10      +0.03     39      85      -0.00      3       5
 158.18     10      -0.02     26      94      -0.00      4       9

After:
Clock precision: 46 ns
CLOCK_MONOTONIC_RAW frequency offset: -0.00 ppm
   base   step       freq    dev     max       freq    dev     max
  93.98  40960      +0.00      3       7      +0.00      4       8
 117.95  40960      +0.00      3       9      -0.00      3       8
 230.44  40960      -0.00      4       9      -0.00      2       5
 240.56  40960      +0.00      3       6      -0.00      3       7
 228.39  40960      +0.00      3       7      -0.00      3       9
 237.85    640      -0.00      4      10      +0.00      3       8
 250.74    640      +0.00      4      11      -0.00      3       7
 249.06    640      +0.00      4       9      -0.00      3       9
 114.98    640      -0.00      3       8      +0.00      3       8
 120.59    640      +0.00      3       7      +0.00      3       7
 190.66     10      -0.00      3       7      +0.00      3       7
 228.83     10      +0.00      3       7      -0.00      3       6
  18.91     10      +0.00      3       8      -0.00      3       8
  12.39     10      +0.00      3       8      +0.00      4       8
  12.01     10      +0.00      4       9      -0.00      4       9

Each line has statistics from 100 samples collected in 0.1 second
interval.

The frequency error in the second "freq" column with values up to 0.11
ppm shows the problem with the clock very slowly correcting a large
NTP error.

I can add some limits for the measured errors and submit it as new a
kselftest. If the measured precision is too large (e.g. >100ns), the
test can return "skip" in order to avoid false negatives.

If adjtimex() had an option to return the NTP error directly, or it
was possible to read the two clocks at the same time, the test could
be much more sensitive and observe shorter intervals (spanning fewer
clock updates).

-- 
Miroslav Lichvar