Christopher,
Please Cc LKML on such issues in the future.
On Mon, 15 Oct 2018, Christopher Hall wrote:
Leaving context around for new readers:
> Problem Statement:
>
> The TSC clocksource mult/shift values are derived from CPUID[15H], but the
> monotonic raw clock value is not equal to TSC in nominal nanoseconds, i.e.
> the timekeeping code is not accurately transforming TSC ticks to nominal
> nanoseconds based on CPUID[15H}.
>
> The included code calculates the drift between nominal TSC nanoseconds and
> the monotonic raw clock.
>
> Background:
>
> Starting with 6th generation Intel CPUs, the TSC is "phase locked" to the
> Always Running Timer (ART). The relation between TSC and ART is read from
> CPUID[15H]. Details of the TSC-ART relation are in the "Invariant
> Timekeeping" section of the SDM.
>
> CPUID[15H].ECX returns the nominal frequency of ART (or crystal frequency).
> CPU feature TSC_KNOWN_FREQ indicates that tsc_khz (tsc.c) is derived from
> CPUID[15H]. The calculation is in tsc.c:native_calibrate_tsc().
>
> When the TSC clocksource is selected, the timekeeping code uses mult/shift
> values to transform TSC into nanoseconds. The mult/shift value is determined
> using tsc_khz.
>
> Example Output:
>
> Running for 3 seconds trial 1
> Scaled TSC delta: 3000328845
> Monotonic raw delta: 3000329117
> Ran for 3 seconds with 272 ns skew
>
> Running for 3 seconds trial 2
> Scaled TSC delta: 3000295209
> Monotonic raw delta: 3000295482
> Ran for 3 seconds with 273 ns skew
>
> Running for 3 seconds trial 3
> Scaled TSC delta: 3000262870
> Monotonic raw delta: 3000263142
> Ran for 3 seconds with 272 ns skew
>
> Running for 300 seconds trial 4
> Scaled TSC delta: 300000281725
> Monotonic raw delta: 300000308905
> Ran for 300 seconds with 27180 ns skew
>
> The skew between tsc and monotonic raw is about 91 PPB.
>
> System Information:
>
> CPU model string: Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
> Kernel version tested: 4.14.71-rt44
> NOTE: The skew seems to be insensitive to kernel version after
> introduction of TSC_KNOWN_FREQ capability
>
> >From CPUID[15H]:
> Time Stamp Counter/Core Crystal Clock Information (0x15):
> TSC/clock ratio = 276/2
> nominal core crystal clock = 24000000 Hz (table lookup)
>
> TSC kHz used to calculate mult/shift value: 3312000
Now the most interesting information here would be the resulting mult/shift
value which the kernel uses. Can you please provide that?
Thanks,
Thomas
On Fri, Oct 19, 2018 at 8:25 AM, Thomas Gleixner <[email protected]> wrote:
> Christopher,
>
> Please Cc LKML on such issues in the future.
>
> On Mon, 15 Oct 2018, Christopher Hall wrote:
>
> Leaving context around for new readers:
>
>> Problem Statement:
>>
>> The TSC clocksource mult/shift values are derived from CPUID[15H], but the
>> monotonic raw clock value is not equal to TSC in nominal nanoseconds, i.e.
>> the timekeeping code is not accurately transforming TSC ticks to nominal
>> nanoseconds based on CPUID[15H}.
>>
>> The included code calculates the drift between nominal TSC nanoseconds and
>> the monotonic raw clock.
>>
>> Background:
>>
>> Starting with 6th generation Intel CPUs, the TSC is "phase locked" to the
>> Always Running Timer (ART). The relation between TSC and ART is read from
>> CPUID[15H]. Details of the TSC-ART relation are in the "Invariant
>> Timekeeping" section of the SDM.
>>
>> CPUID[15H].ECX returns the nominal frequency of ART (or crystal frequency).
>> CPU feature TSC_KNOWN_FREQ indicates that tsc_khz (tsc.c) is derived from
>> CPUID[15H]. The calculation is in tsc.c:native_calibrate_tsc().
>>
>> When the TSC clocksource is selected, the timekeeping code uses mult/shift
>> values to transform TSC into nanoseconds. The mult/shift value is determined
>> using tsc_khz.
>>
>> Example Output:
>>
>> Running for 3 seconds trial 1
>> Scaled TSC delta: 3000328845
>> Monotonic raw delta: 3000329117
>> Ran for 3 seconds with 272 ns skew
>>
>> Running for 3 seconds trial 2
>> Scaled TSC delta: 3000295209
>> Monotonic raw delta: 3000295482
>> Ran for 3 seconds with 273 ns skew
>>
>> Running for 3 seconds trial 3
>> Scaled TSC delta: 3000262870
>> Monotonic raw delta: 3000263142
>> Ran for 3 seconds with 272 ns skew
>>
>> Running for 300 seconds trial 4
>> Scaled TSC delta: 300000281725
>> Monotonic raw delta: 300000308905
>> Ran for 300 seconds with 27180 ns skew
>>
>> The skew between tsc and monotonic raw is about 91 PPB.
>>
>> System Information:
>>
>> CPU model string: Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
>> Kernel version tested: 4.14.71-rt44
>> NOTE: The skew seems to be insensitive to kernel version after
>> introduction of TSC_KNOWN_FREQ capability
>>
>> >From CPUID[15H]:
>> Time Stamp Counter/Core Crystal Clock Information (0x15):
>> TSC/clock ratio = 276/2
>> nominal core crystal clock = 24000000 Hz (table lookup)
>>
>> TSC kHz used to calculate mult/shift value: 3312000
So, just to understand, your saying the problem that we calculate a
tsc_khz value before calculating the mult/shift and the intermediate
step is losing some precision?
Or is the cause from something else?
thanks
-john
On Fri, 19 Oct 2018, Thomas Gleixner wrote:
> On Mon, 15 Oct 2018, Christopher Hall wrote:
> > TSC kHz used to calculate mult/shift value: 3312000
>
> Now the most interesting information here would be the resulting mult/shift
> value which the kernel uses. Can you please provide that?
But aside of the actual values it's pretty obvious why this happens. It's a
simple rounding error. Minimal, but still. To avoid rounding errors you'd
have to find mult and shift values which exactly result in:
(freq * mult) >> shift = 1e9
which is impossible for freq = 3312 * 1e6.
A possible fix for this is, because the error is in the low PPB range, to
adjust the error once per second or in some other sensible regular
interval.
John?
Thanks,
tglx
On Fri, Oct 19, 2018 at 11:34 AM, John Stultz <[email protected]> wrote:
> On Fri, Oct 19, 2018 at 8:25 AM, Thomas Gleixner <[email protected]> wrote:
>> Christopher,
>>
>> Please Cc LKML on such issues in the future.
>>
>> On Mon, 15 Oct 2018, Christopher Hall wrote:
>>
>> Leaving context around for new readers:
>>
>>> Problem Statement:
>>>
>>> The TSC clocksource mult/shift values are derived from CPUID[15H], but the
>>> monotonic raw clock value is not equal to TSC in nominal nanoseconds, i.e.
>>> the timekeeping code is not accurately transforming TSC ticks to nominal
>>> nanoseconds based on CPUID[15H}.
>>>
>>> The included code calculates the drift between nominal TSC nanoseconds and
>>> the monotonic raw clock.
>>>
>>> Background:
>>>
>>> Starting with 6th generation Intel CPUs, the TSC is "phase locked" to the
>>> Always Running Timer (ART). The relation between TSC and ART is read from
>>> CPUID[15H]. Details of the TSC-ART relation are in the "Invariant
>>> Timekeeping" section of the SDM.
>>>
>>> CPUID[15H].ECX returns the nominal frequency of ART (or crystal frequency).
>>> CPU feature TSC_KNOWN_FREQ indicates that tsc_khz (tsc.c) is derived from
>>> CPUID[15H]. The calculation is in tsc.c:native_calibrate_tsc().
>>>
>>> When the TSC clocksource is selected, the timekeeping code uses mult/shift
>>> values to transform TSC into nanoseconds. The mult/shift value is determined
>>> using tsc_khz.
>>>
>>> Example Output:
>>>
>>> Running for 3 seconds trial 1
>>> Scaled TSC delta: 3000328845
>>> Monotonic raw delta: 3000329117
>>> Ran for 3 seconds with 272 ns skew
>>>
>>> Running for 3 seconds trial 2
>>> Scaled TSC delta: 3000295209
>>> Monotonic raw delta: 3000295482
>>> Ran for 3 seconds with 273 ns skew
>>>
>>> Running for 3 seconds trial 3
>>> Scaled TSC delta: 3000262870
>>> Monotonic raw delta: 3000263142
>>> Ran for 3 seconds with 272 ns skew
>>>
>>> Running for 300 seconds trial 4
>>> Scaled TSC delta: 300000281725
>>> Monotonic raw delta: 300000308905
>>> Ran for 300 seconds with 27180 ns skew
>>>
>>> The skew between tsc and monotonic raw is about 91 PPB.
>>>
>>> System Information:
>>>
>>> CPU model string: Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
>>> Kernel version tested: 4.14.71-rt44
>>> NOTE: The skew seems to be insensitive to kernel version after
>>> introduction of TSC_KNOWN_FREQ capability
>>>
>>> >From CPUID[15H]:
>>> Time Stamp Counter/Core Crystal Clock Information (0x15):
>>> TSC/clock ratio = 276/2
>>> nominal core crystal clock = 24000000 Hz (table lookup)
>>>
>>> TSC kHz used to calculate mult/shift value: 3312000
>
> So, just to understand, your saying the problem that we calculate a
> tsc_khz value before calculating the mult/shift and the intermediate
> step is losing some precision?
>
> Or is the cause from something else?
The other potential cause here might be just that when we calculate
the mult/shift pair, we select a shift small enough that avoids the
multiplication from overflowing if we have a long timerval. So there
is liklely always some granularity error converting to mult/shift
pair.
thanks
-john
On Fri, Oct 19, 2018 at 11:37 AM, Thomas Gleixner <[email protected]> wrote:
> On Fri, 19 Oct 2018, Thomas Gleixner wrote:
>> On Mon, 15 Oct 2018, Christopher Hall wrote:
>> > TSC kHz used to calculate mult/shift value: 3312000
>>
>> Now the most interesting information here would be the resulting mult/shift
>> value which the kernel uses. Can you please provide that?
>
> But aside of the actual values it's pretty obvious why this happens. It's a
> simple rounding error. Minimal, but still. To avoid rounding errors you'd
> have to find mult and shift values which exactly result in:
>
> (freq * mult) >> shift = 1e9
>
> which is impossible for freq = 3312 * 1e6.
Right.
> A possible fix for this is, because the error is in the low PPB range, to
> adjust the error once per second or in some other sensible regular
> interval.
Ehh.. I mean, this is basically what all the complicated ntp_error
logic does for the long-term accuracy compensation. Part of the
allure of the raw clock is that there is no changes made over time.
Its a very simple constant calculation.
We could try to do something like pre-seed the ntpinterval value so
the default ntp_tick value corrects for this, but that's only going to
effect the monotonic/realtime clock until ntpd or anyone else sets a
different interval rate.
So I'm not particularly eager in trying to do the type of small
frequency oscillation we do for monotonic/realtime for long-term
accuracy to the raw clock as that feels a bit antithetical to the
point of the raw clock.
I guess I'd want to understand more of the use here and the need to
tie the raw clock back to the hardware counter it abstracts.
thanks
-john
On Fri, 19 Oct 2018, John Stultz wrote:
> On Fri, Oct 19, 2018 at 11:37 AM, Thomas Gleixner <[email protected]> wrote:
> > On Fri, 19 Oct 2018, Thomas Gleixner wrote:
> >> On Mon, 15 Oct 2018, Christopher Hall wrote:
> >> > TSC kHz used to calculate mult/shift value: 3312000
> >>
> >> Now the most interesting information here would be the resulting mult/shift
> >> value which the kernel uses. Can you please provide that?
> >
> > But aside of the actual values it's pretty obvious why this happens. It's a
> > simple rounding error. Minimal, but still. To avoid rounding errors you'd
> > have to find mult and shift values which exactly result in:
> >
> > (freq * mult) >> shift = 1e9
> >
> > which is impossible for freq = 3312 * 1e6.
>
> Right.
>
> > A possible fix for this is, because the error is in the low PPB range, to
> > adjust the error once per second or in some other sensible regular
> > interval.
>
> Ehh.. I mean, this is basically what all the complicated ntp_error
> logic does for the long-term accuracy compensation. Part of the
> allure of the raw clock is that there is no changes made over time.
> Its a very simple constant calculation.
>
> We could try to do something like pre-seed the ntpinterval value so
> the default ntp_tick value corrects for this, but that's only going to
> effect the monotonic/realtime clock until ntpd or anyone else sets a
> different interval rate.
>
> So I'm not particularly eager in trying to do the type of small
> frequency oscillation we do for monotonic/realtime for long-term
> accuracy to the raw clock as that feels a bit antithetical to the
> point of the raw clock.
I don't think you need complex oscillation for that. The error is constant
and small enough that it is a fractional nanoseconds thing with an interval
<= 1s. So you can just add that in a regular interval. Due to it being
small you can't observe time jumping I think.
The end-result is 'correct' as much correct it is in relation to real
nanoseconds. :)
> I guess I'd want to understand more of the use here and the need to
> tie the raw clock back to the hardware counter it abstracts.
The problem there is ART which is distributed to PCIe devices and ART time
stamps are exposed in various ways. ART has a fixed ratio vs. TSC so there
is a reasonable expectation that MONOTONIC_RAW is accurate.
Thanks,
tglx
On Fri, Oct 19, 2018 at 11:57 AM, Thomas Gleixner <[email protected]> wrote:
> On Fri, 19 Oct 2018, John Stultz wrote:
>> On Fri, Oct 19, 2018 at 11:37 AM, Thomas Gleixner <[email protected]> wrote:
>> > On Fri, 19 Oct 2018, Thomas Gleixner wrote:
>> >> On Mon, 15 Oct 2018, Christopher Hall wrote:
>> >> > TSC kHz used to calculate mult/shift value: 3312000
>> >>
>> >> Now the most interesting information here would be the resulting mult/shift
>> >> value which the kernel uses. Can you please provide that?
>> >
>> > But aside of the actual values it's pretty obvious why this happens. It's a
>> > simple rounding error. Minimal, but still. To avoid rounding errors you'd
>> > have to find mult and shift values which exactly result in:
>> >
>> > (freq * mult) >> shift = 1e9
>> >
>> > which is impossible for freq = 3312 * 1e6.
>>
>> Right.
>>
>> > A possible fix for this is, because the error is in the low PPB range, to
>> > adjust the error once per second or in some other sensible regular
>> > interval.
>>
>> Ehh.. I mean, this is basically what all the complicated ntp_error
>> logic does for the long-term accuracy compensation. Part of the
>> allure of the raw clock is that there is no changes made over time.
>> Its a very simple constant calculation.
>>
>> We could try to do something like pre-seed the ntpinterval value so
>> the default ntp_tick value corrects for this, but that's only going to
>> effect the monotonic/realtime clock until ntpd or anyone else sets a
>> different interval rate.
>>
>> So I'm not particularly eager in trying to do the type of small
>> frequency oscillation we do for monotonic/realtime for long-term
>> accuracy to the raw clock as that feels a bit antithetical to the
>> point of the raw clock.
>
> I don't think you need complex oscillation for that. The error is constant
> and small enough that it is a fractional nanoseconds thing with an interval
> <= 1s. So you can just add that in a regular interval. Due to it being
> small you can't observe time jumping I think.
Well, from the examples the trouble is we seem to be a bit fast,
rather then slow.
So we'll have to reduce mult by one, and rework the calculations, but
maybe something like this (correcting the raw_interval value) would
work.
But this also sort of breaks, fundamental argument that the raw clock
is a simple mult/shift transformation of the underlying clocksource
counter. Its not the accuracy of the clock but the consistency that
was key.
The counter argument is that the raw clock is abstracting the
underlying hardware so folks who would have used the TSC directly can
now use the raw clock and have a generic abstracted hardware-counter
interface. So userland shouldn't really be worried about the
occasional injections made since they shouldn't be trying to
re-generate the abstraction from the hardware themselves. <--
Remember this point as we move to the next comment:)
> The end-result is 'correct' as much correct it is in relation to real
> nanoseconds. :)
>
>> I guess I'd want to understand more of the use here and the need to
>> tie the raw clock back to the hardware counter it abstracts.
>
> The problem there is ART which is distributed to PCIe devices and ART time
> stamps are exposed in various ways. ART has a fixed ratio vs. TSC so there
> is a reasonable expectation that MONOTONIC_RAW is accurate.
Which is maybe sort of my issue here. The raw clock provided a
abstraction away from the hardware for generic usage, but then its
being re-used with other un-abstracted hardware references. So unless
they use the same method of transformation, there will be problems (of
varying degree).
We might be able to reduce the degree in this case, but I worry the
extra complexity may only cause problems for others.
thanks
-john
John,
On Fri, 19 Oct 2018, John Stultz wrote:
> On Fri, Oct 19, 2018 at 11:57 AM, Thomas Gleixner <[email protected]> wrote:
> > I don't think you need complex oscillation for that. The error is constant
> > and small enough that it is a fractional nanoseconds thing with an interval
> > <= 1s. So you can just add that in a regular interval. Due to it being
> > small you can't observe time jumping I think.
>
> Well, from the examples the trouble is we seem to be a bit fast,
> rather then slow.
> So we'll have to reduce mult by one, and rework the calculations, but
> maybe something like this (correcting the raw_interval value) would
> work.
Shouldn't be rocket science. It's a one off calculation of adjustment value
and maybe the period at which the correction happens.
> But this also sort of breaks, fundamental argument that the raw clock
> is a simple mult/shift transformation of the underlying clocksource
> counter. Its not the accuracy of the clock but the consistency that
> was key.
>
> The counter argument is that the raw clock is abstracting the
> underlying hardware so folks who would have used the TSC directly can
> now use the raw clock and have a generic abstracted hardware-counter
> interface. So userland shouldn't really be worried about the
> occasional injections made since they shouldn't be trying to
> re-generate the abstraction from the hardware themselves. <--
> Remember this point as we move to the next comment:)
>
> > The end-result is 'correct' as much correct it is in relation to real
> > nanoseconds. :)
> >
> >> I guess I'd want to understand more of the use here and the need to
> >> tie the raw clock back to the hardware counter it abstracts.
> >
> > The problem there is ART which is distributed to PCIe devices and ART time
> > stamps are exposed in various ways. ART has a fixed ratio vs. TSC so there
> > is a reasonable expectation that MONOTONIC_RAW is accurate.
>
> Which is maybe sort of my issue here. The raw clock provided a
> abstraction away from the hardware for generic usage, but then its
> being re-used with other un-abstracted hardware references. So unless
> they use the same method of transformation, there will be problems (of
> varying degree).
OTOH. If people use the CPUID provided frequency information and the TSC
from userspace then they get different results which is contrary to the
goal of providing them an abstracted way of doing it.
> We might be able to reduce the degree in this case, but I worry the
> extra complexity may only cause problems for others.
Is it really that complex to add a fixed correction value periodically?
I don't think so and it should just work for any clocksource which is
exposed this way. Famous last words .....
Thanks,
tglx
On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner <[email protected]> wrote:
> John,
>
> On Fri, 19 Oct 2018, John Stultz wrote:
>> On Fri, Oct 19, 2018 at 11:57 AM, Thomas Gleixner <[email protected]> wrote:
>> > I don't think you need complex oscillation for that. The error is constant
>> > and small enough that it is a fractional nanoseconds thing with an interval
>> > <= 1s. So you can just add that in a regular interval. Due to it being
>> > small you can't observe time jumping I think.
>>
>> Well, from the examples the trouble is we seem to be a bit fast,
>> rather then slow.
>> So we'll have to reduce mult by one, and rework the calculations, but
>> maybe something like this (correcting the raw_interval value) would
>> work.
>
> Shouldn't be rocket science. It's a one off calculation of adjustment value
> and maybe the period at which the correction happens.
>
>> But this also sort of breaks, fundamental argument that the raw clock
>> is a simple mult/shift transformation of the underlying clocksource
>> counter. Its not the accuracy of the clock but the consistency that
>> was key.
>>
>> The counter argument is that the raw clock is abstracting the
>> underlying hardware so folks who would have used the TSC directly can
>> now use the raw clock and have a generic abstracted hardware-counter
>> interface. So userland shouldn't really be worried about the
>> occasional injections made since they shouldn't be trying to
>> re-generate the abstraction from the hardware themselves. <--
>> Remember this point as we move to the next comment:)
>>
>> > The end-result is 'correct' as much correct it is in relation to real
>> > nanoseconds. :)
>> >
>> >> I guess I'd want to understand more of the use here and the need to
>> >> tie the raw clock back to the hardware counter it abstracts.
>> >
>> > The problem there is ART which is distributed to PCIe devices and ART time
>> > stamps are exposed in various ways. ART has a fixed ratio vs. TSC so there
>> > is a reasonable expectation that MONOTONIC_RAW is accurate.
>>
>> Which is maybe sort of my issue here. The raw clock provided a
>> abstraction away from the hardware for generic usage, but then its
>> being re-used with other un-abstracted hardware references. So unless
>> they use the same method of transformation, there will be problems (of
>> varying degree).
>
> OTOH. If people use the CPUID provided frequency information and the TSC
> from userspace then they get different results which is contrary to the
> goal of providing them an abstracted way of doing it.
But that's my point. If they are pulling time values from the hardware
directly that's unabstracted. I'm not sure its smart to be comparing
the abstracted and unabstracted time stamps if your worried about
precision. They are sort of two separate (though similar) time
domains.
>> We might be able to reduce the degree in this case, but I worry the
>> extra complexity may only cause problems for others.
>
> Is it really that complex to add a fixed correction value periodically?
>
> I don't think so and it should just work for any clocksource which is
> exposed this way. Famous last words .....
I'm not saying that the code is overly complex (at least compared to
the rest of the timekeeping code :), but just how the accumulation is
done is less-trivial. So if someone else is trying to mimic the
abstracted time with unabstracted hardware values (again, not
something I reccomend, but that's sort of the usage case pushing
this), they need to use a similar method that is slightly more
complicated (or use slower math). Its all subtle stuff, but this makes
something that was relatively very simple (by design) a bit harder to
explain.
So I'm not out right objecting, I'm more wringing my hands at the
potential edge cases that improving the accuracy of the un-adjusted
clock raises. :)
thanks
-john
On Fri, Oct 19, 2018 at 3:36 PM, John Stultz <[email protected]> wrote:
> On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner <[email protected]> wrote:
>> John,
>>
>> On Fri, 19 Oct 2018, John Stultz wrote:
>>> On Fri, Oct 19, 2018 at 11:57 AM, Thomas Gleixner <[email protected]> wrote:
>>> > I don't think you need complex oscillation for that. The error is constant
>>> > and small enough that it is a fractional nanoseconds thing with an interval
>>> > <= 1s. So you can just add that in a regular interval. Due to it being
>>> > small you can't observe time jumping I think.
>>>
>>> Well, from the examples the trouble is we seem to be a bit fast,
>>> rather then slow.
>>> So we'll have to reduce mult by one, and rework the calculations, but
>>> maybe something like this (correcting the raw_interval value) would
>>> work.
>>
>> Shouldn't be rocket science. It's a one off calculation of adjustment value
>> and maybe the period at which the correction happens.
>>
>>> But this also sort of breaks, fundamental argument that the raw clock
>>> is a simple mult/shift transformation of the underlying clocksource
>>> counter. Its not the accuracy of the clock but the consistency that
>>> was key.
>>>
>>> The counter argument is that the raw clock is abstracting the
>>> underlying hardware so folks who would have used the TSC directly can
>>> now use the raw clock and have a generic abstracted hardware-counter
>>> interface. So userland shouldn't really be worried about the
>>> occasional injections made since they shouldn't be trying to
>>> re-generate the abstraction from the hardware themselves. <--
>>> Remember this point as we move to the next comment:)
>>>
>>> > The end-result is 'correct' as much correct it is in relation to real
>>> > nanoseconds. :)
>>> >
>>> >> I guess I'd want to understand more of the use here and the need to
>>> >> tie the raw clock back to the hardware counter it abstracts.
>>> >
>>> > The problem there is ART which is distributed to PCIe devices and ART time
>>> > stamps are exposed in various ways. ART has a fixed ratio vs. TSC so there
>>> > is a reasonable expectation that MONOTONIC_RAW is accurate.
>>>
>>> Which is maybe sort of my issue here. The raw clock provided a
>>> abstraction away from the hardware for generic usage, but then its
>>> being re-used with other un-abstracted hardware references. So unless
>>> they use the same method of transformation, there will be problems (of
>>> varying degree).
>>
>> OTOH. If people use the CPUID provided frequency information and the TSC
>> from userspace then they get different results which is contrary to the
>> goal of providing them an abstracted way of doing it.
>
> But that's my point. If they are pulling time values from the hardware
> directly that's unabstracted. I'm not sure its smart to be comparing
> the abstracted and unabstracted time stamps if your worried about
> precision. They are sort of two separate (though similar) time
> domains.
>
>>> We might be able to reduce the degree in this case, but I worry the
>>> extra complexity may only cause problems for others.
>>
>> Is it really that complex to add a fixed correction value periodically?
>>
>> I don't think so and it should just work for any clocksource which is
>> exposed this way. Famous last words .....
>
> I'm not saying that the code is overly complex (at least compared to
> the rest of the timekeeping code :), but just how the accumulation is
> done is less-trivial. So if someone else is trying to mimic the
> abstracted time with unabstracted hardware values (again, not
> something I reccomend, but that's sort of the usage case pushing
> this), they need to use a similar method that is slightly more
> complicated (or use slower math). Its all subtle stuff, but this makes
> something that was relatively very simple (by design) a bit harder to
> explain.
Adding Mirosalv as he's always thoughtful on these sorts of issues.
I spent a little bit of time thinking this out. Unfortunately I don't
think its a simple matter of calculating the granularity error on the
raw clock and adding it in each interval. The other trouble spot is
that the adjusted clocks (monotonic/realtime) are adjusted off of that
raw clock. So they would need to have that error added as well,
otherwise the raw and a otherwise non-adjusted monotonic clock would
drift.
However, to be correct, the ntp adjustments made would have to be made
to both the base interval + error, which mucks the math up a fair bit.
Maybe Miroslav will have a better idea here, but otherwise I'll stew
on this a bit more and see what I can come up with.
thanks
-john
On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote:
> On Fri, Oct 19, 2018 at 3:36 PM, John Stultz <[email protected]> wrote:
> > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner <[email protected]> wrote:
> >> On Fri, 19 Oct 2018, John Stultz wrote:
> >>> We might be able to reduce the degree in this case, but I worry the
> >>> extra complexity may only cause problems for others.
> >>
> >> Is it really that complex to add a fixed correction value periodically?
The error is too large to be corrected by stepping on clock updates.
For a typical TSC frequency we have multiplier in the range of few
millions, so that's a frequency error of up to few hundred ppb. In the
old days when the clock was updated 1000 times per second that would
be hidden in the resolution of the clock, but now with tickless
kernels those steps would be very noticeable.
If the multiplier was adjusted in the same way as the non-raw clock,
there wouldn't be any steps in time, but there would be steps in
frequency and the time error would be proportional to the update
interval. For a clock updated once per second that's an error of up to
a few hundreds of nanoseconds.
I agree with John. I think the raw monotonic clock should be stable.
It would help if we better understood the use case. If I needed a
clock that ticks in an exact proportion to the tsc, why wouldn't I use
the tsc directly? Is this about having a fall back in case the tsc
cannot be used (e.g. due to unsynchronized CPUs)?
If the frequency error was exported, it could be compensated where
necessary. Maybe that would work for the original poster?
A better fix might be to modify the calculation of time to use a
second multiplier, effectively increasing its resolution. However,
that would slow down all users of the clock.
--
Miroslav Lichvar
On Wed, Oct 24, 2018 at 04:51:13PM +0200, Miroslav Lichvar wrote:
> On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote:
> > On Fri, Oct 19, 2018 at 3:36 PM, John Stultz <[email protected]> wrote:
> > > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner <[email protected]> wrote:
> > >> On Fri, 19 Oct 2018, John Stultz wrote:
> > >>> We might be able to reduce the degree in this case, but I worry the
> > >>> extra complexity may only cause problems for others.
> > >>
> > >> Is it really that complex to add a fixed correction value periodically?
>
> The error is too large to be corrected by stepping on clock updates.
> For a typical TSC frequency we have multiplier in the range of few
> millions, so that's a frequency error of up to few hundred ppb. In the
For this example, a i5-6600 CPU @ 3.30GHz I measured 90 ppb. That is the
largest I've seen since measuring this on a few platforms. 20-40 PPB seems
more typical.
> old days when the clock was updated 1000 times per second that would
> be hidden in the resolution of the clock, but now with tickless
> kernels those steps would be very noticeable.
>
> If the multiplier was adjusted in the same way as the non-raw clock,
> there wouldn't be any steps in time, but there would be steps in
> frequency and the time error would be proportional to the update
> interval. For a clock updated once per second that's an error of up to
> a few hundreds of nanoseconds.
>
> I agree with John. I think the raw monotonic clock should be stable.
>
> It would help if we better understood the use case. If I needed a
There may be other future use cases, but the most immediate use is timed
GPIO. Current shipping hardware has a timed GPIO device that uses ART/TSC
directly to schedule output events and time-stamp input events. There isn't
a device clock like PHC, as an example.
An example application is synchronizing TGPIO across two networked devices
sync'd with PTP. It's possible to use existing the existing PHC interface to
compute the network time in terms of raw monotonic and vice versa.
Now we need to add a TSC to monotonic raw relation in order to interact
with the TGPIO hardware. To me this doesn't make sense because they use the
same hardware clock.
> clock that ticks in an exact proportion to the tsc, why wouldn't I use
> the tsc directly? Is this about having a fall back in case the tsc
I think this would work OK. This would need some additional API to expose the
relation between monotonic raw and TSC. In the kernel that would probably mean
plumbing ktime_get_snapshot() to the application. Or, another option:
add TSC as another clock to the get_device_system_crosststamp() interface.
The second option would be more precise, but for the proposed application
(or similar) would require changes to the supporting stacks like PTP.
> cannot be used (e.g. due to unsynchronized CPUs)?
No.
>
> If the frequency error was exported, it could be compensated where
> necessary. Maybe that would work for the original poster?
This would be the same as plumbing ktime_get_snapshot() except that the
math would be done in the kernel.
>
> A better fix might be to modify the calculation of time to use a
> second multiplier, effectively increasing its resolution. However,
I'm not sure, I'm understanding. Like cascading transforms? While this
would increase the precision, I think it would still drift over days. We
could probably fix up every second though.
> that would slow down all users of the clock.
Couldn't some clocksources specify an additional multiplier/precision and
others use lower precision?
>
> --
> Miroslav Lichvar
On Wed, Oct 24, 2018 at 01:32:48PM -0400, Christopher Hall wrote:
> On Wed, Oct 24, 2018 at 04:51:13PM +0200, Miroslav Lichvar wrote:
> > The error is too large to be corrected by stepping on clock updates.
> > For a typical TSC frequency we have multiplier in the range of few
> > millions, so that's a frequency error of up to few hundred ppb. In the
>
> For this example, a i5-6600 CPU @ 3.30GHz I measured 90 ppb. That is the
> largest I've seen since measuring this on a few platforms. 20-40 PPB seems
> more typical.
It looks like there is rounding in the calculation of the multiplier,
so for frequencies in the range of 3-5GHz the error will be up to
about 149 ppb. Without rounding it would be twice as much, but always
fast or slow.
> > A better fix might be to modify the calculation of time to use a
> > second multiplier, effectively increasing its resolution. However,
>
> I'm not sure, I'm understanding. Like cascading transforms? While this
> would increase the precision, I think it would still drift over days. We
> could probably fix up every second though.
I was thinking about using a wider multiplier, but avoiding a full
64x64 bit multiplication.
For example, in your case with the 3312MHz clock
nsec = (cycles * 5065585) >> 24
could be replaced with
nsec = (cycles * 5065584) >> 24 + (cycles * 4538763) >> 47
This would take few hours to drift by one nanosecond. If something
like this was implemented for the raw clock, it would probably make
sense to switch the other clocks too.
> > that would slow down all users of the clock.
>
> Couldn't some clocksources specify an additional multiplier/precision and
> others use lower precision?
I guess it could, but I'm not sure if people here would be happy with
the extra complexity.
--
Miroslav Lichvar
On Tue, 23 Oct 2018, John Stultz wrote:
> On Fri, Oct 19, 2018 at 3:36 PM, John Stultz <[email protected]> wrote:
> I spent a little bit of time thinking this out. Unfortunately I don't
> think its a simple matter of calculating the granularity error on the
> raw clock and adding it in each interval. The other trouble spot is
> that the adjusted clocks (monotonic/realtime) are adjusted off of that
> raw clock. So they would need to have that error added as well,
> otherwise the raw and a otherwise non-adjusted monotonic clock would
> drift.
>
> However, to be correct, the ntp adjustments made would have to be made
> to both the base interval + error, which mucks the math up a fair bit.
Hmm, confused as usual. Why would you need to do anything like that?
Thanks,
tglx
On Thu, Nov 1, 2018 at 10:44 AM, Thomas Gleixner <[email protected]> wrote:
> On Tue, 23 Oct 2018, John Stultz wrote:
>> On Fri, Oct 19, 2018 at 3:36 PM, John Stultz <[email protected]> wrote:
>> I spent a little bit of time thinking this out. Unfortunately I don't
>> think its a simple matter of calculating the granularity error on the
>> raw clock and adding it in each interval. The other trouble spot is
>> that the adjusted clocks (monotonic/realtime) are adjusted off of that
>> raw clock. So they would need to have that error added as well,
>> otherwise the raw and a otherwise non-adjusted monotonic clock would
>> drift.
>>
>> However, to be correct, the ntp adjustments made would have to be made
>> to both the base interval + error, which mucks the math up a fair bit.
>
> Hmm, confused as usual. Why would you need to do anything like that?
Because the NTP adjustment is done off of what is now the raw clock.
If the raw clock is "corrected" the ppb adjustment has to be done off
of that corrected rate.
Otherwise with no correction, the raw clock and the monotonic clock
would drift apart.
thanks
-john
Miroslav,
On Wed, 24 Oct 2018, Miroslav Lichvar wrote:
> On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote:
> > On Fri, Oct 19, 2018 at 3:36 PM, John Stultz <[email protected]> wrote:
> > > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner <[email protected]> wrote:
> > >> On Fri, 19 Oct 2018, John Stultz wrote:
> > >>> We might be able to reduce the degree in this case, but I worry the
> > >>> extra complexity may only cause problems for others.
> > >>
> > >> Is it really that complex to add a fixed correction value periodically?
>
> The error is too large to be corrected by stepping on clock updates.
> For a typical TSC frequency we have multiplier in the range of few
> millions, so that's a frequency error of up to few hundred ppb. In the
> old days when the clock was updated 1000 times per second that would
> be hidden in the resolution of the clock, but now with tickless
> kernels those steps would be very noticeable.
>
> If the multiplier was adjusted in the same way as the non-raw clock,
> there wouldn't be any steps in time, but there would be steps in
> frequency and the time error would be proportional to the update
> interval. For a clock updated once per second that's an error of up to
> a few hundreds of nanoseconds.
That only happens when the system was completely idle for a second and in
that case it's a non issue because the clock is updated before it's
used. So nothing will be able to observe the time jumping forward by a few
or even a few hundreds of nanoseconds. For the regular case, where CPUs are
busy and the update happens 100/250/1000 times per second the jump forward
will not be noticable at all.
> I agree with John. I think the raw monotonic clock should be stable.
It is stable. It's still monotonically increasing.
> It would help if we better understood the use case. If I needed a
> clock that ticks in an exact proportion to the tsc, why wouldn't I use
> the tsc directly? Is this about having a fall back in case the tsc
> cannot be used (e.g. due to unsynchronized CPUs)?
>
> If the frequency error was exported, it could be compensated where
> necessary. Maybe that would work for the original poster?
I'd rather not go there and have yet another magic knob to deal with and we
need to have the same thing in the kernel as well.
> A better fix might be to modify the calculation of time to use a
> second multiplier, effectively increasing its resolution. However,
> that would slow down all users of the clock.
Right, and we carefully try to avoid that.
I really would like to give the adjustment a try. It should solve the
problem Christopher cares about and not have any side effects on other
users of monotonic raw including NTP/PTP etc. Famous last words ....
Thanks,
tglx
On Thu, 1 Nov 2018, John Stultz wrote:
> On Thu, Nov 1, 2018 at 10:44 AM, Thomas Gleixner <[email protected]> wrote:
> > On Tue, 23 Oct 2018, John Stultz wrote:
> >> On Fri, Oct 19, 2018 at 3:36 PM, John Stultz <[email protected]> wrote:
> >> I spent a little bit of time thinking this out. Unfortunately I don't
> >> think its a simple matter of calculating the granularity error on the
> >> raw clock and adding it in each interval. The other trouble spot is
> >> that the adjusted clocks (monotonic/realtime) are adjusted off of that
> >> raw clock. So they would need to have that error added as well,
> >> otherwise the raw and a otherwise non-adjusted monotonic clock would
> >> drift.
> >>
> >> However, to be correct, the ntp adjustments made would have to be made
> >> to both the base interval + error, which mucks the math up a fair bit.
> >
> > Hmm, confused as usual. Why would you need to do anything like that?
>
> Because the NTP adjustment is done off of what is now the raw clock.
> If the raw clock is "corrected" the ppb adjustment has to be done off
> of that corrected rate.
Sure, but why would that require any change? Right now the raw clock is
slightly off and you correct clock monotonic against NTP. So with that
extra correction you just see a slightly different raw clock slew and work
from there.
> Otherwise with no correction, the raw clock and the monotonic clock
> would drift apart.
They drift apart today already if you do NTP correction. They stay in sync
w/o NTP/PTP/PPS.
But maybe I confused myself completely by now.
Thanks,
tglx
On Thu, Nov 01, 2018 at 06:41:00PM +0100, Thomas Gleixner wrote:
> On Wed, 24 Oct 2018, Miroslav Lichvar wrote:
> > The error is too large to be corrected by stepping on clock updates.
> > For a typical TSC frequency we have multiplier in the range of few
> > millions, so that's a frequency error of up to few hundred ppb. In the
> > old days when the clock was updated 1000 times per second that would
> > be hidden in the resolution of the clock, but now with tickless
> > kernels those steps would be very noticeable.
> That only happens when the system was completely idle for a second and in
> that case it's a non issue because the clock is updated before it's
> used. So nothing will be able to observe the time jumping forward by a few
> or even a few hundreds of nanoseconds.
That's great news (to me). I think we should do the same with the
mono/real clock. A periodic 4ns step would be better than a slew
correcting tens or hundreds of nanoseconds. This would be a
significant improvement in accuracy on idle systems, in theory
identical to running with nohz=off.
Maybe I am missing some important detail, but I think we can just drop
the +1 mult adjustment and step on each update by the (truncated)
amount that has accumulated in the NTP error register. With the
changes that have been made earlier this year the clock should never
be ahead, so the step would always be forward.
> For the regular case, where CPUs are
> busy and the update happens 100/250/1000 times per second the jump forward
> will not be noticable at all.
I think a 4ns jump at 100 Hz might be noticeable with a good reference
clock and large number of measurements, but so would be the current +1
mult adjustment.
--
Miroslav Lichvar
On Thu, Nov 01, 2018 at 07:03:37PM +0100, Thomas Gleixner wrote:
> On Thu, 1 Nov 2018, John Stultz wrote:
> > On Thu, Nov 1, 2018 at 10:44 AM, Thomas Gleixner <[email protected]> wrote:
> > > On Tue, 23 Oct 2018, John Stultz wrote:
> > >> However, to be correct, the ntp adjustments made would have to be made
> > >> to both the base interval + error, which mucks the math up a fair bit.
> > >
> > > Hmm, confused as usual. Why would you need to do anything like that?
> >
> > Because the NTP adjustment is done off of what is now the raw clock.
> > If the raw clock is "corrected" the ppb adjustment has to be done off
> > of that corrected rate.
>
> Sure, but why would that require any change? Right now the raw clock is
> slightly off and you correct clock monotonic against NTP. So with that
> extra correction you just see a slightly different raw clock slew and work
> from there.
It makes sense to me.
I think there are basically two different ways how it could be done.
One is to correct the frequency of the raw clock, on which sits the
mono/real clock. The other is to create a new raw clock which is
separate from the mono/real clock, and add an offset to the NTP
frequency to match the frequencies of the two clocks when not
synchronized by NTP/PTP. The latter would provide a more stable
mono/real clock.
clocksource -> MONOTONIC_RAW -> MONOTONIC/REALTIME
or
clocksource -> ? -> MONOTONIC_RAW
-> MONOTONIC/REALTIME
--
Miroslav Lichvar
On Fri, 2 Nov 2018, Miroslav Lichvar wrote:
> On Thu, Nov 01, 2018 at 07:03:37PM +0100, Thomas Gleixner wrote:
> > On Thu, 1 Nov 2018, John Stultz wrote:
> > > On Thu, Nov 1, 2018 at 10:44 AM, Thomas Gleixner <[email protected]> wrote:
> > > > On Tue, 23 Oct 2018, John Stultz wrote:
> > > >> However, to be correct, the ntp adjustments made would have to be made
> > > >> to both the base interval + error, which mucks the math up a fair bit.
> > > >
> > > > Hmm, confused as usual. Why would you need to do anything like that?
> > >
> > > Because the NTP adjustment is done off of what is now the raw clock.
> > > If the raw clock is "corrected" the ppb adjustment has to be done off
> > > of that corrected rate.
> >
> > Sure, but why would that require any change? Right now the raw clock is
> > slightly off and you correct clock monotonic against NTP. So with that
> > extra correction you just see a slightly different raw clock slew and work
> > from there.
>
> It makes sense to me.
>
> I think there are basically two different ways how it could be done.
> One is to correct the frequency of the raw clock, on which sits the
> mono/real clock. The other is to create a new raw clock which is
> separate from the mono/real clock, and add an offset to the NTP
> frequency to match the frequencies of the two clocks when not
> synchronized by NTP/PTP. The latter would provide a more stable
> mono/real clock.
>
> clocksource -> MONOTONIC_RAW -> MONOTONIC/REALTIME
>
> or
>
> clocksource -> ? -> MONOTONIC_RAW
> -> MONOTONIC/REALTIME
That's what we have now. At least I don't see how the raw thing is coupled
in NTP.
Thanks,
tglx
Miroslav,
On Fri, 2 Nov 2018, Miroslav Lichvar wrote:
> On Thu, Nov 01, 2018 at 06:41:00PM +0100, Thomas Gleixner wrote:
> > On Wed, 24 Oct 2018, Miroslav Lichvar wrote:
> > > The error is too large to be corrected by stepping on clock updates.
> > > For a typical TSC frequency we have multiplier in the range of few
> > > millions, so that's a frequency error of up to few hundred ppb. In the
> > > old days when the clock was updated 1000 times per second that would
> > > be hidden in the resolution of the clock, but now with tickless
> > > kernels those steps would be very noticeable.
>
> > That only happens when the system was completely idle for a second and in
> > that case it's a non issue because the clock is updated before it's
> > used. So nothing will be able to observe the time jumping forward by a few
> > or even a few hundreds of nanoseconds.
>
> That's great news (to me). I think we should do the same with the
> mono/real clock. A periodic 4ns step would be better than a slew
> correcting tens or hundreds of nanoseconds. This would be a
> significant improvement in accuracy on idle systems, in theory
> identical to running with nohz=off.
>
> Maybe I am missing some important detail, but I think we can just drop
> the +1 mult adjustment and step on each update by the (truncated)
> amount that has accumulated in the NTP error register. With the
> changes that have been made earlier this year the clock should never
> be ahead, so the step would always be forward.
That sounds reasonable.
> > For the regular case, where CPUs are
> > busy and the update happens 100/250/1000 times per second the jump forward
> > will not be noticable at all.
>
> I think a 4ns jump at 100 Hz might be noticeable with a good reference
> clock and large number of measurements, but so would be the current +1
> mult adjustment.
Indeed.
Thanks,
tglx
On Fri, Nov 02, 2018 at 12:25:56PM +0100, Thomas Gleixner wrote:
> On Fri, 2 Nov 2018, Miroslav Lichvar wrote:
> > clocksource -> MONOTONIC_RAW -> MONOTONIC/REALTIME
> >
> > or
> >
> > clocksource -> ? -> MONOTONIC_RAW
> > -> MONOTONIC/REALTIME
>
> That's what we have now. At least I don't see how the raw thing is coupled
> in NTP.
Oh, right.
So, for the first approach when the raw clock is stepped, the same
step needs to be applied to the mono clock. That means the mono clock
will have two large independent corrections applied to it, which will
make it less stable, e.g. steps of up to 8 ns at 100 Hz instead of 4
ns. Otherwise they would drift from each other when not synchronized
by NTP/PTP.
In the second approach, the corrections of the two clocks are
independent, but the NTP tick length needs to be adjusted before the
multiplier is calculated to match the frequency of the raw clock. If
the correction was applied later, I suspect it would require a +2 mult
adjustment or larger steps as in the first approach.
That tick adjustment generally won't be perfectly accurate, so an
extra small correction would be needed to keep the clocks in sync over
long intervals. This could be a +1 adjustment of the NTP tick length,
or it could be a small forward step. I'm not sure what would be easier
to implement.
Does that make sense?
--
Miroslav Lichvar