Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752340AbdI0Ixl (ORCPT ); Wed, 27 Sep 2017 04:53:41 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:51429 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751730AbdI0Ixg (ORCPT ); Wed, 27 Sep 2017 04:53:36 -0400 Date: Wed, 27 Sep 2017 10:52:29 +0200 (CEST) From: Thomas Gleixner To: Paolo Bonzini cc: Denis Plotnikov , rkrcmar@redhat.com, kvm@vger.kernel.org, john.stultz@linaro.org, mingo@redhat.com, hpa@zytor.com, linux-kernel@vger.kernel.org, x86@kernel.org, rkagan@virtuozzo.com, den@virtuozzo.com Subject: Re: [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback In-Reply-To: <7b886c9f-70a7-951b-ff6f-f77f263754e7@redhat.com> Message-ID: References: <1504106628-172372-1-git-send-email-dplotnikov@virtuozzo.com> <1504106628-172372-2-git-send-email-dplotnikov@virtuozzo.com> <7b886c9f-70a7-951b-ff6f-f77f263754e7@redhat.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5170 Lines: 123 On Tue, 26 Sep 2017, Paolo Bonzini wrote: > On 26/09/2017 00:00, Thomas Gleixner wrote: > >> + * @read_with_stamp: saves cycles value (got from timekeeper) and cycles > >> + * stamp (got from hardware counter value and used by > >> + * timekeeper to calculate the cycles value) to > >> + * corresponding input pointers return true if cycles > >> + * stamp holds real cycles and false if it holds some > >> + * time derivative value > > > > Neither the changelog nor this comment make any sense. Not to talk about > > the actual TSC side implementation which does the same as tsc_read() - it > > actually uses tsc_read() - but stores the same value twice and > > unconditionally returns true. > > > > Unless I'm missing something important all of this can be achieved with a > > minimal and actually understandable patch. See below. > > If that is acceptable, that's certainly okay for us too as a start, in > order to clean up the KVM code. > > *However*, I must once more ask for help understanding > ktime_get_snapshot, otherwise there is no way that I can start using it > in kvmclock. > > In particular I'd like to understand the contract of ktime_get_snapshot, > namely the meaning of the "cycles" field in struct system_time_snapshot. > > Does it have to be system clock cycles, like TSC, or can it be just the > value of the clock source? And if the latter, are the users broken, > because they can receive any random clocksource output in ->cycles? It > doesn't help that, for all of the users of ->cycles (which are all > reached from get_device_system_crosststamp), the code that actually uses > the member is never called in the current git master. stamp->cycles is always the raw counter value retrieved via clock->read(). clock is the current clock source. > I asked these questions to John in the previous review, but got no > answer. My understanding is that get_device_system_crosststamp is > broken for !tsc clocksource. > > Because struct system_counterval_t must have a TSC value, history_begin > needs to contain a cross-timestamp of the TSC (in ->cycles) and the clock > (in ->real and ->raw). And if this cross-timestamp is not available, > get_device_system_crosststamp needs to know, so that it doesn't use > history_begin at all. Forget about TSC. TSC is one particular use case. What's required here are two clocks which are strictly coupled, i.e. have a constant frequency ratio and a constant offset. And that is enforced: /* * Verify that the clocksource associated with the captured * system counter value is the same as the currently installed * timekeeper clocksource */ if (tk->tkr_mono.clock != system_counterval.cs) return -ENODEV; And that's not broken, that merily makes sure that the function CAN provide valid data and is not trying to correlate stuff which is not coupled. > Now, such cross-timestamp functionality is exactly what > "read_with_stamp" provides. No, it's a hack. > So, if history_begin _is_ broken, after fixing it (with > "read_with_stamp" or otherwise) ktime_get_snapshot would provide KVM > with everything it needs. If not, I'd be completely off base, and I'd > have to apologize to Denis for screwing up. I told you folks before that you are trying to abuse infrastructure which was designed for a different purpose. Now you're claiming it's broken. It's not broken, it works perfectly fine for the intended use cases. It correlates hardware captured timestamps with system time. One particular use case is high precision PTP. The network card can capture both PTP time and ART time atomically. ART and TSC have a constant ratio and constant offset. The cross time stamp machinery relates that values, which might be outside of the current timekeeping window, back to CLOCK_MONOTONIC and CLOCK_REALTIME. But there is no requirement that current clocksource is TSC. The requirement is: The hardware reference clock, the one which can be captured atomically with device clock (PTP, audio whatever), is the coupled clock of the current timekeeping clocksource. Both have a fixed ratio and offset. That's completely independend of TSC. TSC/ART are a particular hardware implementation which can use that infrastructure because they fulfil the requirement. So please stop these uninformed claims about brokeness and TSC requirements. Instead please sit down and figure out whether your particular use case of kvmclock/hyperv clock actually fit into that functionality, i.e. whether you have 1) 'device time' 2) 'system reference time' 3) 'system time' where #1 and #2 can be captured atomically #2 and #3 are coupled clocks with a fixed ratio and offset If those requirements are fulfilled then you can simply use it as is and it will give you CLOCK_MONOTONIC and CLOCK_REALTIME for the captured device time. If your use case is different and does not fit into that picture then please write it up clearly what you are trying to achieve and we can discuss how to make it work w/o adding duct tape hackery. Thanks, tglx