Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753029AbdHBRLt (ORCPT ); Wed, 2 Aug 2017 13:11:49 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42350 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751923AbdHBRLq (ORCPT ); Wed, 2 Aug 2017 13:11:46 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 02803356C0 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=pbonzini@redhat.com Subject: Re: [PATCH v4 00/10] make L2's kvm-clock stable, get rid of pvclock_gtod_copy in KVM To: John Stultz , Denis Plotnikov Cc: Radim Krcmar , kvm list , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , lkml , "x86@kernel.org" , rkagan@virtuozzo.com, den@virtuozzo.com, Marcelo Tosatti References: <1501684690-211093-1-git-send-email-dplotnikov@virtuozzo.com> From: Paolo Bonzini Message-ID: Date: Wed, 2 Aug 2017 19:11:39 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 02 Aug 2017 17:11:46 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3566 Lines: 79 On 02/08/2017 18:49, John Stultz wrote: > On Wed, Aug 2, 2017 at 7:38 AM, Denis Plotnikov > wrote: >> V4: >> * removed "is stable" function with vague definition of stability >> there is the only function which does time with cycle stamp getting >> * some variables renamed >> * some patches split into smaller once >> * atomic64_t usage is replaced with atomic_t >> >> V3: >> Changing the timekeeper interface for clocksource reading looks like >> an overkill to achive the goal of getting cycles stamp for KVM. >> Instead extend the timekeeping interface and add functions which provide >> necessary data: read clocksource with cycles stamp, check whether the >> clock source is stable. >> >> Use those functions and improve existing timekeeper functionality to >> replace pvclock_gtod_copy scheme in masterclock data calculation. >> >> V2: >> The main goal is to make L2 kvm-clock be stable when it's running over L1 >> with stable kvm-clock. >> >> The patch series is for x86 architecture only. If the series is approved >> I'll do changes for other architectures but I don't have an ability to >> compile and check for every single on (help needed) >> >> The patch series do the following: >> >> * change timekeeper interface to get cycles stamp value from >> the timekeeper >> * get rid of pvclock copy in KVM by using the changed timekeeper >> interface: get time and cycles right from the timekeeper >> * make KVM recognize a stable kvm-clock as stable clocksource >> and use the KVM masterclock in this case, which means making >> L2 stable when running over stable L1 kvm-clock > > So, from a brief skim, I'm not a big fan of this patchset. Though this > is likely in part due to that I haven't seen anything about *why* > these changes are needed. >From my selfish KVM maintainer point of view, one advantage is that it drops knowledge of internal timekeeping functioning from KVM, using ktime_get_snapshot instead. These are patches 1-5. Structuring the series like this was my idea so I take the blame. As to patches 6-10, KVM is currently only able to provide vsyscalls if the host is using the TSC. However, when using nested virtualization you have L0: bare-metal hypervisor (uses TSC) L1: nested hypervisor (uses kvmclock, can use vsyscall) L2: nested guest and L2 cannot use vsyscall because it is not using the TSC. This series lets you use the vsyscall in L2 as long as L1 can. There is one point where I couldn't help Denis as much as I wanted. That's a definition of what's a "good" clocksource that can be used by KVM to provide the vsyscall. I know why the patch is correct, but I couldn't really define the concept. In ktime_get_snapshot and struct system_counterval_t's users, they seem to use "cycles" to map from TSC to ART; this is not unlike kvmclock's use of "cycles" to map from TSC to nanoseconds at an origin point. However, it's not clear to me whether "cycles" may be used by adjust_historical_crosststamp even for non-TSC clocksources (or non-kvmclock after this series). It doesn't help that adjust_historical_crosststamp is essentially dead code, since get_device_system_crosststamp is always called with a NULL history argument. I'm also CCing Marcelo who wrote the KVM vsyscall code. Paolo > Can you briefly explain the issue you're trying to solve, and why you > think this approach is the way to go? > (Its usually a good idea to have such rational included in the patchset)