Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935280AbcCPXH0 (ORCPT ); Wed, 16 Mar 2016 19:07:26 -0400 Received: from mail-oi0-f50.google.com ([209.85.218.50]:33164 "EHLO mail-oi0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932632AbcCPXHX (ORCPT ); Wed, 16 Mar 2016 19:07:23 -0400 MIME-Version: 1.0 In-Reply-To: <20160316225936.GE20310@potion.brq.redhat.com> References: <861716d768a1da6d1fd257b7972f8df13baf7f85.1449702533.git.luto@kernel.org> <20160316220502.GA7040@potion.brq.redhat.com> <20160316225936.GE20310@potion.brq.redhat.com> From: Andy Lutomirski Date: Wed, 16 Mar 2016 16:07:03 -0700 Message-ID: Subject: Re: [PATCH 1/5] x86/kvm: On KVM re-enable (e.g. after suspend), update clocks To: Radim Krcmar Cc: Andy Lutomirski , X86 ML , Marcelo Tosatti , Paolo Bonzini , "linux-kernel@vger.kernel.org" , kvm list , Alexander Graf Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3150 Lines: 70 On Wed, Mar 16, 2016 at 3:59 PM, Radim Krcmar wrote: > 2016-03-16 15:15-0700, Andy Lutomirski: >> On Wed, Mar 16, 2016 at 3:06 PM, Radim Krcmar wrote: >>> Guest TSC is going to jump backward with this patch, which would make >>> the guest think that a lot of cycles passed. This has no bearing on >>> guest timekeeping, because the guest shouldn't be using raw TSC. >>> If we wanted to do something though, there are at least two options: >>> 1) Fake that TSC continued at roughly its specified rate: compute how >>> many cycles could have elapsed while the CPU was suspended (using >>> host time before/after suspend and guest TSC frequency) and adjust >>> guest TSC. >>> 2) Resume guest TSC at its last cycle before suspend. >>> (Roughly what KVM does now.) >>> >>> What are your opinions on TSC faking? >> >> I'd suggest restarting it wherever it left off, because it's simpler. >> If there was a CLOCK_BOOT_RAW, you could try to track it, but I'm not >> sure that such a thing exists. > > CLOCK_MONOTONIC_RAW can count in suspend, so CLOCK_BOOT_RAW would be a > conditional alias and it probably doesn't exist because of that. > >> FWIW, if you ever intend to support ART ("always running timer") >> passthrough, this is going to be a giant clusterfsck. Good luck. I >> haven't gotten a straight answer as to what hardware actually supports >> that thing, so even testing isn't no easy. > > Hm, AR TSC would be best handled by doing nothing ... dropping the > faking logic just became tempting. As it stands, ART is screwed if you adjust the VMCS's tsc offset. But I think it's also screwed if you migrate to a machine with a different ratio of guest TSC ticks to host ART ticks or a different offset, because the host isn't going to do the rdmsr every time it tries to access the ART, so passing it through might require a paravirt mechanism no matter what. ISTM that, if KVM tries to keep the guest TSC monotonic across migration, it should probably also keep it monotonic across host suspend/resume. After all, host suspend/resume is kind of like migrating from the pre-suspend host to the post-resume host. Maybe it could even share code. > >>> --- >>> Btw. I'll be spending some days to decipher kvmclock, so I'd also fix >>> the masterclock+suspend issue, if you don't mind ... So far, I don't >>> even see a reason to update kvmclock on kvm_arch_hardware_enable(). >>> Suspend is a condition that we want to handle, so kvm_resume would be a >>> better place, but we handle suspend only because TSC and timekeeping has >>> changed, so I think that the right place is in their event notifiers. >> >> I'd be glad to try to review things. Please cc me. > > Ok. > >> One of the Xen people pointed me at the MS Viridian spec for handling >> TSC rate changes on migration to or from hosts that don't support TSC >> scaling. I wonder if KVM could use the same technique or even the >> same API. > > The TSC frequency MSR is read-only in Xen, so I guess it's equivalent to > pvclock. I'll take a deeper look, thanks for pointers. -- Andy Lutomirski AMA Capital Management, LLC