Received: by 2002:ac0:a679:0:0:0:0:0 with SMTP id p54csp1447114imp; Fri, 22 Feb 2019 03:47:06 -0800 (PST) X-Google-Smtp-Source: AHgI3IY54DLkdOnzCIMk9O0mCoCnoPGOJ4xTonuWQmSm1WIrEzDCNL2ECtUmwo87S+4HFWikQlfZ X-Received: by 2002:a63:c40a:: with SMTP id h10mr3648994pgd.131.1550836026375; Fri, 22 Feb 2019 03:47:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550836026; cv=none; d=google.com; s=arc-20160816; b=LLoQdTF/qnSkZiOwSWoaFyb/4ZvxLV2IpF3Yl2koEBgi26kDbRdpUGAO1yHeR4Xk4v RmSitZ7bpx25TM+ztU4LJ0GFLHXK0qUcWRWcBQ7pNGnUAxVVeySfwRBJ68HZ365fC+Rc Ql+9Y+6Bdt+qO6RBdKI3O+ziTcHY4V963MprZy7wZxeVYlO57MCtvk1YPNw13hDquBCj MnjSrGA5oG0sL/46Urmt4C4VQWeTF88gwk2LAObT+OxQPbiI7EnO7kaF9DjhYJzWTlA1 clfPrjhZYe9HhF8//6CTEubNUbCewwBwRBnvAEQCQ8ITDGefJ9iA11vw8X0BfQWFm5nn Lnsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=jWrK1hzJpet2OP1UM6nDPyD8+l1Kj4c9hvz8LD5Lrso=; b=hn1dStihkFN3wE+ov1nSRp23D1mse/vgx1B5nZuGHIgrMLseWQ7aSRsd8xowy37p1V EOkNdWe+nlQr8q9i0XYfINA+Fjls/xL02E+cRJuxsZFL2uFrOhmcYI3/uaNNDktbecIn XzRdjtoZNwsyZj+7ihb5Hml0ahsaNSmDOU0Xd/LyNoOWE4AjBxCo85qXCw2K/2BudJ7P SZ0zLjdUvBdbnzWseKVYFx+jfOYgjUeZJBHU/U3WdLLKCiZdBpcMUZsrqRu9oDN9hPtq AUI0issGQZ0Sn/svgtaxsAak5453X+slVPPpl0zR98aWDuf7N3YEQt3cvYBq/UDwS+tv abUQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v6si1181775plo.129.2019.02.22.03.46.50; Fri, 22 Feb 2019 03:47:06 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727245AbfBVLop (ORCPT + 99 others); Fri, 22 Feb 2019 06:44:45 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:54653 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727198AbfBVLoo (ORCPT ); Fri, 22 Feb 2019 06:44:44 -0500 Received: from [5.158.153.52] (helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1gx9Fv-00057O-WB; Fri, 22 Feb 2019 12:44:40 +0100 Date: Fri, 22 Feb 2019 12:44:39 +0100 (CET) From: Thomas Gleixner To: Olaf Hering cc: John Stultz , Stephen Boyd , LKML , x86@kernel.org, Paolo Bonzini Subject: Re: recalibrating x86 TSC during suspend/resume In-Reply-To: <20190222105302.GA26398@aepfle.de> Message-ID: References: <20190222105302.GA26398@aepfle.de> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 22 Feb 2019, Olaf Hering wrote: > Is there a way to recalibrate the x86 TSC during a suspend/resume cycle? No. > While the frequency will remain the same on a Laptop, it may (or rather: > it definitly will) differ if a VM is migrated from one host to another. > The hypervisor may choose to emulate the expected TSC frequency on the > destination host, but this emulation comes with a significant > performance cost. Therefore it would be good if the kernel evaluates the > environment during resume. > > The specific usecase I have is a workload within VMs that makes heavy > use of TSC. The kernel is booted with 'clocksource=tsc highres=off nohz=off' > because only this clocksource gives enough granularity. The default > paravirtualized clock will return the same values via > clock_gettime(CLOCK_MONOTONIC) if the timespan between two calls is too > short. This does not happen with 'clocksource=tsc'. > > Right now it is not possible to migrate VMs to hosts with different CPU > speeds. This leads to "islands" of identical hardware, and makes > maintenance of hosts harder than it needs to be. If the VM kernel would > be able to cope with CPU/TSC frequency changes, the pool of potential > destination hosts will become significant larger. The problem with recalibrating TSC on resume is that it would have to be 1) quick 2) accurate, so NTP does not get utterly unhappy. Newer Intels support TSC scaling for VMX, which could solve the problem. It affects TSC readout by: TSC = (read(HWTSC) * multiplier) >> 48 So you can standarize on a TSC frequency accross a fleet. Not sure when that was introduced and no idea whether it's available on AMD. For a software solution we could try the following: 1) Provide the raw TSC frequency of the host to the guest in some magic software defined MSR or CPUID. If there is an existing mechanism, use that. 2) On resume check whether the MSR/CPUID is available and if so readout that information and check whether the frequency is the same as before. If not it is trivial enough to adjust the guest mult/shift values for both raw and NTP adjusted clocks before they are used again, i.e. before timekeeping_resume(). Need to look what's the best place, but probably the clocksource resume callback. Plus if TSC deadline timer is used, we'd need the same adjustment there. That's backward compatible, because if the MSR/CPUID is not there, then the recalibration is not tried. Whether that is accurate enough or not to make NTP happy, I can't tell, but it's definitely worth a try. Thanks, tglx