Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5408BC433EF for ; Fri, 19 Nov 2021 13:31:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3A32761AD0 for ; Fri, 19 Nov 2021 13:31:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235465AbhKSNeR (ORCPT ); Fri, 19 Nov 2021 08:34:17 -0500 Received: from mail.kernel.org ([198.145.29.99]:34442 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234547AbhKSNeQ (ORCPT ); Fri, 19 Nov 2021 08:34:16 -0500 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9EF9160F4F; Fri, 19 Nov 2021 13:31:14 +0000 (UTC) Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mo3yy-006YGs-At; Fri, 19 Nov 2021 13:31:12 +0000 Date: Fri, 19 Nov 2021 13:31:11 +0000 Message-ID: <87czmw6zmo.wl-maz@kernel.org> From: Marc Zyngier To: Marcelo Tosatti Cc: Nicolas Saenz Julienne , linux-arm-kernel@lists.infradead.org, rostedt@goodmis.org, james.morse@arm.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com, catalin.marinas@arm.com, will@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu, mingo@redhat.com, nilal@redhat.com Subject: Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs In-Reply-To: <20211119125946.GA57544@fuller.cnet> References: <20211119102117.22304-1-nsaenzju@redhat.com> <20211119102117.22304-3-nsaenzju@redhat.com> <87fsrs732b.wl-maz@kernel.org> <20211119125946.GA57544@fuller.cnet> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: mtosatti@redhat.com, nsaenzju@redhat.com, linux-arm-kernel@lists.infradead.org, rostedt@goodmis.org, james.morse@arm.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com, catalin.marinas@arm.com, will@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu, mingo@redhat.com, nilal@redhat.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 19 Nov 2021 12:59:46 +0000, Marcelo Tosatti wrote: > > On Fri, Nov 19, 2021 at 12:17:00PM +0000, Marc Zyngier wrote: > > On Fri, 19 Nov 2021 10:21:18 +0000, > > Nicolas Saenz Julienne wrote: > > > > > > While using cntvct as the raw clock for tracing, it's possible to > > > synchronize host/guest traces just by knowing the virtual offset applied > > > to the guest's virtual counter. > > > > > > This is also the case on x86 when TSC is available. The offset is > > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's > > > implement the same for arm64. > > > > How does this work with NV, where the guest hypervisor is in control > > of the virtual offset? How does userspace knows which vcpu to pick so > > that it gets the right offset? > > On x86, the offsets for different vcpus are the same due to the logic at > kvm_synchronize_tsc function: > > During guest vcpu creation, when the TSC-clock values are written > in a short window of time (or the clock value is zero), the code uses > the same TSC. > > This logic is problematic (since "short window of time" is a heuristic > which can fail), and is being replaced by writing the same offset > for each vCPU: > > commit 828ca89628bfcb1b8f27535025f69dd00eb55207 > Author: Oliver Upton > Date: Thu Sep 16 18:15:38 2021 +0000 > > KVM: x86: Expose TSC offset controls to userspace > > To date, VMM-directed TSC synchronization and migration has been a bit > messy. KVM has some baked-in heuristics around TSC writes to infer if > the VMM is attempting to synchronize. This is problematic, as it depends > on host userspace writing to the guest's TSC within 1 second of the last > write. > > A much cleaner approach to configuring the guest's views of the TSC is to > simply migrate the TSC offset for every vCPU. Offsets are idempotent, > and thus not subject to change depending on when the VMM actually > reads/writes values from/to KVM. The VMM can then read the TSC once with > KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when > the guest is paused. > > So with that in place, the answer to > > How does userspace knows which vcpu to pick so > that it gets the right offset? > > is any vcpu, since the offsets are the same. As I just said above, this assertion doesn't hold true once you have nested virt, because the offset is per-cpu, and is adjusted to mean different things on different hypervisors (some hypervisors expose stolen time through it, for example). What this patch is doing is to expose a Linux-specific behaviour, and try to derive properties from it. It really doesn't work in general. > > > I also wonder why we need this when userspace already has direct > > access to that information without any extra kernel support (read the > > CNTVCT view of the vcpu using the ONEREG API, subtract it from the > > host view of the counter, job done). > > If guest has access to the clock offset (between guest and host), then > in the guest: > > clockval = hostclockval - clockoffset > > Adding "clockoffset" to that will retrieve the host clock. > > Is that what you mean? No. The *VMM* (qemu, kvmtool, crosvm, insertyourfavouriteonehere) has already access to it. Why do we need an extra interface? M. -- Without deviation from the norm, progress is not possible.