Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C12EC433EF for ; Tue, 23 Nov 2021 11:09:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235703AbhKWLMb (ORCPT ); Tue, 23 Nov 2021 06:12:31 -0500 Received: from mail.kernel.org ([198.145.29.99]:53946 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235114AbhKWLM3 (ORCPT ); Tue, 23 Nov 2021 06:12:29 -0500 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8987261059; Tue, 23 Nov 2021 11:09:21 +0000 (UTC) Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mpTfr-007GDl-D2; Tue, 23 Nov 2021 11:09:19 +0000 Date: Tue, 23 Nov 2021 11:09:18 +0000 Message-ID: <87lf1fcen5.wl-maz@kernel.org> From: Marc Zyngier To: Nicolas Saenz Julienne Cc: linux-arm-kernel@lists.infradead.org, rostedt@goodmis.org, james.morse@arm.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com, catalin.marinas@arm.com, will@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu, mingo@redhat.com, mtosatti@redhat.com, nilal@redhat.com Subject: Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs In-Reply-To: <0e948a211bd8d63ba05594fb8c03bf3a77a227a0.camel@redhat.com> References: <20211119102117.22304-1-nsaenzju@redhat.com> <20211119102117.22304-3-nsaenzju@redhat.com> <87fsrs732b.wl-maz@kernel.org> <0e948a211bd8d63ba05594fb8c03bf3a77a227a0.camel@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: nsaenzju@redhat.com, linux-arm-kernel@lists.infradead.org, rostedt@goodmis.org, james.morse@arm.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com, catalin.marinas@arm.com, will@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu, mingo@redhat.com, mtosatti@redhat.com, nilal@redhat.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 22 Nov 2021 20:40:52 +0000, Nicolas Saenz Julienne wrote: > > Hi Marc, thanks for the review. > > On Fri, 2021-11-19 at 12:17 +0000, Marc Zyngier wrote: > > On Fri, 19 Nov 2021 10:21:18 +0000, > > Nicolas Saenz Julienne wrote: > > > > > > While using cntvct as the raw clock for tracing, it's possible to > > > synchronize host/guest traces just by knowing the virtual offset applied > > > to the guest's virtual counter. > > > > > > This is also the case on x86 when TSC is available. The offset is > > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's > > > implement the same for arm64. > > > > How does this work with NV, where the guest hypervisor is in control > > of the virtual offset? > > TBH I handn't thought about NV. Looking at it from that angle, I now see my > approach doesn't work on hosts that use CNTVCT (regardless of NV). Upon > entering into a guest, we change CNTVOFF before the host is done with tracing, > so traces like 'kvm_entry' will have weird timestamps. I was just lucky that > the hosts I was testing with use CNTPCT. There are multiple things at play here: - if the system is a host, the kernel will use CNTPCT. Userspace will still use CNTVCT, and the offset is guaranteed to be 0 *when running userspace*. - if the system isn't a host (which doesn't necessarily means a guest), CNTVCT is the only thing that is being used, and the offset is unknown (Linux requires it to be constant across vcpus though). So I doubt you'd get a bad timestamp on the host. It is just that you have named your trace clock incorrectly (and Steven's idea of an indirected clock could help here). > I believe the solution would be to be able to force a 0 offset between > guest/host. With that in mind, is there a reason why kvm_timer_vcpu_init() > imposes a non-zero one by default? I checked out the commits that introduced > that code, but couldn't find a compelling reason. VMMs can always change it > through KVM_REG_ARM_TIMER_CNT afterwards. We want to minimise the chance for an observable rollover of the virtual counter, so time starts at 0 *in the guest*. The VMM can change the view of that time for the purpose of migration. If you want a 0 offset, set the counter to the physical value in the VMM (imprecise) or have a look at Oliver Upton's patches that were allowing an offset to be specified directly. But migration, by definition, breaks this. > > > I also wonder why we need this when userspace already has direct access to > > that information without any extra kernel support (read the CNTVCT view of > > the vcpu using the ONEREG API, subtract it from the host view of the counter, > > job done). > > Well IIUC, you're at the mercy of how long it takes to return from the ONEREG > ioctl. The results will be skewed. For some workloads, where low latency is > key, we really need high precision traces in the order of single digit us or > even 100s of ns. I'm not sure you'll be able to get there with that approach. The PTP clock does exactly that from the guest PoV, with a lot more overhead, and this results in single digit ns precision. Why isn't that possible from userspace? Thanks, M. -- Without deviation from the norm, progress is not possible.