Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp1969980rwb; Mon, 7 Nov 2022 07:44:56 -0800 (PST) X-Google-Smtp-Source: AMsMyM5k7xQKpKyRpBjkK5DvIzwtQsHggoL3vaPIq9qiqdx+etsn/c4rIr3u16wzSSKsaIOCohDQ X-Received: by 2002:a17:907:9809:b0:7ad:d945:554f with SMTP id ji9-20020a170907980900b007add945554fmr38957031ejc.552.1667835896110; Mon, 07 Nov 2022 07:44:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667835896; cv=none; d=google.com; s=arc-20160816; b=BnKo/50otvitZ9crw23csJS0w0j/5aj7obZFk+Y8Jr3oBtQlIbRLreNQ6Vjf6jT2BB AsZjZEEADHomCIcZuXsenJcqIoPYW9Ti7KYlXj+RZ8LrlMBki7omLCGwlre1Hg+d+EDu Ra/QgMG5H+Qv5hjEJUn/Sj4soVmXLAR60iMh7SjidPOne8QKrj6zBnA+IZtfI3e8GAA9 u/zK1XtvYHO8P7BjAhG3AamnsPRqFXTgbUCq+j4MjLDsbkkUT7BozAewl7IE53DBnEZz hvr5sx8LJb6c/orevjZt86cCss+bn+J2u9qhQpRgawxMTLaUiI+tccbUnfzt62v5mkI7 43NQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:in-reply-to :subject:cc:to:from:message-id:date:dkim-signature; bh=9Lw/9ncBLDm0JdMxHazEOVUv0i5dtfaD6HspUt1uESQ=; b=EXfAEZXUBadspXWC3CUDCcnEaUnbWt3MvD7cf21fowW0dxkMLNz0FDg+W33MA+28xE XppeJappBHX4mxi8vX1AS4Z+/kebiVwEZb7qLoqZCbQ3W3ogJ2P8Chk8Ts2ns5mHaSB4 R2zbEiinic6gwvymOxItuUPOjvNHJTkk93orx/8ioJ+/tDa2WUiacDnDxwbEzRGLJACu 7qBrhdDteqIslOrjXKQz80sRrQ7jz2Yb6OMf5DEhajFAg9vyiYMwvPmKBkiY0aOjESHt Bw8ICzv6QZ5CpBiUevKPKXHXUojQIl7KtFmspG+rEH37679i5w52+HkmTcYaf+V1NCk0 ElOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nsdFJc9T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b5-20020a056402084500b004623028c586si10020281edz.141.2022.11.07.07.44.33; Mon, 07 Nov 2022 07:44:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nsdFJc9T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232284AbiKGPjQ (ORCPT + 93 others); Mon, 7 Nov 2022 10:39:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232912AbiKGPjO (ORCPT ); Mon, 7 Nov 2022 10:39:14 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 112CF1F9F5; Mon, 7 Nov 2022 07:39:11 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A275060EC9; Mon, 7 Nov 2022 15:39:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DF7BEC433D6; Mon, 7 Nov 2022 15:39:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667835550; bh=LJTIX6EOyaGeco5U/ATP5mMAgf2Y3X8oNl0EWEryn5o=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=nsdFJc9TqagjKvmFhMqOxOGFd9zPtEIUDgrKBq1eaNwvFUXm/2MT2aEvrvPfmz7J7 JIw+bt+4p8KLa/zn3pkuPjPS8R0Bh1J+xpk+eX/gVzh8hVLMkb52DFNTZXSq14utSE rzVVJDENJRiZLmhavv6oR67vreSXOvHPNGz4XhvbYqOCH8AIs9rXMVSsclitgNiG5x ZNL1FBuqhtZnRUidWMLLrEGZCt5miwTj+H9/522i6VgM2oCNT94tTEvR4849xFqzdm vUxQ0Vvsh/UD0V7Kfavqdl7K4xON3elnskuxhxpest11c2Ytv1bhlBIsg/gC2bctp8 7upVCChMDUuLA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1os4DL-004Qsn-Ss; Mon, 07 Nov 2022 15:39:08 +0000 Date: Mon, 07 Nov 2022 15:39:07 +0000 Message-ID: <86y1smpyec.wl-maz@kernel.org> From: Marc Zyngier To: Leo Yan Cc: James Morse , Alexandru Elisei , Suzuki K Poulose , Oliver Upton , Catalin Marinas , Will Deacon , Arnaldo Carvalho de Melo , John Garry , James Clark , Mike Leach , Peter Zijlstra , Ingo Molnar , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Subject: Re: [PATCH v1 3/3] perf arm64: Support virtual CPU ID for kvm-stat In-Reply-To: References: <20221105072311.8214-1-leo.yan@linaro.org> <20221105072311.8214-4-leo.yan@linaro.org> <868rkpr0mv.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: leo.yan@linaro.org, james.morse@arm.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, catalin.marinas@arm.com, will@kernel.org, acme@kernel.org, john.garry@huawei.com, james.clark@arm.com, mike.leach@linaro.org, peterz@infradead.org, mingo@redhat.com, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 07 Nov 2022 14:47:10 +0000, Leo Yan wrote: > > On Sat, Nov 05, 2022 at 01:28:40PM +0000, Marc Zyngier wrote: > > [...] > > > > Before: > > > > > > # perf kvm stat report --vcpu 27 > > > > > > Analyze events for all VMs, VCPU 27: > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > > > > > Total Samples:0, Total events handled time:0.00us. > > > > > > After: > > > > > > # perf kvm stat report --vcpu 27 > > > > > > Analyze events for all VMs, VCPU 27: > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > > > > > SYS64 808 98.54% 91.24% 0.00us 303.76us 3.46us ( +- 13.54% ) > > > WFx 10 1.22% 7.79% 0.00us 69.48us 23.91us ( +- 25.91% ) > > > IRQ 2 0.24% 0.97% 0.00us 22.64us 14.82us ( +- 52.77% ) > > > > > > Total Samples:820, Total events handled time:3068.28us. > > > > Please educate me: how useful is it to filter on a vcpu number across > > all VMs? What sense does it even make? > > Now "perf kvm" tool is not sophisticated since it doesn't capture VMID > and virtual CPU ID together. VMID is not a relevant indicator anyway, as it can change at any point. But that's only to show that everybody has a different view on what they need to collect. At which point, we need to provide an infrastructure for data extraction, and not the data itself. > I think a case is we can spin a program on a specific virtual CPU with > taskset in VM, in this way we can check if any bottleneck is caused by > VM entry/exit, but I have to say that it's inaccurate if we only filter > on VCPU ID, we should consider tracing VMID and VCPU ID together in > later's enhancement. > > > Conversely, what would be the purpose of filtering on a 5th thread of > > any process irrespective of what the process does? To me, this is the > > same level of non-sense. > > I agree. > > > AFAICT, this is just piling more arbitrary data extraction for no > > particular reason other than "just because we can", and there is > > absolutely no guarantee that this is fit for anyone else's purpose. > > > > I'd rather you have a generic tracepoint taking the vcpu as a context > > and a BPF program that spits out the information people actually need, > > keeping things out of the kernel. Or even a tracehook (like the > > scheduler does), and let people load a module to dump whatever > > information they please. > > Actually I considered three options: > > Option 1: Simply add new version's trace events for recording more info. > This is not flexible and we even have risk to add more version's trace > event if later we might find that more data should traced. > > This approach is straightforward and the implementation would be > simple. This is main reason why finally I choosed to add new trace > events. But that doesn't scale at all. > > Option 2: use Kprobe to dynamically insert tracepoints; but this means > the user must have the corresponding vmlinux file, otherwise, perf > tool might inject tracepoint at an incorrect address. This is the > main reason I didn't use Kprobe to add dynamic tracepoints. > > Option 3: As you suggested, I can bind KVM tracepoints with a eBPF > program and the eBPF program records perf events. > > When I reviewed Arm64's kvm_entry / kvm_exit trace events, they don't > have vcpu context in the arguments, this means I need to add new trace > events for accessing "vcpu" context. I'm not opposed to adding new trace{point,hook}s if you demonstrate that they are generic enough or will never need to evolve. > > Option 1 and 3 both need to add trace events; option 1 is more > straightforward solution and this is why it was choosed in current patch > set. > > I recognized that I made a mistake, actually we can modify the trace > event's definition for kvm_entry / kvm_exit, note we only modify the > trace event's arguments, this will change the trace function's > definition but it will not break ABI (the format is exactly same for > the user space). Below changes demonstrate what's my proposing: > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index 94d33e296e10..16f6b61abfec 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -917,7 +917,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > /************************************************************** > * Enter the guest > */ > - trace_kvm_entry(*vcpu_pc(vcpu)); > + trace_kvm_entry(vcpu); > guest_timing_enter_irqoff(); > > ret = kvm_arm_vcpu_enter_exit(vcpu); > diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h > index 33e4e7dd2719..9df4fd30093c 100644 > --- a/arch/arm64/kvm/trace_arm.h > +++ b/arch/arm64/kvm/trace_arm.h > @@ -12,15 +12,15 @@ > * Tracepoints for entry/exit to guest > */ > TRACE_EVENT(kvm_entry, > - TP_PROTO(unsigned long vcpu_pc), > - TP_ARGS(vcpu_pc), > + TP_PROTO(struct kvm_vcpu *vcpu), > + TP_ARGS(vcpu), > > TP_STRUCT__entry( > __field( unsigned long, vcpu_pc ) > ), > > TP_fast_assign( > - __entry->vcpu_pc = vcpu_pc; > + __entry->vcpu_pc = *vcpu_pc(vcpu); > ), > > TP_printk("PC: 0x%016lx", __entry->vcpu_pc) > > Please let me know your opinion, if you don't object, I can move > forward with this approach. I have no issue with this if this doesn't change anything else. And if you can make use of this with a BPF program and get to the same result as your initial patch, then please submit it for inclusion in the kernel as an example. We can then point people to it next time this crop up (probably before Xmas). Thanks, M. -- Without deviation from the norm, progress is not possible.