Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp980193rdg; Fri, 13 Oct 2023 07:02:02 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGBU+ZtluxBQu1ri5/wbWr450TwJUiBa37WPNxOHu/vtsDjNTwgeF1ywro9E/zVqoU3G80X X-Received: by 2002:a05:6a00:3995:b0:68c:57c7:1eb0 with SMTP id fi21-20020a056a00399500b0068c57c71eb0mr32379803pfb.11.1697205721930; Fri, 13 Oct 2023 07:02:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697205721; cv=none; d=google.com; s=arc-20160816; b=lx2WSrBAuwd5vQXECOdho78h7ud4nn9iItkK7hRXZr50gIHR/U/7KkEHVMMfCbyTl6 Ewux52eSExE5ElOiGXs7xXH62qN3usPY4pKSMatQCNK+SNnuWBLSRvnHuQprQIIEP9oD nLqnrOz17lQ3KiY8Xi7YeEEq3sjUozUioYhke9en1auN4U2p0yM2mIvkzWLSPSqruFgA L0wSUUazTmE1AydkEWYNkC0A+dsQcJJCCcFxbvBQuhj3YfFgMQqXYU7e78AknWIwkX9l rZaTm5PFkrOSO73U/bcMRoHj4+1NwCSfVWs2BuTBZ+ElxeanI4U2p98KJ+LWD1M4aJtK l43Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=t/dJ/AmsPMZwxIloWZELnhXJx6FkGj16RRlZNqY2Qp8=; fh=ry+8jpmGGEQeF5ycI5KNumCiQcmQXjn/jKwclWORw/k=; b=TH8y/dvtWNPnUZTvNfYZT1EG6HCm6YaAJbfzEZTjCMS5rol5V2Mrb/WwQ9my+x0zia A5nFnht0lS26DSZ3oSTfj7qUuaiPV9RK/O0h9JozTYz9C3+Q10X5UYHF9iru1hcPkWvQ YFuiynnMhz4zyYtPXCDOdbvTqMFS52bN1LVfNyS/0WPHHJwnBV87Va8JXj3wj0F77d1e UZOqdWFtpIumizovtjedeyS1voBVlgLuRZIwGfUpuCe7YL2SWoKXQ9/FC2o6kxiUzBwW hT8jGnSpkJMc6/rGkjuCKKzouipRZPA2IJt80ZAcoljmO8sF62xztwdHxihCWTCsUT/R 3BHQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id bo12-20020a056a000e8c00b0068fcc84dda7si15832512pfb.327.2023.10.13.07.02.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Oct 2023 07:02:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 7BBCC806D7CA; Fri, 13 Oct 2023 07:01:54 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232000AbjJMOBb (ORCPT + 99 others); Fri, 13 Oct 2023 10:01:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54466 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229688AbjJMOB3 (ORCPT ); Fri, 13 Oct 2023 10:01:29 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6310F95; Fri, 13 Oct 2023 07:01:27 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8872E11FB; Fri, 13 Oct 2023 07:02:07 -0700 (PDT) Received: from FVFF77S0Q05N.cambridge.arm.com (FVFF77S0Q05N.cambridge.arm.com [10.1.34.145]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C2A553F762; Fri, 13 Oct 2023 07:01:24 -0700 (PDT) Date: Fri, 13 Oct 2023 15:01:22 +0100 From: Mark Rutland To: Tianyi Liu Cc: maz@kernel.org, acme@kernel.org, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, irogers@google.com, jolsa@kernel.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, mingo@redhat.com, namhyung@kernel.org, pbonzini@redhat.com, peterz@infradead.org, seanjc@google.com, x86@kernel.org Subject: Re: [PATCH v2 0/5] perf: KVM: Enable callchains for guests Message-ID: References: <8734yhm7km.wl-maz@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 13 Oct 2023 07:01:54 -0700 (PDT) On Thu, Oct 12, 2023 at 02:35:42PM +0800, Tianyi Liu wrote: > Hi Marc, > > On Sun, 11 Oct 2023 16:45:17 +0000, Marc Zyngier wrote: > > > The event processing flow is as follows (shown as backtrace): > > > #0 kvm_arch_vcpu_get_frame_pointer / kvm_arch_vcpu_read_virt (per arch) > > > #1 kvm_guest_get_frame_pointer / kvm_guest_read_virt > > > > > > #2 perf_guest_get_frame_pointer / perf_guest_read_virt > > > #3 perf_callchain_guest > > > #4 get_perf_callchain > > > #5 perf_callchain > > > > > > Between #0 and #1 is the interface between KVM and the arch-specific > > > impl, while between #1 and #2 is the interface between Perf and KVM. > > > The 1st patch implements #0. The 2nd patch extends interfaces between #1 > > > and #2, while the 3rd patch implements #1. The 4th patch implements #3 > > > and modifies #4 #5. The last patch is for userspace utils. > > > > > > Since arm64 hasn't provided some foundational infrastructure (interface > > > for reading from a virtual address of guest), the arm64 implementation > > > is stubbed for now because it's a bit complex, and will be implemented > > > later. > > > > I hope you realise that such an "interface" would be, by definition, > > fragile and very likely to break in a subtle way. The only existing > > case where we walk the guest's page tables is for NV, and even that is > > extremely fragile. > > For walking the guest's page tables, yes, there're only very few > use cases. Most of them are used in nested virtualization and XEN. The key point isn't the lack of use cases; the key point is that *this is fragile*. Consider that walking guest page tables is only safe because: (a) The walks happen in the guest-physical / intermiediate-physical address space of the guest, and so are not themselves subject to translation via the guest's page tables. (b) Special traps were added to the architecture (e.g. for TLB invalidation) which allow the host to avoid race conditions when the guest modifies page tables. For unwind we'd have to walk structures in the guest's virtual address space, which can change under our feet at any time the guest is running, and handling that requires much more care. I think this needs a stronger justification, and an explanation of how you handle such races. Mark. > > Given that, I really wonder why this needs to happen in the kernel. > > Userspace has all the required information to interrupt a vcpu and > > walk its current context, without any additional kernel support. What > > are the bits here that cannot be implemented anywhere else? > > Thanks for pointing this out, I agree with your opinion. > Whether it's walking guest's contexts or performing an unwind, > user space can indeed accomplish these tasks. > The only reasons I see for implementing them in the kernel are performance > and the access to a broader range of PMU events. > > Consider if I were to implement these functionalities in userspace: > I could have `perf kvm` periodically access the guest through the KVM API > to retrieve the necessary information. However, interrupting a VCPU > through the KVM API from user space might introduce higher latency > (not tested specifically), and the overhead of syscalls could also > limit the sampling frequency. > > Additionally, it seems that user space can only interrupt the VCPU > at a certain frequency, without harnessing the richness of the PMU's > performance events. And if we incorporate the logic into the kernel, > `perf kvm` can bind to various PMU events and sample with a faster > performance in PMU interrupts. > > So, it appears to be a tradeoff -- whether it's necessary to introduce > more complexity in the kernel to gain access to a broader range and more > precise performance data with less overhead. In my current use case, > I just require simple periodic sampling, which is sufficient for me, > so I'm open to both approaches. > > > > Tianyi Liu (5): > > > KVM: Add arch specific interfaces for sampling guest callchains > > > perf kvm: Introduce guest interfaces for sampling callchains > > > KVM: implement new perf interfaces > > > perf kvm: Support sampling guest callchains > > > perf tools: Support PERF_CONTEXT_GUEST_* flags > > > > > > arch/arm64/kvm/arm.c | 17 +++++++++ > > > > Given that there is more to KVM than just arm64 and x86, I suggest > > that you move the lack of support for this feature into the main KVM > > code. > > Currently, sampling for KVM guests is only available for the guest's > instruction pointer, and even the support is limited, it is available > on only two architectures (x86 and arm64). This functionality relies on > a kernel configuration option called `CONFIG_GUEST_PERF_EVENTS`, > which will only be enabled on x86 and arm64. > Within the main KVM code, these interfaces are enclosed within > `#ifdef CONFIG_GUEST_PERF_EVENTS`. Do you think these are enough? > > Best regards, > Tianyi Liu