Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp1503633rdb; Mon, 2 Oct 2023 11:30:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFYvlkz23x5sUOit83xhaU/YHyC5NXB/TK+urN5RrjWmOF/jOnyZtqi6FsUduonoZy+DbkT X-Received: by 2002:a17:902:b28c:b0:1c7:398c:a437 with SMTP id u12-20020a170902b28c00b001c7398ca437mr10022910plr.69.1696271420915; Mon, 02 Oct 2023 11:30:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696271420; cv=none; d=google.com; s=arc-20160816; b=nyhTRj/ZwyAVmQtz/yWPh5bvr1ob+jERxOX27mDoMLVusDN80s3iSXPG3A8E2s27di 1bJUTJcbLKbyi7GF8v7P1rt+4rt/Nibn20d2ltDo9RLQYGdWr6m53D2pJZpAcS2BzM8Y QkTo0Z6MJQKFUBd1xpY3bZmFALwURxgX+zigP1zeiTdNXh8xFlXlLp8XS1AYw9w3PiDR ljsbUizh1VZh++AHLI+Z2OsYwh9aJHn83A2i5fZ19I71dqP6FEw53GtVrKXz+LESiooX rePPOjrwD6tFGD5UDTrY50yP1lWFD1Yt6x+mrMxLN2ZG/8V5JBSPfbuAYzbotxCjYdCk oKXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=cpsMPjbR1WNgJ/uUktgfbNndpX69oDvkpl70xKylEH8=; fh=gNpbePQzXbQLMEBQj2jLMK4sWIDr1H1/NlMn/JxE8OM=; b=C27Obhe/zT/zX/nIx4Huvyg5Z5aYCN/xKc2TQYE12AvrmFfTpCwVb9DtT4j2r3YuiX 4e+HhdcOt6FAuAixRXw54LN2zJr51DM98bxDI5nQybvOACgRdZJBBbZieCdNIchld5+G JhUoT6aEmqLSeuUWPngmJRjbNJN09tfpncxdsXIHDB1o6NOUOtrZ4kJOzlTTNeMePigK 3hDZwZdnlPdJRSHUgFy6BWlNdNcfmmYkEh8/xf2TGGCUbn0cKA41WsbHjwuYsNHTra+j i2nIu36kFSRBbIdEsG5kSLYuZCbngS1YgyK8zqx+ioK2l1/fcAP2WTjNDMy5nFEPui1h Ng0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=QDU9RwJW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id e6-20020a170902cf4600b001bd9d2e20absi15331017plg.230.2023.10.02.11.30.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Oct 2023 11:30:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=QDU9RwJW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id CC6DA80E73FF; Mon, 2 Oct 2023 08:57:06 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238304AbjJBP46 (ORCPT + 99 others); Mon, 2 Oct 2023 11:56:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238269AbjJBP45 (ORCPT ); Mon, 2 Oct 2023 11:56:57 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BD6FCE for ; Mon, 2 Oct 2023 08:56:53 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d81ff714678so24097032276.2 for ; Mon, 02 Oct 2023 08:56:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696262212; x=1696867012; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=cpsMPjbR1WNgJ/uUktgfbNndpX69oDvkpl70xKylEH8=; b=QDU9RwJW0u50FcNZZxExPPEv4dK3Fr1p/MZuwCn1Awwcr6dwkCmF0GzdiDn6X1Ls1a 9FrOsfkXc/UMWpC7ayqGFvQnaS8UDsjnVVwkOGcf1alqLj78ib1LLFC2o5xA//oQWnbi teMNTWoAuEveJJfFz9WD49zQ7krILJAHmJFFwxLp30zoy5rSScSyJe7OUnSsUhAfYm20 ktMpYDPOWmB+xkNZ7zs4ZiCGEFTthxIijuehMK3QCAMgBiqpUdK+4s2c1Pfj4kLUeMOR RvOpZvC46JEkr590M7niku2AG8CEn/Pcp7jf/SzqgpDmgEEjRUtKCy8kmmI4qzwXGyC5 stiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696262212; x=1696867012; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cpsMPjbR1WNgJ/uUktgfbNndpX69oDvkpl70xKylEH8=; b=sAFlK7Bssy12RSsU8ZCGKarrFzpxHjlCaXZJnlW1m+ORuFFaPF7+TvboAdYNQcaB8X GYscD/M2Oh16p/xR1wnAMCW4Pm6lCDaqXYHB7xow+WB9IGWefFxe6Xahbyy2kBvY0zN2 ZwBsi3uqRkj1JdIFeDflwZGVMI8gBfq3iFqT1N6k+NdSnn9Wn71U9+RN1T6Su5aHg77D GcSbVshkiV0cxndbj7LdaytXF2yjC1J/KZOiaYujpRTZin1rJvrJNLEZn3TB+2Lb90nj RrmanePV4mDwJqQzUiSaFqK+M2pgelXH9h5CUPOeknEagoRBmIjLhyaiP1mxW9joNHYA HaRA== X-Gm-Message-State: AOJu0Yzs1/IS0SjCz3srZEPcrnnTArY9984WCT4a8d/+6KX2kbxJC8Kq twtHnp8CmcrOsTnaoque9ja5aRBU/hw= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:abe9:0:b0:d80:19e5:76c8 with SMTP id v96-20020a25abe9000000b00d8019e576c8mr178348ybi.12.1696262212083; Mon, 02 Oct 2023 08:56:52 -0700 (PDT) Date: Mon, 2 Oct 2023 08:56:50 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230927033124.1226509-1-dapeng1.mi@linux.intel.com> <20230927033124.1226509-8-dapeng1.mi@linux.intel.com> <20230927113312.GD21810@noisy.programming.kicks-ass.net> <20230929115344.GE6282@noisy.programming.kicks-ass.net> <20231002115718.GB13957@noisy.programming.kicks-ass.net> Message-ID: Subject: Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event From: Sean Christopherson To: Ingo Molnar Cc: Peter Zijlstra , Dapeng Mi , Paolo Bonzini , Arnaldo Carvalho de Melo , Kan Liang , Like Xu , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Zhenyu Wang , Zhang Xiong , Lv Zhiyuan , Yang Weijiang , Dapeng Mi , Jim Mattson , David Dunn , Mingwei Zhang , Thomas Gleixner Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 02 Oct 2023 08:57:07 -0700 (PDT) On Mon, Oct 02, 2023, Ingo Molnar wrote: > > * Peter Zijlstra wrote: > > > On Fri, Sep 29, 2023 at 03:46:55PM +0000, Sean Christopherson wrote: > > > > > > I will firmly reject anything that takes the PMU away from the host > > > > entirely through. > > > > > > Why? What is so wrong with supporting use cases where the platform owner *wants* > > > to give up host PMU and NMI watchdog functionality? If disabling host PMU usage > > > were complex, highly invasive, and/or difficult to maintain, then I can understand > > > the pushback. > > > > Because it sucks. > > > You're forcing people to choose between no host PMU or a slow guest PMU. Nowhere did I say that we wouldn't take patches to improve the existing vPMU support. But that's largely a moot point because I don't think it's possible to improve the current approach to the point where it would provide a performant, functional guest PMU. > > And that's simply not a sane choice for most people -- It's better than the status quo, which is that no one gets to choose, everyone gets a slow guest PMU. > > worse it's not a choice based in technical reality. The technical reality is that context switching the PMU between host and guest requires reading and writing far too many MSRs for KVM to be able to context switch at every VM-Enter and every VM-Exit. And PMIs skidding past VM-Exit adds another layer of complexity to deal with. > > It's a choice out of lazyness, disabling host PMU is not a requirement > > for pass-through. The requirement isn't passthrough access, the requirements are that the guest's PMU has accuracy that is on par with bare metal, and that exposing a PMU to the guest doesn't have a meaningful impact on guest performance. > Not just a choice of laziness, but it will clearly be forced upon users > by external entities: > > "Pass ownership of the PMU to the guest and have no host PMU, or you > won't have sane guest PMU support at all. If you disagree, please open > a support ticket, which we'll ignore." We don't have sane guest PMU support today. In the 12+ years since commit f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests"), KVM has never provided anything remotely close to a sane vPMU. It *mostly* works if host perf is quiesced, but that "good enough" approach doesn't suffice for any form of PMU usage that requires a high level of accuracy and precision. > The host OS shouldn't offer facilities that severely limit its own capabilities, > when there's a better solution. We don't give the FPU to apps exclusively either, > it would be insanely stupid for a platform to do that. The FPU can be effeciently context switched, guest state remains resident in hardware so long as the vCPU task is scheduled in (ignoring infrequrent FPU usage from IRQ context), and guest usage of the FPU doesn't require trap-and-emulate behavior in KVM. As David said, ceding the hardware PMU for all of kvm_arch_vcpu_ioctl_run() (module the vCPU task being scheduled out) is likely a viable alternative. : But it does mean that when entering the KVM run loop, the host perf system : needs to context switch away the host PMU state and allow KVM to load the guest : PMU state. And much like the FPU situation, the portion of the host kernel : that runs between the context switch to the KVM thread and VMENTER to the guest : cannot use the PMU. If y'all are willing to let KVM redefined exclude_guest to be KVM's outer run loop, then I'm all for exploring that option. But that idea got shot down over a year ago[*]. Or at least, that was my reading of things. Maybe it was just a misunderstanding because we didn't do a good job of defining the behavior. I am completely ok with either approach, but I am not ok with being nak'd on both. Because unless there's a magical third option lurking, those two options are the only ways for KVM to provide a vPMU that meets the requirements for slice-of-hardware use cases. [*] https://lore.kernel.org/all/YgPCm1WIt9dHuoEo@hirez.programming.kicks-ass.net