Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp3008017rdh; Wed, 27 Sep 2023 22:40:12 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF35U9Nn11QgOZyF2oZnvwM9VNUA0+6n7p7p6FqyY6vDE5OFfsU/uT0AhbVN/8rLa/xKvne X-Received: by 2002:a17:90a:2c48:b0:268:352c:9d13 with SMTP id p8-20020a17090a2c4800b00268352c9d13mr246024pjm.0.1695879612303; Wed, 27 Sep 2023 22:40:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695879612; cv=none; d=google.com; s=arc-20160816; b=bTveQTn8ed2/o6CwYX8QRERTOn78UWXcJ2Zaz89nV6yGTbAHNFTBtyIST+jbv9/R35 hQGYiZ9fzffmuvKvYhWmKCS9jQcvUb7gmT6tqoC1o0DGSMsjP4gNx9RuJsF4waO1FhW+ JXxemBWMBERtJnY+VhvICS7CneiSitWIa3PnUOSdqfRwGDQVz/XzO54FkEZmuokmbpVd U0tJF24DT79SlxrCBwCmtE1norOG6CIITudvzlq0AybbL5pSia948w8ZWGrzvWmgRQ93 MMaSfubVQg8bMeDsKPHFlQEYN2p/IxxmPj/gbemWdL4jpNSfrX3PidiLgZ8mI/9LTuEF u+0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=45/FQrKeL8Udcs0UjyYMLebUZXfqqT6VPGukdHqASBY=; fh=WxMPf1hi5lPwPwjcz/1kHoa2nh8LHDgyN77OeUTa6Ys=; b=qeT5Ze2wkEeOI1tputXL0NV8rie5kdl/TV4mGt8kh5xUM04W5HIJpqxEDdeehXH9Ez CH1HYDg9n5KeIKtsBnety6QTRAkuhySaG5fDKNfoKzMHqMnslv1Au6Qrc5NN1s0xO27S /IvwMZwiYkS7K3R464XpTxSlRlDI9VqEGtjwlAmkbLA316vKq9hJ5LnNI2PcVZeOi54y rfoRP+1yEHAKj8IENoP13212eC5GbUj9xTkB7Ov2K2A83JW+B2eWJeQ0SAmzJ6TNe1LA ByjrrOjkWTswSsEmroA3DAwcCoSpJ7+LzMmhK1YSpStLKMGAStv1uQVyIMJk83M53tev I2PA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="pxkIsUL/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id t2-20020a17090ae50200b002774daac5e7si10232401pjy.0.2023.09.27.22.40.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 22:40:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="pxkIsUL/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 9B9FA8021393; Wed, 27 Sep 2023 10:27:28 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229561AbjI0R1N (ORCPT + 99 others); Wed, 27 Sep 2023 13:27:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229450AbjI0R1M (ORCPT ); Wed, 27 Sep 2023 13:27:12 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36090EB for ; Wed, 27 Sep 2023 10:27:10 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id d9443c01a7336-1c43cd8b6cbso121200475ad.0 for ; Wed, 27 Sep 2023 10:27:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695835629; x=1696440429; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=45/FQrKeL8Udcs0UjyYMLebUZXfqqT6VPGukdHqASBY=; b=pxkIsUL/CYKNNtXcZ54jqEaRWNI1bRSTOtQ6roFCrUvxWV9z+gRVpfk81GbCzvJNUH gUqzv6OJdS1w6EKg5cP8ZnFGhFoU7A0uvR3Ok/cnerZmRjIwkJb58MY+qAbUZvjzPB9q ha5MUQUXsSfXOdj/dt9aoAUtuEuPCxxrnxBlj62vaq6X+W46VbEaQ1nE7AXyaTAN6+P+ m7gD3ZK/P4XZsF71AgkhcU1DjgEzLC6Eq9Y3teJ7l2fOvaH7TKFJbSZEnEhc1lxhOeQc zdIPFIGEM0fm5UMD93MU6o2kBdyIv5klLMPtsXBxYP8OQVZ1i+Rc9MjcXOEnA0yaXDr6 MiJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695835629; x=1696440429; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=45/FQrKeL8Udcs0UjyYMLebUZXfqqT6VPGukdHqASBY=; b=KZqSYmXDLTrodvI0DVQMWGd1Z4LKNOfe7a2QnyeTF8VFc/NgBi8zg7RUTdLEnYWc1/ eZ38H/Bu7Wkl0j5sSJNSpJ8IJWO1brLyotRya8GFFbYbtbSVbOXqbfb8L1qSMMK5Ln4T Q4Ip+x9/flpnsAWTDD6PfWYUvusmOqwSSldbpZSngpU0q1aGsXe+/8Sj7p89+BsHkzm4 4kzYGLJIDLK6km5K5WobOC7gdnPyl62IW0W9sqdh8GPFOW5lyyRiYK8XF7VTGDpVJzpZ Q+Bw/ZM/Wnas3cgkGT6X/dxu/zURHUN6JfbYt2CA0emeioxm8tY9pmnHp/QNv+pBuiLH xxww== X-Gm-Message-State: AOJu0YwJnuM/kGp3bR/WWHbzdxyiryBhgIIUmYNSgfd3RNQfz4f5cwoA Y8G/BriSP3utx2TdnNO0JIbT8bmMcXI= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:d501:b0:1c7:217c:3e4b with SMTP id b1-20020a170902d50100b001c7217c3e4bmr37979plg.5.1695835629656; Wed, 27 Sep 2023 10:27:09 -0700 (PDT) Date: Wed, 27 Sep 2023 10:27:07 -0700 In-Reply-To: <20230927113312.GD21810@noisy.programming.kicks-ass.net> Mime-Version: 1.0 References: <20230927033124.1226509-1-dapeng1.mi@linux.intel.com> <20230927033124.1226509-8-dapeng1.mi@linux.intel.com> <20230927113312.GD21810@noisy.programming.kicks-ass.net> Message-ID: Subject: Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event From: Sean Christopherson To: Peter Zijlstra Cc: Dapeng Mi , Paolo Bonzini , Arnaldo Carvalho de Melo , Kan Liang , Like Xu , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Zhenyu Wang , Zhang Xiong , Lv Zhiyuan , Yang Weijiang , Dapeng Mi , Jim Mattson , David Dunn , Mingwei Zhang Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Wed, 27 Sep 2023 10:27:28 -0700 (PDT) +Jim, David, and Mingwei On Wed, Sep 27, 2023, Peter Zijlstra wrote: > On Wed, Sep 27, 2023 at 11:31:18AM +0800, Dapeng Mi wrote: > > When guest wants to use PERF_METRICS MSR, a virtual metrics event needs > > to be created in the perf subsystem so that the guest can have exclusive > > ownership of the PERF_METRICS MSR. > > Urgh, can someone please remind me how all that is supposed to work > again? The guest is just a task that wants the event. If the > host creates a CPU event, then that gets scheduled with higher priority > and the task looses out, no joy. > > So you cannot guarantee the guest gets anything. > > That is, I remember we've had this exact problem before, but I keep > forgetting how this all is supposed to work. I don't use this virt stuff > (and every time I try qemu arguments defeat me and I give up in > disgust). I don't think it does work, at least not without a very, very carefully crafted setup and a host userspace that knows it must not use certain aspects of perf. E.g. for PEBS, if the guest virtual counters don't map 1:1 to the "real" counters in hardware, KVM+perf simply disables the counter. And for top-down slots, getting anything remotely accurate requires pinning vCPUs 1:1 with pCPUs and enumerating an accurate toplogy to the guest: The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core, in processors that support Intel Hyper-Threading Technology. Jumping the gun a bit (we're in the *super* early stages of scraping together a rough PoC), but I think we should effectively put KVM's current vPMU support into maintenance-only mode, i.e. stop adding new features unless they are *very* simple to enable, and instead pursue an implementation that (a) lets userspace (and/or the kernel builder) completely disable host perf (or possibly just host perf usage of the hardware PMU) and (b) let KVM passthrough the entire hardware PMU when it has been turned off in the host. I.e. keep KVM's existing best-offset vPMU support, e.g. for setups where the platform owner is also the VM ueer (running a Windows VM on a Linux box, hosting a Linux VM in ChromeOS, etc...). But for anything advanced and for hard guarantees, e.g. cloud providers that want to expose fully featured vPMU to customers, force the platform owner to choose between using perf (or again, perf with hardware PMU) in the host, and exposing the hardware PMU to the guest. Hardware vendors are pushing us in the direction whether we like it or not, e.g. SNP and TDX want to disallow profiling the guest from the host, ARM has an upcoming PMU model where (IIUC) it can't be virtualized without a passthrough approach, Intel's hybrid CPUs are a complete trainwreck unless vCPUs are pinned, and virtualizing things like top-down slots, PEBS, and LBRs in the shared model requires an absurd amount of complexity throughout the kernel and userspace. Note, a similar idea was floated and rejected in the past[*], but that failed proposal tried to retain host perf+PMU functionality by making the behavior dynamic, which I agree would create an awful ABI for the host. If we make the "knob" a Kconfig or kernel param, i.e. require the platform owner to opt-out of using perf no later than at boot time, then I think we can provide a sane ABI, keep the implementation simple, all without breaking existing users that utilize perf in the host to profile guests. [*] https://lore.kernel.org/all/CALMp9eRBOmwz=mspp0m5Q093K3rMUeAsF3vEL39MGV5Br9wEQQ@mail.gmail.com