Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp3413614rdh; Thu, 28 Sep 2023 10:49:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEZ2dcqQFWbWyWeBjrT7fNwOSMkzXyth/z6DoJgUZQJvILh745uP2yT9wYyh0HoTclGkPgT X-Received: by 2002:a05:6358:4187:b0:143:4fd:6001 with SMTP id w7-20020a056358418700b0014304fd6001mr1931214rwc.21.1695923395398; Thu, 28 Sep 2023 10:49:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695923395; cv=none; d=google.com; s=arc-20160816; b=v0T9PPin9+updaQSW0vOqLygwMXLz/HO1upKSR2N6Zlza4h6lfhaxK26OkLUFYab1J e8q0UFhT+2Kc0iEpBhMI4xRjMq0cjq5O79WR7WGweq1q/MuqvfIuy3uMo0e2LANnpm4T jfqTTszSmjLyyYBMuuJRxJaER/qqxqmSoJdw/yoskGV1Ei8yKNO9EQSgld8oZmmrOlGn 8UKZe9yXCCneS87rLCqWNzcIu1giboGFtvxUgVrJeWNDr4ZvRv8iBr6WhYChczf88HZi OegGy48TpudqDxbZBy7uVKdzRcc9hDnQXx1NAjriv2yCpJk90hwS0/Zd9f+X999fHHlH Qf5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=QGOQXQtRq7unKy3CLJxbzFc4NkQO4Q1outkZXqQcj34=; fh=M3QXXtlwYsHIZEv4tTKXDTbypyRCZvKMx7zqteHHCmg=; b=zwOoj6GOWOxrMBrtHv4T8n995LcC9VpGAkbO9zXPZB9DEAs6RIAp1RbsmeqYHmx7pj 5RJyaflAIgirCASBhTa/vFIbDy+Y1LFlKx1CNvD/Sk5nnkr7NBRJMGPMStM8QQiddVXH ezRfMQo4dM7UrHYiEnTDrLq5RAHDN3rbsb1Qa3nm1VG448PNgoTfYIUcwf79W/izNy59 vuYQt1cJZrX+ZezxnBfyxT1YmhV0Fx7aWGrt2E5Kno6KVUVw5GQ9N2+KsdAm2sv9ihPi H2S3J5enmyp/j+ipynREhfZ6aj8w9NgM5vmzeNu6cwlzJRhlVwdqXku9GLEh4UkyqPhW X0SQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ZPKKdE9E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id g25-20020a633759000000b00578b952e954si7357895pgn.112.2023.09.28.10.49.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 10:49:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ZPKKdE9E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 99855801B8B8; Thu, 28 Sep 2023 02:25:19 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231800AbjI1JZL (ORCPT + 99 others); Thu, 28 Sep 2023 05:25:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231710AbjI1JZE (ORCPT ); Thu, 28 Sep 2023 05:25:04 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 805381AB; Thu, 28 Sep 2023 02:24:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695893095; x=1727429095; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=d3HSJgxmIHVNYhHahBnx8D0eOMl0pZAVqEK+uI2PB0o=; b=ZPKKdE9E6/iTkSe5bx+jIjCoLxJXS4atEM6mp7xv1j6HYuKLFPMn3D4h eNHuMZwLEayQ557Hf9o3/7gca6ih4VebgRCLSDvTnXLgKl2GDGmykACGu T1uH4aVks6uhUlrggTSHsYL8TANy8ON/xktCujXLrFmzQ/DafLUAoanOr jQHPWJey6rBRdIfZdWwO98G7ShoN13FSPAS8eTRUBaIVLBoW+xDiKQmIp T5iPnKZoq5zEfO20GVuO6ImyP5ciKsqfkDYl1oM5yViKfC5bKHvotszgz +mZZPgwQcQaGBhYeaqnpT573RPYYg4dBNok6UxaD4+IYELyRPNv0XBdLB g==; X-IronPort-AV: E=McAfee;i="6600,9927,10846"; a="361393494" X-IronPort-AV: E=Sophos;i="6.03,183,1694761200"; d="scan'208";a="361393494" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Sep 2023 02:24:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10846"; a="865190491" X-IronPort-AV: E=Sophos;i="6.03,183,1694761200"; d="scan'208";a="865190491" Received: from dapengmi-mobl1.ccr.corp.intel.com (HELO [10.93.5.53]) ([10.93.5.53]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Sep 2023 02:24:40 -0700 Message-ID: <6601b6f9-b3d2-4da8-a07b-a07ef9fe96e1@linux.intel.com> Date: Thu, 28 Sep 2023 17:24:37 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event Content-Language: en-US To: Sean Christopherson , Peter Zijlstra Cc: Paolo Bonzini , Arnaldo Carvalho de Melo , Kan Liang , Like Xu , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Zhenyu Wang , Zhang Xiong , Lv Zhiyuan , Yang Weijiang , Dapeng Mi , Jim Mattson , David Dunn , Mingwei Zhang References: <20230927033124.1226509-1-dapeng1.mi@linux.intel.com> <20230927033124.1226509-8-dapeng1.mi@linux.intel.com> <20230927113312.GD21810@noisy.programming.kicks-ass.net> From: "Mi, Dapeng" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Thu, 28 Sep 2023 02:25:19 -0700 (PDT) On 9/28/2023 1:27 AM, Sean Christopherson wrote: > +Jim, David, and Mingwei > > On Wed, Sep 27, 2023, Peter Zijlstra wrote: >> On Wed, Sep 27, 2023 at 11:31:18AM +0800, Dapeng Mi wrote: >>> When guest wants to use PERF_METRICS MSR, a virtual metrics event needs >>> to be created in the perf subsystem so that the guest can have exclusive >>> ownership of the PERF_METRICS MSR. >> Urgh, can someone please remind me how all that is supposed to work >> again? The guest is just a task that wants the event. If the >> host creates a CPU event, then that gets scheduled with higher priority >> and the task looses out, no joy. It looks I used the inaccurate words in the comments. Yes, it's not *exclusive* from host's point view.  Currently the perf events created by KVM are task-pinned events, they are indeed possible to be preempted by CPU-pinned host events which have higher priority. This is a long term issue which vPMU encountered. We ever have some internal discussion about this issue, but it seems we don't have a good way to solve this issue thoroughly in current vPMU framework. But if there is no such CPU-pinned events which have the highest priority on host, KVM perf events can share the HW resource with other host events with the way of time-multiplexing. >> So you cannot guarantee the guest gets anything. >> >> That is, I remember we've had this exact problem before, but I keep >> forgetting how this all is supposed to work. I don't use this virt stuff >> (and every time I try qemu arguments defeat me and I give up in >> disgust). > I don't think it does work, at least not without a very, very carefully crafted > setup and a host userspace that knows it must not use certain aspects of perf. > E.g. for PEBS, if the guest virtual counters don't map 1:1 to the "real" counters > in hardware, KVM+perf simply disables the counter. > > And for top-down slots, getting anything remotely accurate requires pinning vCPUs > 1:1 with pCPUs and enumerating an accurate toplogy to the guest: > > The count is distributed among unhalted logical processors (hyper-threads) who > share the same physical core, in processors that support Intel Hyper-Threading > Technology. > > Jumping the gun a bit (we're in the *super* early stages of scraping together a > rough PoC), but I think we should effectively put KVM's current vPMU support into > maintenance-only mode, i.e. stop adding new features unless they are *very* simple > to enable, and instead pursue an implementation that (a) lets userspace (and/or > the kernel builder) completely disable host perf (or possibly just host perf usage > of the hardware PMU) and (b) let KVM passthrough the entire hardware PMU when it > has been turned off in the host. > > I.e. keep KVM's existing best-offset vPMU support, e.g. for setups where the > platform owner is also the VM ueer (running a Windows VM on a Linux box, hosting > a Linux VM in ChromeOS, etc...). But for anything advanced and for hard guarantees, > e.g. cloud providers that want to expose fully featured vPMU to customers, force > the platform owner to choose between using perf (or again, perf with hardware PMU) > in the host, and exposing the hardware PMU to the guest. > > Hardware vendors are pushing us in the direction whether we like it or not, e.g. > SNP and TDX want to disallow profiling the guest from the host, ARM has an > upcoming PMU model where (IIUC) it can't be virtualized without a passthrough > approach, Intel's hybrid CPUs are a complete trainwreck unless vCPUs are pinned, > and virtualizing things like top-down slots, PEBS, and LBRs in the shared model > requires an absurd amount of complexity throughout the kernel and userspace. > > Note, a similar idea was floated and rejected in the past[*], but that failed > proposal tried to retain host perf+PMU functionality by making the behavior dynamic, > which I agree would create an awful ABI for the host. If we make the "knob" a > Kconfig or kernel param, i.e. require the platform owner to opt-out of using perf > no later than at boot time, then I think we can provide a sane ABI, keep the > implementation simple, all without breaking existing users that utilize perf in > the host to profile guests. > > [*] https://lore.kernel.org/all/CALMp9eRBOmwz=mspp0m5Q093K3rMUeAsF3vEL39MGV5Br9wEQQ@mail.gmail.com