Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp2127086rdh; Tue, 26 Sep 2023 13:21:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEbbv1niA32p1uHGjYD7Fha+XDqPzOrkpQTCKXMUZf4MFszpDTLuGo9h4/cfuJPCCFHzhqz X-Received: by 2002:a17:902:b284:b0:1c5:db1d:1065 with SMTP id u4-20020a170902b28400b001c5db1d1065mr11300104plr.55.1695759661129; Tue, 26 Sep 2023 13:21:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695759661; cv=none; d=google.com; s=arc-20160816; b=PeMNlQv0rzQLLdRRhoHEBVz7/9SB0cqjI+14jSA0nHGaPUYcSRYDMtk55AvqqunjsM YX4r0ZJglhICNFMniv/qnu3EQpiFgG17rd0YaVrjgJtSRP/tN/X1QflW29Bf2udM2Z54 fGOkHZox/UanfH6AgZ87k+y+vEB+ZDWyIPt1D0iB2AHCFN7uT2ulrT99/YCslR5VWtpg aJN2X5rjyZ5eztSrHr8Nuaghq2MrKtwZaT++1utRj6ZYqXQZz/mFWZ8KgvOhPhvMAK/9 TwyhpesonW4HFQRoEeySTZpMJj1OKc8HaBm1d8iyx8P4z4SBmrPh7aIe5zCAtHH4vcvI Yf9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=bKSDztsnE5+z6fkrpZSRNR81vw6CZUiNISP03Cq0lh0=; fh=R1OkhriIwip1F03SPBPuQjbmUJVFiHc78tVoBI9xOtw=; b=DjYvzSWT9iDHg3nqubJhVaulRBHQOWaB+iBqh6AFNNvrHKrF2LMl0d6GJ6WxInBMhn 8TflfYR3zONvCQ6NmbxWxxpYn5BSAPsU5PK8+3tAdNSekHmuWlsp6CDcx2l/xSritjyo ZKbAGUNcx0l2wec3mLOFMXbPRmtW+NYicbNk8320F1i6/sOdOA0l7YoKlAtQ1ksniACt nqNqfPIscKSNoVP/sBLtfRSZrbPFlUnxT9mVii6wrA19zD5VpxoVQzCqS6VcQmNygUQn jyk6H1gZVkY6TvUGuMgi911DO75hAtRuLxXtEN5yYoxV+w9YJsBZia13EeyHo47DtRrf zIjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mcSUCE0T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id d14-20020a170902cece00b001c3b4cb8c88si14960474plg.338.2023.09.26.13.21.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Sep 2023 13:21:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mcSUCE0T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 95C668315CAB; Tue, 26 Sep 2023 07:43:50 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234958AbjIZOnn (ORCPT + 99 others); Tue, 26 Sep 2023 10:43:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234659AbjIZOnm (ORCPT ); Tue, 26 Sep 2023 10:43:42 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78AFB120; Tue, 26 Sep 2023 07:43:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695739415; x=1727275415; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=82NjXVwCGvS22dDt012PMYpl13On0TbRseI+bZw2z1s=; b=mcSUCE0T80N1dFJ4rk6BUNsfjf0/c76+Krdg/pr4La5+e1wbrS2r4VuK yx8BSa7MDEiX+GwDXSi0QbN6coazyiVwbpmVPAQPn58gLQIMVGHBHW5oJ S/PwWPvvHJBNqEMtHA0hJ/QwrF/FNJFQFw9pkdWmUqdTUR0/mi8CgYa3d XVJUDqI4U0xsUmOGcB9032BW7JL3IwUbE5/HGIrPqOecPdY2qVl9qrRj0 zUfFGWgT9xfHFPSW0hEAPBpv6iOkgpkrgrGF+dmBeNIOtSh6MqWFRWAVP yVPlkTezK5AfkX5csyuKJxhQkmpoLIwRsDICHgVy5eOZPCPfrMtw1QEf+ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10845"; a="445710868" X-IronPort-AV: E=Sophos;i="6.03,178,1694761200"; d="scan'208";a="445710868" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2023 07:43:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10845"; a="814504469" X-IronPort-AV: E=Sophos;i="6.03,178,1694761200"; d="scan'208";a="814504469" Received: from linux.intel.com ([10.54.29.200]) by fmsmga008.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2023 07:43:28 -0700 Received: from [10.209.130.196] (kliang2-mobl1.ccr.corp.intel.com [10.209.130.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by linux.intel.com (Postfix) with ESMTPS id 2EEAF580BBE; Tue, 26 Sep 2023 07:43:26 -0700 (PDT) Message-ID: Date: Tue, 26 Sep 2023 10:43:24 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [RFC PATCH 00/25] Perf stat metric grouping with hardware information Content-Language: en-US To: weilin.wang@intel.com, Ian Rogers , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Adrian Hunter Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Mark Rutland References: <20230925061824.3818631-1-weilin.wang@intel.com> From: "Liang, Kan" In-Reply-To: <20230925061824.3818631-1-weilin.wang@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 26 Sep 2023 07:43:50 -0700 (PDT) On 2023-09-25 2:17 a.m., weilin.wang@intel.com wrote: > From: Weilin Wang > > Perf stat metric grouping generates event groups that are provided to kernel for > data collection using the hardware counters. Sometimes, the grouping might fail > and kernel has to retry the groups because generated groups do not fit in the > hardware counters correctly. In some other cases, the groupings are collected > correctly, however, they left some hardware counters unused. > > To improve these inefficiencies, we would like to propose a hardware aware > grouping method that does metric/event grouping based on event counter > restriction rules and the availability of hardware counters in the system. This > method is generic as long as all the restriction rules could be provided from > the pmu-event JSON files. This method assumes that it's the only user (expect NMI watchdog) and all the HW resource are available. Right? > > This patch set includes code that does hardware aware grouping and updated > pmu-event JSON files for four platforms (SapphireRapids, Icelakex, Cascadelakex, > and Tigerlake) for your testing and experimenting. We've successfully tested > these patches on three platforms (SapphireRapids, Icelakex, and Cascadelakex) > with topdown metrics from TopdownL1 to TopdownL6. > > There are some optimization opportunities that we might implement in the future: > 1) Better NMI hanlding: when NMI watchdog is enabled, we reduce the default_core > total counter size by one. This could be improved to better utilize the counter. > 2) Fill important events into unused counter for better counter utlization: > there might be some unused counters scattered in the groups. We could consider > to add important events in this slots if necessary. This could help increase the > multiplexing percentage and help improve accuracy if the event is critical. > > Remaining questions for dicussion: > 3) Where to start grouping from? The current implementation start grouping by > combining all the events into a single list. This step deduplicates events. But > it does not maintain the relationship of events according to the metrics, i.e. > events required by one metric may not be collected at the same time. Another > type of starting point would be grouping each individual metric and then try to > merge the groups. Maybe you can add a new flag to tag the metric which should/better be scheduled together, e.g., IPC. Thanks, Kan > 4) Any comments, suggestions, new ideas? > 5) If you are interested to test the patch out and the pmu-event JSON files of > your testing platform is not provided here, please let me know so that I could > provide you the files. > > > Weilin Wang (25): > perf stat: Add hardware-grouping cmd option to perf stat > perf stat: Add basic functions for the hardware-grouping stat cmd > option > perf pmu-events: Add functions in jevent.py > perf pmu-events: Add counter info into JSON files for SapphireRapids > perf pmu-events: Add event counter data for Cascadelakex > perf pmu-events: Add event counter data for Icelakex > perf stat: Add helper functions for hardware-grouping method > perf stat: Add functions to get counter info > perf stat: Add helper functions for hardware-grouping method > perf stat: Add helper functions to hardware-grouping method > perf stat: Add utility functions to hardware-grouping method > perf stat: Add more functions for hardware-grouping method > perf stat: Add functions to hardware-grouping method > perf stat: Add build string function and topdown events handling in > hardware-grouping > perf stat: Add function to combine metrics for hardware-grouping > perf stat: Update keyword core to default_core to adjust to the > changes for events with no unit > perf stat: Handle taken alone in hardware-grouping > perf stat: Handle NMI in hardware-grouping > perf stat: Handle grouping method fall back in hardware-grouping > perf stat: Code refactoring in hardware-grouping > perf stat: Add tool events support in hardware-grouping > perf stat: Add TSC support in hardware-grouping > perf stat: Fix a return error issue in hardware-grouping > perf stat: Add check to ensure correctness in platform that does not > support hardware-grouping > perf pmu-events: Add event counter data for Tigerlake > > tools/lib/bitmap.c | 20 + > tools/perf/builtin-stat.c | 7 + > .../arch/x86/cascadelakex/cache.json | 1237 ++++++++++++ > .../arch/x86/cascadelakex/counter.json | 17 + > .../arch/x86/cascadelakex/floating-point.json | 16 + > .../arch/x86/cascadelakex/frontend.json | 68 + > .../arch/x86/cascadelakex/memory.json | 751 ++++++++ > .../arch/x86/cascadelakex/other.json | 168 ++ > .../arch/x86/cascadelakex/pipeline.json | 102 + > .../arch/x86/cascadelakex/uncore-cache.json | 1138 +++++++++++ > .../x86/cascadelakex/uncore-interconnect.json | 1272 +++++++++++++ > .../arch/x86/cascadelakex/uncore-io.json | 394 ++++ > .../arch/x86/cascadelakex/uncore-memory.json | 509 +++++ > .../arch/x86/cascadelakex/uncore-power.json | 25 + > .../arch/x86/cascadelakex/virtual-memory.json | 28 + > .../pmu-events/arch/x86/icelakex/cache.json | 98 + > .../pmu-events/arch/x86/icelakex/counter.json | 17 + > .../arch/x86/icelakex/floating-point.json | 13 + > .../arch/x86/icelakex/frontend.json | 55 + > .../pmu-events/arch/x86/icelakex/memory.json | 53 + > .../pmu-events/arch/x86/icelakex/other.json | 52 + > .../arch/x86/icelakex/pipeline.json | 92 + > .../arch/x86/icelakex/uncore-cache.json | 965 ++++++++++ > .../x86/icelakex/uncore-interconnect.json | 1667 +++++++++++++++++ > .../arch/x86/icelakex/uncore-io.json | 966 ++++++++++ > .../arch/x86/icelakex/uncore-memory.json | 186 ++ > .../arch/x86/icelakex/uncore-power.json | 26 + > .../arch/x86/icelakex/virtual-memory.json | 22 + > .../arch/x86/sapphirerapids/cache.json | 104 + > .../arch/x86/sapphirerapids/counter.json | 17 + > .../x86/sapphirerapids/floating-point.json | 25 + > .../arch/x86/sapphirerapids/frontend.json | 98 +- > .../arch/x86/sapphirerapids/memory.json | 44 + > .../arch/x86/sapphirerapids/other.json | 40 + > .../arch/x86/sapphirerapids/pipeline.json | 118 ++ > .../arch/x86/sapphirerapids/uncore-cache.json | 534 +++++- > .../arch/x86/sapphirerapids/uncore-cxl.json | 56 + > .../sapphirerapids/uncore-interconnect.json | 476 +++++ > .../arch/x86/sapphirerapids/uncore-io.json | 373 ++++ > .../x86/sapphirerapids/uncore-memory.json | 391 ++++ > .../arch/x86/sapphirerapids/uncore-power.json | 24 + > .../x86/sapphirerapids/virtual-memory.json | 20 + > .../pmu-events/arch/x86/tigerlake/cache.json | 65 + > .../arch/x86/tigerlake/counter.json | 7 + > .../arch/x86/tigerlake/floating-point.json | 13 + > .../arch/x86/tigerlake/frontend.json | 56 + > .../pmu-events/arch/x86/tigerlake/memory.json | 31 + > .../pmu-events/arch/x86/tigerlake/other.json | 4 + > .../arch/x86/tigerlake/pipeline.json | 96 + > .../x86/tigerlake/uncore-interconnect.json | 11 + > .../arch/x86/tigerlake/uncore-memory.json | 6 + > .../arch/x86/tigerlake/uncore-other.json | 1 + > .../arch/x86/tigerlake/virtual-memory.json | 20 + > tools/perf/pmu-events/jevents.py | 179 +- > tools/perf/pmu-events/pmu-events.h | 26 +- > tools/perf/util/metricgroup.c | 927 +++++++++ > tools/perf/util/metricgroup.h | 82 + > tools/perf/util/pmu.c | 5 + > tools/perf/util/pmu.h | 1 + > tools/perf/util/stat.h | 1 + > 60 files changed, 13790 insertions(+), 25 deletions(-) > create mode 100644 tools/perf/pmu-events/arch/x86/cascadelakex/counter.json > create mode 100644 tools/perf/pmu-events/arch/x86/icelakex/counter.json > create mode 100644 tools/perf/pmu-events/arch/x86/sapphirerapids/counter.json > create mode 100644 tools/perf/pmu-events/arch/x86/tigerlake/counter.json > > -- > 2.39.3 >