Received: by 2002:ab2:6d45:0:b0:1fb:d597:ff75 with SMTP id d5csp81146lqr; Tue, 4 Jun 2024 22:22:24 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUFRoyUY9W3gAqvBHpSxNjYYMYD/uTzrjTwDdDYBYyNhwTSzgn0nOVegrzgMiBBCXWKNsDj2CnCN7SrQCnmw1EkJrZlXg8nJpopdoBKLg== X-Google-Smtp-Source: AGHT+IFCnM8jQ3L/yJ2aI0sFGu6i1TquoHxtGLTi+Hn1J0j4wTGMizCBrB6V3W5N1weaeXIVw4V6 X-Received: by 2002:a17:906:2809:b0:a68:e2f5:68b1 with SMTP id a640c23a62f3a-a69a00227a9mr96395166b.49.1717564944080; Tue, 04 Jun 2024 22:22:24 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717564944; cv=pass; d=google.com; s=arc-20160816; b=NoA1e0D4RAHh7zEWPTNkxNpdn70YbZOA/cXsgJroMklxVweDx1UXXlJOgSQITBDHvO h1t+YxKK6f7amx7Y80ipme+CA2P2rHDHqkMkgORzctZT79orl/gSFX1cJKkbu2SPxHNY hctXn7E413zeHcDNY40XPui0BT+/aMxJy7PZC7UxPB992uh1pQtj4PxWF9LTBJCLeYbB zQTT+SQhwwIt3iu8yvQ0durgXl9SNiHdvvSz1D78E4TG/D4maV/Tn5ZvSE6VbXp4jaRB nbr082pNuQB96MVQy1aLnAU2fLdAlZm8KCmfKjOHSbgMXuR9l0OAgRsmq4hO9aLZxWDX gcsw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=3H6uHYypI9qAQacFq49x+TMJ3bQTt+2RdKlk2hPifYc=; fh=7D8bWqELNqJM6KF7zbibUOya8038rGAdDMFXsJMRu8c=; b=UUKGxh5VLOhP0z4icov/+zEDEHyLOlVtsr/fdp2VC7dgBDbH7yLZcfQsgrFHJ1RjBy x5GQ9PQWFsDLI/XAuKhamPyx5zZxUWC+GU2QsDuimV1JGOQ9WWtKsii5G3zOk5fHW+Qy p697l/cJMDyEMV/j4UGFOhNTNY9qYmxKE+h4ltB+7Bx9VFwSJgcOp67RnXeNBjAKEw76 KkUPpkTfabkMXd5LKoMDcZk0QZs8XCluXxLathLtothNpp0GkpYMyyUWDhq5e/0I0f01 U+zrVWYtrronwM3mAjUk5FJ5URQHDkBYfF5dmlaOApuOSgyZnfWSH7Z1CkRKuTbAN4El QEwA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lZMa9Ky9; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-201764-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-201764-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id a640c23a62f3a-a68fcd9dab6si307652866b.945.2024.06.04.22.22.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jun 2024 22:22:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-201764-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lZMa9Ky9; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-201764-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-201764-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 9E5DB1F25F4D for ; Wed, 5 Jun 2024 05:22:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A995913AD25; Wed, 5 Jun 2024 05:22:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lZMa9Ky9" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F95E13AA4D; Wed, 5 Jun 2024 05:22:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717564933; cv=none; b=nRy4iPH1twIF1EdfZiacD7yF0vHe0TgsuQMuP7NI0udpvzgYFSrTv1DQQnwVuHhedDrtEC+mnVfwT6abrxS2DS+hg8yfEfcePqaGto1MOUFlV8HJcGkDaGwHeKKQAL4AjuCzBT1SLH9q0AApZpciAHDNGaijyn6vIq5d3RvM11E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717564933; c=relaxed/simple; bh=OJ4RTBm7n87ksfk02fMqhKyBCyQjiCoMk/qXUWK4KLA=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=gitjFbiMHbWvdxCJQgerFMrYnHNYNRHOA7+AdsEUV3loyiUw+UqfhcU5Kb4fbdeiCaBzkFWZRDm5g70z/rRVkEL0xNf/k/lN0QpEUphaeEnPDdk7IuWu3XK4/DOXwS/LFqkDho2vkNZDa4dmLqia5JTqeQ5Q3eJKN6WSsptEo8A= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lZMa9Ky9; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1717564931; x=1749100931; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=OJ4RTBm7n87ksfk02fMqhKyBCyQjiCoMk/qXUWK4KLA=; b=lZMa9Ky9XU1rq/18IBb9MnbBVrq2VvRnEkF6dUw1SmJRgR0/h/qbDopq RGzmJWPSgl+QP9p045/l4ZPGRTt3PL3JlKMS45AWnvs7AGNOWNRmUAfFs iKw5RZjPwVOSeiPtPjI8dXg6KaJF30khsNLJNciSF0U82Wz26AX2i8SEF 0geRnC9KJiV/N+Pt7PaK8yJi9iF0b9ZDFcKOZuRS+bcA2ZQCweiSFbbmk iQGPpOIdKlHcfiUrhPineEKnhiYGivQV2ylXEm98tSnwu4yAs10esVLS5 KmEc9tZ/KMiHn7irNezZ2OK4mrId1B8WLkDfKHqgJ6lk/f1+UVSFe9m/7 Q==; X-CSE-ConnectionGUID: MrEotYIKQHmwNA7m7svZjw== X-CSE-MsgGUID: UHxJ9KonSSe3Pvqt4hIKOg== X-IronPort-AV: E=McAfee;i="6600,9927,11093"; a="25258786" X-IronPort-AV: E=Sophos;i="6.08,215,1712646000"; d="scan'208";a="25258786" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jun 2024 22:22:10 -0700 X-CSE-ConnectionGUID: BmAFBMjzQFCRFNEtPoCaAA== X-CSE-MsgGUID: VymE/XzZRQySFi803sOLJw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,215,1712646000"; d="scan'208";a="37525722" Received: from fl31ca102ks0602.deacluster.intel.com (HELO gnr-bkc.deacluster.intel.com) ([10.75.133.163]) by orviesa009.jf.intel.com with ESMTP; 04 Jun 2024 22:22:11 -0700 From: weilin.wang@intel.com To: weilin.wang@intel.com, Namhyung Kim , Ian Rogers , Arnaldo Carvalho de Melo , Peter Zijlstra , Ingo Molnar , Alexander Shishkin , Jiri Olsa , Adrian Hunter , Kan Liang Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers Subject: [RFC PATCH v11 0/8]TPEBS counting mode support Date: Wed, 5 Jun 2024 01:21:41 -0400 Message-ID: <20240605052200.4143205-1-weilin.wang@intel.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Weilin Wang Changes in v11: - Make retire_latency evsels not opened for counting. This works correctly now with the code Namhyung suggested that adjusting group read size when retire_latency evsels included in the group. - Update retire_latency value assignment using rint() to reduce precision loss while keeping code generic. - Fix the build error caused by not used variable in the test. Other changes in v10: - Change perf record fork from perf stat to evsel. All the major operations like tpebs start, stop, read_evsel should directly work through evsel. - Make intel-tpebs x86_64 only. This change is cross-compiled to arm64. - Put tpebs code to intel-tepbs and simplify intel-tpebs APIs to minimum number of functions and variables. Update funtion name and variable names to use consistent prefix. Also improve error handling. - Integrate code patch from Ian for the :R parser. - Update MTL metrics to TMA 4.8. V9: https://lore.kernel.org/all/20240521173952.3397644-1-weilin.wang@intel.com/ Changes in v9: - Update the retire_latency result print and metric calculation method. Plugin the value to evsel so that no special code is required. - Update --control:fifo to use pipe instead of named pipe. - Add test for TPEBS counting mode. - Update Document with more details. Changes in v8: - In this revision, the code is updated to base on Ian's patch on R modifier parser https://lore.kernel.org/lkml/20240428053616.1125891-3-irogers@google.com/ After this change, there is no special code required for R modifier in metricgroup.c and metricgroup.h files. Caveat of this change: Ideally, we will need to add special handling to skip counting events with R modifier in evsel. Currently, this is not implemented so the event with :R will be both counted and sampled. Usually, in a metric formula that uses retire_latency, it would already require to count the event. As a result, we will endup count the same event twice. This should be able to be handled properly when we finalize our design on evsel R modifier support. - Move TPEBS specific code out from main perf stat code to separate files in util/intel-tpebs.c and util/intel-tpebs.h. [Namhyung] - Use --control:fifo to ack perf stat from forked perf record instead of sleep(2) [Namhyung] - Add introductions about TPEBS and R modifier in Documents. [Namhyung] Changes in v7: - Update code and comments for better code quality [Namhyung] - Add a separate commit for perf data [Namhyung] - Update retire latency print function to improve alignment [Namhyung] Changes in v6: - Update code and add comments for better code quality [Namhyung] - Remove the added fd var and directly pass the opened fd to data.file.fd [Namhyung] - Add kill() to stop perf record when perf stat exists early [Namhyung] - Add command opt check to ensure only start perf record when -a/-C given [Namhyung] - Squash commits [Namhyung] Changes in v5: - Update code and add comments for better code quality [Ian] Changes in v4: - Remove uncessary debug print and update code and comments for better readability and quality [Namhyung] - Update mtl metric json file with consistent TmaL1 and TopdownL1 metricgroup Changes in v3: - Remove ':' when event name has '@' [Ian] - Use 'R' as the modifier instead of "retire_latency" [Ian] Changes in v2: - Add MTL metric file - Add more descriptions and example to the patch [Arnaldo] Here is an example of running perf stat to collect a metric that uses retire_latency value of event MEM_INST_RETIRED.STLB_HIT_STORES on a MTL system. In this simple example, there is no MEM_INST_RETIRED.STLB_HIT_STORES sample. Therefore, the MEM_INST_RETIRED.STLB_HIT_STORES:p count and retire_latency value are all 0. ./perf stat -M tma_dtlb_store -a -- sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] Performance counter stats for 'system wide': 181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 % tma_dtlb_store 3,195,608 cpu_core/topdown-retiring/ 40,156,649 cpu_core/topdown-mem-bound/ 3,550,925 cpu_core/topdown-bad-spec/ 117,571,818 cpu_core/topdown-fe-bound/ 57,118,087 cpu_core/topdown-be-bound/ 69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/ 4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/ 30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ 30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/ 168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ 0.00 MEM_INST_RETIRED.STLB_HIT_STORES:p 0 0 1.003105924 seconds time elapsed v1: TPEBS is one of the features provided by the next generation of Intel PMU. Please refer to Section 8.4.1 of "IntelĀ® Architecture Instruction Set Extensions Programming Reference" [1] for more details about this feature. This set of patches supports TPEBS in counting mode. The code works in the following way: it forks a perf record process from perf stat when retire_latency of one or more events are used in a metric formula. Perf stat would send a SIGTERM signal to perf record before it needs the retire latency value for metric calculation. Perf stat will then process sample data to extract the retire latency data for metric calculations. Currently, the code uses the arithmetic average of retire latency values. [1] https://www.intel.com/content/www/us/en/content-details/812218/intel-architecture-instruction-set-extensions-programming-reference.html?wapkw=future%20features Ian Rogers (1): perf parse-events: Add a retirement latency modifier Weilin Wang (7): perf data: Allow to use given fd in data->file.fd perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric. perf stat: Plugin retire_lat value from sampled data to evsel perf vendor events intel: Add MTL metric json files perf stat: Add command line option for enabling tpebs recording perf Document: Add TPEBS to Documents perf test: Add test for Intel TPEBS counting mode tools/perf/Documentation/perf-list.txt | 1 + tools/perf/Documentation/topdown.txt | 30 + tools/perf/arch/x86/util/evlist.c | 6 + tools/perf/builtin-stat.c | 8 + .../arch/x86/meteorlake/metricgroups.json | 140 + .../arch/x86/meteorlake/mtl-metrics.json | 2595 +++++++++++++++++ .../perf/tests/shell/test_stat_intel_tpebs.sh | 18 + tools/perf/util/Build | 1 + tools/perf/util/data.c | 7 +- tools/perf/util/evsel.c | 81 +- tools/perf/util/evsel.h | 6 + tools/perf/util/intel-tpebs.c | 397 +++ tools/perf/util/intel-tpebs.h | 48 + tools/perf/util/parse-events.c | 2 + tools/perf/util/parse-events.h | 1 + tools/perf/util/parse-events.l | 3 +- 16 files changed, 3340 insertions(+), 4 deletions(-) create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/metricgroups.json create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/mtl-metrics.json create mode 100755 tools/perf/tests/shell/test_stat_intel_tpebs.sh create mode 100644 tools/perf/util/intel-tpebs.c create mode 100644 tools/perf/util/intel-tpebs.h -- 2.43.0