Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp3033979rwr; Fri, 28 Apr 2023 22:37:34 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ751ML01Be5OofEg3Mwov5FpzwORolRpSR+wEUQskR19h9xkTk2/QUNlDQQiXsvflLDTqMl X-Received: by 2002:a17:903:2305:b0:1a9:4167:5db7 with SMTP id d5-20020a170903230500b001a941675db7mr5830802plh.4.1682746653901; Fri, 28 Apr 2023 22:37:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682746653; cv=none; d=google.com; s=arc-20160816; b=GzVYi49b64ithbwhWabVyPqWRvWKn6nAf1wf7u6oG/aWMOmkGBCtEJiGqRMiVTd0Du L14b/rKj7rmJ5ZUElOJNsOVba29JM3Bx/dqXA8r2vH0ROcgp99K3IObyQp4PGBwDWmSV LzFoAwJR4PGg/n2P5IhKimRGMcMTtp406smzrsKwgbyghDfsNGRV9nNJ3PJvrpF65pjf v2AC5a3oyYHl340ovh6GcD6xFPSGgPZ6zFOvuoLmhk9CEqZKmfrHNlaH9jODlxi+nL9P Dad3A8+ICnyuiClM+tCDiXlP7YtqJjvWmVsMDwwfzewzZ+8j2ZyjY8N/ig9SpotPrQzJ sjzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :references:mime-version:message-id:in-reply-to:date:dkim-signature; bh=d8stMu3pEgwI+YgPkaK5pHnIUWo7htcW/4QFR//BHJg=; b=utklUr6xvuUg37FGDXKzxeVmIbLHQVQJCxqWmwXCARAzrVjQc83tac3juulW6YXXMO VBM1o75lLoRJ7FTe92CD6ZpHtWX6xmoKdEedSFGqAYqyAfmCXkah9wowScITJVbkbdat NJdMaYenxpn4oTwxMe+C8NZNY8quggV6SxU6dTkuKICJYtY/OL1d+4s+eblas4L2FmXw CBHvJnHPdzbDNxEW+3b5AnPiMFjgH1hbLI87oDCjZcqYqXr428Ne6L6Qa/dgWuWccGPZ ydkW7VvOz0mLPZ14RXVHkWZdMfdrRG/q9upe0fUh2No6yCLK05GsmnzVZJZ6RJabieyC c7OQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=5L352oY1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p10-20020a170902780a00b001a8143865f4si21330518pll.613.2023.04.28.22.37.20; Fri, 28 Apr 2023 22:37:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=5L352oY1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230317AbjD2FgQ (ORCPT + 99 others); Sat, 29 Apr 2023 01:36:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230297AbjD2FgO (ORCPT ); Sat, 29 Apr 2023 01:36:14 -0400 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55E9F40C8 for ; Fri, 28 Apr 2023 22:35:52 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id 41be03b00d2f7-51bacd63947so511668a12.0 for ; Fri, 28 Apr 2023 22:35:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682746551; x=1685338551; h=content-transfer-encoding:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:from:to:cc:subject:date :message-id:reply-to; bh=d8stMu3pEgwI+YgPkaK5pHnIUWo7htcW/4QFR//BHJg=; b=5L352oY1LupJxDQ4HdKfK59dplqqZ8BiReqfeweCxGpEZhVwWJf5Vi/InVzSRQPxHV 6uJdtSTD5yktONAcaGITixehgKoNd3kYSVPCQneun7X0wLgV1eLLFO+g7/WaLq/Nr92N Alc8q1m/Bsyrsaqgx9081ZJu1oePEFE4/0K1IMmt+7/UZLGLbNX1c5h95SK8ipvnXpQ6 r+C71J/ZaTzfg0MFR1grmd+rE7bHp9tTknb1g3LM2YP3/5R/s5EJDuJD2PZXpYhpATW2 21iZDnAwA6jvVnF1hgFAeKG5gl6/97vsdoyYVskdfAW5Y0Mi7gxjE+mPdOeAillDz8bK FYqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682746551; x=1685338551; h=content-transfer-encoding:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=d8stMu3pEgwI+YgPkaK5pHnIUWo7htcW/4QFR//BHJg=; b=coM0wNOdygeTIek/yW0Vs2ZdLaddNfFjVq+nliPWEULY1I510RlsBxYeUKU/N8nuvA UrdFXQFJkJteLFpDCveNR5Fix8o2nTtx6RQgHNdbI4XA+AxZZppXVtNkoTnMvaArEiJF R2NX1Ra/YsGFE0dv5EZhMY56aVRH9lo7dt6dJphjd+5hBeLT2uWKtMlzDZLkBHLJwgST UuZL5nw0a9cHSV9jJIrL6CQ1oIjA6nJR86XbziEVyhB3XklmCykgme5ZoCfmaYBKuIHW iDYbDpXrzu9p/QWEDhkSWw+ciYjQ2j39tmm2WfjGgLAtcaSA5bZAWQd8fe5+K1sY661A y9bQ== X-Gm-Message-State: AC+VfDyR9Dl+tZwHBAzQXFZfHDiG9RJC7dSmv5LKcXyKKAlZsUogiNbC ybjDn2Xz474+9KikjROgVge+Ra8s46+f X-Received: from irogers.svl.corp.google.com ([2620:15c:2d4:203:c563:7e28:fb7c:bce3]) (user=irogers job=sendgmr) by 2002:a63:24f:0:b0:528:95c2:276f with SMTP id 76-20020a63024f000000b0052895c2276fmr1933743pgc.11.1682746551608; Fri, 28 Apr 2023 22:35:51 -0700 (PDT) Date: Fri, 28 Apr 2023 22:34:24 -0700 In-Reply-To: <20230429053506.1962559-1-irogers@google.com> Message-Id: <20230429053506.1962559-5-irogers@google.com> Mime-Version: 1.0 References: <20230429053506.1962559-1-irogers@google.com> X-Mailer: git-send-email 2.40.1.495.gc816e09b53d-goog Subject: [PATCH v3 04/46] perf metric: Json flag to not group events if gathering a metric group From: Ian Rogers To: Arnaldo Carvalho de Melo , Kan Liang , Ahmad Yasin , Peter Zijlstra , Ingo Molnar , Stephane Eranian , Andi Kleen , Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Adrian Hunter , Florian Fischer , Rob Herring , Zhengjun Xing , John Garry , Kajol Jain , Sumanth Korikkar , Thomas Richter , Tiezhu Yang , Ravi Bangoria , Leo Yan , Yang Jihong , James Clark , Suzuki Poulouse , Kang Minchul , Athira Rajeev , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Ian Rogers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Some metric groups have metrics that don't have fully overlapping events, meaning that the group's events become unique event groups that may need to multiplex with each other. This can be particularly unfortunate when the groups wouldn't need to multiplex because there are sufficient hardware counters. Add a flag so that if recording a metric group then the metrics within the group needn't use groups for their events. The flag is added to Intel TopdownL1 and TopdownL2 metrics. Signed-off-by: Ian Rogers --- .../arch/x86/alderlake/adl-metrics.json | 26 +++++++++++++++++++ .../arch/x86/alderlaken/adln-metrics.json | 14 ++++++++++ .../arch/x86/broadwell/bdw-metrics.json | 12 +++++++++ .../arch/x86/broadwellde/bdwde-metrics.json | 12 +++++++++ .../arch/x86/broadwellx/bdx-metrics.json | 12 +++++++++ .../arch/x86/cascadelakex/clx-metrics.json | 12 +++++++++ .../arch/x86/haswell/hsw-metrics.json | 12 +++++++++ .../arch/x86/haswellx/hsx-metrics.json | 12 +++++++++ .../arch/x86/icelake/icl-metrics.json | 12 +++++++++ .../arch/x86/icelakex/icx-metrics.json | 12 +++++++++ .../arch/x86/ivybridge/ivb-metrics.json | 12 +++++++++ .../arch/x86/ivytown/ivt-metrics.json | 12 +++++++++ .../arch/x86/jaketown/jkt-metrics.json | 12 +++++++++ .../arch/x86/sandybridge/snb-metrics.json | 12 +++++++++ .../arch/x86/sapphirerapids/spr-metrics.json | 12 +++++++++ .../arch/x86/skylake/skl-metrics.json | 12 +++++++++ .../arch/x86/skylakex/skx-metrics.json | 12 +++++++++ .../arch/x86/tigerlake/tgl-metrics.json | 12 +++++++++ tools/perf/pmu-events/jevents.py | 4 ++- tools/perf/pmu-events/pmu-events.h | 1 + tools/perf/util/metricgroup.c | 8 +++--- 21 files changed, 240 insertions(+), 5 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/to= ols/perf/pmu-events/arch/x86/alderlake/adl-metrics.json index 75d80e70e5cd..1f9047553942 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json @@ -133,6 +133,7 @@ "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "Counts the total number of issue slots that= were not consumed by the backend due to backend stalls. Note that uops mu= st be available for consumption in order for this event to count. If a uop= is not available (IQ is empty), this event will not count. The rest of t= hese subevents count backend stalls, in cycles, due to an outstanding reque= st which is memory bound vs core bound. The subevents are not slot based = events and therefore can not be precisely added or subtracted from the Back= end_Bound_Aux subevents which are slot based.", "ScaleUnit": "100%", "Unit": "cpu_atom" @@ -143,6 +144,7 @@ "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound_aux", "MetricThreshold": "tma_backend_bound_aux > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "Counts the total number of issue slots that= were not consumed by the backend due to backend stalls. Note that UOPS mu= st be available for consumption in order for this event to count. If a uop= is not available (IQ is empty), this event will not count. All of these s= ubevents count backend stalls, in slots, due to a resource limitation. Th= ese are not cycle based events and therefore can not be precisely added or = subtracted from the Backend_Bound subevents which are cycle based. These s= ubevents are supplementary to Backend_Bound and can be used to analyze resu= lts from a resource perspective at allocation.", "ScaleUnit": "100%", "Unit": "cpu_atom" @@ -153,6 +155,7 @@ "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "Counts the total number of issue slots that = were not consumed by the backend because allocation is stalled due to a mis= predicted jump or a machine clear. Only issue slots wasted due to fast nuke= s such as memory ordering nukes are counted. Other nukes are not accounted = for. Counts all issue slots blocked during this recovery window including r= elevant microcode flows and while uops are not yet available in the instruc= tion queue (IQ). Also includes the issue slots that were consumed by the ba= ckend but were thrown away because they were younger than the mispredict or= machine clear.", "ScaleUnit": "100%", "Unit": "cpu_atom" @@ -163,6 +166,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group", "MetricName": "tma_base", "MetricThreshold": "tma_base > 0.6", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%", "Unit": "cpu_atom" }, @@ -182,6 +186,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.05", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%", "Unit": "cpu_atom" }, @@ -209,6 +214,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%", "Unit": "cpu_atom" }, @@ -255,6 +261,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%", "Unit": "cpu_atom" }, @@ -264,6 +271,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.15", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%", "Unit": "cpu_atom" }, @@ -291,6 +299,7 @@ "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "ScaleUnit": "100%", "Unit": "cpu_atom" }, @@ -593,6 +602,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.05", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%", "Unit": "cpu_atom" }, @@ -611,6 +621,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%", "Unit": "cpu_atom" }, @@ -629,6 +640,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group", "MetricName": "tma_ms_uops", "MetricThreshold": "tma_ms_uops > 0.05", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "Counts the number of uops that are from the = complex flows issued by the micro-sequencer (MS). This includes uops from = flows due to complex instructions, faults, assists, and inserted flows.", "ScaleUnit": "100%", "Unit": "cpu_atom" @@ -729,6 +741,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_aux_group= ", "MetricName": "tma_resource_bound", "MetricThreshold": "tma_resource_bound > 0.2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "Counts the total number of issue slots that= were not consumed by the backend due to backend stalls. Note that uops mu= st be available for consumption in order for this event to count. If a uop= is not available (IQ is empty), this event will not count.", "ScaleUnit": "100%", "Unit": "cpu_atom" @@ -739,6 +752,7 @@ "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.75", + "MetricgroupNoGroup": "TopdownL1", "ScaleUnit": "100%", "Unit": "cpu_atom" }, @@ -848,6 +862,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -858,6 +873,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -868,6 +884,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: = tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredict= s_resteers", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -919,6 +936,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -1031,6 +1049,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 6 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp"= , "ScaleUnit": "100%", "Unit": "cpu_core" @@ -1041,6 +1060,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -1121,6 +1141,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -1141,6 +1162,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences. Sample with: UOPS_RETIRED.HEA= VY", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -2023,6 +2045,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -2082,6 +2105,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -2112,6 +2136,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -2310,6 +2335,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json b/= tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json index 1a85d935c733..0402adbf7d92 100644 --- a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json @@ -98,6 +98,7 @@ "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "Counts the total number of issue slots that= were not consumed by the backend due to backend stalls. Note that uops mu= st be available for consumption in order for this event to count. If a uop= is not available (IQ is empty), this event will not count. The rest of t= hese subevents count backend stalls, in cycles, due to an outstanding reque= st which is memory bound vs core bound. The subevents are not slot based = events and therefore can not be precisely added or subtracted from the Back= end_Bound_Aux subevents which are slot based.", "ScaleUnit": "100%" }, @@ -107,6 +108,7 @@ "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound_aux", "MetricThreshold": "tma_backend_bound_aux > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "Counts the total number of issue slots that= were not consumed by the backend due to backend stalls. Note that UOPS mu= st be available for consumption in order for this event to count. If a uop= is not available (IQ is empty), this event will not count. All of these s= ubevents count backend stalls, in slots, due to a resource limitation. Th= ese are not cycle based events and therefore can not be precisely added or = subtracted from the Backend_Bound subevents which are cycle based. These s= ubevents are supplementary to Backend_Bound and can be used to analyze resu= lts from a resource perspective at allocation.", "ScaleUnit": "100%" }, @@ -116,6 +118,7 @@ "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "Counts the total number of issue slots that = were not consumed by the backend because allocation is stalled due to a mis= predicted jump or a machine clear. Only issue slots wasted due to fast nuke= s such as memory ordering nukes are counted. Other nukes are not accounted = for. Counts all issue slots blocked during this recovery window including r= elevant microcode flows and while uops are not yet available in the instruc= tion queue (IQ). Also includes the issue slots that were consumed by the ba= ckend but were thrown away because they were younger than the mispredict or= machine clear.", "ScaleUnit": "100%" }, @@ -125,6 +128,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group", "MetricName": "tma_base", "MetricThreshold": "tma_base > 0.6", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%" }, { @@ -142,6 +146,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.05", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%" }, { @@ -166,6 +171,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%" }, { @@ -207,6 +213,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%" }, { @@ -215,6 +222,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.15", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%" }, { @@ -239,6 +247,7 @@ "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "ScaleUnit": "100%" }, { @@ -499,6 +508,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.05", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%" }, { @@ -515,6 +525,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2", + "MetricgroupNoGroup": "TopdownL2", "ScaleUnit": "100%" }, { @@ -531,6 +542,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group", "MetricName": "tma_ms_uops", "MetricThreshold": "tma_ms_uops > 0.05", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "Counts the number of uops that are from the = complex flows issued by the micro-sequencer (MS). This includes uops from = flows due to complex instructions, faults, assists, and inserted flows.", "ScaleUnit": "100%" }, @@ -620,6 +632,7 @@ "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_aux_group= ", "MetricName": "tma_resource_bound", "MetricThreshold": "tma_resource_bound > 0.2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "Counts the total number of issue slots that= were not consumed by the backend due to backend stalls. Note that uops mu= st be available for consumption in order for this event to count. If a uop= is not available (IQ is empty), this event will not count.", "ScaleUnit": "100%" }, @@ -629,6 +642,7 @@ "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.75", + "MetricgroupNoGroup": "TopdownL1", "ScaleUnit": "100%" }, { diff --git a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json b/to= ols/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json index 51cf8560a8d3..f9e2316601e1 100644 --- a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json +++ b/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json @@ -103,6 +103,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound.", "ScaleUnit": "100%" }, @@ -112,6 +113,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -122,6 +124,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_mispredicts_resteers", "ScaleUnit": "100%" }, @@ -170,6 +173,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -263,6 +267,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Rel= ated metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_l= cp", "ScaleUnit": "100%" }, @@ -272,6 +277,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END", "ScaleUnit": "100%" }, @@ -326,6 +332,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound.", "ScaleUnit": "100%" }, @@ -335,6 +342,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -828,6 +836,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -858,6 +867,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -886,6 +896,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1048,6 +1059,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .RETIRE_SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json = b/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json index fb57c7382408..e9c46d336a8e 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json +++ b/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json @@ -97,6 +97,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, @@ -106,6 +107,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -116,6 +118,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: = tma_info_branch_misprediction_cost, tma_mispredicts_resteers", "ScaleUnit": "100%" }, @@ -164,6 +167,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -248,6 +252,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_dsb_coverage, tma_info_iptb, tma_lcp", "ScaleUnit": "100%" }, @@ -257,6 +262,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", "ScaleUnit": "100%" }, @@ -311,6 +317,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -320,6 +327,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -795,6 +803,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -825,6 +834,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -853,6 +863,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1013,6 +1024,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json b/t= ools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json index 65ec0c9e55d1..437b9867acb9 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json @@ -103,6 +103,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound.", "ScaleUnit": "100%" }, @@ -112,6 +113,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -122,6 +124,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_mispredicts_resteers", "ScaleUnit": "100%" }, @@ -170,6 +173,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -263,6 +267,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Rel= ated metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_l= cp", "ScaleUnit": "100%" }, @@ -272,6 +277,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END", "ScaleUnit": "100%" }, @@ -326,6 +332,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound.", "ScaleUnit": "100%" }, @@ -335,6 +342,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -829,6 +837,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -869,6 +878,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -897,6 +907,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1079,6 +1090,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .RETIRE_SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json b= /tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json index 8f7dc72accd0..875c766222e3 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json @@ -101,6 +101,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound.", "ScaleUnit": "100%" }, @@ -110,6 +111,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -120,6 +122,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredic= ts_resteers", "ScaleUnit": "100%" }, @@ -167,6 +170,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -271,6 +275,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp"= , "ScaleUnit": "100%" }, @@ -280,6 +285,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", "ScaleUnit": "100%" }, @@ -354,6 +360,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -372,6 +379,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -1142,6 +1150,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -1196,6 +1205,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -1224,6 +1234,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1458,6 +1469,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .RETIRE_SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json b/tool= s/perf/pmu-events/arch/x86/haswell/hsw-metrics.json index 2528418200bb..9570a88d6d1c 100644 --- a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json +++ b/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json @@ -103,6 +103,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound.", "ScaleUnit": "100%" }, @@ -112,6 +113,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -122,6 +124,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_mispredicts_resteers", "ScaleUnit": "100%" }, @@ -161,6 +164,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -254,6 +258,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Rel= ated metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_l= cp", "ScaleUnit": "100%" }, @@ -263,6 +268,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END", "ScaleUnit": "100%" }, @@ -272,6 +278,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound.", "ScaleUnit": "100%" }, @@ -281,6 +288,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -663,6 +671,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -693,6 +702,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -721,6 +731,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -874,6 +885,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .RETIRE_SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json b/too= ls/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json index 11f152c346eb..a522202cf684 100644 --- a/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json @@ -103,6 +103,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound.", "ScaleUnit": "100%" }, @@ -112,6 +113,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -122,6 +124,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_mispredicts_resteers", "ScaleUnit": "100%" }, @@ -161,6 +164,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -254,6 +258,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Rel= ated metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_l= cp", "ScaleUnit": "100%" }, @@ -263,6 +268,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END", "ScaleUnit": "100%" }, @@ -272,6 +278,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound.", "ScaleUnit": "100%" }, @@ -281,6 +288,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -664,6 +672,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -704,6 +713,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -732,6 +742,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -905,6 +916,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .RETIRE_SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tool= s/perf/pmu-events/arch/x86/icelake/icl-metrics.json index f45ae3483df4..1a2154f28b7b 100644 --- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json @@ -115,6 +115,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, @@ -124,6 +125,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -141,6 +143,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredic= ts_resteers", "ScaleUnit": "100%" }, @@ -187,6 +190,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -288,6 +292,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 5 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp"= , "ScaleUnit": "100%" }, @@ -297,6 +302,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", "ScaleUnit": "100%" }, @@ -369,6 +375,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -378,6 +385,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -1111,6 +1119,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -1164,6 +1173,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -1191,6 +1201,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1360,6 +1371,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/too= ls/perf/pmu-events/arch/x86/icelakex/icx-metrics.json index 0f9b174dfc22..1ef772b40e04 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json @@ -80,6 +80,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, @@ -89,6 +90,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -106,6 +108,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredic= ts_resteers", "ScaleUnit": "100%" }, @@ -152,6 +155,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -253,6 +257,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 5 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp"= , "ScaleUnit": "100%" }, @@ -262,6 +267,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", "ScaleUnit": "100%" }, @@ -334,6 +340,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -343,6 +350,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -1134,6 +1142,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -1187,6 +1196,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -1214,6 +1224,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1410,6 +1421,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json b/to= ols/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json index 5247f69c13b6..11080ccffd51 100644 --- a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json +++ b/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json @@ -103,6 +103,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound.", "ScaleUnit": "100%" }, @@ -112,6 +113,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -122,6 +124,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_mispredicts_resteers", "ScaleUnit": "100%" }, @@ -161,6 +164,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -254,6 +258,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Rel= ated metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_l= cp", "ScaleUnit": "100%" }, @@ -263,6 +268,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END", "ScaleUnit": "100%" }, @@ -299,6 +305,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound.", "ScaleUnit": "100%" }, @@ -308,6 +315,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -724,6 +732,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -754,6 +763,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -782,6 +792,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -917,6 +928,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .RETIRE_SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json b/tool= s/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json index 89469b10fa30..65a46d659c0a 100644 --- a/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json +++ b/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json @@ -103,6 +103,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound.", "ScaleUnit": "100%" }, @@ -112,6 +113,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -122,6 +124,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_mispredicts_resteers", "ScaleUnit": "100%" }, @@ -161,6 +164,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -254,6 +258,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Rel= ated metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_info_iptb, tma_l= cp", "ScaleUnit": "100%" }, @@ -263,6 +268,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END", "ScaleUnit": "100%" }, @@ -299,6 +305,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound.", "ScaleUnit": "100%" }, @@ -308,6 +315,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -725,6 +733,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -765,6 +774,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -793,6 +803,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -948,6 +959,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .RETIRE_SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json b/too= ls/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json index e8f4e5c01c9f..66a6f657bd6f 100644 --- a/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json +++ b/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json @@ -76,6 +76,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound.", "ScaleUnit": "100%" }, @@ -85,6 +86,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -95,6 +97,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_mispredicts_resteers", "ScaleUnit": "100%" }, @@ -114,6 +117,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -160,6 +164,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Rel= ated metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_lcp", "ScaleUnit": "100%" }, @@ -169,6 +174,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END", "ScaleUnit": "100%" }, @@ -205,6 +211,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound.", "ScaleUnit": "100%" }, @@ -214,6 +221,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -412,6 +420,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -422,6 +431,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches= , tma_remote_cache", "ScaleUnit": "100%" }, @@ -450,6 +460,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -487,6 +498,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .RETIRE_SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json b/= tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json index 4a99fe515f4b..4b8bc19392a4 100644 --- a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json +++ b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json @@ -76,6 +76,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound.", "ScaleUnit": "100%" }, @@ -85,6 +86,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -95,6 +97,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_mispredicts_resteers", "ScaleUnit": "100%" }, @@ -114,6 +117,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -160,6 +164,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Rel= ated metrics: tma_dsb_switches, tma_info_dsb_coverage, tma_lcp", "ScaleUnit": "100%" }, @@ -169,6 +174,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END", "ScaleUnit": "100%" }, @@ -205,6 +211,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound.", "ScaleUnit": "100%" }, @@ -214,6 +221,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -411,6 +419,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -421,6 +430,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches= , tma_remote_cache", "ScaleUnit": "100%" }, @@ -449,6 +459,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -486,6 +497,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .RETIRE_SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json= b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json index 126300b7ae77..620fc5bd2217 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json @@ -87,6 +87,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, @@ -96,6 +97,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -105,6 +107,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: = tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredict= s_resteers", "ScaleUnit": "100%" }, @@ -151,6 +154,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -252,6 +256,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 6 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp"= , "ScaleUnit": "100%" }, @@ -261,6 +266,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", "ScaleUnit": "100%" }, @@ -351,6 +357,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -369,6 +376,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences. Sample with: UOPS_RETIRED.HEA= VY", "ScaleUnit": "100%" }, @@ -1216,6 +1224,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -1269,6 +1278,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -1304,6 +1314,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1509,6 +1520,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json b/tool= s/perf/pmu-events/arch/x86/skylake/skl-metrics.json index a6d212b349f5..21ef6c9be816 100644 --- a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json @@ -101,6 +101,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound.", "ScaleUnit": "100%" }, @@ -110,6 +111,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -120,6 +122,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredic= ts_resteers", "ScaleUnit": "100%" }, @@ -167,6 +170,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -271,6 +275,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp"= , "ScaleUnit": "100%" }, @@ -280,6 +285,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", "ScaleUnit": "100%" }, @@ -345,6 +351,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -363,6 +370,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -1065,6 +1073,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -1110,6 +1119,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -1138,6 +1148,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1343,6 +1354,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .RETIRE_SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json b/too= ls/perf/pmu-events/arch/x86/skylakex/skx-metrics.json index fa2f7f126a30..eb6f12c0343d 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json @@ -101,6 +101,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound.", "ScaleUnit": "100%" }, @@ -110,6 +111,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -120,6 +122,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredic= ts_resteers", "ScaleUnit": "100%" }, @@ -167,6 +170,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -271,6 +275,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 4 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp"= , "ScaleUnit": "100%" }, @@ -280,6 +285,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", "ScaleUnit": "100%" }, @@ -354,6 +360,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -372,6 +379,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -1123,6 +1131,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -1177,6 +1186,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -1205,6 +1215,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1429,6 +1440,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .RETIRE_SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json b/to= ols/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json index 4c80d6be6cf1..b442ed4acfbb 100644 --- a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json @@ -109,6 +109,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, @@ -118,6 +119,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -135,6 +137,7 @@ "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_branch_misprediction_cost, tma_info_mispredictions, tma_mispredic= ts_resteers", "ScaleUnit": "100%" }, @@ -181,6 +184,7 @@ "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -282,6 +286,7 @@ "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_ipc / 5 > 0.35", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_dsb_coverage, tma_info_dsb_misses, tma_info_iptb, tma_lcp"= , "ScaleUnit": "100%" }, @@ -291,6 +296,7 @@ "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", "ScaleUnit": "100%" }, @@ -363,6 +369,7 @@ "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -372,6 +379,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", "ScaleUnit": "100%" }, @@ -1125,6 +1133,7 @@ "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -1178,6 +1187,7 @@ "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -1205,6 +1215,7 @@ "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1374,6 +1385,7 @@ "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jeven= ts.py index ca99b9cfe4ad..f57a8f274025 100755 --- a/tools/perf/pmu-events/jevents.py +++ b/tools/perf/pmu-events/jevents.py @@ -52,7 +52,8 @@ _json_event_attributes =3D [ # Attributes that are in pmu_metric rather than pmu_event. _json_metric_attributes =3D [ 'metric_name', 'metric_group', 'metric_expr', 'metric_threshold', 'des= c', - 'long_desc', 'unit', 'compat', 'aggr_mode', 'event_grouping' + 'long_desc', 'unit', 'compat', 'metricgroup_no_group', 'aggr_mode', + 'event_grouping' ] # Attributes that are bools or enum int values, encoded as '0', '1',... _json_enum_attributes =3D ['aggr_mode', 'deprecated', 'event_grouping', 'p= erpkg'] @@ -303,6 +304,7 @@ class JsonEvent: self.deprecated =3D jd.get('Deprecated') self.metric_name =3D jd.get('MetricName') self.metric_group =3D jd.get('MetricGroup') + self.metricgroup_no_group =3D jd.get('MetricgroupNoGroup') self.event_grouping =3D convert_metric_constraint(jd.get('MetricConstr= aint')) self.metric_expr =3D None if 'MetricExpr' in jd: diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu= -events.h index b7dff8f1021f..80349685cf4d 100644 --- a/tools/perf/pmu-events/pmu-events.h +++ b/tools/perf/pmu-events/pmu-events.h @@ -59,6 +59,7 @@ struct pmu_metric { const char *compat; const char *desc; const char *long_desc; + const char *metricgroup_no_group; enum aggr_mode_class aggr_mode; enum metric_event_groups event_grouping; }; diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index 4b9a16291b96..e1acb0d23b95 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -1144,12 +1144,12 @@ static int metricgroup__add_metric_callback(const s= truct pmu_metric *pm, struct metricgroup__add_metric_data *data =3D vdata; int ret =3D 0; =20 - if (pm->metric_expr && - (match_metric(pm->metric_group, data->metric_name) || - match_metric(pm->metric_name, data->metric_name))) { + if (pm->metric_expr && match_pm_metric(pm, data->metric_name)) { + bool metric_no_group =3D data->metric_no_group || + match_metric(data->metric_name, pm->metricgroup_no_group); =20 data->has_match =3D true; - ret =3D add_metric(data->list, pm, data->modifier, data->metric_no_group= , + ret =3D add_metric(data->list, pm, data->modifier, metric_no_group, data->metric_no_threshold, data->user_requested_cpu_list, data->system_wide, /*root_metric=3D*/NULL, /*visited_metrics=3D*/NULL, table); --=20 2.40.1.495.gc816e09b53d-goog