Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp12670660rwd; Fri, 23 Jun 2023 09:04:28 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4cmLd9hG44g+MPzHh9MZ9MsgKLwtxaQS7zjsdJEeEA3NahxdX6Yrgm4xeEhIOS3wKqtJmI X-Received: by 2002:aca:1c16:0:b0:3a0:5497:3c57 with SMTP id c22-20020aca1c16000000b003a054973c57mr6341659oic.36.1687536266872; Fri, 23 Jun 2023 09:04:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687536266; cv=none; d=google.com; s=arc-20160816; b=APe3/bLdmGzeNbsO/QpKOG7o9kP4FdWa/r08kI+i45l8Pla7BWy1t6Tyf3IHH1yg/b 0KMGHFMIAIq2ulctMYPyWDqgsO/Pywt3UPNDBSGqCObWROMqloSitI+i1Di9EMLBOdBc xKwKykGeIvceNqPfiL7Fi6hDMagdolRqrYnewN5ABa5AWt5eXJYPk7Z29XGuzMn98eGG hHeHG+dmSUrSqc3QYz+fIowtSTYhVHZnanN4e1iVmrlKLmZgRA+cFLiI4Jwk+ULmgOBS 5kx/agwzRSnr10nalYRYR9BgDenpFxBC+E9sxPTOkm1liDdUXlrBc6CZlTI9hAo+vr/X z6oA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:to:from:subject :references:mime-version:message-id:in-reply-to:date:dkim-signature; bh=0+Um1tF9FSQdUoamJQd/sqVSIoCCImKhekb+n80xvS8=; b=rUZeYgD1vWfF4MMx6VUFPkTrEAEP2IMIwHbw+SP1yvmEloppYPBwdvXPKYxsPnf98X 5KqPrBTozGSU0BX7g2ZtXurZFdSJNoV5qdhX+VoMiT05WtghbkYHh1Jepxu432f3sX2C YGRXy4OMKPxr2+RPkk6Ik+e5ntS42aXWa0WsGhiOmO+VJxhr+mO0XhPyXyhI+evokv+B dTIrEhB0zrbDbBGDVFr5QinhEZ7UnKsavy3SIRGY+96h2ARMYjgQNtJUpyb3atI3q/x0 7P8id4xcpLrJaO8iUuOvUrhH0uA+pVMI6SMMKj/R+5SCfWkuywpCL3+Y2m8TW6ytICfh sIFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=y4My8xsh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x17-20020a63b211000000b005538bf65e5bsi2135557pge.435.2023.06.23.09.03.42; Fri, 23 Jun 2023 09:04:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=y4My8xsh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231556AbjFWPLt (ORCPT + 99 others); Fri, 23 Jun 2023 11:11:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232245AbjFWPLR (ORCPT ); Fri, 23 Jun 2023 11:11:17 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEF662D51 for ; Fri, 23 Jun 2023 08:10:45 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id d2e1a72fcca58-66872889417so338988b3a.1 for ; Fri, 23 Jun 2023 08:10:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533044; x=1690125044; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=0+Um1tF9FSQdUoamJQd/sqVSIoCCImKhekb+n80xvS8=; b=y4My8xshl1rjzLfqow3ojLgYzCqnGjp5wOMtF7ZyRLS+zsAZU31bz6Nanl7lLHyumY ocmNDwNBKy5CvSKCORqHgTQivtYvDl7/DQrRBNzVZB9Zz5e9t2LlR9r9/xCrY6g8sy0l DbdaxZGk8Rjt+7aDLscHMPpnGiSxK4LMwihJZlSDYBtYBf9Zzdmdsx72mYvpQKMZMZ5y AcXKb7CF0esh7ZvkhJ1T66lrrN6f1Xmez3CCu38T4bQe9Uun/9goVwc13mPAwvBycyX+ VXvjkZGY5TXstdnxBfGYatYTrwzu0gVe9V83LF7vQY7KUoOd0tM5mB2Bc/OiGwGnCs39 Rbjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533044; x=1690125044; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=0+Um1tF9FSQdUoamJQd/sqVSIoCCImKhekb+n80xvS8=; b=JUHOb1fiWp9pcF80/OEaYuaNWkYMUCZIJdWbJmc1lMq4IcRoyM8YAF+csTFt+VACPJ VSKJdnDgNPetn7uXwFb8oRdSlaOqdXzE8Ifk+txw1iGMMkF+fgYwvyMLBx0+fg/jdk51 T8aNPDmeCpCng0JDkS58s6SJYLFm0bdOYgVzuaxpuk8T/uH4pkp3aaeMUvgQCd+LTJfo LZ+uxwUphVO59oUG9rcaMTKF8s9McOTayUM5QF1nzpMmvgu9MZlPfdgq9ivqfYVF+wgL xxaJ66mrICafACV2Thczn3FHZ/NZF5f5PwtLBEloADrPv0O+q83peQTWOcnA0m4i0WQm Xd1Q== X-Gm-Message-State: AC+VfDyebBM4afzkXwP/dHqTd7zRiDN3kM0TAyZipGFxfriFjjh/fJUM zqsE7V8hfrTX2aj6Pvh3ZdzKmWoxKVdj X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a05:6a00:2386:b0:668:70cd:5b37 with SMTP id f6-20020a056a00238600b0066870cd5b37mr5091784pfc.5.1687533043858; Fri, 23 Jun 2023 08:10:43 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:08 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-5-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 04/12] perf vendor events intel: Add rocketlake events/metrics From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add RocketLake events added to Intel perfmon in: https://github.com/intel/perfmon/commit/f5f47dc938d81d5cc491cf8ac1f90bee18e= 238cf mapfile.csv is updated accordingly with the CPUID matching one previously associated with icelake. Signed-off-by: Ian Rogers --- tools/perf/pmu-events/arch/x86/mapfile.csv | 3 +- .../pmu-events/arch/x86/rocketlake/cache.json | 894 ++++++++++ .../arch/x86/rocketlake/floating-point.json | 105 ++ .../arch/x86/rocketlake/frontend.json | 377 ++++ .../arch/x86/rocketlake/memory.json | 394 +++++ .../arch/x86/rocketlake/metricgroups.json | 113 ++ .../pmu-events/arch/x86/rocketlake/other.json | 242 +++ .../arch/x86/rocketlake/pipeline.json | 801 +++++++++ .../arch/x86/rocketlake/rkl-metrics.json | 1571 +++++++++++++++++ .../x86/rocketlake/uncore-interconnect.json | 74 + .../arch/x86/rocketlake/uncore-other.json | 9 + .../arch/x86/rocketlake/virtual-memory.json | 165 ++ 12 files changed, 4747 insertions(+), 1 deletion(-) create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/cache.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/floating-poin= t.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/frontend.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/memory.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/metricgroups.= json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/other.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.j= son create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/uncore-interc= onnect.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/uncore-other.= json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/virtual-memor= y.json diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index 6543a68d4a17..a30f2483d99e 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -13,7 +13,7 @@ GenuineIntel-6-B6,v1.00,grandridge,core GenuineIntel-6-A[DE],v1.01,graniterapids,core GenuineIntel-6-(3C|45|46),v33,haswell,core GenuineIntel-6-3F,v27,haswellx,core -GenuineIntel-6-(7D|7E|A7),v1.18,icelake,core +GenuineIntel-6-7[DE],v1.18,icelake,core GenuineIntel-6-6[AC],v1.20,icelakex,core GenuineIntel-6-3A,v24,ivybridge,core GenuineIntel-6-3E,v23,ivytown,core @@ -22,6 +22,7 @@ GenuineIntel-6-(57|85),v10,knightslanding,core GenuineIntel-6-A[AC],v1.01,meteorlake,core GenuineIntel-6-1[AEF],v3,nehalemep,core GenuineIntel-6-2E,v3,nehalemex,core +GenuineIntel-6-A7,v1.01,rocketlake,core GenuineIntel-6-2A,v19,sandybridge,core GenuineIntel-6-(8F|CF),v1.13,sapphirerapids,core GenuineIntel-6-AF,v1.00,sierraforest,core diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/cache.json b/tools/p= erf/pmu-events/arch/x86/rocketlake/cache.json new file mode 100644 index 000000000000..b0f54a6650fe --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/cache.json @@ -0,0 +1,894 @@ +[ + { + "BriefDescription": "Counts the number of cache lines replaced in = L1 data cache.", + "EventCode": "0x51", + "EventName": "L1D.REPLACEMENT", + "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of cycles a demand request has waited = due to L1D Fill Buffer (FB) unavailability.", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.FB_FULL", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", + "SampleAfterValue": "1000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of phases a demand request has waited = due to L1D Fill Buffer (FB) unavailability.", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.FB_FULL_PERIODS", + "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", + "SampleAfterValue": "1000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of cycles a demand request has waited = due to L1D due to lack of L2 resources.", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.L2_STALL", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses.", + "SampleAfterValue": "1000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of L1D misses that are outstanding", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.PENDING", + "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles with L1D load Misses outstanding.", + "CounterMask": "1", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.PENDING_CYCLES", + "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "L2 cache lines filling L2", + "EventCode": "0xF1", + "EventName": "L2_LINES_IN.ALL", + "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects.", + "SampleAfterValue": "100003", + "UMask": "0x1f" + }, + { + "BriefDescription": "Modified cache lines that are evicted by L2 c= ache when triggered by an L2 cache fill.", + "EventCode": "0xF2", + "EventName": "L2_LINES_OUT.NON_SILENT", + "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3", + "SampleAfterValue": "200003", + "UMask": "0x2" + }, + { + "BriefDescription": "Non-modified cache lines that are silently dr= opped by L2 cache when triggered by an L2 cache fill.", + "EventCode": "0xF2", + "EventName": "L2_LINES_OUT.SILENT", + "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache when triggered by an L2 cache fill. These lines are ty= pically in Shared or Exclusive state. A non-threaded event.", + "SampleAfterValue": "200003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cache lines that have been L2 hardware prefet= ched but not used by demand accesses", + "EventCode": "0xf2", + "EventName": "L2_LINES_OUT.USELESS_HWPF", + "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "L2 code requests", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_CODE_RD", + "PublicDescription": "Counts the total number of L2 code requests.= ", + "SampleAfterValue": "200003", + "UMask": "0xe4" + }, + { + "BriefDescription": "Demand Data Read requests", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD", + "PublicDescription": "Counts the number of demand Data Read reques= ts (including requests from L1D hardware prefetchers). These loads may hit = or miss L2 cache. Only non rejected loads are counted.", + "SampleAfterValue": "200003", + "UMask": "0xe1" + }, + { + "BriefDescription": "Demand requests that miss L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_DEMAND_MISS", + "PublicDescription": "Counts demand requests that miss L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0x27" + }, + { + "BriefDescription": "Demand requests to L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_DEMAND_REFERENCES", + "PublicDescription": "Counts demand requests to L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0xe7" + }, + { + "BriefDescription": "RFO requests to L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_RFO", + "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches.", + "SampleAfterValue": "200003", + "UMask": "0xe2" + }, + { + "BriefDescription": "L2 cache hits when fetching instructions, cod= e reads.", + "EventCode": "0x24", + "EventName": "L2_RQSTS.CODE_RD_HIT", + "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads.", + "SampleAfterValue": "200003", + "UMask": "0xc4" + }, + { + "BriefDescription": "L2 cache misses when fetching instructions", + "EventCode": "0x24", + "EventName": "L2_RQSTS.CODE_RD_MISS", + "PublicDescription": "Counts L2 cache misses when fetching instruc= tions.", + "SampleAfterValue": "200003", + "UMask": "0x24" + }, + { + "BriefDescription": "Demand Data Read requests that hit L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.DEMAND_DATA_RD_HIT", + "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0xc1" + }, + { + "BriefDescription": "Demand Data Read miss L2, no rejects", + "EventCode": "0x24", + "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS", + "PublicDescription": "Counts the number of demand Data Read reques= ts that miss L2 cache. Only not rejected loads are counted.", + "SampleAfterValue": "200003", + "UMask": "0x21" + }, + { + "BriefDescription": "All requests that miss L2 cache.", + "EventCode": "0x24", + "EventName": "L2_RQSTS.MISS", + "PublicDescription": "Counts all requests that miss L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0x3f" + }, + { + "BriefDescription": "All L2 requests.", + "EventCode": "0x24", + "EventName": "L2_RQSTS.REFERENCES", + "PublicDescription": "Counts all L2 requests.", + "SampleAfterValue": "200003", + "UMask": "0xff" + }, + { + "BriefDescription": "RFO requests that hit L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.RFO_HIT", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0xc2" + }, + { + "BriefDescription": "RFO requests that miss L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.RFO_MISS", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0x22" + }, + { + "BriefDescription": "SW prefetch requests that hit L2 cache.", + "EventCode": "0x24", + "EventName": "L2_RQSTS.SWPF_HIT", + "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full.", + "SampleAfterValue": "200003", + "UMask": "0xc8" + }, + { + "BriefDescription": "SW prefetch requests that miss L2 cache.", + "EventCode": "0x24", + "EventName": "L2_RQSTS.SWPF_MISS", + "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full.", + "SampleAfterValue": "200003", + "UMask": "0x28" + }, + { + "BriefDescription": "L2 writebacks that access L2 cache", + "EventCode": "0xF0", + "EventName": "L2_TRANS.L2_WB", + "PublicDescription": "Counts L2 writebacks that access L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0x40" + }, + { + "BriefDescription": "Core-originated cacheable requests that misse= d L3 (Except hardware prefetches to the L3)", + "EventCode": "0x2e", + "EventName": "LONGEST_LAT_CACHE.MISS", + "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3.", + "SampleAfterValue": "100003", + "UMask": "0x41" + }, + { + "BriefDescription": "Retired load instructions.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.ALL_LOADS", + "PEBS": "1", + "PublicDescription": "Counts all retired load instructions. This e= vent accounts for SW prefetch instructions of PREFETCHNTA or PREFETCHT0/1/2= or PREFETCHW.", + "SampleAfterValue": "1000003", + "UMask": "0x81" + }, + { + "BriefDescription": "Retired store instructions.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.ALL_STORES", + "PEBS": "1", + "PublicDescription": "Counts all retired store instructions.", + "SampleAfterValue": "1000003", + "UMask": "0x82" + }, + { + "BriefDescription": "All retired memory instructions.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.ANY", + "PEBS": "1", + "PublicDescription": "Counts all retired memory instructions - loa= ds and stores.", + "SampleAfterValue": "1000003", + "UMask": "0x83" + }, + { + "BriefDescription": "Retired load instructions with locked access.= ", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.LOCK_LOADS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with locked= access.", + "SampleAfterValue": "100007", + "UMask": "0x21" + }, + { + "BriefDescription": "Retired load instructions that split across a= cacheline boundary.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.SPLIT_LOADS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions that split = across a cacheline boundary.", + "SampleAfterValue": "100003", + "UMask": "0x41" + }, + { + "BriefDescription": "Retired store instructions that split across = a cacheline boundary.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.SPLIT_STORES", + "PEBS": "1", + "PublicDescription": "Counts retired store instructions that split= across a cacheline boundary.", + "SampleAfterValue": "100003", + "UMask": "0x42" + }, + { + "BriefDescription": "Retired load instructions that miss the STLB.= ", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_MISS_LOADS", + "PEBS": "1", + "PublicDescription": "Number of retired load instructions that (st= art a) miss in the 2nd-level TLB (STLB).", + "SampleAfterValue": "100003", + "UMask": "0x11" + }, + { + "BriefDescription": "Retired store instructions that miss the STLB= .", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_MISS_STORES", + "PEBS": "1", + "PublicDescription": "Number of retired store instructions that (s= tart a) miss in the 2nd-level TLB (STLB).", + "SampleAfterValue": "100003", + "UMask": "0x12" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were L3 and cross-core snoop hits in on-pkg core cache", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions whose data = sources were L3 and cross-core snoop hits in on-pkg core cache.", + "SampleAfterValue": "20011", + "UMask": "0x2" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were HitM responses from shared L3", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions whose data = sources were HitM responses from shared L3.", + "SampleAfterValue": "20011", + "UMask": "0x4" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were L3 hit and cross-core snoop missed in on-pkg core cache.", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS", + "PEBS": "1", + "PublicDescription": "Counts the retired load instructions whose d= ata sources were L3 hit and cross-core snoop missed in on-pkg core cache.", + "SampleAfterValue": "20011", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were hits in L3 without snoops required", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_NONE", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions whose data = sources were hits in L3 without snoops required.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Retired instructions with at least 1 uncachea= ble load or Bus Lock.", + "Data_LA": "1", + "EventCode": "0xd4", + "EventName": "MEM_LOAD_MISC_RETIRED.UC", + "PEBS": "1", + "PublicDescription": "Retired instructions with at least one load = to uncacheable memory-type, or at least one cache-line split locked access = (Bus Lock).", + "SampleAfterValue": "100007", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of completed demand load requests that= missed the L1, but hit the FB(fill buffer), because a preceding miss to th= e same cacheline initiated the line to be brought into L1, but data is not = yet ready in L1.", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.FB_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop was load missed in L1 but hit FB (Fill Buffers) due to preceding= miss to the same cache line with data not ready.", + "SampleAfterValue": "100007", + "UMask": "0x40" + }, + { + "BriefDescription": "Retired load instructions with L1 cache hits = as data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L1_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L1 data cache. This event includes all SW prefet= ches and lock instructions regardless of the data source.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired load instructions missed L1 cache as = data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L1_MISS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L1 cache.", + "SampleAfterValue": "200003", + "UMask": "0x8" + }, + { + "BriefDescription": "Retired load instructions with L2 cache hits = as data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L2_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with L2 cac= he hits as data sources.", + "SampleAfterValue": "200003", + "UMask": "0x2" + }, + { + "BriefDescription": "Retired load instructions missed L2 cache as = data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L2_MISS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions missed L2 c= ache as data sources.", + "SampleAfterValue": "100021", + "UMask": "0x10" + }, + { + "BriefDescription": "Retired load instructions with L3 cache hits = as data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L3_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L3 cache.", + "SampleAfterValue": "100021", + "UMask": "0x4" + }, + { + "BriefDescription": "Retired load instructions missed L3 cache as = data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L3_MISS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L3 cache.", + "SampleAfterValue": "50021", + "UMask": "0x20" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop was s= ent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop hit i= n another cores caches, data forwarding is required as the data is modified= .", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop hit i= n another core, data forwarding is not required.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop was s= ent but no other cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop was n= ot needed to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop was s= ent.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop hit in another cores caches, data forwarding is re= quired as the data is modified.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop hit in another core, data forwarding is not requir= ed.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop was sent but no other cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop was not needed to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop was sent.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop hit in another cores caches, data forward= ing is required as the data is modified.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop hit in another core, data forwarding is n= ot required.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop was sent but no other cores had the data.= ", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop was not needed to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop was sent.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that hit a cacheline in the L3 where a= snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that hit a cacheline in the L3 where a= snoop was sent but no other cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that hit a cacheline in the L3 where a= snoop was not needed to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop was sent or n= ot.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop hit in anothe= r cores caches, data forwarding is required as the data is modified.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop hit in anothe= r core, data forwarding is not required.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop was sent but = no other cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop was not neede= d to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop was sent.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop hit in another cores= caches, data forwarding is required as the data is modified.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop hit in another core,= data forwarding is not required.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop was sent but no othe= r cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop was not needed to sa= tisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop was sent.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetches to the L3 only tha= t hit a cacheline in the L3 where a snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L3.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C2380", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that hit a cacheline in the L3 where a snoop hit in= another core, data forwarding is not required.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C8000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that hit a cacheline in the L3 where a snoop was se= nt but no other cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C8000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that hit a cacheline in the L3 where a snoop was no= t needed to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C8000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that hit a cacheline in the L3 where a snoop was se= nt.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C8000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts streaming stores that hit a cacheline = in the L3 where a snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.STREAMING_WR.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0800", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Demand and prefetch data reads", + "EventCode": "0xB0", + "EventName": "OFFCORE_REQUESTS.ALL_DATA_RD", + "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Counts memory transactions sent to the uncore= .", + "EventCode": "0xB0", + "EventName": "OFFCORE_REQUESTS.ALL_REQUESTS", + "PublicDescription": "Counts memory transactions sent to the uncor= e including requests initiated by the core, all L3 prefetches, reads result= ing from page walks, and snoop responses.", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Demand Data Read requests sent to uncore", + "EventCode": "0xb0", + "EventName": "OFFCORE_REQUESTS.DEMAND_DATA_RD", + "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Demand RFO requests including regular RFOs, l= ocks, ItoM", + "EventCode": "0xb0", + "EventName": "OFFCORE_REQUESTS.DEMAND_RFO", + "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "For every cycle, increments by the number of = outstanding data read requests pending.", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD", + "PublicDescription": "For every cycle, increments by the number of= outstanding data read requests pending. Data read requests include cachea= ble demand reads and L2 prefetches, but do not include RFOs, code reads or = prefetches to the L3. Reads due to page walks resulting from any request t= ype will also be counted. Requests are considered outstanding from the tim= e they miss the core's L2 cache until the transaction completion message is= sent to the requestor.", + "SampleAfterValue": "1000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles where at least 1 outstanding data read= request is pending.", + "CounterMask": "1", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", + "PublicDescription": "Cycles where at least 1 outstanding data rea= d request is pending. Data read requests include cacheable demand reads an= d L2 prefetches, but do not include RFOs, code reads or prefetches to the L= 3. Reads due to page walks resulting from any request type will also be co= unted. Requests are considered outstanding from the time they miss the cor= e's L2 cache until the transaction completion message is sent to the reques= tor.", + "SampleAfterValue": "1000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles where at least 1 outstanding Demand RF= O request is pending.", + "CounterMask": "1", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO"= , + "PublicDescription": "Cycles where at least 1 outstanding Demand R= FO request is pending. RFOs are initiated by a core as part of a data sto= re operation. Demand RFO requests include RFOs, locks, and ItoM transactio= ns. Requests are considered outstanding from the time they miss the core's= L2 cache until the transaction completion message is sent to the requestor= .", + "SampleAfterValue": "1000003", + "UMask": "0x4" + }, + { + "BriefDescription": "For every cycle, increments by the number of = outstanding demand data read requests pending.", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD", + "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Store Read transactions pending for off-core.= Highly correlated.", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO", + "PublicDescription": "Counts the number of off-core outstanding re= ad-for-ownership (RFO) store transactions every cycle. An RFO transaction i= s considered to be in the Off-core outstanding state between L2 cache miss = and transaction completion.", + "SampleAfterValue": "1000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Counts bus locks, accounts for cache line spl= it locks and UC locks.", + "EventCode": "0xF4", + "EventName": "SQ_MISC.BUS_LOCK", + "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Cycles the queue waiting for offcore response= s is full.", + "EventCode": "0xf4", + "EventName": "SQ_MISC.SQ_FULL", + "PublicDescription": "Counts the cycles for which the thread is ac= tive and the queue waiting for responses from the uncore cannot take any mo= re entries.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of PREFETCHNTA instructions executed."= , + "EventCode": "0x32", + "EventName": "SW_PREFETCH_ACCESS.NTA", + "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of PREFETCHW instructions executed.", + "EventCode": "0x32", + "EventName": "SW_PREFETCH_ACCESS.PREFETCHW", + "PublicDescription": "Counts the number of PREFETCHW instructions = executed.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Number of PREFETCHT0 instructions executed.", + "EventCode": "0x32", + "EventName": "SW_PREFETCH_ACCESS.T0", + "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of PREFETCHT1 or PREFETCHT2 instructio= ns executed.", + "EventCode": "0x32", + "EventName": "SW_PREFETCH_ACCESS.T1_T2", + "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed.", + "SampleAfterValue": "100003", + "UMask": "0x4" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/floating-point.json = b/tools/perf/pmu-events/arch/x86/rocketlake/floating-point.json new file mode 100644 index 000000000000..85c26c889088 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/floating-point.json @@ -0,0 +1,105 @@ +[ + { + "BriefDescription": "Counts all microcode FP assists.", + "EventCode": "0xc1", + "EventName": "ASSISTS.FP", + "PublicDescription": "Counts all microcode Floating Point assists.= ", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 128-bi= t packed double precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 2 computation= operations, one for each element. Applies to SSE* and AVX* packed double = precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN= MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice = as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of SSE/AVX computational 128-bit packe= d single precision floating-point instructions retired; some instructions w= ill count twice as noted below. Each count represents 4 computation operat= ions, one for each element. Applies to SSE* and AVX* packed single precisi= on floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT = DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they pe= rform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 256-bi= t packed double precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 4 computation= operations, one for each element. Applies to SSE* and AVX* packed double = precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN= MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perf= orm 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 256-bi= t packed single precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 8 computation= operations, one for each element. Applies to SSE* and AVX* packed single = precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN= MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of SSE/AVX computational 128-bit packe= d single and 256-bit packed double precision FP instructions retired; some = instructions will count twice as noted below. Each count represents 2 or/a= nd 4 computation operations, 1 for each element. Applies to SSE* and AVX* = packed single precision and packed double precision FP instructions: ADD SU= B HADD HSUB SUBADD MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DP= P and FM(N)ADD/SUB count twice as they perform 2 calculations per element."= , + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.4_FLOPS", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events."= , + "SampleAfterValue": "100003", + "UMask": "0x18" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 512-bi= t packed double precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 8 computation= operations, one for each element. Applies to SSE* and AVX* packed double = precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14= RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform = 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14= FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calc= ulations per element. The DAZ and FTZ flags in the MXCSR register need to b= e set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x40" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 512-bi= t packed single precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 16 computatio= n operations, one for each element. Applies to SSE* and AVX* packed single= precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT1= 4 RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform= 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 16 computation oper= ations, one for each element. Applies to SSE* and AVX* packed single preci= sion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP1= 4 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 cal= culations per element. The DAZ and FTZ flags in the MXCSR register need to = be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Number of SSE/AVX computational 256-bit packe= d single precision and 512-bit packed double precision FP instructions ret= ired; some instructions will count twice as noted below. Each count repres= ents 8 computation operations, 1 for each element. Applies to SSE* and AVX= * packed single precision and double precision FP instructions: ADD SUB HAD= D HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14 RCP RCP14 DPP FM(N)ADD/SUB= . DPP and FM(N)ADD/SUB count twice as they perform 2 calculations per elem= ent.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.8_FLOPS", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision and 512-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 8 computation operations, one for each element. Applies = to SSE* and AVX* packed single precision and double precision floating-poin= t instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14= RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice= as they perform 2 calculations per element. The DAZ and FTZ flags in the M= XCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x60" + }, + { + "BriefDescription": "Number of SSE/AVX computational scalar floati= ng-point instructions retired; some instructions will count twice as noted = below. Applies to SSE* and AVX* scalar, double and single precision floati= ng-point: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP= and FM(N)ADD/SUB instructions count twice as they perform multiple calcula= tions per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.SCALAR", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events.", + "SampleAfterValue": "1000003", + "UMask": "0x3" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational scalar= double precision floating-point instructions retired; some instructions wi= ll count twice as noted below. Each count represents 1 computational opera= tion. Applies to SSE* and AVX* scalar double precision floating-point instr= uctions: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructi= ons count twice as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational scalar= single precision floating-point instructions retired; some instructions wi= ll count twice as noted below. Each count represents 1 computational opera= tion. Applies to SSE* and AVX* scalar single precision floating-point instr= uctions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB= instructions count twice as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of any Vector retired FP arithmetic in= structions", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.VECTOR", + "SampleAfterValue": "1000003", + "UMask": "0xfc" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/frontend.json b/tool= s/perf/pmu-events/arch/x86/rocketlake/frontend.json new file mode 100644 index 000000000000..2b539a08d2bf --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/frontend.json @@ -0,0 +1,377 @@ +[ + { + "BriefDescription": "Counts the total number when the front end is= resteered, mainly when the BPU cannot provide a correct prediction and thi= s is corrected by other branch handling mechanisms at the front end.", + "EventCode": "0xe6", + "EventName": "BACLEARS.ANY", + "PublicDescription": "Counts the number of times the front-end is = resteered when it finds a branch instruction in a fetch line. This occurs f= or the first time a branch instruction is fetched or when the branch is not= tracked by the BPU (Branch Prediction Unit) anymore.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to ILD_STALL.LCP]", + "EventCode": "0x87", + "EventName": "DECODE.LCP", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to ILD_STALL.LCP]", + "SampleAfterValue": "500009", + "UMask": "0x1" + }, + { + "BriefDescription": "Decode Stream Buffer (DSB)-to-MITE transition= s count.", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0xab", + "EventName": "DSB2MITE_SWITCHES.COUNT", + "PublicDescription": "Counts the number of Decode Stream Buffer (D= SB a.k.a. Uop Cache)-to-MITE speculative transitions.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "DSB-to-MITE switch true penalty cycles.", + "EventCode": "0xab", + "EventName": "DSB2MITE_SWITCHES.PENALTY_CYCLES", + "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Retired Instructions who experienced DSB miss= .", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.ANY_DSB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x1", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions that experienced= DSB (Decode stream buffer i.e. the decoded instruction-cache) miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired Instructions who experienced a critic= al DSB miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.DSB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x11", + "PEBS": "1", + "PublicDescription": "Number of retired Instructions that experien= ced a critical DSB (Decode stream buffer i.e. the decoded instruction-cache= ) miss. Critical means stalls were exposed to the back-end as a result of t= he DSB miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired Instructions who experienced iTLB tru= e miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.ITLB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x14", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions that experienced= iTLB (Instruction TLB) true miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired Instructions who experienced Instruct= ion L1 Cache true miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.L1I_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x12", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions who experienced = Instruction L1 Cache true miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired Instructions who experienced Instruct= ion L2 Cache true miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.L2_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x13", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions who experienced = Instruction L2 Cache true miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions after front-end starvati= on of at least 1 cycle", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_1", + "MSRIndex": "0x3F7", + "MSRValue": "0x500106", + "PEBS": "1", + "PublicDescription": "Retired instructions that are fetched after = an interval where the front-end delivered no uops for a period of at least = 1 cycle which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 128 cycles= which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_128", + "MSRIndex": "0x3F7", + "MSRValue": "0x508006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 12= 8 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 16 cycles = which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_16", + "MSRIndex": "0x3F7", + "MSRValue": "0x501006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 16 cycles. During th= is period the front-end delivered no uops.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions after front-end starvati= on of at least 2 cycles", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_2", + "MSRIndex": "0x3F7", + "MSRValue": "0x500206", + "PEBS": "1", + "PublicDescription": "Retired instructions that are fetched after = an interval where the front-end delivered no uops for a period of at least = 2 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 256 cycles= which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_256", + "MSRIndex": "0x3F7", + "MSRValue": "0x510006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 25= 6 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end had at least 1 bubble-slot for a period of 2= cycles which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1", + "MSRIndex": "0x3F7", + "MSRValue": "0x100206", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after the front-end had at least 1 bubble-slot for a per= iod of 2 cycles. A bubble-slot is an empty issue-pipeline slot while there = was no RAT stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 32 cycles = which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_32", + "MSRIndex": "0x3F7", + "MSRValue": "0x502006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 32 cycles. During th= is period the front-end delivered no uops.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 4 cycles w= hich was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_4", + "MSRIndex": "0x3F7", + "MSRValue": "0x500406", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 4 = cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 512 cycles= which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_512", + "MSRIndex": "0x3F7", + "MSRValue": "0x520006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 51= 2 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 64 cycles = which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_64", + "MSRIndex": "0x3F7", + "MSRValue": "0x504006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 64= cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 8 cycles w= hich was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_8", + "MSRIndex": "0x3F7", + "MSRValue": "0x500806", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 8 cycles. During thi= s period the front-end delivered no uops.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired Instructions who experienced STLB (2n= d level TLB) true miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.STLB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x15", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions that experienced= STLB (2nd level TLB) true miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss. [This event is alias to ICACHE_DATA.STALLS]", + "EventCode": "0x80", + "EventName": "ICACHE_16B.IFDATA_STALL", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity. [This event is alias to ICACHE_DATA.STALLS]", + "SampleAfterValue": "500009", + "UMask": "0x4" + }, + { + "BriefDescription": "Instruction fetch tag lookups that hit in the= instruction cache (L1I). Counts at 64-byte cache-line granularity.", + "EventCode": "0x83", + "EventName": "ICACHE_64B.IFTAG_HIT", + "PublicDescription": "Counts instruction fetch tag lookups that hi= t in the instruction cache (L1I). Counts at 64-byte cache-line granularity.= Accounts for both cacheable and uncacheable accesses.", + "SampleAfterValue": "200003", + "UMask": "0x1" + }, + { + "BriefDescription": "Instruction fetch tag lookups that miss in th= e instruction cache (L1I). Counts at 64-byte cache-line granularity.", + "EventCode": "0x83", + "EventName": "ICACHE_64B.IFTAG_MISS", + "PublicDescription": "Counts instruction fetch tag lookups that mi= ss in the instruction cache (L1I). Counts at 64-byte cache-line granularity= . Accounts for both cacheable and uncacheable accesses.", + "SampleAfterValue": "200003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", + "EventCode": "0x83", + "EventName": "ICACHE_64B.IFTAG_STALL", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.ST= ALLS]", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss. [This event is alias to ICACHE_16B.IFDATA_STALL]"= , + "EventCode": "0x80", + "EventName": "ICACHE_DATA.STALLS", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity. [This event is alias to ICACHE_16B.IFDATA_STALL]"= , + "SampleAfterValue": "500009", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STAL= L]", + "EventCode": "0x83", + "EventName": "ICACHE_TAG.STALLS", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IF= TAG_STALL]", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_ANY", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath.", + "SampleAfterValue": "2000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles DSB is delivering optimal number of Uo= ps", + "CounterMask": "5", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_OK", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Uops delivered to Instruction Decode Queue (I= DQ) from the Decode Stream Buffer (DSB) path", + "EventCode": "0x79", + "EventName": "IDQ.DSB_UOPS", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.", + "SampleAfterValue": "2000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles MITE is delivering any Uop", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.MITE_CYCLES_ANY", + "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles MITE is delivering optimal number of U= ops", + "CounterMask": "5", + "EventCode": "0x79", + "EventName": "IDQ.MITE_CYCLES_OK", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Uops delivered to Instruction Decode Queue (I= DQ) from MITE path", + "EventCode": "0x79", + "EventName": "IDQ.MITE_UOPS", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles when uops are being delivered to IDQ w= hile MS is busy", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.MS_CYCLES_ANY", + "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE.", + "SampleAfterValue": "2000003", + "UMask": "0x30" + }, + { + "BriefDescription": "Number of switches from DSB or MITE to the MS= ", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0x79", + "EventName": "IDQ.MS_SWITCHES", + "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer.", + "SampleAfterValue": "100003", + "UMask": "0x30" + }, + { + "BriefDescription": "Uops delivered to IDQ while MS is busy", + "EventCode": "0x79", + "EventName": "IDQ.MS_UOPS", + "PublicDescription": "Counts the total number of uops delivered by= the Microcode Sequencer (MS). Any instruction over 4 uops will be delivere= d by the MS. Some instructions such as transcendentals may additionally gen= erate uops from the MS.", + "SampleAfterValue": "100003", + "UMask": "0x30" + }, + { + "BriefDescription": "Uops not delivered by IDQ when backend of the= machine is not stalled", + "EventCode": "0x9c", + "EventName": "IDQ_UOPS_NOT_DELIVERED.CORE", + "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles when no uops are not delivered by the = IDQ when backend of the machine is not stalled", + "CounterMask": "5", + "EventCode": "0x9c", + "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles when optimal number of uops was delive= red to the back-end when the back-end is not stalled", + "CounterMask": "1", + "EventCode": "0x9C", + "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK", + "Invert": "1", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/memory.json b/tools/= perf/pmu-events/arch/x86/rocketlake/memory.json new file mode 100644 index 000000000000..e8d2ec1c029b --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/memory.json @@ -0,0 +1,394 @@ +[ + { + "BriefDescription": "Cycles while L3 cache miss demand load is out= standing.", + "CounterMask": "2", + "EventCode": "0xA3", + "EventName": "CYCLE_ACTIVITY.CYCLES_L3_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Execution stalls while L3 cache miss demand l= oad is outstanding.", + "CounterMask": "6", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_L3_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x6" + }, + { + "BriefDescription": "Number of times an HLE execution aborted due = to any reasons (multiple categories may count as one).", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.ABORTED", + "PublicDescription": "Counts the number of times HLE abort was tri= ggered.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of times an HLE execution aborted due = to unfriendly events (such as interrupts).", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.ABORTED_EVENTS", + "PublicDescription": "Counts the number of times an HLE execution = aborted due to unfriendly events (such as interrupts).", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Number of times an HLE execution aborted due = to various memory events (e.g., read/write capacity and conflicts).", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.ABORTED_MEM", + "PublicDescription": "Counts the number of times an HLE execution = aborted due to various memory events (e.g., read/write capacity and conflic= ts).", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Number of times an HLE execution aborted due = to HLE-unfriendly instructions and certain unfriendly events (such as AD as= sists etc.).", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.ABORTED_UNFRIENDLY", + "PublicDescription": "Counts the number of times an HLE execution = aborted due to HLE-unfriendly instructions and certain unfriendly events (s= uch as AD assists etc.).", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of times an HLE execution successfully= committed", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.COMMIT", + "PublicDescription": "Counts the number of times HLE commit succee= ded.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of times an HLE execution started.", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.START", + "PublicDescription": "Counts the number of times we entered an HLE= region. Does not count nested transactions.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of machine clears due to memory orderi= ng conflicts.", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.MEMORY_ORDERING", + "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 128 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_128", + "MSRIndex": "0x3F6", + "MSRValue": "0x80", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 128 cycles. Reported= latency may be longer than just the memory latency.", + "SampleAfterValue": "1009", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 16 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_16", + "MSRIndex": "0x3F6", + "MSRValue": "0x10", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 16 cycles. Reported = latency may be longer than just the memory latency.", + "SampleAfterValue": "20011", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 256 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256", + "MSRIndex": "0x3F6", + "MSRValue": "0x100", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 256 cycles. Reported= latency may be longer than just the memory latency.", + "SampleAfterValue": "503", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 32 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_32", + "MSRIndex": "0x3F6", + "MSRValue": "0x20", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 32 cycles. Reported = latency may be longer than just the memory latency.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 4 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4", + "MSRIndex": "0x3F6", + "MSRValue": "0x4", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 4 cycles. Reported l= atency may be longer than just the memory latency.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 512 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512", + "MSRIndex": "0x3F6", + "MSRValue": "0x200", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 512 cycles. Reported= latency may be longer than just the memory latency.", + "SampleAfterValue": "101", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 64 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_64", + "MSRIndex": "0x3F6", + "MSRValue": "0x40", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 64 cycles. Reported = latency may be longer than just the memory latency.", + "SampleAfterValue": "2003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 8 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8", + "MSRIndex": "0x3F6", + "MSRValue": "0x8", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 8 cycles. Reported l= atency may be longer than just the memory latency.", + "SampleAfterValue": "50021", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that was not supplied by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that was not supplie= d by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that was no= t supplied by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that was not supplied by the L3 cache.= ", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that was not supplied by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that was not supplied by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that was not supplied by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC08000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts streaming stores that was not supplied= by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.STREAMING_WR.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00800", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data read requests that miss th= e L3 cache.", + "EventCode": "0xb0", + "EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Cycles where at least one demand data read re= quest known to have missed the L3 cache is pending.", + "CounterMask": "1", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_L3_MISS_DEM= AND_DATA_RD", + "PublicDescription": "Cycles where at least one demand data read r= equest known to have missed the L3 cache is pending. Note that this does n= ot capture all elapsed cycles while requests are outstanding - only cycles = from when the requests were known to have missed the L3 cache.", + "SampleAfterValue": "1000003", + "UMask": "0x10" + }, + { + "BriefDescription": "Number of times an RTM execution aborted.", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.ABORTED", + "PublicDescription": "Counts the number of times RTM abort was tri= ggered.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of times an RTM execution aborted due = to none of the previous 4 categories (e.g. interrupt)", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.ABORTED_EVENTS", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to none of the previous 4 categories (e.g. interrupt).", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Number of times an RTM execution aborted due = to various memory events (e.g. read/write capacity and conflicts)", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.ABORTED_MEM", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to various memory events (e.g. read/write capacity and conflict= s).", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Number of times an RTM execution aborted due = to incompatible memory type", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.ABORTED_MEMTYPE", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to incompatible memory type.", + "SampleAfterValue": "100003", + "UMask": "0x40" + }, + { + "BriefDescription": "Number of times an RTM execution aborted due = to HLE-unfriendly instructions", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.ABORTED_UNFRIENDLY", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to HLE-unfriendly instructions.", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of times an RTM execution successfully= committed", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.COMMIT", + "PublicDescription": "Counts the number of times RTM commit succee= ded.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of times an RTM execution started.", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.START", + "PublicDescription": "Counts the number of times we entered an RTM= region. Does not count nested transactions.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts the number of times a class of instruc= tions that may cause a transactional abort was executed inside a transactio= nal region", + "EventCode": "0x5d", + "EventName": "TX_EXEC.MISC2", + "PublicDescription": "Counts Unfriendly TSX abort triggered by a v= zeroupper instruction.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of times an instruction execution caus= ed the transactional nest count supported to be exceeded", + "EventCode": "0x5d", + "EventName": "TX_EXEC.MISC3", + "PublicDescription": "Counts Unfriendly TSX abort triggered by a n= est count that is too deep.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Speculatively counts the number of TSX aborts= due to a data capacity limitation for transactional reads", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_CAPACITY_READ", + "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional reads", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Speculatively counts the number of TSX aborts= due to a data capacity limitation for transactional writes.", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_CAPACITY_WRITE", + "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional writes.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of times a transactional abort was sig= naled due to a data conflict on a transactionally accessed address", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_CONFLICT", + "PublicDescription": "Counts the number of times a TSX line had a = cache conflict.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of times an HLE transactional executio= n aborted due to XRELEASE lock not satisfying the address and value require= ments in the elision buffer", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_MISMATCH", + "PublicDescription": "Counts the number of times a TSX Abort was t= riggered due to release/commit but data and address mismatch.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Number of times an HLE transactional executio= n aborted due to NoAllocatedElisionBuffer being non-zero.", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_NOT_EMPTY", + "PublicDescription": "Counts the number of times a TSX Abort was t= riggered due to commit but Lock Buffer not empty.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Number of times an HLE transactional executio= n aborted due to an unsupported read alignment from the elision buffer.", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_UNSUPPORTED_ALIGNMEN= T", + "PublicDescription": "Counts the number of times a TSX Abort was t= riggered due to attempting an unsupported alignment from Lock Buffer.", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of times a HLE transactional region ab= orted due to a non XRELEASE prefixed instruction writing to an elided lock = in the elision buffer", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_HLE_STORE_TO_ELIDED_LOCK", + "PublicDescription": "Counts the number of times a TSX Abort was t= riggered due to a non-release/commit store to lock.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of times HLE lock could not be elided = due to ElisionBufferAvailable being zero.", + "EventCode": "0x54", + "EventName": "TX_MEM.HLE_ELISION_BUFFER_FULL", + "PublicDescription": "Counts the number of times we could not allo= cate Lock Buffer.", + "SampleAfterValue": "100003", + "UMask": "0x40" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/metricgroups.json b/= tools/perf/pmu-events/arch/x86/rocketlake/metricgroups.json new file mode 100644 index 000000000000..a151ba9cccb0 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/metricgroups.json @@ -0,0 +1,113 @@ +{ + "Backend": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "Bad": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "BadSpec": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "BigFoot": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "BrMispredicts": "Grouping from Top-down Microarchitecture Analysis Me= trics spreadsheet", + "Branches": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "CacheMisses": "Grouping from Top-down Microarchitecture Analysis Metr= ics spreadsheet", + "CodeGen": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "Compute": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "Cor": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "DSB": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "DSBmiss": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "DataSharing": "Grouping from Top-down Microarchitecture Analysis Metr= ics spreadsheet", + "Fed": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "FetchBW": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "FetchLat": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "Flops": "Grouping from Top-down Microarchitecture Analysis Metrics sp= readsheet", + "FpScalar": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "FpVector": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "Frontend": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "HPC": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "IcMiss": "Grouping from Top-down Microarchitecture Analysis Metrics s= preadsheet", + "InsType": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "L2Evicts": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "LSD": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "MachineClears": "Grouping from Top-down Microarchitecture Analysis Me= trics spreadsheet", + "Mem": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "MemoryBW": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "MemoryBound": "Grouping from Top-down Microarchitecture Analysis Metr= ics spreadsheet", + "MemoryLat": "Grouping from Top-down Microarchitecture Analysis Metric= s spreadsheet", + "MemoryTLB": "Grouping from Top-down Microarchitecture Analysis Metric= s spreadsheet", + "Memory_BW": "Grouping from Top-down Microarchitecture Analysis Metric= s spreadsheet", + "Memory_Lat": "Grouping from Top-down Microarchitecture Analysis Metri= cs spreadsheet", + "MicroSeq": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "OS": "Grouping from Top-down Microarchitecture Analysis Metrics sprea= dsheet", + "Offcore": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "PGO": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "Pipeline": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "PortsUtil": "Grouping from Top-down Microarchitecture Analysis Metric= s spreadsheet", + "Power": "Grouping from Top-down Microarchitecture Analysis Metrics sp= readsheet", + "Prefetches": "Grouping from Top-down Microarchitecture Analysis Metri= cs spreadsheet", + "Ret": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "Retire": "Grouping from Top-down Microarchitecture Analysis Metrics s= preadsheet", + "SMT": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "Server": "Grouping from Top-down Microarchitecture Analysis Metrics s= preadsheet", + "Snoop": "Grouping from Top-down Microarchitecture Analysis Metrics sp= readsheet", + "SoC": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "Summary": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "TmaL1": "Grouping from Top-down Microarchitecture Analysis Metrics sp= readsheet", + "TmaL2": "Grouping from Top-down Microarchitecture Analysis Metrics sp= readsheet", + "TmaL3mem": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "TopdownL1": "Metrics for top-down breakdown at level 1", + "TopdownL2": "Metrics for top-down breakdown at level 2", + "TopdownL3": "Metrics for top-down breakdown at level 3", + "TopdownL4": "Metrics for top-down breakdown at level 4", + "TopdownL5": "Metrics for top-down breakdown at level 5", + "TopdownL6": "Metrics for top-down breakdown at level 6", + "tma_L1_group": "Metrics for top-down breakdown at level 1", + "tma_L2_group": "Metrics for top-down breakdown at level 2", + "tma_L3_group": "Metrics for top-down breakdown at level 3", + "tma_L4_group": "Metrics for top-down breakdown at level 4", + "tma_L5_group": "Metrics for top-down breakdown at level 5", + "tma_L6_group": "Metrics for top-down breakdown at level 6", + "tma_alu_op_utilization_group": "Metrics contributing to tma_alu_op_ut= ilization category", + "tma_backend_bound_group": "Metrics contributing to tma_backend_bound = category", + "tma_bad_speculation_group": "Metrics contributing to tma_bad_speculat= ion category", + "tma_branch_resteers_group": "Metrics contributing to tma_branch_reste= ers category", + "tma_core_bound_group": "Metrics contributing to tma_core_bound catego= ry", + "tma_dram_bound_group": "Metrics contributing to tma_dram_bound catego= ry", + "tma_dtlb_load_group": "Metrics contributing to tma_dtlb_load category= ", + "tma_dtlb_store_group": "Metrics contributing to tma_dtlb_store catego= ry", + "tma_fetch_bandwidth_group": "Metrics contributing to tma_fetch_bandwi= dth category", + "tma_fetch_latency_group": "Metrics contributing to tma_fetch_latency = category", + "tma_fp_arith_group": "Metrics contributing to tma_fp_arith category", + "tma_fp_vector_group": "Metrics contributing to tma_fp_vector category= ", + "tma_frontend_bound_group": "Metrics contributing to tma_frontend_boun= d category", + "tma_heavy_operations_group": "Metrics contributing to tma_heavy_opera= tions category", + "tma_issue2P": "Metrics related by the issue $issue2P", + "tma_issueBC": "Metrics related by the issue $issueBC", + "tma_issueBM": "Metrics related by the issue $issueBM", + "tma_issueBW": "Metrics related by the issue $issueBW", + "tma_issueD0": "Metrics related by the issue $issueD0", + "tma_issueFB": "Metrics related by the issue $issueFB", + "tma_issueFL": "Metrics related by the issue $issueFL", + "tma_issueL1": "Metrics related by the issue $issueL1", + "tma_issueLat": "Metrics related by the issue $issueLat", + "tma_issueMC": "Metrics related by the issue $issueMC", + "tma_issueMS": "Metrics related by the issue $issueMS", + "tma_issueMV": "Metrics related by the issue $issueMV", + "tma_issueRFO": "Metrics related by the issue $issueRFO", + "tma_issueSL": "Metrics related by the issue $issueSL", + "tma_issueSO": "Metrics related by the issue $issueSO", + "tma_issueSmSt": "Metrics related by the issue $issueSmSt", + "tma_issueSpSt": "Metrics related by the issue $issueSpSt", + "tma_issueSyncxn": "Metrics related by the issue $issueSyncxn", + "tma_issueTLB": "Metrics related by the issue $issueTLB", + "tma_l1_bound_group": "Metrics contributing to tma_l1_bound category", + "tma_l3_bound_group": "Metrics contributing to tma_l3_bound category", + "tma_light_operations_group": "Metrics contributing to tma_light_opera= tions category", + "tma_load_op_utilization_group": "Metrics contributing to tma_load_op_= utilization category", + "tma_mem_latency_group": "Metrics contributing to tma_mem_latency cate= gory", + "tma_memory_bound_group": "Metrics contributing to tma_memory_bound ca= tegory", + "tma_microcode_sequencer_group": "Metrics contributing to tma_microcod= e_sequencer category", + "tma_mite_group": "Metrics contributing to tma_mite category", + "tma_ports_utilization_group": "Metrics contributing to tma_ports_util= ization category", + "tma_ports_utilized_0_group": "Metrics contributing to tma_ports_utili= zed_0 category", + "tma_ports_utilized_3m_group": "Metrics contributing to tma_ports_util= ized_3m category", + "tma_retiring_group": "Metrics contributing to tma_retiring category", + "tma_serializing_operation_group": "Metrics contributing to tma_serial= izing_operation category", + "tma_store_bound_group": "Metrics contributing to tma_store_bound cate= gory", + "tma_store_op_utilization_group": "Metrics contributing to tma_store_o= p_utilization category" +} diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/other.json b/tools/p= erf/pmu-events/arch/x86/rocketlake/other.json new file mode 100644 index 000000000000..cfb590632918 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/other.json @@ -0,0 +1,242 @@ +[ + { + "BriefDescription": "Core cycles where the core was running in a m= anner where Turbo may be clipped to the Non-AVX turbo schedule.", + "EventCode": "0x28", + "EventName": "CORE_POWER.LVL0_TURBO_LICENSE", + "PublicDescription": "Counts Core cycles where the core was runnin= g with power-delivery for baseline license level 0. This includes non-AVX = codes, SSE, AVX 128-bit, and low-current AVX 256-bit codes.", + "SampleAfterValue": "200003", + "UMask": "0x7" + }, + { + "BriefDescription": "Core cycles where the core was running in a m= anner where Turbo may be clipped to the AVX2 turbo schedule.", + "EventCode": "0x28", + "EventName": "CORE_POWER.LVL1_TURBO_LICENSE", + "PublicDescription": "Counts Core cycles where the core was runnin= g with power-delivery for license level 1. This includes high current AVX = 256-bit instructions as well as low current AVX 512-bit instructions.", + "SampleAfterValue": "200003", + "UMask": "0x18" + }, + { + "BriefDescription": "Core cycles where the core was running in a m= anner where Turbo may be clipped to the AVX512 turbo schedule.", + "EventCode": "0x28", + "EventName": "CORE_POWER.LVL2_TURBO_LICENSE", + "PublicDescription": "Core cycles where the core was running with = power-delivery for license level 2 (introduced in Skylake Server microarcht= ecture). This includes high current AVX 512-bit instructions.", + "SampleAfterValue": "200003", + "UMask": "0x20" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that have any type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that have any type o= f response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that DRAM supplied t= he request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that DRAM supplied t= he request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that have a= ny type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that DRAM s= upplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that DRAM s= upplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that have any type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that have any type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that have any type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that have any type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x18000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184008000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184008000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts streaming stores that have any type of= response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.STREAMING_WR.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10800", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts streaming stores that DRAM supplied th= e request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.STREAMING_WR.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000800", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts streaming stores that DRAM supplied th= e request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.STREAMING_WR.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000800", + "SampleAfterValue": "100003", + "UMask": "0x1" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json b/tool= s/perf/pmu-events/arch/x86/rocketlake/pipeline.json new file mode 100644 index 000000000000..375b78044f14 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json @@ -0,0 +1,801 @@ +[ + { + "BriefDescription": "Cycles when divide unit is busy executing div= ide or square root operations.", + "CounterMask": "1", + "EventCode": "0x14", + "EventName": "ARITH.DIVIDER_ACTIVE", + "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations.", + "SampleAfterValue": "1000003", + "UMask": "0x9" + }, + { + "BriefDescription": "Number of occurrences where a microcode assis= t is invoked by hardware.", + "EventCode": "0xc1", + "EventName": "ASSISTS.ANY", + "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware Examples include AD (page Access Dirty= ), FP and AVX related assists.", + "SampleAfterValue": "100003", + "UMask": "0x7" + }, + { + "BriefDescription": "All branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.ALL_BRANCHES", + "PEBS": "1", + "PublicDescription": "Counts all branch instructions retired.", + "SampleAfterValue": "400009" + }, + { + "BriefDescription": "Conditional branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.COND", + "PEBS": "1", + "PublicDescription": "Counts conditional branch instructions retir= ed.", + "SampleAfterValue": "400009", + "UMask": "0x11" + }, + { + "BriefDescription": "Not taken branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.COND_NTAKEN", + "PEBS": "1", + "PublicDescription": "Counts not taken branch instructions retired= .", + "SampleAfterValue": "400009", + "UMask": "0x10" + }, + { + "BriefDescription": "Taken conditional branch instructions retired= .", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.COND_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts taken conditional branch instructions= retired.", + "SampleAfterValue": "400009", + "UMask": "0x1" + }, + { + "BriefDescription": "Far branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.FAR_BRANCH", + "PEBS": "1", + "PublicDescription": "Counts far branch instructions retired.", + "SampleAfterValue": "100007", + "UMask": "0x40" + }, + { + "BriefDescription": "Indirect near branch instructions retired (ex= cluding returns)", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.INDIRECT", + "PEBS": "1", + "PublicDescription": "Counts near indirect branch instructions ret= ired excluding returns. TSX abort is an indirect branch.", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Direct and indirect near call instructions re= tired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.NEAR_CALL", + "PEBS": "1", + "PublicDescription": "Counts both direct and indirect near call in= structions retired.", + "SampleAfterValue": "100007", + "UMask": "0x2" + }, + { + "BriefDescription": "Return instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.NEAR_RETURN", + "PEBS": "1", + "PublicDescription": "Counts return instructions retired.", + "SampleAfterValue": "100007", + "UMask": "0x8" + }, + { + "BriefDescription": "Taken branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.NEAR_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts taken branch instructions retired.", + "SampleAfterValue": "400009", + "UMask": "0x20" + }, + { + "BriefDescription": "All mispredicted branch instructions retired.= ", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.ALL_BRANCHES", + "PEBS": "1", + "PublicDescription": "Counts all the retired branch instructions t= hat were mispredicted by the processor. A branch misprediction occurs when = the processor incorrectly predicts the destination of the branch. When the= misprediction is discovered at execution, all the instructions executed in= the wrong (speculative) path must be discarded, and the processor must sta= rt fetching from the correct path.", + "SampleAfterValue": "50021" + }, + { + "BriefDescription": "Mispredicted conditional branch instructions = retired.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND", + "PEBS": "1", + "PublicDescription": "Counts mispredicted conditional branch instr= uctions retired.", + "SampleAfterValue": "50021", + "UMask": "0x11" + }, + { + "BriefDescription": "Mispredicted non-taken conditional branch ins= tructions retired.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND_NTAKEN", + "PEBS": "1", + "PublicDescription": "Counts the number of conditional branch inst= ructions retired that were mispredicted and the branch direction was not ta= ken.", + "SampleAfterValue": "50021", + "UMask": "0x10" + }, + { + "BriefDescription": "number of branch instructions retired that we= re mispredicted and taken.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts taken conditional mispredicted branch= instructions retired.", + "SampleAfterValue": "50021", + "UMask": "0x1" + }, + { + "BriefDescription": "All miss-predicted indirect branch instructio= ns retired (excluding RETs. TSX aborts is considered indirect branch).", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.INDIRECT", + "PEBS": "1", + "PublicDescription": "Counts all miss-predicted indirect branch in= structions retired (excluding RETs. TSX aborts is considered indirect branc= h).", + "SampleAfterValue": "50021", + "UMask": "0x80" + }, + { + "BriefDescription": "Mispredicted indirect CALL instructions retir= ed.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.INDIRECT_CALL", + "PEBS": "1", + "PublicDescription": "Counts retired mispredicted indirect (near t= aken) CALL instructions, including both register and memory indirect.", + "SampleAfterValue": "50021", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of near branch instructions retired th= at were mispredicted and taken.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.NEAR_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts number of near branch instructions re= tired that were mispredicted and taken.", + "SampleAfterValue": "50021", + "UMask": "0x20" + }, + { + "BriefDescription": "This event counts the number of mispredicted = ret instructions retired. Non PEBS", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.RET", + "PEBS": "1", + "PublicDescription": "This is a non-precise version (that is, does= not use PEBS) of the event that counts mispredicted return instructions re= tired.", + "SampleAfterValue": "50021", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycle counts are evenly distributed between a= ctive threads in the Core.", + "EventCode": "0xec", + "EventName": "CPU_CLK_UNHALTED.DISTRIBUTED", + "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Core crystal clock cycles when this thread is= unhalted and the other thread is halted.", + "EventCode": "0x3C", + "EventName": "CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE", + "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted.", + "SampleAfterValue": "25003", + "UMask": "0x2" + }, + { + "BriefDescription": "Core crystal clock cycles. Cycle counts are e= venly distributed between active threads in the Core.", + "EventCode": "0x3c", + "EventName": "CPU_CLK_UNHALTED.REF_DISTRIBUTED", + "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread.", + "SampleAfterValue": "2000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventName": "CPU_CLK_UNHALTED.REF_TSC", + "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. This e= vent has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is c= ounted on a dedicated fixed counter, leaving the eight programmable counter= s available for other events. Note: On all current platforms this event sto= ps counting during 'throttling (TM)' states duty off periods the processor = is 'halted'. The counter update is done at a lower clock rate then the cor= e clock the overflow status bit for this counter may appear 'sticky'. Afte= r the counter has overflowed and software clears the overflow status bit an= d resets the counter to less than MAX. The reset value to the counter is no= t clocked immediately so the overflow status bit will flip 'high (1)' and g= enerate another PMI (if enabled) after which the reset value gets clocked i= nto the counter. Therefore, software will get the interrupt, read the overf= low status bit '1 for bit 34 while the counter value is less than MAX. Soft= ware should ignore this case.", + "SampleAfterValue": "2000003", + "UMask": "0x3" + }, + { + "BriefDescription": "Core crystal clock cycles when the thread is = unhalted.", + "EventCode": "0x3C", + "EventName": "CPU_CLK_UNHALTED.REF_XCLK", + "PublicDescription": "Counts core crystal clock cycles when the th= read is unhalted.", + "SampleAfterValue": "25003", + "UMask": "0x1" + }, + { + "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventName": "CPU_CLK_UNHALTED.THREAD", + "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Thread cycles when thread is not in halt stat= e", + "EventCode": "0x3C", + "EventName": "CPU_CLK_UNHALTED.THREAD_P", + "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time.", + "SampleAfterValue": "2000003" + }, + { + "BriefDescription": "Cycles while L1 cache miss demand load is out= standing.", + "CounterMask": "8", + "EventCode": "0xA3", + "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles while L2 cache miss demand load is out= standing.", + "CounterMask": "1", + "EventCode": "0xA3", + "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles while memory subsystem has an outstand= ing load.", + "CounterMask": "16", + "EventCode": "0xA3", + "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY", + "SampleAfterValue": "1000003", + "UMask": "0x10" + }, + { + "BriefDescription": "Execution stalls while L1 cache miss demand l= oad is outstanding.", + "CounterMask": "12", + "EventCode": "0xA3", + "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS", + "SampleAfterValue": "1000003", + "UMask": "0xc" + }, + { + "BriefDescription": "Execution stalls while L2 cache miss demand l= oad is outstanding.", + "CounterMask": "5", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x5" + }, + { + "BriefDescription": "Execution stalls while memory subsystem has a= n outstanding load.", + "CounterMask": "20", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_MEM_ANY", + "SampleAfterValue": "1000003", + "UMask": "0x14" + }, + { + "BriefDescription": "Total execution stalls.", + "CounterMask": "4", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL", + "SampleAfterValue": "1000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles total of 1 uop is executed on all port= s and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.1_PORTS_UTIL", + "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles total of 2 uops are executed on all po= rts and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.2_PORTS_UTIL", + "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles total of 3 uops are executed on all po= rts and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.3_PORTS_UTIL", + "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles total of 4 uops are executed on all po= rts and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.4_PORTS_UTIL", + "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x10" + }, + { + "BriefDescription": "Cycles where the Store Buffer was full and no= loads caused an execution stall.", + "CounterMask": "2", + "EventCode": "0xA6", + "EventName": "EXE_ACTIVITY.BOUND_ON_STORES", + "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall.", + "SampleAfterValue": "1000003", + "UMask": "0x40" + }, + { + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to DECODE.LCP]", + "EventCode": "0x87", + "EventName": "ILD_STALL.LCP", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to DECODE.LCP]", + "SampleAfterValue": "500009", + "UMask": "0x1" + }, + { + "BriefDescription": "Instruction decoders utilized in a cycle", + "EventCode": "0x55", + "EventName": "INST_DECODED.DECODERS", + "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of instructions retired. Fixed Counter= - architectural event", + "EventName": "INST_RETIRED.ANY", + "PEBS": "1", + "PublicDescription": "Counts the number of instructions retired - = an Architectural PerfMon event. Counting continues during hardware interrup= ts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counte= d by a designated fixed counter freeing up programmable counters to count o= ther events. INST_RETIRED.ANY_P is counted by a programmable counter.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of instructions retired. General Count= er - architectural event", + "EventCode": "0xc0", + "EventName": "INST_RETIRED.ANY_P", + "PEBS": "1", + "PublicDescription": "Counts the number of instructions retired - = an Architectural PerfMon event. Counting continues during hardware interrup= ts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counte= d by a designated fixed counter freeing up programmable counters to count o= ther events. INST_RETIRED.ANY_P is counted by a programmable counter.", + "SampleAfterValue": "2000003" + }, + { + "BriefDescription": "Number of all retired NOP instructions.", + "EventCode": "0xc0", + "EventName": "INST_RETIRED.NOP", + "PEBS": "1", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Precise instruction retired event with a redu= ced effect of PEBS shadow in IP distribution", + "EventName": "INST_RETIRED.PREC_DIST", + "PEBS": "1", + "PublicDescription": "A version of INST_RETIRED that allows for a = more unbiased distribution of samples across instructions retired. It utili= zes the Precise Distribution of Instructions Retired (PDIR) feature to miti= gate some bias in how retired instructions get sampled. Use on Fixed Counte= r 0.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles without actually retired instructions.= ", + "CounterMask": "1", + "EventCode": "0xc0", + "EventName": "INST_RETIRED.STALL_CYCLES", + "Invert": "1", + "PublicDescription": "This event counts cycles without actually re= tired instructions.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles the Backend cluster is recovering afte= r a miss-speculation or a Store Buffer or Load Buffer drain stall.", + "CounterMask": "1", + "EventCode": "0x0D", + "EventName": "INT_MISC.ALL_RECOVERY_CYCLES", + "PublicDescription": "Counts cycles the Backend cluster is recover= ing after a miss-speculation or a Store Buffer or Load Buffer drain stall."= , + "SampleAfterValue": "2000003", + "UMask": "0x3" + }, + { + "BriefDescription": "Clears speculative count", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0x0D", + "EventName": "INT_MISC.CLEARS_COUNT", + "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears", + "SampleAfterValue": "500009", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts cycles after recovery from a branch mi= sprediction or machine clear till the first uop is issued from the resteere= d path.", + "EventCode": "0x0d", + "EventName": "INT_MISC.CLEAR_RESTEER_CYCLES", + "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= .", + "SampleAfterValue": "500009", + "UMask": "0x80" + }, + { + "BriefDescription": "Core cycles the allocator was stalled due to = recovery from earlier clear event for this thread", + "EventCode": "0x0D", + "EventName": "INT_MISC.RECOVERY_CYCLES", + "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event.", + "SampleAfterValue": "500009", + "UMask": "0x1" + }, + { + "BriefDescription": "TMA slots where uops got dropped", + "EventCode": "0x0d", + "EventName": "INT_MISC.UOP_DROPPING", + "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons", + "SampleAfterValue": "1000003", + "UMask": "0x10" + }, + { + "BriefDescription": "The number of times that split load operation= s are temporarily blocked because all resources for handling the split acce= sses are in use.", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.NO_SR", + "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Loads blocked due to overlapping with a prece= ding store that cannot be forwarded.", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.STORE_FORWARD", + "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "False dependencies due to partial compare on = address.", + "EventCode": "0x07", + "EventName": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS", + "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies due to partial compare on address.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts the number of demand load dispatches t= hat hit L1D fill buffer (FB) allocated for software prefetch.", + "EventCode": "0x4c", + "EventName": "LOAD_HIT_PREFETCH.SWPF", + "PublicDescription": "Counts all not software-prefetch load dispat= ches that hit the fill buffer (FB) allocated for the software prefetch. It = can also be incremented by some lock instructions. So it should only be use= d with profiling so that the locks can be excluded by ASM (Assembly File) i= nspection of the nearby instructions.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles Uops delivered by the LSD, but didn't = come from the decoder.", + "CounterMask": "1", + "EventCode": "0xA8", + "EventName": "LSD.CYCLES_ACTIVE", + "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector).", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles optimal number of Uops delivered by th= e LSD, but did not come from the decoder.", + "CounterMask": "5", + "EventCode": "0xa8", + "EventName": "LSD.CYCLES_OK", + "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector).", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of Uops delivered by the LSD.", + "EventCode": "0xa8", + "EventName": "LSD.UOPS", + "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector).", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of machine clears (nukes) of any type.= ", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.COUNT", + "PublicDescription": "Counts the number of machine clears (nukes) = of any type.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Self-modifying code (SMC) detected.", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.SMC", + "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Increments whenever there is an update to the= LBR array.", + "EventCode": "0xcc", + "EventName": "MISC_RETIRED.LBR_INSERTS", + "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR to be enabled properly.", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of retired PAUSE instructions. This ev= ent is not supported on first SKL and KBL products.", + "EventCode": "0xcc", + "EventName": "MISC_RETIRED.PAUSE_INST", + "PublicDescription": "Counts number of retired PAUSE instructions.= This event is not supported on first SKL and KBL products.", + "SampleAfterValue": "100003", + "UMask": "0x40" + }, + { + "BriefDescription": "Cycles stalled due to no store buffers availa= ble. (not including draining form sync).", + "EventCode": "0xa2", + "EventName": "RESOURCE_STALLS.SB", + "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Counts cycles where the pipeline is stalled d= ue to serializing operations.", + "EventCode": "0xa2", + "EventName": "RESOURCE_STALLS.SCOREBOARD", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles when Reservation Station (RS) is empty= for the thread", + "EventCode": "0x5e", + "EventName": "RS_EVENTS.EMPTY_CYCLES", + "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into stravation periods (e.g. branch mispredi= ctions or i-cache misses)", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts end of periods where the Reservation S= tation (RS) was empty.", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0x5E", + "EventName": "RS_EVENTS.EMPTY_END", + "Invert": "1", + "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events)", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "TMA slots where no uops were being issued due= to lack of back-end resources.", + "EventCode": "0xa4", + "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS", + "PublicDescription": "Counts the number of Top-down Microarchitect= ure Analysis (TMA) method's slots where no micro-operations were being iss= ued from front-end to back-end of the machine due to lack of back-end resou= rces.", + "SampleAfterValue": "10000003", + "UMask": "0x2" + }, + { + "BriefDescription": "TMA slots wasted due to incorrect speculation= by branch mispredictions", + "EventCode": "0xa4", + "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS", + "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by branch mispredictions. This event estimates number= of operations that were issued but not retired from the speculative path a= s well as the out-of-order engine recovery past a branch misprediction.", + "SampleAfterValue": "10000003", + "UMask": "0x8" + }, + { + "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", + "EventName": "TOPDOWN.SLOTS", + "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", + "SampleAfterValue": "10000003", + "UMask": "0x4" + }, + { + "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. General counter - architectural event", + "EventCode": "0xa4", + "EventName": "TOPDOWN.SLOTS_P", + "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core.", + "SampleAfterValue": "10000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of uops decoded out of instructions ex= clusively fetched by decoder 0", + "EventCode": "0x56", + "EventName": "UOPS_DECODED.DEC0", + "PublicDescription": "Uops exclusively fetched by decoder 0", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of uops executed on port 0", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_0", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o port 0.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of uops executed on port 1", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_1", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o port 1.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of uops executed on port 2 and 3", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_2_3", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o ports 2 and 3.", + "SampleAfterValue": "2000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of uops executed on port 4 and 9", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_4_9", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o ports 5 and 9.", + "SampleAfterValue": "2000003", + "UMask": "0x10" + }, + { + "BriefDescription": "Number of uops executed on port 5", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_5", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o port 5.", + "SampleAfterValue": "2000003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of uops executed on port 6", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_6", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o port 6.", + "SampleAfterValue": "2000003", + "UMask": "0x40" + }, + { + "BriefDescription": "Number of uops executed on port 7 and 8", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_7_8", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o ports 7 and 8.", + "SampleAfterValue": "2000003", + "UMask": "0x80" + }, + { + "BriefDescription": "Number of uops executed on the core.", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.CORE", + "PublicDescription": "Counts the number of uops executed from any = thread.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles at least 1 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "1", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_1", + "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles at least 2 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "2", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_2", + "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles at least 3 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "3", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_3", + "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles at least 4 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "4", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_4", + "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles where at least 1 uop was executed per-= thread", + "CounterMask": "1", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_1", + "PublicDescription": "Cycles where at least 1 uop was executed per= -thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles where at least 2 uops were executed pe= r-thread", + "CounterMask": "2", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_2", + "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles where at least 3 uops were executed pe= r-thread", + "CounterMask": "3", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_3", + "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles where at least 4 uops were executed pe= r-thread", + "CounterMask": "4", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_4", + "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts number of cycles no uops were dispatch= ed to be executed on this thread.", + "CounterMask": "1", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.STALL_CYCLES", + "Invert": "1", + "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts the number of uops to be executed per-= thread each cycle.", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.THREAD", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts the number of x87 uops dispatched.", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.X87", + "PublicDescription": "Counts the number of x87 uops executed.", + "SampleAfterValue": "2000003", + "UMask": "0x10" + }, + { + "BriefDescription": "Uops that RAT issues to RS", + "EventCode": "0x0e", + "EventName": "UOPS_ISSUED.ANY", + "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS).", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles when RAT does not issue Uops to RS for= the thread", + "CounterMask": "1", + "EventCode": "0x0E", + "EventName": "UOPS_ISSUED.STALL_CYCLES", + "Invert": "1", + "PublicDescription": "Counts cycles during which the Resource Allo= cation Table (RAT) does not issue any Uops to the reservation station (RS) = for the current thread.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Uops inserted at issue-stage in order to pres= erve upper bits of vector registers.", + "EventCode": "0x0e", + "EventName": "UOPS_ISSUED.VECTOR_WIDTH_MISMATCH", + "PublicDescription": "Counts the number of Blend Uops issued by th= e Resource Allocation Table (RAT) to the reservation station (RS) in order = to preserve upper bits of vector registers. Starting with the Skylake micro= architecture, these Blend uops are needed since every Intel SSE instruction= executed in Dirty Upper State needs to preserve bits 128-255 of the destin= ation register. For more information, refer to 'Mixing Intel AVX and Intel = SSE Code' section of the Optimization Guide.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Retirement slots used.", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.SLOTS", + "PublicDescription": "Counts the retirement slots used each cycle.= ", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles without actually retired uops.", + "CounterMask": "1", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.STALL_CYCLES", + "Invert": "1", + "PublicDescription": "This event counts cycles without actually re= tired uops.", + "SampleAfterValue": "1000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles with less than 10 actually retired uop= s.", + "CounterMask": "10", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.TOTAL_CYCLES", + "Invert": "1", + "PublicDescription": "Counts the number of cycles using always tru= e condition (uops_ret < 16) applied to non PEBS uops retired event.", + "SampleAfterValue": "1000003", + "UMask": "0x2" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json b/t= ools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json new file mode 100644 index 000000000000..1bb9cededa56 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json @@ -0,0 +1,1571 @@ +[ + { + "BriefDescription": "C10 residency percent per package", + "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C10_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C2 residency percent per package", + "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C2_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C3 residency percent per package", + "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C3_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C6 residency percent per core", + "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C6_Core_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C6 residency percent per package", + "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C6_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C7 residency percent per core", + "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C7_Core_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C7 residency percent per package", + "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C7_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C8 residency percent per package", + "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C8_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C9 residency percent per package", + "MetricExpr": "cstate_pkg@c9\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C9_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "Uncore frequency per die [GHZ]", + "MetricExpr": "tma_info_system_socket_clks / #num_dies / duration_= time / 1e9", + "MetricGroup": "SoC", + "MetricName": "UNCORE_FREQ" + }, + { + "BriefDescription": "Percentage of cycles spent in System Manageme= nt Interrupts.", + "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0= else 0)", + "MetricGroup": "smi", + "MetricName": "smi_cycles", + "MetricThreshold": "smi_cycles > 0.1", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "Number of SMI interrupts.", + "MetricExpr": "msr@smi@", + "MetricGroup": "smi", + "MetricName": "smi_num", + "ScaleUnit": "1SMI#" + }, + { + "BriefDescription": "This metric estimates how often memory load a= ccesses were aliased by preceding stores (in program order) with a 4K addre= ss offset", + "MetricExpr": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS / tma_info_thread_c= lks", + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", + "MetricName": "tma_4k_aliasing", + "MetricThreshold": "tma_4k_aliasing > 0.2 & (tma_l1_bound > 0.1 & = (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates how often memory load = accesses were aliased by preceding stores (in program order) with a 4K addr= ess offset. False match is possible; which incur a few cycles load re-issue= . However; the short re-issue duration is often hidden by the out-of-order = core and HW optimizations; hence a user may safely ignore a high value of t= his metric unless it manages to propagate up into parent nodes of the hiera= rchy (e.g. to L1_Bound).", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", + "MetricExpr": "(UOPS_DISPATCHED.PORT_0 + UOPS_DISPATCHED.PORT_1 + = UOPS_DISPATCHED.PORT_5 + UOPS_DISPATCHED.PORT_6) / (4 * tma_info_core_core_= clks)", + "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", + "MetricName": "tma_alu_op_utilization", + "MetricThreshold": "tma_alu_op_utilization > 0.6", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of slots the C= PU retired uops delivered by the Microcode_Sequencer as a result of Assists= ", + "MetricExpr": "100 * ASSISTS.ANY / tma_info_thread_slots", + "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", + "MetricName": "tma_assists", + "MetricThreshold": "tma_assists > 0.1 & (tma_microcode_sequencer >= 0.05 & tma_heavy_operations > 0.1)", + "PublicDescription": "This metric estimates fraction of slots the = CPU retired uops delivered by the Microcode_Sequencer as a result of Assist= s. Assists are long sequences of uops that are required in certain corner-c= ases for operations that cannot be handled natively by the execution pipeli= ne. For example; when working with very small floating point values (so-cal= led Denormals); the FP units are not set up to perform these operations nat= ively. Instead; a sequence of instructions to perform the computation on th= e Denormals is injected into the pipeline. Since these microcode sequences = might be dozens of uops long; Assists can be extremely deleterious to perfo= rmance and they can be avoided in many cases. Sample with: ASSISTS.ANY", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", + "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT= _MISC.RECOVERY_CYCLES\\,cmask\\=3D1\\,edge@ / tma_info_thread_slots", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", + "MetricName": "tma_backend_bound", + "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1;Default", + "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", + "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", + "MetricName": "tma_bad_speculation", + "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1;Default", + "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring branch instructions.", + "MetricExpr": "tma_light_operations * BR_INST_RETIRED.ALL_BRANCHES= / (tma_retiring * tma_info_thread_slots)", + "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", + "MetricName": "tma_branch_instructions", + "MetricThreshold": "tma_branch_instructions > 0.1 & tma_light_oper= ations > 0.6", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", + "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL= _BRANCHES + MACHINE_CLEARS.COUNT) * tma_bad_speculation", + "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", + "MetricName": "tma_branch_mispredicts", + "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_bad_spec_branch_misprediction_cost, tma_info_bottleneck_mispredic= tions, tma_mispredicts_resteers", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers", + "MetricExpr": "INT_MISC.CLEAR_RESTEER_CYCLES / tma_info_thread_clk= s + tma_unknown_branches", + "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_= group", + "MetricName": "tma_branch_resteers", + "MetricThreshold": "tma_branch_resteers > 0.05 & (tma_fetch_latenc= y > 0.1 & tma_frontend_bound > 0.15)", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to Branch Resteers. Branch Resteers estimates the Fro= ntend delay in fetching operations from corrected path; following all sorts= of miss-predicted branches. For example; branchy code with lots of miss-pr= edictions might get categorized under Branch Resteers. Note the value of th= is node may overlap with its siblings. Sample with: BR_MISP_RETIRED.ALL_BRA= NCHES", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", + "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", + "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", + "MetricName": "tma_cisc", + "MetricThreshold": "tma_cisc > 0.1 & (tma_microcode_sequencer > 0.= 05 & tma_heavy_operations > 0.1)", + "PublicDescription": "This metric estimates fraction of cycles the= CPU retired uops originated from CISC (complex instruction set computer) i= nstruction. A CISC instruction has multiple uops that are required to perfo= rm the instruction's functionality as in the case of read-modify-write as a= n example. Since these instructions require multiple uops they may or may n= ot imply sub-optimal use of machine resources.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers as a result of Machine Clears", + "MetricExpr": "(1 - BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRE= D.ALL_BRANCHES + MACHINE_CLEARS.COUNT)) * INT_MISC.CLEAR_RESTEER_CYCLES / t= ma_info_thread_clks", + "MetricGroup": "BadSpec;MachineClears;TopdownL4;tma_L4_group;tma_b= ranch_resteers_group;tma_issueMC", + "MetricName": "tma_clears_resteers", + "MetricThreshold": "tma_clears_resteers > 0.05 & (tma_branch_reste= ers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to Branch Resteers as a result of Machine Clears. Sam= ple with: INT_MISC.CLEAR_RESTEER_CYCLES. Related metrics: tma_l1_bound, tma= _machine_clears, tma_microcode_sequencer, tma_ms_switches", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(29 * tma_info_system_average_frequency * MEM_LOAD_= L3_HIT_RETIRED.XSNP_HITM + 23.5 * tma_info_system_average_frequency * MEM_L= OAD_L3_HIT_RETIRED.XSNP_MISS) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RET= IRED.L1_MISS / 2) / tma_info_thread_clks", + "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;t= ma_issueSyncxn;tma_l3_bound_group", + "MetricName": "tma_contested_accesses", + "MetricThreshold": "tma_contested_accesses > 0.05 & (tma_l3_bound = > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles whi= le the memory subsystem was handling synchronizations due to contested acce= sses. Contested accesses occur when data written by one Logical Processor a= re read by another Logical Processor on a different Physical Core. Examples= of contested accesses include synchronizations such as locks; true data sh= aring such as modified locked variables; and false sharing. Sample with: ME= M_LOAD_L3_HIT_RETIRED.XSNP_HITM_PS;MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS_PS. Re= lated metrics: tma_data_sharing, tma_false_sharing, tma_machine_clears, tma= _remote_cache", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e Core non-memory issues were of a bottleneck", + "MetricExpr": "max(0, tma_backend_bound - tma_memory_bound)", + "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", + "MetricName": "tma_core_bound", + "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to data-sharing ac= cesses", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "23.5 * tma_info_system_average_frequency * MEM_LOAD= _L3_HIT_RETIRED.XSNP_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.= L1_MISS / 2) / tma_info_thread_clks", + "MetricGroup": "Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSync= xn;tma_l3_bound_group", + "MetricName": "tma_data_sharing", + "MetricThreshold": "tma_data_sharing > 0.05 & (tma_l3_bound > 0.05= & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles whi= le the memory subsystem was handling synchronizations due to data-sharing a= ccesses. Data shared by multiple Logical Processors (even just read shared)= may cause increased access latency due to cache coherency. Excessive data = sharing can drastically harm multithreaded performance. Sample with: MEM_LO= AD_L3_HIT_RETIRED.XSNP_HIT_PS. Related metrics: tma_contested_accesses, tma= _false_sharing, tma_machine_clears, tma_remote_cache", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles whe= re decoder-0 was the only active decoder", + "MetricExpr": "(cpu@INST_DECODED.DECODERS\\,cmask\\=3D1@ - cpu@INS= T_DECODED.DECODERS\\,cmask\\=3D2@) / tma_info_core_core_clks / 2", + "MetricGroup": "DSBmiss;FetchBW;TopdownL4;tma_L4_group;tma_issueD0= ;tma_mite_group", + "MetricName": "tma_decoder0_alone", + "MetricThreshold": "tma_decoder0_alone > 0.1 & (tma_mite > 0.1 & (= tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_thread_ipc= / 5 > 0.35))", + "PublicDescription": "This metric represents fraction of cycles wh= ere decoder-0 was the only active decoder. Related metrics: tma_few_uops_in= structions", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles whe= re the Divider unit was active", + "MetricExpr": "ARITH.DIVIDER_ACTIVE / tma_info_thread_clks", + "MetricGroup": "TopdownL3;tma_L3_group;tma_core_bound_group", + "MetricName": "tma_divider", + "MetricThreshold": "tma_divider > 0.2 & (tma_core_bound > 0.1 & tm= a_backend_bound > 0.2)", + "PublicDescription": "This metric represents fraction of cycles wh= ere the Divider unit was active. Divide and square root instructions are pe= rformed by the Divider unit and can take considerably longer latency than i= nteger or Floating Point addition; subtraction; or multiplication. Sample w= ith: ARITH.DIVIDER_ACTIVE", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often the CPU was s= talled on accesses to external memory (DRAM) by loads", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "CYCLE_ACTIVITY.STALLS_L3_MISS / tma_info_thread_clk= s + (CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_= info_thread_clks - tma_l2_bound", + "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_me= mory_bound_group", + "MetricName": "tma_dram_bound", + "MetricThreshold": "tma_dram_bound > 0.1 & (tma_memory_bound > 0.2= & tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates how often the CPU was = stalled on accesses to external memory (DRAM) by loads. Better caching can = improve the latency and increase performance. Sample with: MEM_LOAD_RETIRED= .L3_MISS_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to DSB (decoded uop cache) fetch pipe= line", + "MetricExpr": "(IDQ.DSB_CYCLES_ANY - IDQ.DSB_CYCLES_OK) / tma_info= _core_core_clks / 2", + "MetricGroup": "DSB;FetchBW;TopdownL3;tma_L3_group;tma_fetch_bandw= idth_group", + "MetricName": "tma_dsb", + "MetricThreshold": "tma_dsb > 0.15 & (tma_fetch_bandwidth > 0.1 & = tma_frontend_bound > 0.15 & tma_info_thread_ipc / 5 > 0.35)", + "PublicDescription": "This metric represents Core fraction of cycl= es in which CPU was likely limited due to DSB (decoded uop cache) fetch pip= eline. For example; inefficient utilization of the DSB cache structure or = bank conflict when reading from it; are categorized here.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to switches from DSB to MITE pipelines", + "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / tma_info_thread_= clks", + "MetricGroup": "DSBmiss;FetchLat;TopdownL3;tma_L3_group;tma_fetch_= latency_group;tma_issueFB", + "MetricName": "tma_dsb_switches", + "MetricThreshold": "tma_dsb_switches > 0.05 & (tma_fetch_latency >= 0.1 & tma_frontend_bound > 0.15)", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to switches from DSB to MITE pipelines. The DSB (deco= ded i-cache) is a Uop Cache where the front-end directly delivers Uops (mic= ro operations) avoiding heavy x86 decoding. The DSB pipeline has shorter la= tency and delivered higher bandwidth than the MITE (legacy instruction deco= de pipeline). Switching between the two pipelines can cause penalties hence= this metric measures the exposed penalty. Sample with: FRONTEND_RETIRED.DS= B_MISS_PS. Related metrics: tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_mis= ses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates the fraction of= cycles where the Data TLB (DTLB) was missed by load accesses", + "MetricExpr": "min(7 * cpu@DTLB_LOAD_MISSES.STLB_HIT\\,cmask\\=3D1= @ + DTLB_LOAD_MISSES.WALK_ACTIVE, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE= _ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", + "MetricGroup": "MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_= l1_bound_group", + "MetricName": "tma_dtlb_load", + "MetricThreshold": "tma_dtlb_load > 0.1 & (tma_l1_bound > 0.1 & (t= ma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric roughly estimates the fraction o= f cycles where the Data TLB (DTLB) was missed by load accesses. TLBs (Trans= lation Look-aside Buffers) are processor caches for recently used entries o= ut of the Page Tables that are used to map virtual- to physical-addresses b= y the operating system. This metric approximates the potential delay of dem= and loads missing the first-level data TLB (assuming worst case scenario wi= th back to back misses to different pages). This includes hitting in the se= cond-level TLB (STLB) as well as performing a hardware page walk on an STLB= miss. Sample with: MEM_INST_RETIRED.STLB_MISS_LOADS_PS. Related metrics: t= ma_dtlb_store, tma_info_bottleneck_memory_data_tlbs", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates the fraction of= cycles spent handling first-level data TLB store misses", + "MetricExpr": "(7 * cpu@DTLB_STORE_MISSES.STLB_HIT\\,cmask\\=3D1@ = + DTLB_STORE_MISSES.WALK_ACTIVE) / tma_info_core_core_clks", + "MetricGroup": "MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_= store_bound_group", + "MetricName": "tma_dtlb_store", + "MetricThreshold": "tma_dtlb_store > 0.05 & (tma_store_bound > 0.2= & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric roughly estimates the fraction o= f cycles spent handling first-level data TLB store misses. As with ordinar= y data caching; focus on improving data locality and reducing working-set s= ize to reduce DTLB overhead. Additionally; consider using profile-guided o= ptimization (PGO) to collocate frequently-used data on the same page. Try = using larger page sizes for large amounts of frequently-used data. Sample w= ith: MEM_INST_RETIRED.STLB_MISS_STORES_PS. Related metrics: tma_dtlb_load, = tma_info_bottleneck_memory_data_tlbs", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates how often CPU w= as handling synchronizations due to False Sharing", + "MetricExpr": "32.5 * tma_info_system_average_frequency * OCR.DEMA= ND_RFO.L3_HIT.SNOOP_HITM / tma_info_thread_clks", + "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;t= ma_issueSyncxn;tma_store_bound_group", + "MetricName": "tma_false_sharing", + "MetricThreshold": "tma_false_sharing > 0.05 & (tma_store_bound > = 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric roughly estimates how often CPU = was handling synchronizations due to False Sharing. False Sharing is a mult= ithreading hiccup; where multiple Logical Processors contend on different d= ata-elements mapped into the same cache line. Sample with: OCR.DEMAND_RFO.L= 3_HIT.SNOOP_HITM. Related metrics: tma_contested_accesses, tma_data_sharing= , tma_machine_clears, tma_remote_cache", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric does a *rough estimation* of how = often L1D Fill Buffer unavailability limited additional L1D miss memory acc= ess requests to proceed", + "MetricExpr": "L1D_PEND_MISS.FB_FULL / tma_info_thread_clks", + "MetricGroup": "MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;tma_is= sueSL;tma_issueSmSt;tma_l1_bound_group", + "MetricName": "tma_fb_full", + "MetricThreshold": "tma_fb_full > 0.3", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_info_bottleneck_memory_bandwidth, t= ma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_laten= cy, tma_streaming_stores", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = CPU was stalled due to Frontend bandwidth issues", + "MetricExpr": "max(0, tma_frontend_bound - tma_fetch_latency)", + "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", + "MetricName": "tma_fetch_bandwidth", + "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_thread_ipc / 5 > 0.35", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_= info_inst_mix_iptb, tma_lcp", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = CPU was stalled due to Frontend latency issues", + "MetricExpr": "(5 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.COR= E - INT_MISC.UOP_DROPPING) / tma_info_thread_slots", + "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", + "MetricName": "tma_fetch_latency", + "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring instructions that that are decoder into two or up to= ([SNB+] four; [ADL+] five) uops", + "MetricExpr": "tma_heavy_operations - tma_microcode_sequencer", + "MetricGroup": "TopdownL3;tma_L3_group;tma_heavy_operations_group;= tma_issueD0", + "MetricName": "tma_few_uops_instructions", + "MetricThreshold": "tma_few_uops_instructions > 0.05 & tma_heavy_o= perations > 0.1", + "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring instructions that that are decoder into two or up t= o ([SNB+] four; [ADL+] five) uops. This highly-correlates with the number o= f uops in such instructions. Related metrics: tma_decoder0_alone", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents overall arithmetic flo= ating-point (FP) operations fraction the CPU has executed (retired)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector", + "MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_gr= oup", + "MetricName": "tma_fp_arith", + "MetricThreshold": "tma_fp_arith > 0.2 & tma_light_operations > 0.= 6", + "PublicDescription": "This metric represents overall arithmetic fl= oating-point (FP) operations fraction the CPU has executed (retired). Note = this metric's value may exceed its parent due to use of \"Uops\" CountDomai= n and FMA double-counting.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric approximates arithmetic floating-= point (FP) scalar uops fraction the CPU has retired", + "MetricExpr": "cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ / (tma_retiring * tma_info_thread_slots)", + "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_= group;tma_issue2P", + "MetricName": "tma_fp_scalar", + "MetricThreshold": "tma_fp_scalar > 0.1 & (tma_fp_arith > 0.2 & tm= a_light_operations > 0.6)", + "PublicDescription": "This metric approximates arithmetic floating= -point (FP) scalar uops fraction the CPU has retired. May overcount due to = FMA double counting. Related metrics: tma_fp_vector, tma_fp_vector_128b, tm= a_fp_vector_256b, tma_fp_vector_512b, tma_port_0, tma_port_1, tma_port_5, t= ma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric approximates arithmetic floating-= point (FP) vector uops fraction the CPU has retired aggregated across all v= ector widths", + "MetricExpr": "cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umas= k\\=3D0xfc@ / (tma_retiring * tma_info_thread_slots)", + "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_= group;tma_issue2P", + "MetricName": "tma_fp_vector", + "MetricThreshold": "tma_fp_vector > 0.1 & (tma_fp_arith > 0.2 & tm= a_light_operations > 0.6)", + "PublicDescription": "This metric approximates arithmetic floating= -point (FP) vector uops fraction the CPU has retired aggregated across all = vector widths. May overcount due to FMA double counting. Related metrics: t= ma_fp_scalar, tma_fp_vector_128b, tma_fp_vector_256b, tma_fp_vector_512b, t= ma_port_0, tma_port_1, tma_port_5, tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric approximates arithmetic FP vector= uops fraction the CPU has retired for 128-bit wide vectors", + "MetricExpr": "(FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARIT= H_INST_RETIRED.128B_PACKED_SINGLE) / (tma_retiring * tma_info_thread_slots)= ", + "MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector= _group;tma_issue2P", + "MetricName": "tma_fp_vector_128b", + "MetricThreshold": "tma_fp_vector_128b > 0.1 & (tma_fp_vector > 0.= 1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))", + "PublicDescription": "This metric approximates arithmetic FP vecto= r uops fraction the CPU has retired for 128-bit wide vectors. May overcount= due to FMA double counting. Related metrics: tma_fp_scalar, tma_fp_vector,= tma_fp_vector_256b, tma_fp_vector_512b, tma_port_0, tma_port_1, tma_port_5= , tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric approximates arithmetic FP vector= uops fraction the CPU has retired for 256-bit wide vectors", + "MetricExpr": "(FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARIT= H_INST_RETIRED.256B_PACKED_SINGLE) / (tma_retiring * tma_info_thread_slots)= ", + "MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector= _group;tma_issue2P", + "MetricName": "tma_fp_vector_256b", + "MetricThreshold": "tma_fp_vector_256b > 0.1 & (tma_fp_vector > 0.= 1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))", + "PublicDescription": "This metric approximates arithmetic FP vecto= r uops fraction the CPU has retired for 256-bit wide vectors. May overcount= due to FMA double counting. Related metrics: tma_fp_scalar, tma_fp_vector,= tma_fp_vector_128b, tma_fp_vector_512b, tma_port_0, tma_port_1, tma_port_5= , tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric approximates arithmetic FP vector= uops fraction the CPU has retired for 512-bit wide vectors", + "MetricExpr": "(FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE + FP_ARIT= H_INST_RETIRED.512B_PACKED_SINGLE) / (tma_retiring * tma_info_thread_slots)= ", + "MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector= _group;tma_issue2P", + "MetricName": "tma_fp_vector_512b", + "MetricThreshold": "tma_fp_vector_512b > 0.1 & (tma_fp_vector > 0.= 1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))", + "PublicDescription": "This metric approximates arithmetic FP vecto= r uops fraction the CPU has retired for 512-bit wide vectors. May overcount= due to FMA double counting. Related metrics: tma_fp_scalar, tma_fp_vector,= tma_fp_vector_128b, tma_fp_vector_256b, tma_port_0, tma_port_1, tma_port_5= , tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", + "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UO= P_DROPPING / tma_info_thread_slots", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricName": "tma_frontend_bound", + "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1;Default", + "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or micro-coded sequences", + "MetricExpr": "tma_microcode_sequencer + tma_retiring * (UOPS_DECO= DED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=3D1@) / IDQ.MITE_UOPS", + "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", + "MetricName": "tma_heavy_operations", + "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to instruction cache misses", + "MetricExpr": "ICACHE_16B.IFDATA_STALL / tma_info_thread_clks", + "MetricGroup": "BigFoot;FetchLat;IcMiss;TopdownL3;tma_L3_group;tma= _fetch_latency_group", + "MetricName": "tma_icache_misses", + "MetricThreshold": "tma_icache_misses > 0.05 & (tma_fetch_latency = > 0.1 & tma_frontend_bound > 0.15)", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to instruction cache misses. Sample with: FRONTEND_RE= TIRED.L2_MISS_PS;FRONTEND_RETIRED.L1I_MISS_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "Branch Misprediction Cost: Fraction of TMA sl= ots wasted per non-speculative branch misprediction (retired JEClear)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_m= ispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_= misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_thread_sl= ots / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;BrMispredicts;tma_issueBM", + "MetricName": "tma_info_bad_spec_branch_misprediction_cost", + "PublicDescription": "Branch Misprediction Cost: Fraction of TMA s= lots wasted per non-speculative branch misprediction (retired JEClear). Rel= ated metrics: tma_branch_mispredicts, tma_info_bottleneck_mispredictions, t= ma_mispredicts_resteers" + }, + { + "BriefDescription": "Instructions per retired mispredicts for cond= itional non-taken branches (lower number means higher occurrence rate).", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.COND_NTAKEN", + "MetricGroup": "Bad;BrMispredicts", + "MetricName": "tma_info_bad_spec_ipmisp_cond_ntaken", + "MetricThreshold": "tma_info_bad_spec_ipmisp_cond_ntaken < 200" + }, + { + "BriefDescription": "Instructions per retired mispredicts for cond= itional taken branches (lower number means higher occurrence rate).", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.COND_TAKEN", + "MetricGroup": "Bad;BrMispredicts", + "MetricName": "tma_info_bad_spec_ipmisp_cond_taken", + "MetricThreshold": "tma_info_bad_spec_ipmisp_cond_taken < 200" + }, + { + "BriefDescription": "Instructions per retired mispredicts for indi= rect CALL or JMP branches (lower number means higher occurrence rate).", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.INDIRECT", + "MetricGroup": "Bad;BrMispredicts", + "MetricName": "tma_info_bad_spec_ipmisp_indirect", + "MetricThreshold": "tma_info_bad_spec_ipmisp_indirect < 1e3" + }, + { + "BriefDescription": "Instructions per retired mispredicts for retu= rn branches (lower number means higher occurrence rate).", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.RET", + "MetricGroup": "Bad;BrMispredicts", + "MetricName": "tma_info_bad_spec_ipmisp_ret", + "MetricThreshold": "tma_info_bad_spec_ipmisp_ret < 500" + }, + { + "BriefDescription": "Number of Instructions per non-speculative Br= anch Misprediction (JEClear) (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;BadSpec;BrMispredicts", + "MetricName": "tma_info_bad_spec_ipmispredict", + "MetricThreshold": "tma_info_bad_spec_ipmispredict < 200" + }, + { + "BriefDescription": "Probability of Core Bound bottleneck hidden b= y SMT-profiling artifacts", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization = if tma_core_bound < tma_ports_utilization else 1) if tma_info_system_smt_2t= _utilization > 0.5 else 0)", + "MetricGroup": "Cor;SMT", + "MetricName": "tma_info_botlnk_l0_core_bound_likely", + "MetricThreshold": "tma_info_botlnk_l0_core_bound_likely > 0.5" + }, + { + "BriefDescription": "Total pipeline cost of DSB (uop cache) misses= - subset of the Instruction_Fetch_BW Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_= branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + = tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tm= a_lsd + tma_mite))", + "MetricGroup": "DSBmiss;Fed;tma_issueFB", + "MetricName": "tma_info_botlnk_l2_dsb_misses", + "MetricThreshold": "tma_info_botlnk_l2_dsb_misses > 10", + "PublicDescription": "Total pipeline cost of DSB (uop cache) misse= s - subset of the Instruction_Fetch_BW Bottleneck. Related metrics: tma_dsb= _switches, tma_fetch_bandwidth, tma_info_frontend_dsb_coverage, tma_info_in= st_mix_iptb, tma_lcp" + }, + { + "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", + "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", + "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", + "MetricName": "tma_info_botlnk_l2_ic_misses", + "MetricThreshold": "tma_info_botlnk_l2_ic_misses > 5", + "PublicDescription": "Total pipeline cost of Instruction Cache mis= ses - subset of the Big_Code Bottleneck. Related metrics: " + }, + { + "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", + "MetricGroup": "BigFoot;Fed;Frontend;IcMiss;MemoryTLB;tma_issueBC"= , + "MetricName": "tma_info_bottleneck_big_code", + "MetricThreshold": "tma_info_bottleneck_big_code > 20", + "PublicDescription": "Total pipeline cost of instruction fetch rel= ated bottlenecks by large code footprint programs (i-side cache; TLB and BT= B misses). Related metrics: tma_info_bottleneck_branching_overhead" + }, + { + "BriefDescription": "Total pipeline cost of branch related instruc= tions (used for program control-flow including function calls)", + "MetricExpr": "100 * ((BR_INST_RETIRED.COND + 3 * BR_INST_RETIRED.= NEAR_CALL + (BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_TAKEN - 2 * = BR_INST_RETIRED.NEAR_CALL)) / tma_info_thread_slots)", + "MetricGroup": "Ret;tma_issueBC", + "MetricName": "tma_info_bottleneck_branching_overhead", + "MetricThreshold": "tma_info_bottleneck_branching_overhead > 10", + "PublicDescription": "Total pipeline cost of branch related instru= ctions (used for program control-flow including function calls). Related me= trics: tma_info_bottleneck_big_code" + }, + { + "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma= _mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icach= e_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_bottlen= eck_big_code", + "MetricGroup": "Fed;FetchBW;Frontend", + "MetricName": "tma_info_bottleneck_instruction_fetch_bw", + "MetricThreshold": "tma_info_bottleneck_instruction_fetch_bw > 20" + }, + { + "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", + "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound /= (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_b= ound) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_= hit_latency + tma_sq_full))) + tma_l1_bound / (tma_dram_bound + tma_l1_boun= d + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_fb_full / (tma_4k= _aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_load= s + tma_store_fwd_blk))", + "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", + "MetricName": "tma_info_bottleneck_memory_bandwidth", + "MetricThreshold": "tma_info_bottleneck_memory_bandwidth > 20", + "PublicDescription": "Total pipeline cost of (external) Memory Ban= dwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_system_d= ram_bw_use, tma_mem_bandwidth, tma_sq_full" + }, + { + "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_me= mory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + t= ma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tma_= dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fw= d_blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound += tma_l3_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_= false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores= )))", + "MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB", + "MetricName": "tma_info_bottleneck_memory_data_tlbs", + "MetricThreshold": "tma_info_bottleneck_memory_data_tlbs > 20", + "PublicDescription": "Total pipeline cost of Memory Address Transl= ation related bottlenecks (data-side TLBs). Related metrics: tma_dtlb_load,= tma_dtlb_store" + }, + { + "BriefDescription": "Total pipeline cost of Memory Latency related= bottlenecks (external memory and off-core caches)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (= tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bou= nd) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tm= a_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_= bound + tma_l2_bound + tma_l3_bound + tma_store_bound))", + "MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat", + "MetricName": "tma_info_bottleneck_memory_latency", + "MetricThreshold": "tma_info_bottleneck_memory_latency > 20", + "PublicDescription": "Total pipeline cost of Memory Latency relate= d bottlenecks (external memory and off-core caches). Related metrics: tma_l= 3_hit_latency, tma_mem_latency" + }, + { + "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency *= tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_i= cache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", + "MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM", + "MetricName": "tma_info_bottleneck_mispredictions", + "MetricThreshold": "tma_info_bottleneck_mispredictions > 20", + "PublicDescription": "Total pipeline cost of Branch Misprediction = related bottlenecks. Related metrics: tma_branch_mispredicts, tma_info_bad_= spec_branch_misprediction_cost, tma_mispredicts_resteers" + }, + { + "BriefDescription": "Fraction of branches that are CALL or RET", + "MetricExpr": "(BR_INST_RETIRED.NEAR_CALL + BR_INST_RETIRED.NEAR_R= ETURN) / BR_INST_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;Branches", + "MetricName": "tma_info_branches_callret" + }, + { + "BriefDescription": "Fraction of branches that are non-taken condi= tionals", + "MetricExpr": "BR_INST_RETIRED.COND_NTAKEN / BR_INST_RETIRED.ALL_B= RANCHES", + "MetricGroup": "Bad;Branches;CodeGen;PGO", + "MetricName": "tma_info_branches_cond_nt" + }, + { + "BriefDescription": "Fraction of branches that are taken condition= als", + "MetricExpr": "BR_INST_RETIRED.COND_TAKEN / BR_INST_RETIRED.ALL_BR= ANCHES", + "MetricGroup": "Bad;Branches;CodeGen;PGO", + "MetricName": "tma_info_branches_cond_tk" + }, + { + "BriefDescription": "Fraction of branches that are unconditional (= direct or indirect) jumps", + "MetricExpr": "(BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_= TAKEN - 2 * BR_INST_RETIRED.NEAR_CALL) / BR_INST_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;Branches", + "MetricName": "tma_info_branches_jump" + }, + { + "BriefDescription": "Fraction of branches of other types (not indi= vidually covered by other metrics in Info.Branches group)", + "MetricExpr": "1 - (tma_info_branches_cond_nt + tma_info_branches_= cond_tk + tma_info_branches_callret + tma_info_branches_jump)", + "MetricGroup": "Bad;Branches", + "MetricName": "tma_info_branches_other_branches" + }, + { + "BriefDescription": "Core actual clocks when any Logical Processor= is active on the Physical Core", + "MetricExpr": "CPU_CLK_UNHALTED.DISTRIBUTED", + "MetricGroup": "SMT", + "MetricName": "tma_info_core_core_clks" + }, + { + "BriefDescription": "Instructions Per Cycle across hyper-threads (= per physical core)", + "MetricExpr": "INST_RETIRED.ANY / tma_info_core_core_clks", + "MetricGroup": "Ret;SMT;TmaL1;tma_L1_group", + "MetricName": "tma_info_core_coreipc" + }, + { + "BriefDescription": "Floating Point Operations Per Cycle", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * cpu@FP_ARITH_= INST_RETIRED.128B_PACKED_SINGLE\\,umask\\=3D0x18@ + 8 * cpu@FP_ARITH_INST_R= ETIRED.256B_PACKED_SINGLE\\,umask\\=3D0x60@ + 16 * FP_ARITH_INST_RETIRED.51= 2B_PACKED_SINGLE) / tma_info_core_core_clks", + "MetricGroup": "Flops;Ret", + "MetricName": "tma_info_core_flopc" + }, + { + "BriefDescription": "Actual per-core usage of the Floating Point n= on-X87 execution units (regardless of precision or vector-width)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=3D0xfc@) = / (2 * tma_info_core_core_clks)", + "MetricGroup": "Cor;Flops;HPC", + "MetricName": "tma_info_core_fp_arith_utilization", + "PublicDescription": "Actual per-core usage of the Floating Point = non-X87 execution units (regardless of precision or vector-width). Values >= 1 are possible due to ([BDW+] Fused-Multiply Add (FMA) counting - common; = [ADL+] use all of ADD/MUL/FMA in Scalar or 128/256-bit vectors - less commo= n)." + }, + { + "BriefDescription": "Instruction-Level-Parallelism (average number= of uops executed when there is execution) per-core", + "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)", + "MetricGroup": "Backend;Cor;Pipeline;PortsUtil", + "MetricName": "tma_info_core_ilp" + }, + { + "BriefDescription": "Fraction of Uops delivered by the DSB (aka De= coded ICache; or Uop Cache)", + "MetricExpr": "IDQ.DSB_UOPS / UOPS_ISSUED.ANY", + "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", + "MetricName": "tma_info_frontend_dsb_coverage", + "MetricThreshold": "tma_info_frontend_dsb_coverage < 0.7 & tma_inf= o_thread_ipc / 5 > 0.35", + "PublicDescription": "Fraction of Uops delivered by the DSB (aka D= ecoded ICache; or Uop Cache). Related metrics: tma_dsb_switches, tma_fetch_= bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_inst_mix_iptb, tma_lcp" + }, + { + "BriefDescription": "Average number of cycles of a switch from the= DSB fetch-unit to MITE fetch unit - see DSB_Switches tree node for details= .", + "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / cpu@DSB2MITE_SWI= TCHES.PENALTY_CYCLES\\,cmask\\=3D1\\,edge@", + "MetricGroup": "DSBmiss", + "MetricName": "tma_info_frontend_dsb_switch_cost" + }, + { + "BriefDescription": "Average number of Uops issued by front-end wh= en it issued something", + "MetricExpr": "UOPS_ISSUED.ANY / cpu@UOPS_ISSUED.ANY\\,cmask\\=3D1= @", + "MetricGroup": "Fed;FetchBW", + "MetricName": "tma_info_frontend_fetch_upc" + }, + { + "BriefDescription": "Average Latency for L1 instruction cache miss= es", + "MetricExpr": "ICACHE_16B.IFDATA_STALL / cpu@ICACHE_16B.IFDATA_STA= LL\\,cmask\\=3D1\\,edge@", + "MetricGroup": "Fed;FetchLat;IcMiss", + "MetricName": "tma_info_frontend_icache_miss_latency" + }, + { + "BriefDescription": "Instructions per non-speculative DSB miss (lo= wer number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / FRONTEND_RETIRED.ANY_DSB_MISS", + "MetricGroup": "DSBmiss;Fed", + "MetricName": "tma_info_frontend_ipdsb_miss_ret", + "MetricThreshold": "tma_info_frontend_ipdsb_miss_ret < 50" + }, + { + "BriefDescription": "Instructions per speculative Unknown Branch M= isprediction (BAClear) (lower number means higher occurrence rate)", + "MetricExpr": "tma_info_inst_mix_instructions / BACLEARS.ANY", + "MetricGroup": "Fed", + "MetricName": "tma_info_frontend_ipunknown_branch" + }, + { + "BriefDescription": "L2 cache true code cacheline misses per kilo = instruction", + "MetricExpr": "1e3 * FRONTEND_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "IcMiss", + "MetricName": "tma_info_frontend_l2mpki_code" + }, + { + "BriefDescription": "L2 cache speculative code cacheline misses pe= r kilo instruction", + "MetricExpr": "1e3 * L2_RQSTS.CODE_RD_MISS / INST_RETIRED.ANY", + "MetricGroup": "IcMiss", + "MetricName": "tma_info_frontend_l2mpki_code_all" + }, + { + "BriefDescription": "Fraction of Uops delivered by the LSD (Loop S= tream Detector; aka Loop Cache)", + "MetricExpr": "LSD.UOPS / UOPS_ISSUED.ANY", + "MetricGroup": "Fed;LSD", + "MetricName": "tma_info_frontend_lsd_coverage" + }, + { + "BriefDescription": "Branch instructions per taken branch.", + "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR= _TAKEN", + "MetricGroup": "Branches;Fed;PGO", + "MetricName": "tma_info_inst_mix_bptkbranch" + }, + { + "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", + "MetricGroup": "Summary;TmaL1;tma_L1_group", + "MetricName": "tma_info_inst_mix_instructions", + "PublicDescription": "Total number of retired Instructions. Sample= with: INST_RETIRED.PREC_DIST" + }, + { + "BriefDescription": "Instructions per FP Arithmetic instruction (l= ower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / (cpu@FP_ARITH_INST_RETIRED.SCALA= R_SINGLE\\,umask\\=3D0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\= ,umask\\=3D0xfc@)", + "MetricGroup": "Flops;InsType", + "MetricName": "tma_info_inst_mix_iparith", + "MetricThreshold": "tma_info_inst_mix_iparith < 10", + "PublicDescription": "Instructions per FP Arithmetic instruction (= lower number means higher occurrence rate). May undercount due to FMA doubl= e counting. Approximated prior to BDW." + }, + { + "BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bi= t instruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.128B_PACK= ED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE)", + "MetricGroup": "Flops;FpVector;InsType", + "MetricName": "tma_info_inst_mix_iparith_avx128", + "MetricThreshold": "tma_info_inst_mix_iparith_avx128 < 10", + "PublicDescription": "Instructions per FP Arithmetic AVX/SSE 128-b= it instruction (lower number means higher occurrence rate). May undercount = due to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit i= nstruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.256B_PACK= ED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE)", + "MetricGroup": "Flops;FpVector;InsType", + "MetricName": "tma_info_inst_mix_iparith_avx256", + "MetricThreshold": "tma_info_inst_mix_iparith_avx256 < 10", + "PublicDescription": "Instructions per FP Arithmetic AVX* 256-bit = instruction (lower number means higher occurrence rate). May undercount due= to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic AVX 512-bit in= struction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.512B_PACK= ED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE)", + "MetricGroup": "Flops;FpVector;InsType", + "MetricName": "tma_info_inst_mix_iparith_avx512", + "MetricThreshold": "tma_info_inst_mix_iparith_avx512 < 10", + "PublicDescription": "Instructions per FP Arithmetic AVX 512-bit i= nstruction (lower number means higher occurrence rate). May undercount due = to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic Scalar Double-= Precision instruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_DOU= BLE", + "MetricGroup": "Flops;FpScalar;InsType", + "MetricName": "tma_info_inst_mix_iparith_scalar_dp", + "MetricThreshold": "tma_info_inst_mix_iparith_scalar_dp < 10", + "PublicDescription": "Instructions per FP Arithmetic Scalar Double= -Precision instruction (lower number means higher occurrence rate). May und= ercount due to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic Scalar Single-= Precision instruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_SIN= GLE", + "MetricGroup": "Flops;FpScalar;InsType", + "MetricName": "tma_info_inst_mix_iparith_scalar_sp", + "MetricThreshold": "tma_info_inst_mix_iparith_scalar_sp < 10", + "PublicDescription": "Instructions per FP Arithmetic Scalar Single= -Precision instruction (lower number means higher occurrence rate). May und= ercount due to FMA double counting." + }, + { + "BriefDescription": "Instructions per Branch (lower number means h= igher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", + "MetricGroup": "Branches;Fed;InsType", + "MetricName": "tma_info_inst_mix_ipbranch", + "MetricThreshold": "tma_info_inst_mix_ipbranch < 8" + }, + { + "BriefDescription": "Instructions per (near) call (lower number me= ans higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", + "MetricGroup": "Branches;Fed;PGO", + "MetricName": "tma_info_inst_mix_ipcall", + "MetricThreshold": "tma_info_inst_mix_ipcall < 200" + }, + { + "BriefDescription": "Instructions per Floating Point (FP) Operatio= n (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / (cpu@FP_ARITH_INST_RETIRED.SCALA= R_SINGLE\\,umask\\=3D0x03@ + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE += 4 * cpu@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE\\,umask\\=3D0x18@ + 8 * c= pu@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE\\,umask\\=3D0x60@ + 16 * FP_ARI= TH_INST_RETIRED.512B_PACKED_SINGLE)", + "MetricGroup": "Flops;InsType", + "MetricName": "tma_info_inst_mix_ipflop", + "MetricThreshold": "tma_info_inst_mix_ipflop < 10" + }, + { + "BriefDescription": "Instructions per Load (lower number means hig= her occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS", + "MetricGroup": "InsType", + "MetricName": "tma_info_inst_mix_ipload", + "MetricThreshold": "tma_info_inst_mix_ipload < 3" + }, + { + "BriefDescription": "Instructions per Store (lower number means hi= gher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES", + "MetricGroup": "InsType", + "MetricName": "tma_info_inst_mix_ipstore", + "MetricThreshold": "tma_info_inst_mix_ipstore < 8" + }, + { + "BriefDescription": "Instructions per Software prefetch instructio= n (of any type: NTA/T0/T1/T2/Prefetch) (lower number means higher occurrenc= e rate)", + "MetricExpr": "INST_RETIRED.ANY / cpu@SW_PREFETCH_ACCESS.T0\\,umas= k\\=3D0xF@", + "MetricGroup": "Prefetches", + "MetricName": "tma_info_inst_mix_ipswpf", + "MetricThreshold": "tma_info_inst_mix_ipswpf < 100" + }, + { + "BriefDescription": "Instruction per taken branch", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fed;FetchBW;Frontend;PGO;tma_issueFB", + "MetricName": "tma_info_inst_mix_iptb", + "MetricThreshold": "tma_info_inst_mix_iptb < 11", + "PublicDescription": "Instruction per taken branch. Related metric= s: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tm= a_info_frontend_dsb_coverage, tma_lcp" + }, + { + "BriefDescription": "Average per-core data fill bandwidth to the L= 1 data cache [GB / sec]", + "MetricExpr": "64 * L1D.REPLACEMENT / 1e9 / duration_time", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_core_l1d_cache_fill_bw" + }, + { + "BriefDescription": "Average per-core data fill bandwidth to the L= 2 cache [GB / sec]", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1e9 / duration_time", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_core_l2_cache_fill_bw" + }, + { + "BriefDescription": "Average per-core data access bandwidth to the= L3 cache [GB / sec]", + "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1e9 / duration= _time", + "MetricGroup": "Mem;MemoryBW;Offcore", + "MetricName": "tma_info_memory_core_l3_cache_access_bw" + }, + { + "BriefDescription": "Average per-core data fill bandwidth to the L= 3 cache [GB / sec]", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1e9 / duration_time", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_core_l3_cache_fill_bw" + }, + { + "BriefDescription": "Fill Buffer (FB) hits per kilo instructions f= or retired demand loads (L1D misses that merge into ongoing miss-handling e= ntries)", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.FB_HIT / INST_RETIRED.ANY", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_fb_hpki" + }, + { + "BriefDescription": "L1 cache true misses per kilo instruction for= retired demand loads", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l1mpki" + }, + { + "BriefDescription": "L1 cache true misses per kilo instruction for= all demand loads (including speculative)", + "MetricExpr": "1e3 * L2_RQSTS.ALL_DEMAND_DATA_RD / INST_RETIRED.AN= Y", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l1mpki_load" + }, + { + "BriefDescription": "L2 cache hits per kilo instruction for all re= quest types (including speculative)", + "MetricExpr": "1e3 * (L2_RQSTS.REFERENCES - L2_RQSTS.MISS) / INST_= RETIRED.ANY", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l2hpki_all" + }, + { + "BriefDescription": "L2 cache hits per kilo instruction for all de= mand loads (including speculative)", + "MetricExpr": "1e3 * L2_RQSTS.DEMAND_DATA_RD_HIT / INST_RETIRED.AN= Y", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l2hpki_load" + }, + { + "BriefDescription": "L2 cache true misses per kilo instruction for= retired demand loads", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Backend;CacheMisses;Mem", + "MetricName": "tma_info_memory_l2mpki" + }, + { + "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instru= ction for all request types (including speculative)", + "MetricExpr": "1e3 * L2_RQSTS.MISS / INST_RETIRED.ANY", + "MetricGroup": "CacheMisses;Mem;Offcore", + "MetricName": "tma_info_memory_l2mpki_all" + }, + { + "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instru= ction for all demand loads (including speculative)", + "MetricExpr": "1e3 * L2_RQSTS.DEMAND_DATA_RD_MISS / INST_RETIRED.A= NY", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l2mpki_load" + }, + { + "BriefDescription": "L3 cache true misses per kilo instruction for= retired demand loads", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l3mpki" + }, + { + "BriefDescription": "Actual Average Latency for L1 data-cache miss= demand load operations (in core cycles)", + "MetricExpr": "L1D_PEND_MISS.PENDING / (MEM_LOAD_RETIRED.L1_MISS += MEM_LOAD_RETIRED.FB_HIT)", + "MetricGroup": "Mem;MemoryBound;MemoryLat", + "MetricName": "tma_info_memory_load_miss_real_latency" + }, + { + "BriefDescription": "Memory-Level-Parallelism (average number of L= 1 miss demand load when there is at least one such miss", + "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLE= S", + "MetricGroup": "Mem;MemoryBW;MemoryBound", + "MetricName": "tma_info_memory_mlp", + "PublicDescription": "Memory-Level-Parallelism (average number of = L1 miss demand load when there is at least one such miss. Per-Logical Proce= ssor)" + }, + { + "BriefDescription": "Average Parallel L2 cache miss data reads", + "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD / OFFCORE_= REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", + "MetricGroup": "Memory_BW;Offcore", + "MetricName": "tma_info_memory_oro_data_l2_mlp" + }, + { + "BriefDescription": "Average Latency for L2 cache miss demand Load= s", + "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD / OFFCO= RE_REQUESTS.DEMAND_DATA_RD", + "MetricGroup": "Memory_Lat;Offcore", + "MetricName": "tma_info_memory_oro_load_l2_miss_latency" + }, + { + "BriefDescription": "Average Parallel L2 cache miss demand Loads", + "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD / cpu@O= FFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD\\,cmask\\=3D1@", + "MetricGroup": "Memory_BW;Offcore", + "MetricName": "tma_info_memory_oro_load_l2_mlp" + }, + { + "BriefDescription": "Average Latency for L3 cache miss demand Load= s", + "MetricExpr": "cpu@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD\\,u= mask\\=3D0x10@ / OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", + "MetricGroup": "Memory_Lat;Offcore", + "MetricName": "tma_info_memory_oro_load_l3_miss_latency" + }, + { + "BriefDescription": "Average per-thread data fill bandwidth to the= L1 data cache [GB / sec]", + "MetricExpr": "tma_info_memory_core_l1d_cache_fill_bw", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_thread_l1d_cache_fill_bw_1t" + }, + { + "BriefDescription": "Average per-thread data fill bandwidth to the= L2 cache [GB / sec]", + "MetricExpr": "tma_info_memory_core_l2_cache_fill_bw", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_thread_l2_cache_fill_bw_1t" + }, + { + "BriefDescription": "Average per-thread data access bandwidth to t= he L3 cache [GB / sec]", + "MetricExpr": "tma_info_memory_core_l3_cache_access_bw", + "MetricGroup": "Mem;MemoryBW;Offcore", + "MetricName": "tma_info_memory_thread_l3_cache_access_bw_1t" + }, + { + "BriefDescription": "Average per-thread data fill bandwidth to the= L3 cache [GB / sec]", + "MetricExpr": "tma_info_memory_core_l3_cache_fill_bw", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_thread_l3_cache_fill_bw_1t" + }, + { + "BriefDescription": "STLB (2nd level TLB) code speculative misses = per kilo instruction (misses of any page-size that complete the page walk)"= , + "MetricExpr": "1e3 * ITLB_MISSES.WALK_COMPLETED / INST_RETIRED.ANY= ", + "MetricGroup": "Fed;MemoryTLB", + "MetricName": "tma_info_memory_tlb_code_stlb_mpki" + }, + { + "BriefDescription": "STLB (2nd level TLB) data load speculative mi= sses per kilo instruction (misses of any page-size that complete the page w= alk)", + "MetricExpr": "1e3 * DTLB_LOAD_MISSES.WALK_COMPLETED / INST_RETIRE= D.ANY", + "MetricGroup": "Mem;MemoryTLB", + "MetricName": "tma_info_memory_tlb_load_stlb_mpki" + }, + { + "BriefDescription": "Utilization of the core's Page Walker(s) serv= ing STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "(ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_P= ENDING + DTLB_STORE_MISSES.WALK_PENDING) / (2 * tma_info_core_core_clks)", + "MetricGroup": "Mem;MemoryTLB", + "MetricName": "tma_info_memory_tlb_page_walks_utilization", + "MetricThreshold": "tma_info_memory_tlb_page_walks_utilization > 0= .5" + }, + { + "BriefDescription": "STLB (2nd level TLB) data store speculative m= isses per kilo instruction (misses of any page-size that complete the page = walk)", + "MetricExpr": "1e3 * DTLB_STORE_MISSES.WALK_COMPLETED / INST_RETIR= ED.ANY", + "MetricGroup": "Mem;MemoryTLB", + "MetricName": "tma_info_memory_tlb_store_stlb_mpki" + }, + { + "BriefDescription": "Instruction-Level-Parallelism (average number= of uops executed when there is execution) per-thread", + "MetricExpr": "UOPS_EXECUTED.THREAD / cpu@UOPS_EXECUTED.THREAD\\,c= mask\\=3D1@", + "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", + "MetricName": "tma_info_pipeline_execute" + }, + { + "BriefDescription": "Average number of Uops retired in cycles wher= e at least one uop has retired.", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "tma_retiring * tma_info_thread_slots / cpu@UOPS_RET= IRED.SLOTS\\,cmask\\=3D1@", + "MetricGroup": "Pipeline;Ret", + "MetricName": "tma_info_pipeline_retire" + }, + { + "BriefDescription": "Measured Average Frequency for unhalted proce= ssors [GHz]", + "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / dur= ation_time", + "MetricGroup": "Power;Summary", + "MetricName": "tma_info_system_average_frequency" + }, + { + "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricGroup": "HPC;Summary", + "MetricName": "tma_info_system_cpu_utilization" + }, + { + "BriefDescription": "Average external Memory Bandwidth Use for rea= ds and writes [GB / sec]", + "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_RE= QUESTS.ALL) / 1e6 / duration_time / 1e3", + "MetricGroup": "HPC;Mem;MemoryBW;SoC;tma_issueBW", + "MetricName": "tma_info_system_dram_bw_use", + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_fb_full, tma_info_bottlenec= k_memory_bandwidth, tma_mem_bandwidth, tma_sq_full" + }, + { + "BriefDescription": "Giga Floating Point Operations Per Second", + "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * cpu@FP_ARITH_= INST_RETIRED.128B_PACKED_SINGLE\\,umask\\=3D0x18@ + 8 * cpu@FP_ARITH_INST_R= ETIRED.256B_PACKED_SINGLE\\,umask\\=3D0x60@ + 16 * FP_ARITH_INST_RETIRED.51= 2B_PACKED_SINGLE) / 1e9 / duration_time", + "MetricGroup": "Cor;Flops;HPC", + "MetricName": "tma_info_system_gflops", + "PublicDescription": "Giga Floating Point Operations Per Second. A= ggregate across all supported options of: FP precisions, scalar and vector = instructions, vector-width and AMX engine." + }, + { + "BriefDescription": "Instructions per Far Branch ( Far Branches ap= ply upon transition from application to operating system, handling interrup= ts, exceptions) [lower number means higher occurrence rate]", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.FAR_BRANCH:u", + "MetricGroup": "Branches;OS", + "MetricName": "tma_info_system_ipfarbranch", + "MetricThreshold": "tma_info_system_ipfarbranch < 1e6" + }, + { + "BriefDescription": "Cycles Per Instruction for the Operating Syst= em (OS) Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / INST_RETIRED.ANY_P:k"= , + "MetricGroup": "OS", + "MetricName": "tma_info_system_kernel_cpi" + }, + { + "BriefDescription": "Fraction of cycles spent in the Operating Sys= tem (OS) Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / CPU_CLK_UNHALTED.THRE= AD", + "MetricGroup": "OS", + "MetricName": "tma_info_system_kernel_utilization", + "MetricThreshold": "tma_info_system_kernel_utilization > 0.05" + }, + { + "BriefDescription": "Average number of parallel data read requests= to external memory", + "MetricExpr": "UNC_ARB_DAT_OCCUPANCY.RD / cpu@UNC_ARB_DAT_OCCUPANC= Y.RD\\,cmask\\=3D1@", + "MetricGroup": "Mem;MemoryBW;SoC", + "MetricName": "tma_info_system_mem_parallel_reads", + "PublicDescription": "Average number of parallel data read request= s to external memory. Accounts for demand loads and L1/L2 prefetches" + }, + { + "BriefDescription": "Average latency of data read request to exter= nal memory (in nanoseconds)", + "MetricExpr": "(UNC_ARB_TRK_OCCUPANCY.RD + UNC_ARB_DAT_OCCUPANCY.R= D) / UNC_ARB_TRK_REQUESTS.RD", + "MetricGroup": "Mem;MemoryLat;SoC", + "MetricName": "tma_info_system_mem_read_latency", + "PublicDescription": "Average latency of data read request to exte= rnal memory (in nanoseconds). Accounts for demand loads and L1/L2 prefetche= s. ([RKL+]memory-controller only)" + }, + { + "BriefDescription": "Average latency of all requests to external m= emory (in Uncore cycles)", + "MetricExpr": "(UNC_ARB_TRK_OCCUPANCY.ALL + UNC_ARB_DAT_OCCUPANCY.= RD) / UNC_ARB_TRK_REQUESTS.ALL", + "MetricGroup": "Mem;SoC", + "MetricName": "tma_info_system_mem_request_latency" + }, + { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for baseline license level 0", + "MetricExpr": "CORE_POWER.LVL0_TURBO_LICENSE / tma_info_core_core_= clks", + "MetricGroup": "Power", + "MetricName": "tma_info_system_power_license0_utilization", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for baseline license level 0. This includes non= -AVX codes, SSE, AVX 128-bit, and low-current AVX 256-bit codes." + }, + { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for license level 1", + "MetricExpr": "CORE_POWER.LVL1_TURBO_LICENSE / tma_info_core_core_= clks", + "MetricGroup": "Power", + "MetricName": "tma_info_system_power_license1_utilization", + "MetricThreshold": "tma_info_system_power_license1_utilization > 0= .5", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for license level 1. This includes high current= AVX 256-bit instructions as well as low current AVX 512-bit instructions." + }, + { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for license level 2 (introduced in SKX)", + "MetricExpr": "CORE_POWER.LVL2_TURBO_LICENSE / tma_info_core_core_= clks", + "MetricGroup": "Power", + "MetricName": "tma_info_system_power_license2_utilization", + "MetricThreshold": "tma_info_system_power_license2_utilization > 0= .5", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for license level 2 (introduced in SKX). This i= ncludes high current AVX 512-bit instructions." + }, + { + "BriefDescription": "Fraction of cycles where both hardware Logica= l Processors were active", + "MetricExpr": "(1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_U= NHALTED.REF_DISTRIBUTED if #SMT_on else 0)", + "MetricGroup": "SMT", + "MetricName": "tma_info_system_smt_2t_utilization" + }, + { + "BriefDescription": "Socket actual clocks when any core is active = on that socket", + "MetricExpr": "UNC_CLOCK.SOCKET", + "MetricGroup": "SoC", + "MetricName": "tma_info_system_socket_clks" + }, + { + "BriefDescription": "Average Frequency Utilization relative nomina= l frequency", + "MetricExpr": "tma_info_thread_clks / CPU_CLK_UNHALTED.REF_TSC", + "MetricGroup": "Power", + "MetricName": "tma_info_system_turbo_utilization" + }, + { + "BriefDescription": "Per-Logical Processor actual clocks when the = Logical Processor is active.", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD", + "MetricGroup": "Pipeline", + "MetricName": "tma_info_thread_clks" + }, + { + "BriefDescription": "Cycles Per Instruction (per Logical Processor= )", + "MetricExpr": "1 / tma_info_thread_ipc", + "MetricGroup": "Mem;Pipeline", + "MetricName": "tma_info_thread_cpi" + }, + { + "BriefDescription": "The ratio of Executed- by Issued-Uops", + "MetricExpr": "UOPS_EXECUTED.THREAD / UOPS_ISSUED.ANY", + "MetricGroup": "Cor;Pipeline", + "MetricName": "tma_info_thread_execute_per_issue", + "PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio= > 1 suggests high rate of uop micro-fusions. Ratio < 1 suggest high rate o= f \"execute\" at rename stage." + }, + { + "BriefDescription": "Instructions Per Cycle (per Logical Processor= )", + "MetricExpr": "INST_RETIRED.ANY / tma_info_thread_clks", + "MetricGroup": "Ret;Summary", + "MetricName": "tma_info_thread_ipc" + }, + { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core= till ICL; per-Logical Processor ICL onward)", + "MetricExpr": "TOPDOWN.SLOTS", + "MetricGroup": "TmaL1;tma_L1_group", + "MetricName": "tma_info_thread_slots" + }, + { + "BriefDescription": "Fraction of Physical Core issue-slots utilize= d by this Logical Processor", + "MetricExpr": "(tma_info_thread_slots / (TOPDOWN.SLOTS / 2) if #SM= T_on else 1)", + "MetricGroup": "SMT;TmaL1;tma_L1_group", + "MetricName": "tma_info_thread_slots_utilization" + }, + { + "BriefDescription": "Uops Per Instruction", + "MetricExpr": "tma_retiring * tma_info_thread_slots / INST_RETIRED= .ANY", + "MetricGroup": "Pipeline;Ret;Retire", + "MetricName": "tma_info_thread_uoppi", + "MetricThreshold": "tma_info_thread_uoppi > 1.05" + }, + { + "BriefDescription": "Instruction per taken branch", + "MetricExpr": "tma_retiring * tma_info_thread_slots / BR_INST_RETI= RED.NEAR_TAKEN", + "MetricGroup": "Branches;Fed;FetchBW", + "MetricName": "tma_info_thread_uptb", + "MetricThreshold": "tma_info_thread_uptb < 7.5" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Instruction TLB (ITLB) misses", + "MetricExpr": "ICACHE_64B.IFTAG_STALL / tma_info_thread_clks", + "MetricGroup": "BigFoot;FetchLat;MemoryTLB;TopdownL3;tma_L3_group;= tma_fetch_latency_group", + "MetricName": "tma_itlb_misses", + "MetricThreshold": "tma_itlb_misses > 0.05 & (tma_fetch_latency > = 0.1 & tma_frontend_bound > 0.15)", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to Instruction TLB (ITLB) misses. Sample with: FRONTE= ND_RETIRED.STLB_MISS_PS;FRONTEND_RETIRED.ITLB_MISS_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often the CPU was s= talled without loads missing the L1 data cache", + "MetricExpr": "max((CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY= .STALLS_L1D_MISS) / tma_info_thread_clks, 0)", + "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_issueL1;tma_issueMC;tma_memory_bound_group", + "MetricName": "tma_l1_bound", + "MetricThreshold": "tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 &= tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates how often the CPU was = stalled without loads missing the L1 data cache. The L1 data cache typical= ly has the shortest latency. However; in certain cases like loads blocked = on older stores; a load might suffer due to high latency even though it is = being satisfied by the L1. Another example is loads who miss in the TLB. Th= ese cases are characterized by execution unit stalls; while some non-comple= ted demand load lives in the machine without having that demand load missin= g the L1 cache. Sample with: MEM_LOAD_RETIRED.L1_HIT_PS;MEM_LOAD_RETIRED.FB= _HIT_PS. Related metrics: tma_clears_resteers, tma_machine_clears, tma_micr= ocode_sequencer, tma_ms_switches, tma_ports_utilized_1", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often the CPU was s= talled due to L2 cache accesses by loads", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "MEM_LOAD_RETIRED.L2_HIT * (1 + MEM_LOAD_RETIRED.FB_= HIT / MEM_LOAD_RETIRED.L1_MISS) / (MEM_LOAD_RETIRED.L2_HIT * (1 + MEM_LOAD_= RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + L1D_PEND_MISS.FB_FULL_PERIODS)= * ((CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_= info_thread_clks)", + "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", + "MetricName": "tma_l2_bound", + "MetricThreshold": "tma_l2_bound > 0.05 & (tma_memory_bound > 0.2 = & tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates how often the CPU was = stalled due to L2 cache accesses by loads. Avoiding cache misses (i.e. L1 = misses/L2 hits) can improve the latency and increase performance. Sample wi= th: MEM_LOAD_RETIRED.L2_HIT_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", + "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS) / tma_info_thread_clks", + "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", + "MetricName": "tma_l3_bound", + "MetricThreshold": "tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 = & tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates how often the CPU was = stalled due to loads accesses to L3 cache or contended with a sibling Core.= Avoiding cache misses (i.e. L2 misses/L3 hits) can improve the latency an= d increase performance. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited)", + "MetricExpr": "9 * tma_info_system_average_frequency * MEM_LOAD_RE= TIRED.L3_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2)= / tma_info_thread_clks", + "MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_issueLat;tma_= l3_bound_group", + "MetricName": "tma_l3_hit_latency", + "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles wi= th demand load accesses that hit the L3 cache under unloaded scenarios (pos= sibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L= 3 hits) will improve the latency; reduce contention with sibling physical c= ores and increase performance. Note the value of this node may overlap wit= h its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: t= ma_info_bottleneck_memory_latency, tma_mem_latency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles CPU= was stalled due to Length Changing Prefixes (LCPs)", + "MetricExpr": "ILD_STALL.LCP / tma_info_thread_clks", + "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_= group;tma_issueFB", + "MetricName": "tma_lcp", + "MetricThreshold": "tma_lcp > 0.05 & (tma_fetch_latency > 0.1 & tm= a_frontend_bound > 0.15)", + "PublicDescription": "This metric represents fraction of cycles CP= U was stalled due to Length Changing Prefixes (LCPs). Using proper compiler= flags or Intel Compiler by default will certainly avoid this. #Link: Optim= ization Guide about LCP BKMs. Related metrics: tma_dsb_switches, tma_fetch_= bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, t= ma_info_inst_mix_iptb", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring light-weight operations -- instructions that require= no more than one uop (micro-operation)", + "MetricExpr": "max(0, tma_retiring - tma_heavy_operations)", + "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", + "MetricName": "tma_light_operations", + "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", + "MetricExpr": "UOPS_DISPATCHED.PORT_2_3 / (2 * tma_info_core_core_= clks)", + "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", + "MetricName": "tma_load_op_utilization", + "MetricThreshold": "tma_load_op_utilization > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port for Load operations. Sample with: = UOPS_DISPATCHED.PORT_2_3", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates the fraction of= cycles where the (first level) DTLB was missed by load accesses, that late= r on hit in second-level TLB (STLB)", + "MetricExpr": "tma_dtlb_load - tma_load_stlb_miss", + "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_load_gro= up", + "MetricName": "tma_load_stlb_hit", + "MetricThreshold": "tma_load_stlb_hit > 0.05 & (tma_dtlb_load > 0.= 1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2= )))", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates the fraction of cycles = where the Second-level TLB (STLB) was missed by load accesses, performing a= hardware page walk", + "MetricExpr": "DTLB_LOAD_MISSES.WALK_ACTIVE / tma_info_thread_clks= ", + "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_load_gro= up", + "MetricName": "tma_load_stlb_miss", + "MetricThreshold": "tma_load_stlb_miss > 0.05 & (tma_dtlb_load > 0= .1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.= 2)))", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(16 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS= .ALL_RFO) + MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES * (10= * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTAN= DING.CYCLES_WITH_DEMAND_RFO))) / tma_info_thread_clks", + "MetricGroup": "Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_l1= _bound_group", + "MetricName": "tma_lock_latency", + "MetricThreshold": "tma_lock_latency > 0.2 & (tma_l1_bound > 0.1 &= (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU spent handling cache misses due to lock operations. Due to the microa= rchitecture handling of locks; they are classified as L1_Bound regardless o= f what memory source satisfied them. Sample with: MEM_INST_RETIRED.LOCK_LOA= DS_PS. Related metrics: tma_store_latency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to LSD (Loop Stream Detector) unit", + "MetricExpr": "(LSD.CYCLES_ACTIVE - LSD.CYCLES_OK) / tma_info_core= _core_clks / 2", + "MetricGroup": "FetchBW;LSD;TopdownL3;tma_L3_group;tma_fetch_bandw= idth_group", + "MetricName": "tma_lsd", + "MetricThreshold": "tma_lsd > 0.15 & (tma_fetch_bandwidth > 0.1 & = tma_frontend_bound > 0.15 & tma_info_thread_ipc / 5 > 0.35)", + "PublicDescription": "This metric represents Core fraction of cycl= es in which CPU was likely limited due to LSD (Loop Stream Detector) unit. = LSD typically does well sustaining Uop supply. However; in some rare cases= ; optimal uop-delivery could not be reached for small loops whose size (in = terms of number of uops) does not suit well the LSD structure.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Machine Clears", + "MetricExpr": "max(0, tma_bad_speculation - tma_branch_mispredicts= )", + "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", + "MetricName": "tma_machine_clears", + "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles wher= e the core's performance was likely hurt due to approaching bandwidth limit= s of external memory (DRAM)", + "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, cpu@OFFCORE_REQUESTS_O= UTSTANDING.ALL_DATA_RD\\,cmask\\=3D4@) / tma_info_thread_clks", + "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_dram_b= ound_group;tma_issueBW", + "MetricName": "tma_mem_bandwidth", + "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory (DRAM). The underlying heuristic assumes that a simi= lar off-core traffic is generated by all IA cores. This metric does not agg= regate non-data-read requests by this logical processor; requests from othe= r IA Logical Processors/Physical Cores/sockets; or other non-IA devices lik= e GPU; hence the maximum external memory bandwidth limits may or may not be= approached when this metric is flagged (see Uncore counters for that). Rel= ated metrics: tma_fb_full, tma_info_bottleneck_memory_bandwidth, tma_info_s= ystem_dram_bw_use, tma_sq_full", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles wher= e the performance was likely hurt due to latency from external memory (DRAM= )", + "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTST= ANDING.CYCLES_WITH_DATA_RD) / tma_info_thread_clks - tma_mem_bandwidth", + "MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_dram_= bound_group;tma_issueLat", + "MetricName": "tma_mem_latency", + "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory (DRA= M). This metric does not aggregate requests from other Logical Processors/= Physical Cores/sockets (see Uncore counters for that). Related metrics: tma= _info_bottleneck_memory_latency, tma_l3_hit_latency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", + "MetricExpr": "(CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND= _ON_STORES) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + t= ma_retiring * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES) * = tma_backend_bound", + "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", + "MetricName": "tma_memory_bound", + "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring memory operations -- uops for memory load or store a= ccesses.", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "tma_light_operations * MEM_INST_RETIRED.ANY / INST_= RETIRED.ANY", + "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", + "MetricName": "tma_memory_operations", + "MetricThreshold": "tma_memory_operations > 0.1 & tma_light_operat= ions > 0.6", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = CPU was retiring uops fetched by the Microcode Sequencer (MS) unit", + "MetricExpr": "tma_retiring * tma_info_thread_slots / UOPS_ISSUED.= ANY * IDQ.MS_UOPS / tma_info_thread_slots", + "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_heavy_operatio= ns_group;tma_issueMC;tma_issueMS", + "MetricName": "tma_microcode_sequencer", + "MetricThreshold": "tma_microcode_sequencer > 0.05 & tma_heavy_ope= rations > 0.1", + "PublicDescription": "This metric represents fraction of slots the= CPU was retiring uops fetched by the Microcode Sequencer (MS) unit. The M= S is used for CISC instructions not supported by the default decoders (like= repeat move strings; or CPUID); or by microcode assists used to address so= me operation modes (like in Floating Point assists). These cases can often = be avoided. Sample with: IDQ.MS_UOPS. Related metrics: tma_clears_resteers,= tma_l1_bound, tma_machine_clears, tma_ms_switches", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers as a result of Branch Misprediction= at execution stage", + "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL= _BRANCHES + MACHINE_CLEARS.COUNT) * INT_MISC.CLEAR_RESTEER_CYCLES / tma_inf= o_thread_clks", + "MetricGroup": "BadSpec;BrMispredicts;TopdownL4;tma_L4_group;tma_b= ranch_resteers_group;tma_issueBM", + "MetricName": "tma_mispredicts_resteers", + "MetricThreshold": "tma_mispredicts_resteers > 0.05 & (tma_branch_= resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to Branch Resteers as a result of Branch Mispredictio= n at execution stage. Sample with: INT_MISC.CLEAR_RESTEER_CYCLES. Related m= etrics: tma_branch_mispredicts, tma_info_bad_spec_branch_misprediction_cost= , tma_info_bottleneck_mispredictions", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the MITE pipeline (the legacy deco= de pipeline)", + "MetricExpr": "(IDQ.MITE_CYCLES_ANY - IDQ.MITE_CYCLES_OK) / tma_in= fo_core_core_clks / 2", + "MetricGroup": "DSBmiss;FetchBW;TopdownL3;tma_L3_group;tma_fetch_b= andwidth_group", + "MetricName": "tma_mite", + "MetricThreshold": "tma_mite > 0.1 & (tma_fetch_bandwidth > 0.1 & = tma_frontend_bound > 0.15 & tma_info_thread_ipc / 5 > 0.35)", + "PublicDescription": "This metric represents Core fraction of cycl= es in which CPU was likely limited due to the MITE pipeline (the legacy dec= ode pipeline). This pipeline is used for code that was not pre-cached in th= e DSB or LSD. For example; inefficiencies due to asymmetric decoders; use o= f long immediate or LCP can manifest as MITE fetch bandwidth bottleneck. Sa= mple with: FRONTEND_RETIRED.ANY_DSB_MISS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles whe= re (only) 4 uops were delivered by the MITE pipeline", + "MetricExpr": "(cpu@IDQ.MITE_UOPS\\,cmask\\=3D4@ - cpu@IDQ.MITE_UO= PS\\,cmask\\=3D5@) / tma_info_thread_clks", + "MetricGroup": "DSBmiss;FetchBW;TopdownL4;tma_L4_group;tma_mite_gr= oup", + "MetricName": "tma_mite_4wide", + "MetricThreshold": "tma_mite_4wide > 0.05 & (tma_mite > 0.1 & (tma= _fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_thread_ipc / = 5 > 0.35))", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "The Mixing_Vectors metric gives the percentag= e of injected blend uops out of all uops issued", + "MetricExpr": "UOPS_ISSUED.VECTOR_WIDTH_MISMATCH / UOPS_ISSUED.ANY= ", + "MetricGroup": "TopdownL5;tma_L5_group;tma_issueMV;tma_ports_utili= zed_0_group", + "MetricName": "tma_mixing_vectors", + "MetricThreshold": "tma_mixing_vectors > 0.05", + "PublicDescription": "The Mixing_Vectors metric gives the percenta= ge of injected blend uops out of all uops issued. Usually a Mixing_Vectors = over 5% is worth investigating. Read more in Appendix B1 of the Optimizatio= ns Guide for this topic. Related metrics: tma_ms_switches", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates the fraction of cycles = when the CPU was stalled due to switches of uop delivery to the Microcode S= equencer (MS)", + "MetricExpr": "3 * IDQ.MS_SWITCHES / tma_info_thread_clks", + "MetricGroup": "FetchLat;MicroSeq;TopdownL3;tma_L3_group;tma_fetch= _latency_group;tma_issueMC;tma_issueMS;tma_issueMV;tma_issueSO", + "MetricName": "tma_ms_switches", + "MetricThreshold": "tma_ms_switches > 0.05 & (tma_fetch_latency > = 0.1 & tma_frontend_bound > 0.15)", + "PublicDescription": "This metric estimates the fraction of cycles= when the CPU was stalled due to switches of uop delivery to the Microcode = Sequencer (MS). Commonly used instructions are optimized for delivery by th= e DSB (decoded i-cache) or MITE (legacy instruction decode) pipelines. Cert= ain operations cannot be handled natively by the execution pipeline; and mu= st be performed by microcode (small programs injected into the execution st= ream). Switching to the MS too often can negatively impact performance. The= MS is designated to deliver long uop flows required by CISC instructions l= ike CPUID; or uncommon conditions like Floating Point Assists when dealing = with Denormals. Sample with: IDQ.MS_SWITCHES. Related metrics: tma_clears_r= esteers, tma_l1_bound, tma_machine_clears, tma_microcode_sequencer, tma_mix= ing_vectors, tma_serializing_operation", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring NOP (no op) instructions", + "MetricExpr": "tma_light_operations * INST_RETIRED.NOP / (tma_reti= ring * tma_info_thread_slots)", + "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", + "MetricName": "tma_nop_instructions", + "MetricThreshold": "tma_nop_instructions > 0.1 & tma_light_operati= ons > 0.6", + "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring NOP (no op) instructions. Compilers often use NOPs = for certain address alignments - e.g. start address of a function or loop b= ody. Sample with: INST_RETIRED.NOP", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_m= emory_operations + tma_branch_instructions + tma_nop_instructions))", + "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", + "MetricName": "tma_other_light_ops", + "MetricThreshold": "tma_other_light_ops > 0.3 & tma_light_operatio= ns > 0.6", + "PublicDescription": "This metric represents the remaining light u= ops fraction the CPU has executed - remaining means not covered by other si= bling nodes. May undercount due to FMA double counting", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd b= ranch)", + "MetricExpr": "UOPS_DISPATCHED.PORT_0 / tma_info_core_core_clks", + "MetricGroup": "Compute;TopdownL6;tma_L6_group;tma_alu_op_utilizat= ion_group;tma_issue2P", + "MetricName": "tma_port_0", + "MetricThreshold": "tma_port_0 > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd = branch). Sample with: UOPS_DISPATCHED.PORT_0. Related metrics: tma_fp_scala= r, tma_fp_vector, tma_fp_vector_128b, tma_fp_vector_256b, tma_fp_vector_512= b, tma_port_1, tma_port_5, tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port 1 (ALU)", + "MetricExpr": "UOPS_DISPATCHED.PORT_1 / tma_info_core_core_clks", + "MetricGroup": "TopdownL6;tma_L6_group;tma_alu_op_utilization_grou= p;tma_issue2P", + "MetricName": "tma_port_1", + "MetricThreshold": "tma_port_1 > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port 1 (ALU). Sample with: UOPS_DISPATC= HED.PORT_1. Related metrics: tma_fp_scalar, tma_fp_vector, tma_fp_vector_12= 8b, tma_fp_vector_256b, tma_fp_vector_512b, tma_port_0, tma_port_5, tma_por= t_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port 5 ([SNB+] Branches and ALU; [HSW+] = ALU)", + "MetricExpr": "UOPS_DISPATCHED.PORT_5 / tma_info_core_core_clks", + "MetricGroup": "TopdownL6;tma_L6_group;tma_alu_op_utilization_grou= p;tma_issue2P", + "MetricName": "tma_port_5", + "MetricThreshold": "tma_port_5 > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port 5 ([SNB+] Branches and ALU; [HSW+]= ALU). Sample with: UOPS_DISPATCHED.PORT_5. Related metrics: tma_fp_scalar,= tma_fp_vector, tma_fp_vector_128b, tma_fp_vector_256b, tma_fp_vector_512b,= tma_port_0, tma_port_1, tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port 6 ([HSW+]Primary Branch and simple = ALU)", + "MetricExpr": "UOPS_DISPATCHED.PORT_6 / tma_info_core_core_clks", + "MetricGroup": "TopdownL6;tma_L6_group;tma_alu_op_utilization_grou= p;tma_issue2P", + "MetricName": "tma_port_6", + "MetricThreshold": "tma_port_6 > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port 6 ([HSW+]Primary Branch and simple= ALU). Sample with: UOPS_DISPATCHED.PORT_6. Related metrics: tma_fp_scalar,= tma_fp_vector, tma_fp_vector_128b, tma_fp_vector_256b, tma_fp_vector_512b,= tma_port_0, tma_port_1, tma_port_5, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricExpr": "((cpu@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=3D0x80@ += tma_serializing_operation * (CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.= STALLS_MEM_ANY) + (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring * EXE_ACTIVITY.= 2_PORTS_UTIL)) / tma_info_thread_clks if ARITH.DIVIDER_ACTIVE < CYCLE_ACTIV= ITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALLS_MEM_ANY else (EXE_ACTIVITY.1_PORTS= _UTIL + tma_retiring * EXE_ACTIVITY.2_PORTS_UTIL) / tma_info_thread_clks)", + "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", + "MetricName": "tma_ports_utilization", + "MetricThreshold": "tma_ports_utilization > 0.15 & (tma_core_bound= > 0.1 & tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates fraction of cycles the= CPU performance was potentially limited due to Core computation issues (no= n divider-related). Two distinct categories can be attributed into this me= tric: (1) heavy data-dependency among contiguous instructions would manifes= t in this metric - such cases are often referred to as low Instruction Leve= l Parallelism (ILP). (2) Contention on some hardware execution unit other t= han Divider. For example; when there are too many multiply operations.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", + "MetricExpr": "cpu@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=3D0x80@ / t= ma_info_thread_clks + tma_serializing_operation * (CYCLE_ACTIVITY.STALLS_TO= TAL - CYCLE_ACTIVITY.STALLS_MEM_ANY) / tma_info_thread_clks", + "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", + "MetricName": "tma_ports_utilized_0", + "MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles CP= U executed no uops on any execution port (Logical Processor cycles since IC= L, Physical Core cycles otherwise). Long-latency instructions like divides = may contribute to this metric.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricExpr": "EXE_ACTIVITY.1_PORTS_UTIL / tma_info_thread_clks", + "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", + "MetricName": "tma_ports_utilized_1", + "MetricThreshold": "tma_ports_utilized_1 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles wh= ere the CPU executed total of 1 uop per cycle on all execution ports (Logic= al Processor cycles since ICL, Physical Core cycles otherwise). This can be= due to heavy data-dependency among software instructions; or over oversubs= cribing a particular hardware resource. In some other cases with high 1_Por= t_Utilized and L1_Bound; this metric can point to L1 data-cache latency bot= tleneck that may not necessarily manifest with complete execution starvatio= n (due to the short L1 latency e.g. walking a linked list) - looking at the= assembly can be helpful. Sample with: EXE_ACTIVITY.1_PORTS_UTIL. Related m= etrics: tma_l1_bound", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", + "MetricExpr": "EXE_ACTIVITY.2_PORTS_UTIL / tma_info_thread_clks", + "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", + "MetricName": "tma_ports_utilized_2", + "MetricThreshold": "tma_ports_utilized_2 > 0.15 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles CP= U executed total of 2 uops per cycle on all execution ports (Logical Proces= sor cycles since ICL, Physical Core cycles otherwise). Loop Vectorization = -most compilers feature auto-Vectorization options today- reduces pressure = on the execution ports as multiple elements are calculated with same uop. S= ample with: EXE_ACTIVITY.2_PORTS_UTIL. Related metrics: tma_fp_scalar, tma_= fp_vector, tma_fp_vector_128b, tma_fp_vector_256b, tma_fp_vector_512b, tma_= port_0, tma_port_1, tma_port_5, tma_port_6", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 3 or more uops per cycle on all execution ports (Logical= Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricExpr": "UOPS_EXECUTED.CYCLES_GE_3 / tma_info_thread_clks", + "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", + "MetricName": "tma_ports_utilized_3m", + "MetricThreshold": "tma_ports_utilized_3m > 0.7 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles CP= U executed total of 3 or more uops per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise). Sample with:= UOPS_EXECUTED.CYCLES_GE_3", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", + "MetricName": "tma_retiring", + "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1;Default", + "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU issue-pipeline was stalled due to serializing operations", + "MetricExpr": "RESOURCE_STALLS.SCOREBOARD / tma_info_thread_clks", + "MetricGroup": "PortsUtil;TopdownL5;tma_L5_group;tma_issueSO;tma_p= orts_utilized_0_group", + "MetricName": "tma_serializing_operation", + "MetricThreshold": "tma_serializing_operation > 0.1 & (tma_ports_u= tilized_0 > 0.2 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & t= ma_backend_bound > 0.2)))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU issue-pipeline was stalled due to serializing operations. Instruction= s like CPUID; WRMSR or LFENCE serialize the out-of-order execution which ma= y limit performance. Sample with: RESOURCE_STALLS.SCOREBOARD. Related metri= cs: tma_ms_switches", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to PAUSE Instructions", + "MetricExpr": "140 * MISC_RETIRED.PAUSE_INST / tma_info_thread_clk= s", + "MetricGroup": "TopdownL6;tma_L6_group;tma_serializing_operation_g= roup", + "MetricName": "tma_slow_pause", + "MetricThreshold": "tma_slow_pause > 0.05 & (tma_serializing_opera= tion > 0.1 & (tma_ports_utilized_0 > 0.2 & (tma_ports_utilization > 0.15 & = (tma_core_bound > 0.1 & tma_backend_bound > 0.2))))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to PAUSE Instructions. Sample with: MISC_RETIRED.PAUS= E_INST", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles hand= ling memory load split accesses - load that cross 64-byte cache line bounda= ry", + "MetricExpr": "tma_info_memory_load_miss_real_latency * LD_BLOCKS.= NO_SR / tma_info_thread_clks", + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", + "MetricName": "tma_split_loads", + "MetricThreshold": "tma_split_loads > 0.2 & (tma_l1_bound > 0.1 & = (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles han= dling memory load split accesses - load that cross 64-byte cache line bound= ary. Sample with: MEM_INST_RETIRED.SPLIT_LOADS_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents rate of split store ac= cesses", + "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", + "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", + "MetricName": "tma_split_stores", + "MetricThreshold": "tma_split_stores > 0.2 & (tma_store_bound > 0.= 2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents rate of split store a= ccesses. Consider aligning your data to the 64-byte cache line granularity= . Sample with: MEM_INST_RETIRED.SPLIT_STORES_PS. Related metrics: tma_port_= 4", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric measures fraction of cycles where= the Super Queue (SQ) was full taking into account all request-types and bo= th hardware SMT threads (Logical Processors)", + "MetricExpr": "L1D_PEND_MISS.L2_STALL / tma_info_thread_clks", + "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_issueB= W;tma_l3_bound_group", + "MetricName": "tma_sq_full", + "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_fb_full= , tma_info_bottleneck_memory_bandwidth, tma_info_system_dram_bw_use, tma_me= m_bandwidth", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often CPU was stall= ed due to RFO store memory accesses; RFO store issue a read-for-ownership = request before the write", + "MetricExpr": "EXE_ACTIVITY.BOUND_ON_STORES / tma_info_thread_clks= ", + "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_me= mory_bound_group", + "MetricName": "tma_store_bound", + "MetricThreshold": "tma_store_bound > 0.2 & (tma_memory_bound > 0.= 2 & tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates how often CPU was stal= led due to RFO store memory accesses; RFO store issue a read-for-ownership= request before the write. Even though store accesses do not typically stal= l out-of-order CPUs; there are few cases where stores can lead to actual st= alls. This metric will be flagged should RFO stores be a bottleneck. Sample= with: MEM_INST_RETIRED.ALL_STORES_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", + "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", + "MetricName": "tma_store_fwd_blk", + "MetricThreshold": "tma_store_fwd_blk > 0.1 & (tma_l1_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric roughly estimates fraction of cy= cles when the memory subsystem had loads blocked since they could not forwa= rd data from earlier (in program order) overlapping stores. To streamline m= emory operations in the pipeline; a load can avoid waiting for memory if a = prior in-flight store is writing the data that the load wants to read (stor= e forwarding process). However; in some cases the load may be blocked for a= significant time pending the store forward. For example; when the prior st= ore is writing a smaller region than the load is reading.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles the = CPU spent handling L1D store misses", + "MetricExpr": "(L2_RQSTS.RFO_HIT * 10 * (1 - MEM_INST_RETIRED.LOCK= _LOADS / MEM_INST_RETIRED.ALL_STORES) + (1 - MEM_INST_RETIRED.LOCK_LOADS / = MEM_INST_RETIRED.ALL_STORES) * min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUEST= S_OUTSTANDING.CYCLES_WITH_DEMAND_RFO)) / tma_info_thread_clks", + "MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_issue= RFO;tma_issueSL;tma_store_bound_group", + "MetricName": "tma_store_latency", + "MetricThreshold": "tma_store_latency > 0.1 & (tma_store_bound > 0= .2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles the= CPU spent handling L1D store misses. Store accesses usually less impact ou= t-of-order core performance; however; holding resources for longer time can= lead into undesired implications (e.g. contention on L1D fill-buffer entri= es - see FB_Full). Related metrics: tma_fb_full, tma_lock_latency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Store operations", + "MetricExpr": "(UOPS_DISPATCHED.PORT_4_9 + UOPS_DISPATCHED.PORT_7_= 8) / (4 * tma_info_core_core_clks)", + "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", + "MetricName": "tma_store_op_utilization", + "MetricThreshold": "tma_store_op_utilization > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port for Store operations. Sample with:= UOPS_DISPATCHED.PORT_7_8", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates the fraction of= cycles where the TLB was missed by store accesses, hitting in the second-l= evel TLB (STLB)", + "MetricExpr": "tma_dtlb_store - tma_store_stlb_miss", + "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_store_gr= oup", + "MetricName": "tma_store_stlb_hit", + "MetricThreshold": "tma_store_stlb_hit > 0.05 & (tma_dtlb_store > = 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound= > 0.2)))", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates the fraction of cycles = where the STLB was missed by store accesses, performing a hardware page wal= k", + "MetricExpr": "DTLB_STORE_MISSES.WALK_ACTIVE / tma_info_core_core_= clks", + "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_store_gr= oup", + "MetricName": "tma_store_stlb_miss", + "MetricThreshold": "tma_store_stlb_miss > 0.05 & (tma_dtlb_store >= 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_boun= d > 0.2)))", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often CPU was stall= ed due to Streaming store memory accesses; Streaming store optimize out a = read request required by RFO stores", + "MetricExpr": "9 * OCR.STREAMING_WR.ANY_RESPONSE / tma_info_thread= _clks", + "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_issueS= mSt;tma_store_bound_group", + "MetricName": "tma_streaming_stores", + "MetricThreshold": "tma_streaming_stores > 0.2 & (tma_store_bound = > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates how often CPU was stal= led due to Streaming store memory accesses; Streaming store optimize out a= read request required by RFO stores. Even though store accesses do not typ= ically stall out-of-order CPUs; there are few cases where stores can lead t= o actual stalls. This metric will be flagged should Streaming stores be a b= ottleneck. Sample with: OCR.STREAMING_WR.ANY_RESPONSE. Related metrics: tma= _fb_full", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to new branch address clears", + "MetricExpr": "10 * BACLEARS.ANY / tma_info_thread_clks", + "MetricGroup": "BigFoot;FetchLat;TopdownL4;tma_L4_group;tma_branch= _resteers_group", + "MetricName": "tma_unknown_branches", + "MetricThreshold": "tma_unknown_branches > 0.05 & (tma_branch_rest= eers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to new branch address clears. These are fetched branc= hes the Branch Prediction Unit was unable to recognize (e.g. first time the= branch is fetched or hitting BTB capacity limit). Sample with: BACLEARS.AN= Y", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric serves as an approximation of leg= acy x87 usage", + "MetricExpr": "tma_retiring * UOPS_EXECUTED.X87 / UOPS_EXECUTED.TH= READ", + "MetricGroup": "Compute;TopdownL4;tma_L4_group;tma_fp_arith_group"= , + "MetricName": "tma_x87_use", + "MetricThreshold": "tma_x87_use > 0.1 & (tma_fp_arith > 0.2 & tma_= light_operations > 0.6)", + "PublicDescription": "This metric serves as an approximation of le= gacy x87 usage. It accounts for instructions beyond X87 FP arithmetic opera= tions; hence may be used as a thermometer to avoid X87 high usage and prefe= rably upgrade to modern ISA. See Tip under Tuning Hint.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "Percentage of cycles in aborted transactions.= ", + "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_e= vent(cycles\\-t) else 0)", + "MetricGroup": "transaction", + "MetricName": "tsx_aborted_cycles", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "Number of cycles within a transaction divided= by the number of elisions.", + "MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) e= lse 0)", + "MetricGroup": "transaction", + "MetricName": "tsx_cycles_per_elision", + "ScaleUnit": "1cycles / elision" + }, + { + "BriefDescription": "Number of cycles within a transaction divided= by the number of transactions.", + "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) e= lse 0)", + "MetricGroup": "transaction", + "MetricName": "tsx_cycles_per_transaction", + "ScaleUnit": "1cycles / transaction" + }, + { + "BriefDescription": "Percentage of cycles within a transaction reg= ion.", + "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else = 0)", + "MetricGroup": "transaction", + "MetricName": "tsx_transactional_cycles", + "ScaleUnit": "100%" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/uncore-interconnect.= json b/tools/perf/pmu-events/arch/x86/rocketlake/uncore-interconnect.json new file mode 100644 index 000000000000..8027590f1776 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/uncore-interconnect.json @@ -0,0 +1,74 @@ +[ + { + "BriefDescription": "Number of entries allocated. Account for Any = type: e.g. Snoop, etc.", + "EventCode": "0x84", + "EventName": "UNC_ARB_COH_TRK_REQUESTS.ALL", + "PerPkg": "1", + "UMask": "0x1", + "Unit": "ARB" + }, + { + "BriefDescription": "Each cycle counts number of any coherent requ= est at memory controller that were issued by any core. This event is not su= pported on ICL products but is supported on RKL products.", + "EventCode": "0x85", + "EventName": "UNC_ARB_DAT_OCCUPANCY.ALL", + "PerPkg": "1", + "UMask": "0x1", + "Unit": "ARB" + }, + { + "BriefDescription": "Each cycle counts number of coherent reads pe= nding on data return from memory controller that were issued by any core. T= his event is not supported on ICL products but is supported on RKL products= .", + "EventCode": "0x85", + "EventName": "UNC_ARB_DAT_OCCUPANCY.RD", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "ARB" + }, + { + "BriefDescription": "Each cycle count number of 'valid' coherent D= ata Read entries . Such entry is defined as valid when it is allocated till= deallocation. Doesn't include prefetches. This event is not supported on I= CL products but is supported on RKL products.", + "EventCode": "0x80", + "EventName": "UNC_ARB_REQ_TRK_OCCUPANCY.DRD", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "ARB" + }, + { + "BriefDescription": "Number of all coherent Data Read entries. Doe= sn't include prefetches", + "EventCode": "0x81", + "EventName": "UNC_ARB_REQ_TRK_REQUEST.DRD", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "ARB" + }, + { + "BriefDescription": "Each cycle counts number of all outgoing vali= d entries in ReqTrk. Such entry is defined as valid from its allocation in = ReqTrk till deallocation. Accounts for Coherent and non-coherent traffic. T= his event is not supported on ICL products but is supported on RKL products= .", + "EventCode": "0x80", + "EventName": "UNC_ARB_TRK_OCCUPANCY.ALL", + "PerPkg": "1", + "UMask": "0x1", + "Unit": "ARB" + }, + { + "BriefDescription": "Each cycle count number of 'valid' coherent D= ata Read entries . Such entry is defined as valid when it is allocated till= deallocation. Doesn't include prefetches. This event is not supported on I= CL products but is supported on RKL products.", + "EventCode": "0x80", + "EventName": "UNC_ARB_TRK_OCCUPANCY.RD", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "ARB" + }, + { + "BriefDescription": "Total number of all outgoing entries allocate= d. Accounts for Coherent and non-coherent traffic.", + "EventCode": "0x81", + "EventName": "UNC_ARB_TRK_REQUESTS.ALL", + "PerPkg": "1", + "UMask": "0x1", + "Unit": "ARB" + }, + { + "BriefDescription": "Number of all coherent Data Read entries. Doe= sn't include prefetches. This event is not supported on ICL products but is= supported on RKL products.", + "EventCode": "0x81", + "EventName": "UNC_ARB_TRK_REQUESTS.RD", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "ARB" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/uncore-other.json b/= tools/perf/pmu-events/arch/x86/rocketlake/uncore-other.json new file mode 100644 index 000000000000..c6596ba09195 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/uncore-other.json @@ -0,0 +1,9 @@ +[ + { + "BriefDescription": "UNC_CLOCK.SOCKET", + "EventCode": "0xff", + "EventName": "UNC_CLOCK.SOCKET", + "PerPkg": "1", + "Unit": "CLOCK" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/virtual-memory.json = b/tools/perf/pmu-events/arch/x86/rocketlake/virtual-memory.json new file mode 100644 index 000000000000..b28f62ce1f39 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/virtual-memory.json @@ -0,0 +1,165 @@ +[ + { + "BriefDescription": "Loads that miss the DTLB and hit the STLB.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.STLB_HIT", + "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB).", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Cycles when at least one PMH is busy with a p= age walk for a demand load.", + "CounterMask": "1", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_ACTIVE", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Load miss in all TLB levels causes a page wal= k that completes. (All page sizes)", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault.", + "SampleAfterValue": "100003", + "UMask": "0xe" + }, + { + "BriefDescription": "Page walks completed due to a demand data loa= d to a 2M/4M page.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Page walks completed due to a demand data loa= d to a 4K page.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_4K", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of page walks outstanding for a demand= load in the PMH each cycle.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Stores that miss the DTLB and hit the STLB.", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.STLB_HIT", + "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB).", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Cycles when at least one PMH is busy with a p= age walk for a store.", + "CounterMask": "1", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_ACTIVE", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Store misses in all TLB levels causes a page = walk that completes. (All page sizes)", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault.", + "SampleAfterValue": "100003", + "UMask": "0xe" + }, + { + "BriefDescription": "Page walks completed due to a demand data sto= re to a 2M/4M page.", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Page walks completed due to a demand data sto= re to a 4K page.", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_4K", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of page walks outstanding for a store = in the PMH each cycle.", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Instruction fetch requests that miss the ITLB= and hit the STLB.", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.STLB_HIT", + "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB).", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Cycles when at least one PMH is busy with a p= age walk for code (instruction fetch) request.", + "CounterMask": "1", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_ACTIVE", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= .", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Code miss in all TLB levels causes a page wal= k that completes. (All page sizes)", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_COMPLETED", + "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= .", + "SampleAfterValue": "100003", + "UMask": "0xe" + }, + { + "BriefDescription": "Code miss in all TLB levels causes a page wal= k that completes. (2M/4M)", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M", + "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Code miss in all TLB levels causes a page wal= k that completes. (4K)", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_COMPLETED_4K", + "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= ", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of page walks outstanding for an outst= anding code request in the PMH each cycle.", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "DTLB flush attempts of the thread-specific en= tries", + "EventCode": "0xBD", + "EventName": "TLB_FLUSH.DTLB_THREAD", + "PublicDescription": "Counts the number of DTLB flush attempts of = the thread-specific entries.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "STLB flush attempts", + "EventCode": "0xBD", + "EventName": "TLB_FLUSH.STLB_ANY", + "PublicDescription": "Counts the number of any STLB flush attempts= (such as entire, VPID, PCID, InvPage, CR3 write, etc.).", + "SampleAfterValue": "100007", + "UMask": "0x20" + } +] --=20 2.41.0.162.gfafddb0af9-goog