Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33;
Date:   Tue, 12 Sep 2023 17:34:52 -0300
From:   Arnaldo Carvalho de Melo <acme@kernel.org>
To:     Jing Zhang <renyu.zj@linux.alibaba.com>
Cc:     Ian Rogers <irogers@google.com>,
        John Garry <john.g.garry@oracle.com>,
        Will Deacon <will@kernel.org>,
        James Clark <james.clark@arm.com>,
        Mark Rutland <mark.rutland@arm.com>,
        Mike Leach <mike.leach@linaro.org>,
        Leo Yan <leo.yan@linaro.org>,
        Namhyung Kim <namhyung@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Ingo Molnar <mingo@redhat.com>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Jiri Olsa <jolsa@kernel.org>,
        Adrian Hunter <adrian.hunter@intel.com>,
        linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
        linux-perf-users@vger.kernel.org, linux-doc@vger.kernel.org,
        Zhuo Song <zhuo.song@linux.alibaba.com>,
        Shuai Xue <xueshuai@linux.alibaba.com>
Subject: Re: [PATCH v8 3/8] perf vendor events: Supplement the omitted
 EventCode
Message-ID: <ZQDLbBqWY6foLgB+@kernel.org>
References: <1694087913-46144-1-git-send-email-renyu.zj@linux.alibaba.com>
 <1694087913-46144-4-git-send-email-renyu.zj@linux.alibaba.com>
 <CAP-5=fUyj_Xkkbx5W2Kr88BwZLF1QQ9XcBsbnumnHZM_P+0t2g@mail.gmail.com>
 <01eecef3-a918-a6d0-6f9f-d3b99c9680a8@linux.alibaba.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <01eecef3-a918-a6d0-6f9f-d3b99c9680a8@linux.alibaba.com>
Precedence: bulk

Em Mon, Sep 11, 2023 at 10:41:16AM +0800, Jing Zhang escreveu:
>=20
>=20
> =E5=9C=A8 2023/9/9 =E4=B8=8A=E5=8D=885:18, Ian Rogers =E5=86=99=E9=81=93:
> > On Thu, Sep 7, 2023 at 4:58=E2=80=AFAM Jing Zhang <renyu.zj@linux.aliba=
ba.com> wrote:
> >>
> >> If there is an "event=3D0" in the event description, the EventCode can
> >> be omitted in the JSON file, and jevent.py will automatically fill in
> >> "event=3D0" during parsing.
> >>
> >> However, for some events where EventCode and ConfigCode are missing,
> >> it is not necessary to automatically fill in "event=3D0", such as the
> >> CMN event description which is typically "type=3Dxxx, eventid=3Dxxx".
> >>
> >> Therefore, before modifying jevent.py to prevent it from automatically
> >> adding "event=3D0" by default, it is necessary to fill in all omitted
> >> EventCodes first.
> >>
> >> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> >=20
> > I thought you were going to change the behavior in jevents.py so this
> > change would be unnecessary. The next time the json is generated by
> > the script:
> > https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
> > then this will break. It seems easier to workaround the issue in jevent=
s.py.
> >=20
>=20
> Okay, I will workaround the issue in jevents.py. Thank you!

So this means you will resubmit the whole 8-patches long series or
should we merge this one and then get a followup patch?

- Arnaldo
=20
> > Thanks,
> > Ian
> >=20
> >> ---
> >>  tools/perf/pmu-events/arch/x86/alderlake/pipeline.json     |  9 +++++=
++++
> >>  tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json    |  3 +++
> >>  tools/perf/pmu-events/arch/x86/broadwell/pipeline.json     |  4 ++++
> >>  tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json   |  4 ++++
> >>  .../perf/pmu-events/arch/x86/broadwellde/uncore-cache.json |  2 ++
> >>  .../arch/x86/broadwellde/uncore-interconnect.json          |  1 +
> >>  .../pmu-events/arch/x86/broadwellde/uncore-memory.json     |  1 +
> >>  .../perf/pmu-events/arch/x86/broadwellde/uncore-power.json |  1 +
> >>  tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json    |  4 ++++
> >>  .../perf/pmu-events/arch/x86/broadwellx/uncore-cache.json  |  2 ++
> >>  .../arch/x86/broadwellx/uncore-interconnect.json           | 13 +++++=
++++++++
> >>  .../perf/pmu-events/arch/x86/broadwellx/uncore-memory.json |  2 ++
> >>  .../perf/pmu-events/arch/x86/broadwellx/uncore-power.json  |  1 +
> >>  tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json  |  4 ++++
> >>  .../pmu-events/arch/x86/cascadelakex/uncore-cache.json     |  2 ++
> >>  .../arch/x86/cascadelakex/uncore-interconnect.json         |  1 +
> >>  tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json |  1 +
> >>  .../pmu-events/arch/x86/cascadelakex/uncore-memory.json    |  1 +
> >>  .../pmu-events/arch/x86/cascadelakex/uncore-power.json     |  1 +
> >>  tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json   |  2 ++
> >>  tools/perf/pmu-events/arch/x86/goldmont/pipeline.json      |  3 +++
> >>  tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json  |  3 +++
> >>  tools/perf/pmu-events/arch/x86/grandridge/pipeline.json    |  3 +++
> >>  tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json |  4 ++++
> >>  tools/perf/pmu-events/arch/x86/haswell/pipeline.json       |  4 ++++
> >>  tools/perf/pmu-events/arch/x86/haswellx/pipeline.json      |  4 ++++
> >>  tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json  |  2 ++
> >>  .../pmu-events/arch/x86/haswellx/uncore-interconnect.json  | 14 +++++=
+++++++++
> >>  tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json |  2 ++
> >>  tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json  |  1 +
> >>  tools/perf/pmu-events/arch/x86/icelake/pipeline.json       |  5 +++++
> >>  tools/perf/pmu-events/arch/x86/icelakex/pipeline.json      |  5 +++++
> >>  tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json  |  1 +
> >>  .../pmu-events/arch/x86/icelakex/uncore-interconnect.json  |  1 +
> >>  tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json |  1 +
> >>  tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json  |  1 +
> >>  tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json     |  3 +++
> >>  tools/perf/pmu-events/arch/x86/ivytown/pipeline.json       |  4 ++++
> >>  tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json   |  2 ++
> >>  .../pmu-events/arch/x86/ivytown/uncore-interconnect.json   | 11 +++++=
++++++
> >>  tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json  |  1 +
> >>  tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json   |  1 +
> >>  tools/perf/pmu-events/arch/x86/jaketown/pipeline.json      |  4 ++++
> >>  tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json  |  2 ++
> >>  .../pmu-events/arch/x86/jaketown/uncore-interconnect.json  | 12 +++++=
+++++++
> >>  tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json |  1 +
> >>  tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json  |  2 ++
> >>  .../perf/pmu-events/arch/x86/knightslanding/pipeline.json  |  3 +++
> >>  .../pmu-events/arch/x86/knightslanding/uncore-cache.json   |  1 +
> >>  .../pmu-events/arch/x86/knightslanding/uncore-memory.json  |  4 ++++
> >>  tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json    |  9 +++++=
++++
> >>  tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json    |  3 +++
> >>  tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json   |  4 ++++
> >>  .../perf/pmu-events/arch/x86/sapphirerapids/pipeline.json  |  5 +++++
> >>  tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json  |  4 ++++
> >>  tools/perf/pmu-events/arch/x86/silvermont/pipeline.json    |  3 +++
> >>  tools/perf/pmu-events/arch/x86/skylake/pipeline.json       |  4 ++++
> >>  tools/perf/pmu-events/arch/x86/skylakex/pipeline.json      |  4 ++++
> >>  tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json  |  2 ++
> >>  .../pmu-events/arch/x86/skylakex/uncore-interconnect.json  |  1 +
> >>  tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json     |  1 +
> >>  tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json |  1 +
> >>  tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json  |  1 +
> >>  tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json    |  2 ++
> >>  .../perf/pmu-events/arch/x86/snowridgex/uncore-cache.json  |  1 +
> >>  .../arch/x86/snowridgex/uncore-interconnect.json           |  1 +
> >>  .../perf/pmu-events/arch/x86/snowridgex/uncore-memory.json |  1 +
> >>  .../perf/pmu-events/arch/x86/snowridgex/uncore-power.json  |  1 +
> >>  tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json     |  5 +++++
> >>  69 files changed, 217 insertions(+)
> >>
> >> diff --git a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json b/=
tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
> >> index a92013c..9e30943 100644
> >> --- a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json
> >> @@ -489,6 +489,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counts the number of unhalted core clock=
 cycles. (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.CORE",
> >>          "PublicDescription": "Counts the number of core cycles while =
the core is not in a halt state. The core enters the halt state when it is =
running the HLT instruction. The core frequency may change from time to tim=
e. For this reason this event may have a changing ratio with regards to tim=
e. This event uses fixed counter 1.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -550,6 +551,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counts the number of unhalted reference =
clock cycles at TSC frequency. (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles t=
hat the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction. This event is not affected by core frequen=
cy changes and increments at a fixed frequency that is also used for the Ti=
me Stamp Counter (TSC). This event uses fixed counter 2.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -558,6 +560,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hen the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction or the MWAIT instruction. This event is not=
 affected by core frequency changes (for example, P states, TM2 transitions=
) but has the same incrementing frequency as the time stamp counter. This e=
vent can approximate elapsed time while the core was not in a halt state. I=
t is counted on a dedicated fixed counter, leaving the eight programmable c=
ounters available for other events. Note: On all current platforms this eve=
nt stops counting during 'throttling (TM)' states duty off periods the proc=
essor is 'halted'.  The counter update is done at a lower clock rate then t=
he core clock the overflow status bit for this counter may appear 'sticky'.=
  After the counter has overflowed and software clears the overflow status =
bit and resets the counter to less than MAX. The reset value to the counter=
 is not clocked immediately so the overflow status bit will flip 'high (1)'=
 and generate another PMI (if enabled) after which the reset value gets clo=
cked into the counter. Therefore, software will get the interrupt, read the=
 overflow status bit '1 for bit 34 while the counter value is less than MAX=
=2E Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -584,6 +587,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counts the number of unhalted core clock=
 cycles. (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the core is not in a halt state.  The core enters the halt state when it is=
 running the HLT instruction. The core frequency may change from time to ti=
me. For this reason this event may have a changing ratio with regards to ti=
me.  This event uses fixed counter 1.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -592,6 +596,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the thread is not in a halt state. The thread enters the halt state when it=
 is running the HLT instruction. This event is a component in many key even=
t ratios. The core frequency may change from time to time due to transition=
s associated with Enhanced Intel SpeedStep Technology or TM2. For this reas=
on this event may have a changing ratio with regards to time. When the core=
 frequency is constant, this event can approximate elapsed time while the c=
ore was not in the halt state. It is counted on a dedicated fixed counter, =
leaving the eight programmable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -743,6 +748,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counts the total number of instructions =
retired. (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "PublicDescription": "Counts the total number of instructions=
 that retired. For instructions that consist of multiple uops, this event c=
ounts the retirement of the last uop of the instruction. This event continu=
es counting during hardware interrupts, traps, and inside interrupt handler=
s. This event uses fixed counter 0.",
> >> @@ -752,6 +758,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Number of instructions retired. Fixed Co=
unter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "PublicDescription": "Counts the number of X86 instructions r=
etired - an Architectural PerfMon event. Counting continues during hardware=
 interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY =
is counted by a designated fixed counter freeing up programmable counters t=
o count other events. INST_RETIRED.ANY_P is counted by a programmable count=
er.",
> >> @@ -796,6 +803,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Precise instruction retired with PEBS pr=
ecise-distribution",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.PREC_DIST",
> >>          "PEBS": "1",
> >>          "PublicDescription": "A version of INST_RETIRED that allows f=
or a precise distribution of samples across instructions retired. It utiliz=
es the Precise Distribution of Instructions Retired (PDIR++) feature to fix=
 bias in how retired instructions get sampled. Use on Fixed Counter 0.",
> >> @@ -1160,6 +1168,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "TMA slots available for an unhalted logi=
cal processor. Fixed counter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "TOPDOWN.SLOTS",
> >>          "PublicDescription": "Number of available slots for an unhalt=
ed logical processor. The event increments by machine-width of the narrowes=
t pipeline as employed by the Top-down Microarchitecture Analysis method (T=
MA). The count is distributed among unhalted logical processors (hyper-thre=
ads) who share the same physical core. Software can use this event as the d=
enominator for the top-level metrics of the TMA method. This architectural =
event is counted on a designated fixed counter (Fixed Counter 3).",
> >>          "SampleAfterValue": "10000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json b=
/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
> >> index fa53ff1..345d1c8 100644
> >> --- a/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json
> >> @@ -211,6 +211,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counts the number of unhalted core clock=
 cycles. (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.CORE",
> >>          "PublicDescription": "Counts the number of core cycles while =
the core is not in a halt state. The core enters the halt state when it is =
running the HLT instruction. The core frequency may change from time to tim=
e. For this reason this event may have a changing ratio with regards to tim=
e. This event uses fixed counter 1.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -225,6 +226,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counts the number of unhalted reference =
clock cycles at TSC frequency. (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles t=
hat the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction. This event is not affected by core frequen=
cy changes and increments at a fixed frequency that is also used for the Ti=
me Stamp Counter (TSC). This event uses fixed counter 2.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -240,6 +242,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counts the number of unhalted core clock=
 cycles. (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the core is not in a halt state.  The core enters the halt state when it is=
 running the HLT instruction. The core frequency may change from time to ti=
me. For this reason this event may have a changing ratio with regards to ti=
me.  This event uses fixed counter 1.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json b/=
tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
> >> index 9a902d2..b114d0d 100644
> >> --- a/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json
> >> @@ -336,6 +336,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "This event counts the number of referen=
ce cycles when the core is not in a halt state. The core enters the halt st=
ate when it is running the HLT instruction or the MWAIT instruction. This e=
vent is not affected by core frequency changes (for example, P states, TM2 =
transitions) but has the same incrementing frequency as the time stamp coun=
ter. This event can approximate elapsed time while the core was not in a ha=
lt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCL=
K event. It is counted on a dedicated fixed counter, leaving the four (eigh=
t when Hyperthreading is disabled) programmable counters available for othe=
r events. \nNote: On all current platforms this event stops counting during=
 'throttling (TM)' states duty off periods the processor is 'halted'.  This=
 event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter upda=
te is done at a lower clock rate then the core clock the overflow status bi=
t for this counter may appear 'sticky'.  After the counter has overflowed a=
nd software clears the overflow status bit and resets the counter to less t=
han MAX. The reset value to the counter is not clocked immediately so the o=
verflow status bit will flip 'high (1)' and generate another PMI (if enable=
d) after which the reset value gets clocked into the counter. Therefore, so=
ftware will get the interrupt, read the overflow status bit '1 for bit 34 w=
hile the counter value is less than MAX. Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -359,6 +360,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "This event counts the number of core cy=
cles while the thread is not in a halt state. The thread enters the halt st=
ate when it is running the HLT instruction. This event is a component in ma=
ny key event ratios. The core frequency may change from time to time due to=
 transitions associated with Enhanced Intel SpeedStep Technology or TM2. Fo=
r this reason this event may have a changing ratio with regards to time. Wh=
en the core frequency is constant, this event can approximate elapsed time =
while the core was not in the halt state. It is counted on a dedicated fixe=
d counter, leaving the four (eight when Hyperthreading is disabled) program=
mable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -366,6 +368,7 @@
> >>      },
> >>      {
> >>          "AnyThread": "1",
> >> +        "EventCode": "0x0",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state.",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "SampleAfterValue": "2000003",
> >> @@ -514,6 +517,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired from execution.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "This event counts the number of instruc=
tions retired from execution. For instructions that consist of multiple mic=
ro-ops, this event counts the retirement of the last micro-op of the instru=
ction. Counting continues during hardware interrupts, traps, and inside int=
errupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed=
 counter, leaving the four (eight when Hyperthreading is disabled) programm=
able counters available for other events. INST_RETIRED.ANY_P is counted by =
a programmable counter and it is an architectural performance event. \nCoun=
ting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count a=
s retired instructions.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json =
b/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
> >> index 9a902d2..ce90d058 100644
> >> --- a/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json
> >> @@ -336,6 +336,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "This event counts the number of referen=
ce cycles when the core is not in a halt state. The core enters the halt st=
ate when it is running the HLT instruction or the MWAIT instruction. This e=
vent is not affected by core frequency changes (for example, P states, TM2 =
transitions) but has the same incrementing frequency as the time stamp coun=
ter. This event can approximate elapsed time while the core was not in a ha=
lt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCL=
K event. It is counted on a dedicated fixed counter, leaving the four (eigh=
t when Hyperthreading is disabled) programmable counters available for othe=
r events. \nNote: On all current platforms this event stops counting during=
 'throttling (TM)' states duty off periods the processor is 'halted'.  This=
 event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter upda=
te is done at a lower clock rate then the core clock the overflow status bi=
t for this counter may appear 'sticky'.  After the counter has overflowed a=
nd software clears the overflow status bit and resets the counter to less t=
han MAX. The reset value to the counter is not clocked immediately so the o=
verflow status bit will flip 'high (1)' and generate another PMI (if enable=
d) after which the reset value gets clocked into the counter. Therefore, so=
ftware will get the interrupt, read the overflow status bit '1 for bit 34 w=
hile the counter value is less than MAX. Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -359,6 +360,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "This event counts the number of core cy=
cles while the thread is not in a halt state. The thread enters the halt st=
ate when it is running the HLT instruction. This event is a component in ma=
ny key event ratios. The core frequency may change from time to time due to=
 transitions associated with Enhanced Intel SpeedStep Technology or TM2. Fo=
r this reason this event may have a changing ratio with regards to time. Wh=
en the core frequency is constant, this event can approximate elapsed time =
while the core was not in the halt state. It is counted on a dedicated fixe=
d counter, leaving the four (eight when Hyperthreading is disabled) program=
mable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -367,6 +369,7 @@
> >>      {
> >>          "AnyThread": "1",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -514,6 +517,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired from execution.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "This event counts the number of instruc=
tions retired from execution. For instructions that consist of multiple mic=
ro-ops, this event counts the retirement of the last micro-op of the instru=
ction. Counting continues during hardware interrupts, traps, and inside int=
errupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed=
 counter, leaving the four (eight when Hyperthreading is disabled) programm=
able counters available for other events. INST_RETIRED.ANY_P is counted by =
a programmable counter and it is an architectural performance event. \nCoun=
ting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count a=
s retired instructions.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.j=
son b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
> >> index 56bba6d..117be19 100644
> >> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
> >> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json
> >> @@ -8,6 +8,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Uncore Clocks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_C_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "CBOX"
> >> @@ -1501,6 +1502,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "uclks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_H_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of uclks in the HA.  =
This will be slightly different than the count in the Ubox because of enabl=
e/freeze delays.  The HA is on the other side of the die from the fixed Ubo=
x uclk counter, so the drift could be somewhat larger than in units that ar=
e closer like the QPI Agent.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interco=
nnect.json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect=
=2Ejson
> >> index 9103959..3ed95a6 100644
> >> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.j=
son
> >> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.j=
son
> >> @@ -19,6 +19,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Clocks in the IRP",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_I_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Number of clocks in the IRP.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.=
json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
> >> index a764234..32c46bd 100644
> >> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
> >> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json
> >> @@ -131,6 +131,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "DRAM Clockticks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_DCLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "iMC"
> >> diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.j=
son b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
> >> index 83d2013..f57eb8e 100644
> >> --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
> >> +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "pclk Cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_P_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  =
This event counts the number of pclk cycles measured while the counter was =
enabled.  The pclk, like the Memory Controller's dclk, counts at a constant=
 rate making it a good measure of actual wall time.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json b=
/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
> >> index 9a902d2..ce90d058 100644
> >> --- a/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json
> >> @@ -336,6 +336,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "This event counts the number of referen=
ce cycles when the core is not in a halt state. The core enters the halt st=
ate when it is running the HLT instruction or the MWAIT instruction. This e=
vent is not affected by core frequency changes (for example, P states, TM2 =
transitions) but has the same incrementing frequency as the time stamp coun=
ter. This event can approximate elapsed time while the core was not in a ha=
lt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCL=
K event. It is counted on a dedicated fixed counter, leaving the four (eigh=
t when Hyperthreading is disabled) programmable counters available for othe=
r events. \nNote: On all current platforms this event stops counting during=
 'throttling (TM)' states duty off periods the processor is 'halted'.  This=
 event is clocked by base clock (100 Mhz) on Sandy Bridge. The counter upda=
te is done at a lower clock rate then the core clock the overflow status bi=
t for this counter may appear 'sticky'.  After the counter has overflowed a=
nd software clears the overflow status bit and resets the counter to less t=
han MAX. The reset value to the counter is not clocked immediately so the o=
verflow status bit will flip 'high (1)' and generate another PMI (if enable=
d) after which the reset value gets clocked into the counter. Therefore, so=
ftware will get the interrupt, read the overflow status bit '1 for bit 34 w=
hile the counter value is less than MAX. Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -359,6 +360,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "This event counts the number of core cy=
cles while the thread is not in a halt state. The thread enters the halt st=
ate when it is running the HLT instruction. This event is a component in ma=
ny key event ratios. The core frequency may change from time to time due to=
 transitions associated with Enhanced Intel SpeedStep Technology or TM2. Fo=
r this reason this event may have a changing ratio with regards to time. Wh=
en the core frequency is constant, this event can approximate elapsed time =
while the core was not in the halt state. It is counted on a dedicated fixe=
d counter, leaving the four (eight when Hyperthreading is disabled) program=
mable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -367,6 +369,7 @@
> >>      {
> >>          "AnyThread": "1",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -514,6 +517,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired from execution.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "This event counts the number of instruc=
tions retired from execution. For instructions that consist of multiple mic=
ro-ops, this event counts the retirement of the last micro-op of the instru=
ction. Counting continues during hardware interrupts, traps, and inside int=
errupt handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed=
 counter, leaving the four (eight when Hyperthreading is disabled) programm=
able counters available for other events. INST_RETIRED.ANY_P is counted by =
a programmable counter and it is an architectural performance event. \nCoun=
ting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count a=
s retired instructions.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.js=
on b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
> >> index 400d784..346f5cf 100644
> >> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
> >> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json
> >> @@ -183,6 +183,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Uncore Clocks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_C_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "CBOX"
> >> @@ -1689,6 +1690,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "uclks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_H_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of uclks in the HA.  =
This will be slightly different than the count in the Ubox because of enabl=
e/freeze delays.  The HA is on the other side of the die from the fixed Ubo=
x uclk counter, so the drift could be somewhat larger than in units that ar=
e closer like the QPI Agent.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-intercon=
nect.json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.j=
son
> >> index b9fb216..68232e7 100644
> >> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.js=
on
> >> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.js=
on
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "Number of non data (control) flits trans=
mitted . Derived from unc_q_txl_flits_g0.non_data",
> >> +        "EventCode": "0x0",
> >>          "EventName": "QPI_CTL_BANDWIDTH_TX",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each flit is made up of 80 bits of information (in addition to some ECC=
 data).  In full-width (L0) mode, flits are made up of four fits, each of w=
hich contains 20 bits of data (along with some additional ECC data).   In h=
alf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twi=
ce as many fits to transmit a flit.  When one talks about QPI speed (for ex=
ample, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the =
system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can ca=
lculate the bandwidth of the link by taking: flits*80b/time.  Note that thi=
s is not the same as data bandwidth.  For example, when we are transferring=
 a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header=
 information and 8 with 64 bits of actual data and an additional 16 bits of=
 other information.  To calculate data bandwidth, one should therefore do: =
data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non=
-NULL non-data flits transmitted across QPI.  This basically tracks the pro=
tocol overhead on the QPI link.  One can get a good picture of the QPI-link=
 characteristics by evaluating the protocol flits, data flits, and idle/nul=
l flits.  This includes the header flits for data packets.",
> >> @@ -10,6 +11,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Number of data flits transmitted . Deriv=
ed from unc_q_txl_flits_g0.data",
> >> +        "EventCode": "0x0",
> >>          "EventName": "QPI_DATA_BANDWIDTH_TX",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each flit is made up of 80 bits of information (in addition to some ECC=
 data).  In full-width (L0) mode, flits are made up of four fits, each of w=
hich contains 20 bits of data (along with some additional ECC data).   In h=
alf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twi=
ce as many fits to transmit a flit.  When one talks about QPI speed (for ex=
ample, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the =
system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can ca=
lculate the bandwidth of the link by taking: flits*80b/time.  Note that thi=
s is not the same as data bandwidth.  For example, when we are transferring=
 a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header=
 information and 8 with 64 bits of actual data and an additional 16 bits of=
 other information.  To calculate data bandwidth, one should therefore do: =
data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of dat=
a flits transmitted over QPI.  Each flit contains 64b of data.  This includ=
es both DRS and NCB data flits (coherent and non-coherent).  This can be us=
ed to calculate the data bandwidth of the QPI link.  One can get a good pic=
ture of the QPI-link characteristics by evaluating the protocol flits, data=
 flits, and idle/null flits.  This does not include the header flits that g=
o in data packets.",
> >> @@ -37,6 +39,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Clocks in the IRP",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_I_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Number of clocks in the IRP.",
> >> @@ -1400,6 +1403,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Fli=
ts",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each flit is made up of 80 bits of information (in addition to some ECC=
 data).  In full-width (L0) mode, flits are made up of four fits, each of w=
hich contains 20 bits of data (along with some additional ECC data).   In h=
alf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twi=
ce as many fits to transmit a flit.  When one talks about QPI speed (for ex=
ample, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the =
system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can ca=
lculate the bandwidth of the link by taking: flits*80b/time.  Note that thi=
s is not the same as data bandwidth.  For example, when we are transferring=
 a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header=
 information and 8 with 64 bits of actual data and an additional 16 bits of=
 other information.  To calculate data bandwidth, one should therefore do: =
data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of dat=
a flits transmitted over QPI.  Each flit contains 64b of data.  This includ=
es both DRS and NCB data flits (coherent and non-coherent).  This can be us=
ed to calculate the data bandwidth of the QPI link.  One can get a good pic=
ture of the QPI-link characteristics by evaluating the protocol flits, data=
 flits, and idle/null flits.  This does not include the header flits that g=
o in data packets.",
> >> @@ -1408,6 +1412,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 0; Non-Data pr=
otocol Tx Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each flit is made up of 80 bits of information (in addition to some ECC=
 data).  In full-width (L0) mode, flits are made up of four fits, each of w=
hich contains 20 bits of data (along with some additional ECC data).   In h=
alf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twi=
ce as many fits to transmit a flit.  When one talks about QPI speed (for ex=
ample, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the =
system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can ca=
lculate the bandwidth of the link by taking: flits*80b/time.  Note that thi=
s is not the same as data bandwidth.  For example, when we are transferring=
 a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header=
 information and 8 with 64 bits of actual data and an additional 16 bits of=
 other information.  To calculate data bandwidth, one should therefore do: =
data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non=
-NULL non-data flits transmitted across QPI.  This basically tracks the pro=
tocol overhead on the QPI link.  One can get a good picture of the QPI-link=
 characteristics by evaluating the protocol flits, data flits, and idle/nul=
l flits.  This includes the header flits for data packets.",
> >> @@ -1416,6 +1421,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (=
both Header and Data)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the total number of flits transmitted over QPI on the DR=
S (Data Response) channel.  DRS flits are used to transmit data with cohere=
ncy.",
> >> @@ -1424,6 +1430,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Fl=
its",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the total number of data flits transmitted over QPI on t=
he DRS (Data Response) channel.  DRS flits are used to transmit data with c=
oherency.  This does not count data flits transmitted over the NCB channel =
which transmits non-coherent data.  This includes only the data flits (not =
the header).",
> >> @@ -1432,6 +1439,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Header =
Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the total number of protocol flits transmitted over QPI =
on the DRS (Data Response) channel.  DRS flits are used to transmit data wi=
th coherency.  This does not count data flits transmitted over the NCB chan=
nel which transmits non-coherent data.  This includes only the header flits=
 (not the data).  This includes extended headers.",
> >> @@ -1440,6 +1448,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of flits transmitted over QPI on the home cha=
nnel.",
> >> @@ -1448,6 +1457,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Req=
uest Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of non-request flits transmitted over QPI on =
the home channel.  These are most commonly snoop responses, and this event =
can be used as a proxy for that.",
> >> @@ -1456,6 +1466,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Request=
 Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of data request transmitted over QPI on the h=
ome channel.  This basically counts the number of remote memory requests tr=
ansmitted over QPI.  In conjunction with the local read count in the Home A=
gent, one can calculate the number of LLC Misses.",
> >> @@ -1464,6 +1475,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of snoop request flits transmitted over QPI. =
 These requests are contained in the snoop channel.  This does not include =
snoop responses, which are transmitted on the home channel.",
> >> @@ -3162,6 +3174,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Uncore Clocks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_S_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "SBOX"
> >> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.j=
son b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
> >> index b5a33e7a..0c5888d 100644
> >> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
> >> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json
> >> @@ -158,12 +158,14 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Clockticks in the Memory Controller usin=
g one of the programmable counters",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_CLOCKTICKS_P",
> >>          "PerPkg": "1",
> >>          "Unit": "iMC"
> >>      },
> >>      {
> >>          "BriefDescription": "This event is deprecated. Refer to new e=
vent UNC_M_CLOCKTICKS_P",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_DCLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "iMC"
> >> diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.js=
on b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
> >> index 83d2013..f57eb8e 100644
> >> --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
> >> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "pclk Cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_P_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  =
This event counts the number of pclk cycles measured while the counter was =
enabled.  The pclk, like the Memory Controller's dclk, counts at a constant=
 rate making it a good measure of actual wall time.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json=
 b/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
> >> index 66d686c..efda247 100644
> >> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json
> >> @@ -200,6 +200,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hen the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction or the MWAIT instruction. This event is not=
 affected by core frequency changes (for example, P states, TM2 transitions=
) but has the same incrementing frequency as the time stamp counter. This e=
vent can approximate elapsed time while the core was not in a halt state. T=
his event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It=
 is counted on a dedicated fixed counter, leaving the four (eight when Hype=
rthreading is disabled) programmable counters available for other events. N=
ote: On all current platforms this event stops counting during 'throttling =
(TM)' states duty off periods the processor is 'halted'.  The counter updat=
e is done at a lower clock rate then the core clock the overflow status bit=
 for this counter may appear 'sticky'.  After the counter has overflowed an=
d software clears the overflow status bit and resets the counter to less th=
an MAX. The reset value to the counter is not clocked immediately so the ov=
erflow status bit will flip 'high (1)' and generate another PMI (if enabled=
) after which the reset value gets clocked into the counter. Therefore, sof=
tware will get the interrupt, read the overflow status bit '1 for bit 34 wh=
ile the counter value is less than MAX. Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -231,6 +232,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the thread is not in a halt state. The thread enters the halt state when it=
 is running the HLT instruction. This event is a component in many key even=
t ratios. The core frequency may change from time to time due to transition=
s associated with Enhanced Intel SpeedStep Technology or TM2. For this reas=
on this event may have a changing ratio with regards to time. When the core=
 frequency is constant, this event can approximate elapsed time while the c=
ore was not in the halt state. It is counted on a dedicated fixed counter, =
leaving the four (eight when Hyperthreading is disabled) programmable count=
ers available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -239,6 +241,7 @@
> >>      {
> >>          "AnyThread": "1",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -378,6 +381,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired from execution.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "Counts the number of instructions retir=
ed from execution. For instructions that consist of multiple micro-ops, Cou=
nts the retirement of the last micro-op of the instruction. Counting contin=
ues during hardware interrupts, traps, and inside interrupt handlers. Notes=
: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the fo=
ur (eight when Hyperthreading is disabled) programmable counters available =
for other events. INST_RETIRED.ANY_P is counted by a programmable counter a=
nd it is an architectural performance event. Counting: Faulting executions =
of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.=
json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
> >> index 2c88053..ba7a6f6 100644
> >> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
> >> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
> >> @@ -512,6 +512,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Uncore cache clock ticks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_CHA_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts clockticks of the clock controll=
ing the uncore caching and home agent (CHA).",
> >> @@ -5792,6 +5793,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "This event is deprecated. Refer to new e=
vent UNC_CHA_CLOCKTICKS",
> >> +        "EventCode": "0x0",
> >>          "Deprecated": "1",
> >>          "EventName": "UNC_C_CLOCKTICKS",
> >>          "PerPkg": "1",
> >> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interc=
onnect.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconne=
ct.json
> >> index 1a342df..deae678 100644
> >> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.=
json
> >> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.=
json
> >> @@ -1090,6 +1090,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Cycles - at UCLK",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M2M_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "M2M"
> >> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.jso=
n b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
> >> index 743c91f..377d54f 100644
> >> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
> >> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
> >> @@ -1271,6 +1271,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counting disabled",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_IIO_NOTHING",
> >>          "PerPkg": "1",
> >>          "Unit": "IIO"
> >> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory=
=2Ejson b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
> >> index d82d2cc..6b1217e 100644
> >> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
> >> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json
> >> @@ -167,6 +167,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Memory controller clock ticks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts clockticks of the fixed frequenc=
y clock of the memory controller using one of the programmable counters.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.=
json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
> >> index c6254af..a01b279 100644
> >> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
> >> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "pclk Cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_P_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  =
This event counts the number of pclk cycles measured while the counter was =
enabled.  The pclk, like the Memory Controller's dclk, counts at a constant=
 rate making it a good measure of actual wall time.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json =
b/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
> >> index c483c08..2e40cd0 100644
> >> --- a/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json
> >> @@ -150,6 +150,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counts the number of unhalted reference =
clock cycles at TSC frequency. (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles t=
hat the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction. This event is not affected by core frequen=
cy changes and increments at a fixed frequency that is also used for the Ti=
me Stamp Counter (TSC). This event uses fixed counter 2.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -180,6 +181,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counts the total number of instructions =
retired. (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "PublicDescription": "Counts the total number of instructions=
 that retired. For instructions that consist of multiple uops, this event c=
ounts the retirement of the last uop of the instruction. This event continu=
es counting during hardware interrupts, traps, and inside interrupt handler=
s. This event uses fixed counter 0.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json b/t=
ools/perf/pmu-events/arch/x86/goldmont/pipeline.json
> >> index acb8974..79806e7 100644
> >> --- a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
> >> @@ -143,6 +143,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when core is not halted  (Fi=
xed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.CORE",
> >>          "PublicDescription": "Counts the number of core cycles while =
the core is not in a halt state.  The core enters the halt state when it is=
 running the HLT instruction. In mobile systems the core frequency may chan=
ge from time to time. For this reason this event may have a changing ratio =
with regards to time.  This event uses fixed counter 1.  You cannot collect=
 a PEBs record for this event.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -165,6 +166,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when core is not halted=
  (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles t=
hat the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction.  In mobile systems the core frequency may =
change from time.  This event is not affected by core frequency changes but=
 counts as if the core is running at the maximum frequency all the time.  T=
his event uses fixed counter 2.  You cannot collect a PEBs record for this =
event.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -187,6 +189,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "Counts the number of instructions that =
retire execution. For instructions that consist of multiple uops, this even=
t counts the retirement of the last uop of the instruction. The counter con=
tinues counting during hardware interrupts, traps, and inside interrupt han=
dlers.  This event uses fixed counter 0.  You cannot collect a PEBs record =
for this event.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json=
 b/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
> >> index 33ef331..1be1b50 100644
> >> --- a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
> >> @@ -143,6 +143,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when core is not halted  (Fi=
xed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.CORE",
> >>          "PublicDescription": "Counts the number of core cycles while =
the core is not in a halt state.  The core enters the halt state when it is=
 running the HLT instruction. In mobile systems the core frequency may chan=
ge from time to time. For this reason this event may have a changing ratio =
with regards to time.  This event uses fixed counter 1.  You cannot collect=
 a PEBs record for this event.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -165,6 +166,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when core is not halted=
  (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles t=
hat the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction.  In mobile systems the core frequency may =
change from time.  This event is not affected by core frequency changes but=
 counts as if the core is running at the maximum frequency all the time.  T=
his event uses fixed counter 2.  You cannot collect a PEBs record for this =
event.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -187,6 +189,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "2",
> >>          "PublicDescription": "Counts the number of instructions that =
retire execution. For instructions that consist of multiple uops, this even=
t counts the retirement of the last uop of the instruction. The counter con=
tinues counting during hardware interrupts, traps, and inside interrupt han=
dlers.  This event uses fixed counter 0.  You cannot collect a PEBs record =
for this event.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json b=
/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
> >> index 4121295..5335a7b 100644
> >> --- a/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
> >> @@ -29,6 +29,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted reference clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x3"
> >> @@ -43,6 +44,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted core clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -55,6 +57,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of inst=
ructions retired",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.jso=
n b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
> >> index 764c043..6ca34b9 100644
> >> --- a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
> >> @@ -17,6 +17,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hen the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction or the MWAIT instruction. This event is not=
 affected by core frequency changes (for example, P states, TM2 transitions=
) but has the same incrementing frequency as the time stamp counter. This e=
vent can approximate elapsed time while the core was not in a halt state. I=
t is counted on a dedicated fixed counter, leaving the eight programmable c=
ounters available for other events. Note: On all current platforms this eve=
nt stops counting during 'throttling (TM)' states duty off periods the proc=
essor is 'halted'.  The counter update is done at a lower clock rate then t=
he core clock the overflow status bit for this counter may appear 'sticky'.=
  After the counter has overflowed and software clears the overflow status =
bit and resets the counter to less than MAX. The reset value to the counter=
 is not clocked immediately so the overflow status bit will flip 'high (1)'=
 and generate another PMI (if enabled) after which the reset value gets clo=
cked into the counter. Therefore, software will get the interrupt, read the=
 overflow status bit '1 for bit 34 while the counter value is less than MAX=
=2E Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -32,6 +33,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the thread is not in a halt state. The thread enters the halt state when it=
 is running the HLT instruction. This event is a component in many key even=
t ratios. The core frequency may change from time to time due to transition=
s associated with Enhanced Intel SpeedStep Technology or TM2. For this reas=
on this event may have a changing ratio with regards to time. When the core=
 frequency is constant, this event can approximate elapsed time while the c=
ore was not in the halt state. It is counted on a dedicated fixed counter, =
leaving the eight programmable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -46,6 +48,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Number of instructions retired. Fixed Co=
unter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "PublicDescription": "Counts the number of X86 instructions r=
etired - an Architectural PerfMon event. Counting continues during hardware=
 interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY =
is counted by a designated fixed counter freeing up programmable counters t=
o count other events. INST_RETIRED.ANY_P is counted by a programmable count=
er.",
> >> @@ -78,6 +81,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "TMA slots available for an unhalted logi=
cal processor. Fixed counter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "TOPDOWN.SLOTS",
> >>          "PublicDescription": "Number of available slots for an unhalt=
ed logical processor. The event increments by machine-width of the narrowes=
t pipeline as employed by the Top-down Microarchitecture Analysis method (T=
MA). The count is distributed among unhalted logical processors (hyper-thre=
ads) who share the same physical core. Software can use this event as the d=
enominator for the top-level metrics of the TMA method. This architectural =
event is counted on a designated fixed counter (Fixed Counter 3).",
> >>          "SampleAfterValue": "10000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json b/to=
ols/perf/pmu-events/arch/x86/haswell/pipeline.json
> >> index 540f437..0d5eafd 100644
> >> --- a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
> >> @@ -303,6 +303,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "This event counts the number of referen=
ce cycles when the core is not in a halt state. The core enters the halt st=
ate when it is running the HLT instruction or the MWAIT instruction. This e=
vent is not affected by core frequency changes (for example, P states, TM2 =
transitions) but has the same incrementing frequency as the time stamp coun=
ter. This event can approximate elapsed time while the core was not in a ha=
lt state.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -327,6 +328,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "This event counts the number of thread =
cycles while the thread is not in a halt state. The thread enters the halt =
state when it is running the HLT instruction. The core frequency may change=
 from time to time due to power or thermal throttling.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -335,6 +337,7 @@
> >>      {
> >>          "AnyThread": "1",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -436,6 +439,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired from execution.",
> >> +        "EventCode": "0x0",
> >>          "Errata": "HSD140, HSD143",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "This event counts the number of instruc=
tions retired from execution. For instructions that consist of multiple mic=
ro-ops, this event counts the retirement of the last micro-op of the instru=
ction. Counting continues during hardware interrupts, traps, and inside int=
errupt handlers. INST_RETIRED.ANY is counted by a designated fixed counter,=
 leaving the programmable counters available for other events. Faulting exe=
cutions of GETSEC/VM entry/VM Exit/MWait will not count as retired instruct=
ions.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json b/t=
ools/perf/pmu-events/arch/x86/haswellx/pipeline.json
> >> index 540f437..0d5eafd 100644
> >> --- a/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json
> >> @@ -303,6 +303,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "This event counts the number of referen=
ce cycles when the core is not in a halt state. The core enters the halt st=
ate when it is running the HLT instruction or the MWAIT instruction. This e=
vent is not affected by core frequency changes (for example, P states, TM2 =
transitions) but has the same incrementing frequency as the time stamp coun=
ter. This event can approximate elapsed time while the core was not in a ha=
lt state.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -327,6 +328,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "This event counts the number of thread =
cycles while the thread is not in a halt state. The thread enters the halt =
state when it is running the HLT instruction. The core frequency may change=
 from time to time due to power or thermal throttling.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -335,6 +337,7 @@
> >>      {
> >>          "AnyThread": "1",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -436,6 +439,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired from execution.",
> >> +        "EventCode": "0x0",
> >>          "Errata": "HSD140, HSD143",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "This event counts the number of instruc=
tions retired from execution. For instructions that consist of multiple mic=
ro-ops, this event counts the retirement of the last micro-op of the instru=
ction. Counting continues during hardware interrupts, traps, and inside int=
errupt handlers. INST_RETIRED.ANY is counted by a designated fixed counter,=
 leaving the programmable counters available for other events. Faulting exe=
cutions of GETSEC/VM entry/VM Exit/MWait will not count as retired instruct=
ions.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json=
 b/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
> >> index 9227cc2..64e2fb4 100644
> >> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
> >> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json
> >> @@ -183,6 +183,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Uncore Clocks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_C_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "CBOX"
> >> @@ -1698,6 +1699,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "uclks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_H_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of uclks in the HA.  =
This will be slightly different than the count in the Ubox because of enabl=
e/freeze delays.  The HA is on the other side of the die from the fixed Ubo=
x uclk counter, so the drift could be somewhat larger than in units that ar=
e closer like the QPI Agent.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconne=
ct.json b/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
> >> index bef1f5e..57268d6 100644
> >> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
> >> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "Number of non data (control) flits trans=
mitted . Derived from unc_q_txl_flits_g0.non_data",
> >> +        "EventCode": "0x0",
> >>          "EventName": "QPI_CTL_BANDWIDTH_TX",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each flit is made up of 80 bits of information (in addition to some ECC=
 data).  In full-width (L0) mode, flits are made up of four fits, each of w=
hich contains 20 bits of data (along with some additional ECC data).   In h=
alf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twi=
ce as many fits to transmit a flit.  When one talks about QPI speed (for ex=
ample, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the =
system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can ca=
lculate the bandwidth of the link by taking: flits*80b/time.  Note that thi=
s is not the same as data bandwidth.  For example, when we are transferring=
 a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header=
 information and 8 with 64 bits of actual data and an additional 16 bits of=
 other information.  To calculate data bandwidth, one should therefore do: =
data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non=
-NULL non-data flits transmitted across QPI.  This basically tracks the pro=
tocol overhead on the QPI link.  One can get a good picture of the QPI-link=
 characteristics by evaluating the protocol flits, data flits, and idle/nul=
l flits.  This includes the header flits for data packets.",
> >> @@ -10,6 +11,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Number of data flits transmitted . Deriv=
ed from unc_q_txl_flits_g0.data",
> >> +        "EventCode": "0x0",
> >>          "EventName": "QPI_DATA_BANDWIDTH_TX",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each flit is made up of 80 bits of information (in addition to some ECC=
 data).  In full-width (L0) mode, flits are made up of four fits, each of w=
hich contains 20 bits of data (along with some additional ECC data).   In h=
alf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twi=
ce as many fits to transmit a flit.  When one talks about QPI speed (for ex=
ample, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the =
system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can ca=
lculate the bandwidth of the link by taking: flits*80b/time.  Note that thi=
s is not the same as data bandwidth.  For example, when we are transferring=
 a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header=
 information and 8 with 64 bits of actual data and an additional 16 bits of=
 other information.  To calculate data bandwidth, one should therefore do: =
data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of dat=
a flits transmitted over QPI.  Each flit contains 64b of data.  This includ=
es both DRS and NCB data flits (coherent and non-coherent).  This can be us=
ed to calculate the data bandwidth of the QPI link.  One can get a good pic=
ture of the QPI-link characteristics by evaluating the protocol flits, data=
 flits, and idle/null flits.  This does not include the header flits that g=
o in data packets.",
> >> @@ -37,6 +39,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Clocks in the IRP",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_I_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Number of clocks in the IRP.",
> >> @@ -1401,6 +1404,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Fli=
ts",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each flit is made up of 80 bits of information (in addition to some ECC=
 data).  In full-width (L0) mode, flits are made up of four fits, each of w=
hich contains 20 bits of data (along with some additional ECC data).   In h=
alf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twi=
ce as many fits to transmit a flit.  When one talks about QPI speed (for ex=
ample, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the =
system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can ca=
lculate the bandwidth of the link by taking: flits*80b/time.  Note that thi=
s is not the same as data bandwidth.  For example, when we are transferring=
 a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header=
 information and 8 with 64 bits of actual data and an additional 16 bits of=
 other information.  To calculate data bandwidth, one should therefore do: =
data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of dat=
a flits transmitted over QPI.  Each flit contains 64b of data.  This includ=
es both DRS and NCB data flits (coherent and non-coherent).  This can be us=
ed to calculate the data bandwidth of the QPI link.  One can get a good pic=
ture of the QPI-link characteristics by evaluating the protocol flits, data=
 flits, and idle/null flits.  This does not include the header flits that g=
o in data packets.",
> >> @@ -1409,6 +1413,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 0; Non-Data pr=
otocol Tx Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each flit is made up of 80 bits of information (in addition to some ECC=
 data).  In full-width (L0) mode, flits are made up of four fits, each of w=
hich contains 20 bits of data (along with some additional ECC data).   In h=
alf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twi=
ce as many fits to transmit a flit.  When one talks about QPI speed (for ex=
ample, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the =
system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can ca=
lculate the bandwidth of the link by taking: flits*80b/time.  Note that thi=
s is not the same as data bandwidth.  For example, when we are transferring=
 a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header=
 information and 8 with 64 bits of actual data and an additional 16 bits of=
 other information.  To calculate data bandwidth, one should therefore do: =
data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non=
-NULL non-data flits transmitted across QPI.  This basically tracks the pro=
tocol overhead on the QPI link.  One can get a good picture of the QPI-link=
 characteristics by evaluating the protocol flits, data flits, and idle/nul=
l flits.  This includes the header flits for data packets.",
> >> @@ -1417,6 +1422,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (=
both Header and Data)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the total number of flits transmitted over QPI on the DR=
S (Data Response) channel.  DRS flits are used to transmit data with cohere=
ncy.",
> >> @@ -1425,6 +1431,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Fl=
its",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the total number of data flits transmitted over QPI on t=
he DRS (Data Response) channel.  DRS flits are used to transmit data with c=
oherency.  This does not count data flits transmitted over the NCB channel =
which transmits non-coherent data.  This includes only the data flits (not =
the header).",
> >> @@ -1433,6 +1440,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Header =
Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the total number of protocol flits transmitted over QPI =
on the DRS (Data Response) channel.  DRS flits are used to transmit data wi=
th coherency.  This does not count data flits transmitted over the NCB chan=
nel which transmits non-coherent data.  This includes only the header flits=
 (not the data).  This includes extended headers.",
> >> @@ -1441,6 +1449,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of flits transmitted over QPI on the home cha=
nnel.",
> >> @@ -1449,6 +1458,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Req=
uest Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of non-request flits transmitted over QPI on =
the home channel.  These are most commonly snoop responses, and this event =
can be used as a proxy for that.",
> >> @@ -1457,6 +1467,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Request=
 Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of data request transmitted over QPI on the h=
ome channel.  This basically counts the number of remote memory requests tr=
ansmitted over QPI.  In conjunction with the local read count in the Home A=
gent, one can calculate the number of LLC Misses.",
> >> @@ -1465,6 +1476,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of snoop request flits transmitted over QPI. =
 These requests are contained in the snoop channel.  This does not include =
snoop responses, which are transmitted on the home channel.",
> >> @@ -3136,6 +3148,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Uncore Clocks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_S_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "SBOX"
> >> @@ -3823,6 +3836,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "UNC_U_CLOCKTICKS",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_U_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "UBOX"
> >> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.jso=
n b/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
> >> index c005f51..124c3ae 100644
> >> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
> >> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json
> >> @@ -151,12 +151,14 @@
> >>      },
> >>      {
> >>          "BriefDescription": "DRAM Clockticks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "iMC"
> >>      },
> >>      {
> >>          "BriefDescription": "DRAM Clockticks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_DCLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "iMC"
> >> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json=
 b/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
> >> index daebf10..9276058 100644
> >> --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
> >> +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "pclk Cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_P_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "The PCU runs off a fixed 800 MHz clock.=
  This event counts the number of pclk cycles measured while the counter wa=
s enabled.  The pclk, like the Memory Controller's dclk, counts at a consta=
nt rate making it a good measure of actual wall time.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json b/to=
ols/perf/pmu-events/arch/x86/icelake/pipeline.json
> >> index 375b780..14e6d7c 100644
> >> --- a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
> >> @@ -193,6 +193,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hen the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction or the MWAIT instruction. This event is not=
 affected by core frequency changes (for example, P states, TM2 transitions=
) but has the same incrementing frequency as the time stamp counter. This e=
vent can approximate elapsed time while the core was not in a halt state. T=
his event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It=
 is counted on a dedicated fixed counter, leaving the eight programmable co=
unters available for other events. Note: On all current platforms this even=
t stops counting during 'throttling (TM)' states duty off periods the proce=
ssor is 'halted'.  The counter update is done at a lower clock rate then th=
e core clock the overflow status bit for this counter may appear 'sticky'. =
 After the counter has overflowed and software clears the overflow status b=
it and resets the counter to less than MAX. The reset value to the counter =
is not clocked immediately so the overflow status bit will flip 'high (1)' =
and generate another PMI (if enabled) after which the reset value gets cloc=
ked into the counter. Therefore, software will get the interrupt, read the =
overflow status bit '1 for bit 34 while the counter value is less than MAX.=
 Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -208,6 +209,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the thread is not in a halt state. The thread enters the halt state when it=
 is running the HLT instruction. This event is a component in many key even=
t ratios. The core frequency may change from time to time due to transition=
s associated with Enhanced Intel SpeedStep Technology or TM2. For this reas=
on this event may have a changing ratio with regards to time. When the core=
 frequency is constant, this event can approximate elapsed time while the c=
ore was not in the halt state. It is counted on a dedicated fixed counter, =
leaving the eight programmable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -335,6 +337,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Number of instructions retired. Fixed Co=
unter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "PublicDescription": "Counts the number of instructions retir=
ed - an Architectural PerfMon event. Counting continues during hardware int=
errupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is c=
ounted by a designated fixed counter freeing up programmable counters to co=
unt other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
> >> @@ -359,6 +362,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Precise instruction retired event with a=
 reduced effect of PEBS shadow in IP distribution",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.PREC_DIST",
> >>          "PEBS": "1",
> >>          "PublicDescription": "A version of INST_RETIRED that allows f=
or a more unbiased distribution of samples across instructions retired. It =
utilizes the Precise Distribution of Instructions Retired (PDIR) feature to=
 mitigate some bias in how retired instructions get sampled. Use on Fixed C=
ounter 0.",
> >> @@ -562,6 +566,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "TMA slots available for an unhalted logi=
cal processor. Fixed counter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "TOPDOWN.SLOTS",
> >>          "PublicDescription": "Number of available slots for an unhalt=
ed logical processor. The event increments by machine-width of the narrowes=
t pipeline as employed by the Top-down Microarchitecture Analysis method (T=
MA). The count is distributed among unhalted logical processors (hyper-thre=
ads) who share the same physical core. Software can use this event as the d=
enominator for the top-level metrics of the TMA method. This architectural =
event is counted on a designated fixed counter (Fixed Counter 3).",
> >>          "SampleAfterValue": "10000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json b/t=
ools/perf/pmu-events/arch/x86/icelakex/pipeline.json
> >> index 176e5ef..9303c70 100644
> >> --- a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
> >> @@ -193,6 +193,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hen the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction or the MWAIT instruction. This event is not=
 affected by core frequency changes (for example, P states, TM2 transitions=
) but has the same incrementing frequency as the time stamp counter. This e=
vent can approximate elapsed time while the core was not in a halt state. T=
his event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It=
 is counted on a dedicated fixed counter, leaving the eight programmable co=
unters available for other events. Note: On all current platforms this even=
t stops counting during 'throttling (TM)' states duty off periods the proce=
ssor is 'halted'.  The counter update is done at a lower clock rate then th=
e core clock the overflow status bit for this counter may appear 'sticky'. =
 After the counter has overflowed and software clears the overflow status b=
it and resets the counter to less than MAX. The reset value to the counter =
is not clocked immediately so the overflow status bit will flip 'high (1)' =
and generate another PMI (if enabled) after which the reset value gets cloc=
ked into the counter. Therefore, software will get the interrupt, read the =
overflow status bit '1 for bit 34 while the counter value is less than MAX.=
 Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -208,6 +209,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the thread is not in a halt state. The thread enters the halt state when it=
 is running the HLT instruction. This event is a component in many key even=
t ratios. The core frequency may change from time to time due to transition=
s associated with Enhanced Intel SpeedStep Technology or TM2. For this reas=
on this event may have a changing ratio with regards to time. When the core=
 frequency is constant, this event can approximate elapsed time while the c=
ore was not in the halt state. It is counted on a dedicated fixed counter, =
leaving the eight programmable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -335,6 +337,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Number of instructions retired. Fixed Co=
unter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "PublicDescription": "Counts the number of instructions retir=
ed - an Architectural PerfMon event. Counting continues during hardware int=
errupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is c=
ounted by a designated fixed counter freeing up programmable counters to co=
unt other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
> >> @@ -359,6 +362,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Precise instruction retired event with a=
 reduced effect of PEBS shadow in IP distribution",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.PREC_DIST",
> >>          "PEBS": "1",
> >>          "PublicDescription": "A version of INST_RETIRED that allows f=
or a more unbiased distribution of samples across instructions retired. It =
utilizes the Precise Distribution of Instructions Retired (PDIR) feature to=
 mitigate some bias in how retired instructions get sampled. Use on Fixed C=
ounter 0.",
> >> @@ -544,6 +548,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "TMA slots available for an unhalted logi=
cal processor. Fixed counter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "TOPDOWN.SLOTS",
> >>          "PublicDescription": "Number of available slots for an unhalt=
ed logical processor. The event increments by machine-width of the narrowes=
t pipeline as employed by the Top-down Microarchitecture Analysis method (T=
MA). The count is distributed among unhalted logical processors (hyper-thre=
ads) who share the same physical core. Software can use this event as the d=
enominator for the top-level metrics of the TMA method. This architectural =
event is counted on a designated fixed counter (Fixed Counter 3).",
> >>          "SampleAfterValue": "10000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json=
 b/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
> >> index b6ce14e..ae57663 100644
> >> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
> >> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
> >> @@ -892,6 +892,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Clockticks of the uncore caching and hom=
e agent (CHA)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_CHA_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "CHA"
> >> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconne=
ct.json b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
> >> index f87ea3f..0b88598 100644
> >> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
> >> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
> >> @@ -1419,6 +1419,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Clockticks of the mesh to memory (M2M)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M2M_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "M2M"
> >> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.jso=
n b/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
> >> index 814d959..b0b2f27 100644
> >> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
> >> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json
> >> @@ -100,6 +100,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "DRAM Clockticks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "iMC"
> >> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json=
 b/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
> >> index ee4dac6..9c4cd59 100644
> >> --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
> >> +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "Clockticks of the power control unit (PC=
U)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_P_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Clockticks of the power control unit (P=
CU) : The PCU runs off a fixed 1 GHz clock.  This event counts the number o=
f pclk cycles measured while the counter was enabled.  The pclk, like the M=
emory Controller's dclk, counts at a constant rate making it a good measure=
 of actual wall time.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json b/=
tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
> >> index 30a3da9..2df2d21 100644
> >> --- a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
> >> @@ -326,6 +326,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x3"
> >> @@ -348,6 +349,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -355,6 +357,7 @@
> >>      {
> >>          "AnyThread": "1",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "PublicDescription": "Core cycles when at least one thread on=
 the physical core is not in halt state.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json b/to=
ols/perf/pmu-events/arch/x86/ivytown/pipeline.json
> >> index 30a3da9..6f6f281 100644
> >> --- a/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json
> >> @@ -326,6 +326,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x3"
> >> @@ -348,6 +349,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -355,6 +357,7 @@
> >>      {
> >>          "AnyThread": "1",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "PublicDescription": "Core cycles when at least one thread on=
 the physical core is not in halt state.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -510,6 +513,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired from execution.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x1"
> >> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json =
b/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
> >> index 8bf2706..31e58fb 100644
> >> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
> >> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "Uncore Clocks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_C_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "CBOX"
> >> @@ -1533,6 +1534,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "uclks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_H_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of uclks in the HA.  =
This will be slightly different than the count in the Ubox because of enabl=
e/freeze delays.  The HA is on the other side of the die from the fixed Ubo=
x uclk counter, so the drift could be somewhat larger than in units that ar=
e closer like the QPI Agent.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnec=
t.json b/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
> >> index 914d2cf..10e315c 100644
> >> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
> >> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json
> >> @@ -109,6 +109,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Clocks in the IRP",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_I_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Number of clocks in the IRP.",
> >> @@ -1522,6 +1523,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Fli=
ts",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each flit is made up of 80 bits of information (in addition to some ECC=
 data).  In full-width (L0) mode, flits are made up of four fits, each of w=
hich contains 20 bits of data (along with some additional ECC data).   In h=
alf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twi=
ce as many fits to transmit a flit.  When one talks about QPI speed (for ex=
ample, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the =
system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can ca=
lculate the bandwidth of the link by taking: flits*80b/time.  Note that thi=
s is not the same as data bandwidth.  For example, when we are transferring=
 a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header=
 information and 8 with 64 bits of actual data and an additional 16 bits of=
 other information.  To calculate data bandwidth, one should therefore do: =
data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of dat=
a flits transmitted over QPI.  Each flit contains 64b of data.  This includ=
es both DRS and NCB data flits (coherent and non-coherent).  This can be us=
ed to calculate the data bandwidth of the QPI link.  One can get a good pic=
ture of the QPI-link characteristics by evaluating the protocol flits, data=
 flits, and idle/null flits.  This does not include the header flits that g=
o in data packets.",
> >> @@ -1530,6 +1532,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 0; Non-Data pr=
otocol Tx Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each flit is made up of 80 bits of information (in addition to some ECC=
 data).  In full-width (L0) mode, flits are made up of four fits, each of w=
hich contains 20 bits of data (along with some additional ECC data).   In h=
alf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twi=
ce as many fits to transmit a flit.  When one talks about QPI speed (for ex=
ample, 8.0 GT/s), the transfers here refer to fits.  Therefore, in L0, the =
system will transfer 1 flit at the rate of 1/4th the QPI speed.  One can ca=
lculate the bandwidth of the link by taking: flits*80b/time.  Note that thi=
s is not the same as data bandwidth.  For example, when we are transferring=
 a 64B cacheline across QPI, we will break it into 9 flits -- 1 with header=
 information and 8 with 64 bits of actual data and an additional 16 bits of=
 other information.  To calculate data bandwidth, one should therefore do: =
data flits * 8B / time (for L0) or 4B instead of 8B for L0p.; Number of non=
-NULL non-data flits transmitted across QPI.  This basically tracks the pro=
tocol overhead on the QPI link.  One can get a good picture of the QPI-link=
 characteristics by evaluating the protocol flits, data flits, and idle/nul=
l flits.  This includes the header flits for data packets.",
> >> @@ -1538,6 +1541,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (=
both Header and Data)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the total number of flits transmitted over QPI on the DR=
S (Data Response) channel.  DRS flits are used to transmit data with cohere=
ncy.",
> >> @@ -1546,6 +1550,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Fl=
its",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the total number of data flits transmitted over QPI on t=
he DRS (Data Response) channel.  DRS flits are used to transmit data with c=
oherency.  This does not count data flits transmitted over the NCB channel =
which transmits non-coherent data.  This includes only the data flits (not =
the header).",
> >> @@ -1554,6 +1559,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Header =
Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the total number of protocol flits transmitted over QPI =
on the DRS (Data Response) channel.  DRS flits are used to transmit data wi=
th coherency.  This does not count data flits transmitted over the NCB chan=
nel which transmits non-coherent data.  This includes only the header flits=
 (not the data).  This includes extended headers.",
> >> @@ -1562,6 +1568,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of flits transmitted over QPI on the home cha=
nnel.",
> >> @@ -1570,6 +1577,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Req=
uest Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of non-request flits transmitted over QPI on =
the home channel.  These are most commonly snoop responses, and this event =
can be used as a proxy for that.",
> >> @@ -1578,6 +1586,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Request=
 Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of data request transmitted over QPI on the h=
ome channel.  This basically counts the number of remote memory requests tr=
ansmitted over QPI.  In conjunction with the local read count in the Home A=
gent, one can calculate the number of LLC Misses.",
> >> @@ -1586,6 +1595,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three groups that allow us to track fl=
its.  It includes filters for SNP, HOM, and DRS message classes.  Each flit=
 is made up of 80 bits of information (in addition to some ECC data).  In f=
ull-width (L0) mode, flits are made up of four fits, each of which contains=
 20 bits of data (along with some additional ECC data).   In half-width (L0=
p) mode, the fits are only 10 bits, and therefore it takes twice as many fi=
ts to transmit a flit.  When one talks about QPI speed (for example, 8.0 GT=
/s), the transfers here refer to fits.  Therefore, in L0, the system will t=
ransfer 1 flit at the rate of 1/4th the QPI speed.  One can calculate the b=
andwidth of the link by taking: flits*80b/time.  Note that this is not the =
same as data bandwidth.  For example, when we are transferring a 64B cachel=
ine across QPI, we will break it into 9 flits -- 1 with header information =
and 8 with 64 bits of actual data and an additional 16 bits of other inform=
ation.  To calculate data bandwidth, one should therefore do: data flits * =
8B / time.; Counts the number of snoop request flits transmitted over QPI. =
 These requests are contained in the snoop channel.  This does not include =
snoop responses, which are transmitted on the home channel.",
> >> @@ -3104,6 +3114,7 @@
> >>      },
> >>      {
> >>          "EventName": "UNC_U_CLOCKTICKS",
> >> +        "EventCode": "0x0",
> >>          "PerPkg": "1",
> >>          "Unit": "UBOX"
> >>      },
> >> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json=
 b/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
> >> index 6550934..869a320 100644
> >> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
> >> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json
> >> @@ -131,6 +131,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "DRAM Clockticks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_DCLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "iMC"
> >> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json =
b/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
> >> index 5df1ebf..0a5d0c3 100644
> >> --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
> >> +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "pclk Cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_P_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "The PCU runs off a fixed 800 MHz clock.=
  This event counts the number of pclk cycles measured while the counter wa=
s enabled.  The pclk, like the Memory Controller's dclk, counts at a consta=
nt rate making it a good measure of actual wall time.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json b/t=
ools/perf/pmu-events/arch/x86/jaketown/pipeline.json
> >> index d0edfde..76b515d 100644
> >> --- a/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json
> >> @@ -329,6 +329,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "This event counts the number of referen=
ce cycles when the core is not in a halt state. The core enters the halt st=
ate when it is running the HLT instruction or the MWAIT instruction. This e=
vent is not affected by core frequency changes (for example, P states, TM2 =
transitions) but has the same incrementing frequency as the time stamp coun=
ter. This event can approximate elapsed time while the core was not in a ha=
lt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCL=
K event. It is counted on a dedicated fixed counter, leaving the four (eigh=
t when Hyperthreading is disabled) programmable counters available for othe=
r events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -351,6 +352,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "This event counts the number of core cy=
cles while the thread is not in a halt state. The thread enters the halt st=
ate when it is running the HLT instruction. This event is a component in ma=
ny key event ratios. The core frequency may change from time to time due to=
 transitions associated with Enhanced Intel SpeedStep Technology or TM2. Fo=
r this reason this event may have a changing ratio with regards to time. Wh=
en the core frequency is constant, this event can approximate elapsed time =
while the core was not in the halt state. It is counted on a dedicated fixe=
d counter, leaving the four (eight when Hyperthreading is disabled) program=
mable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -359,6 +361,7 @@
> >>      {
> >>          "AnyThread": "1",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -432,6 +435,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired from execution.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "This event counts the number of instruc=
tions retired from execution. For instructions that consist of multiple mic=
ro-ops, this event counts the retirement of the last micro-op of the instru=
ction. Counting continues during hardware interrupts, traps, and inside int=
errupt handlers.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json=
 b/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
> >> index 63395e7e..160f1c4 100644
> >> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
> >> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "Uncore Clocks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_C_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "CBOX"
> >> @@ -863,6 +864,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "uclks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_H_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of uclks in the HA.  =
This will be slightly different than the count in the Ubox because of enabl=
e/freeze delays.  The HA is on the other side of the die from the fixed Ubo=
x uclk counter, so the drift could be somewhat larger than in units that ar=
e closer like the QPI Agent.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconne=
ct.json b/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
> >> index 0fc907e..addab93 100644
> >> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
> >> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json
> >> @@ -109,6 +109,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Clocks in the IRP",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_I_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Number of clocks in the IRP.",
> >> @@ -847,6 +848,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 0; Data Tx Fli=
ts",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G0.DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each 'flit' is made up of 80 bits of information (in addition to some E=
CC data).  In full-width (L0) mode, flits are made up of four 'fits', each =
of which contains 20 bits of data (along with some additional ECC data).   =
In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes=
 twice as many fits to transmit a flit.  When one talks about QPI 'speed' (=
for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, i=
n L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.=
  One can calculate the bandwidth of the link by taking: flits*80b/time.  N=
ote that this is not the same as 'data' bandwidth.  For example, when we ar=
e transferring a 64B cacheline across QPI, we will break it into 9 flits --=
 1 with header information and 8 with 64 bits of actual 'data' and an addit=
ional 16 bits of other information.  To calculate 'data' bandwidth, one sho=
uld therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L=
0p.",
> >> @@ -855,6 +857,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 0; Idle and Nu=
ll Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G0.IDLE",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each 'flit' is made up of 80 bits of information (in addition to some E=
CC data).  In full-width (L0) mode, flits are made up of four 'fits', each =
of which contains 20 bits of data (along with some additional ECC data).   =
In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes=
 twice as many fits to transmit a flit.  When one talks about QPI 'speed' (=
for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, i=
n L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.=
  One can calculate the bandwidth of the link by taking: flits*80b/time.  N=
ote that this is not the same as 'data' bandwidth.  For example, when we ar=
e transferring a 64B cacheline across QPI, we will break it into 9 flits --=
 1 with header information and 8 with 64 bits of actual 'data' and an addit=
ional 16 bits of other information.  To calculate 'data' bandwidth, one sho=
uld therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L=
0p.",
> >> @@ -863,6 +866,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 0; Non-Data pr=
otocol Tx Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  It includes filters for Idle, protocol, and Data Flit=
s.  Each 'flit' is made up of 80 bits of information (in addition to some E=
CC data).  In full-width (L0) mode, flits are made up of four 'fits', each =
of which contains 20 bits of data (along with some additional ECC data).   =
In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes=
 twice as many fits to transmit a flit.  When one talks about QPI 'speed' (=
for example, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, i=
n L0, the system will transfer 1 'flit' at the rate of 1/4th the QPI speed.=
  One can calculate the bandwidth of the link by taking: flits*80b/time.  N=
ote that this is not the same as 'data' bandwidth.  For example, when we ar=
e transferring a 64B cacheline across QPI, we will break it into 9 flits --=
 1 with header information and 8 with 64 bits of actual 'data' and an addit=
ional 16 bits of other information.  To calculate 'data' bandwidth, one sho=
uld therefore do: data flits * 8B / time (for L0) or 4B instead of 8B for L=
0p.",
> >> @@ -871,6 +875,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Flits (=
both Header and Data)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three 'groups' that allow us to track =
flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'f=
lit' is made up of 80 bits of information (in addition to some ECC data).  =
In full-width (L0) mode, flits are made up of four 'fits', each of which co=
ntains 20 bits of data (along with some additional ECC data).   In half-wid=
th (L0p) mode, the fits are only 10 bits, and therefore it takes twice as m=
any fits to transmit a flit.  When one talks about QPI 'speed' (for example=
, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the s=
ystem will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can c=
alculate the bandwidth of the link by taking: flits*80b/time.  Note that th=
is is not the same as 'data' bandwidth.  For example, when we are transferr=
ing a 64B cacheline across QPI, we will break it into 9 flits -- 1 with hea=
der information and 8 with 64 bits of actual 'data' and an additional 16 bi=
ts of other information.  To calculate 'data' bandwidth, one should therefo=
re do: data flits * 8B / time.",
> >> @@ -879,6 +884,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Data Fl=
its",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three 'groups' that allow us to track =
flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'f=
lit' is made up of 80 bits of information (in addition to some ECC data).  =
In full-width (L0) mode, flits are made up of four 'fits', each of which co=
ntains 20 bits of data (along with some additional ECC data).   In half-wid=
th (L0p) mode, the fits are only 10 bits, and therefore it takes twice as m=
any fits to transmit a flit.  When one talks about QPI 'speed' (for example=
, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the s=
ystem will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can c=
alculate the bandwidth of the link by taking: flits*80b/time.  Note that th=
is is not the same as 'data' bandwidth.  For example, when we are transferr=
ing a 64B cacheline across QPI, we will break it into 9 flits -- 1 with hea=
der information and 8 with 64 bits of actual 'data' and an additional 16 bi=
ts of other information.  To calculate 'data' bandwidth, one should therefo=
re do: data flits * 8B / time.",
> >> @@ -887,6 +893,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; DRS Header =
Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three 'groups' that allow us to track =
flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'f=
lit' is made up of 80 bits of information (in addition to some ECC data).  =
In full-width (L0) mode, flits are made up of four 'fits', each of which co=
ntains 20 bits of data (along with some additional ECC data).   In half-wid=
th (L0p) mode, the fits are only 10 bits, and therefore it takes twice as m=
any fits to transmit a flit.  When one talks about QPI 'speed' (for example=
, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the s=
ystem will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can c=
alculate the bandwidth of the link by taking: flits*80b/time.  Note that th=
is is not the same as 'data' bandwidth.  For example, when we are transferr=
ing a 64B cacheline across QPI, we will break it into 9 flits -- 1 with hea=
der information and 8 with 64 bits of actual 'data' and an additional 16 bi=
ts of other information.  To calculate 'data' bandwidth, one should therefo=
re do: data flits * 8B / time.",
> >> @@ -895,6 +902,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three 'groups' that allow us to track =
flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'f=
lit' is made up of 80 bits of information (in addition to some ECC data).  =
In full-width (L0) mode, flits are made up of four 'fits', each of which co=
ntains 20 bits of data (along with some additional ECC data).   In half-wid=
th (L0p) mode, the fits are only 10 bits, and therefore it takes twice as m=
any fits to transmit a flit.  When one talks about QPI 'speed' (for example=
, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the s=
ystem will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can c=
alculate the bandwidth of the link by taking: flits*80b/time.  Note that th=
is is not the same as 'data' bandwidth.  For example, when we are transferr=
ing a 64B cacheline across QPI, we will break it into 9 flits -- 1 with hea=
der information and 8 with 64 bits of actual 'data' and an additional 16 bi=
ts of other information.  To calculate 'data' bandwidth, one should therefo=
re do: data flits * 8B / time.",
> >> @@ -903,6 +911,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Non-Req=
uest Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three 'groups' that allow us to track =
flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'f=
lit' is made up of 80 bits of information (in addition to some ECC data).  =
In full-width (L0) mode, flits are made up of four 'fits', each of which co=
ntains 20 bits of data (along with some additional ECC data).   In half-wid=
th (L0p) mode, the fits are only 10 bits, and therefore it takes twice as m=
any fits to transmit a flit.  When one talks about QPI 'speed' (for example=
, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the s=
ystem will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can c=
alculate the bandwidth of the link by taking: flits*80b/time.  Note that th=
is is not the same as 'data' bandwidth.  For example, when we are transferr=
ing a 64B cacheline across QPI, we will break it into 9 flits -- 1 with hea=
der information and 8 with 64 bits of actual 'data' and an additional 16 bi=
ts of other information.  To calculate 'data' bandwidth, one should therefo=
re do: data flits * 8B / time.",
> >> @@ -911,6 +920,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; HOM Request=
 Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three 'groups' that allow us to track =
flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'f=
lit' is made up of 80 bits of information (in addition to some ECC data).  =
In full-width (L0) mode, flits are made up of four 'fits', each of which co=
ntains 20 bits of data (along with some additional ECC data).   In half-wid=
th (L0p) mode, the fits are only 10 bits, and therefore it takes twice as m=
any fits to transmit a flit.  When one talks about QPI 'speed' (for example=
, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the s=
ystem will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can c=
alculate the bandwidth of the link by taking: flits*80b/time.  Note that th=
is is not the same as 'data' bandwidth.  For example, when we are transferr=
ing a 64B cacheline across QPI, we will break it into 9 flits -- 1 with hea=
der information and 8 with 64 bits of actual 'data' and an additional 16 bi=
ts of other information.  To calculate 'data' bandwidth, one should therefo=
re do: data flits * 8B / time.",
> >> @@ -919,6 +929,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Flits Transferred - Group 1; SNP Flits",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_Q_TxL_FLITS_G1.SNP",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of flits transmitted =
across the QPI Link.  This is one of three 'groups' that allow us to track =
flits.  It includes filters for SNP, HOM, and DRS message classes.  Each 'f=
lit' is made up of 80 bits of information (in addition to some ECC data).  =
In full-width (L0) mode, flits are made up of four 'fits', each of which co=
ntains 20 bits of data (along with some additional ECC data).   In half-wid=
th (L0p) mode, the fits are only 10 bits, and therefore it takes twice as m=
any fits to transmit a flit.  When one talks about QPI 'speed' (for example=
, 8.0 GT/s), the 'transfers' here refer to 'fits'.  Therefore, in L0, the s=
ystem will transfer 1 'flit' at the rate of 1/4th the QPI speed.  One can c=
alculate the bandwidth of the link by taking: flits*80b/time.  Note that th=
is is not the same as 'data' bandwidth.  For example, when we are transferr=
ing a 64B cacheline across QPI, we will break it into 9 flits -- 1 with hea=
der information and 8 with 64 bits of actual 'data' and an additional 16 bi=
ts of other information.  To calculate 'data' bandwidth, one should therefo=
re do: data flits * 8B / time.",
> >> @@ -1576,6 +1587,7 @@
> >>      },
> >>      {
> >>          "EventName": "UNC_U_CLOCKTICKS",
> >> +        "EventCode": "0x0",
> >>          "PerPkg": "1",
> >>          "Unit": "UBOX"
> >>      },
> >> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.jso=
n b/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
> >> index 6dcc9415..2385b0a 100644
> >> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
> >> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json
> >> @@ -65,6 +65,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "uclks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Uncore Fixed Counter - uclks",
> >> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json=
 b/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
> >> index b3ee5d7..f453afd 100644
> >> --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
> >> +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "pclk Cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_P_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "The PCU runs off a fixed 800 MHz clock.=
  This event counts the number of pclk cycles measured while the counter wa=
s enabled.  The pclk, like the Memory Controller's dclk, counts at a consta=
nt rate making it a good measure of actual wall time.",
> >> @@ -216,6 +217,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Cycles spent changing Frequency",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_P_FREQ_TRANS_CYCLES",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts the number of cycles when the sy=
stem is changing frequency.  This can not be filtered by thread ID.  One ca=
n also use it with the occupancy counter that monitors number of threads in=
 C0 to estimate the performance impact that frequency transitions had on th=
e system.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.js=
on b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
> >> index 3dc5321..a74d45a 100644
> >> --- a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json
> >> @@ -150,12 +150,14 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted reference clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x3"
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted core clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "This event counts the number of core cy=
cles while the thread is not in a halt state. The thread enters the halt st=
ate when it is running the HLT instruction. This event is a component in ma=
ny key event ratios. The core frequency may change from time to time due to=
 transitions associated with Enhanced Intel SpeedStep Technology or TM2. Fo=
r this reason this event may have a changing ratio with regards to time. Wh=
en the core frequency is constant, this event can approximate elapsed time =
while the core was not in the halt state. It is counted on a dedicated fixe=
d counter",
> >>          "SampleAfterValue": "2000003",
> >> @@ -177,6 +179,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of inst=
ructions retired",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "This event counts the number of instruc=
tions that retire.  For instructions that consist of multiple micro-ops, th=
is event counts exactly once, as the last micro-op of the instruction retir=
es.  The event continues counting while instructions retire, including duri=
ng interrupt service routines caused by hardware interrupts, faults or trap=
s.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cach=
e.json b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
> >> index 1b8dcfa..c062253 100644
> >> --- a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
> >> +++ b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json
> >> @@ -3246,6 +3246,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Uncore Clocks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_H_U_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "CHA"
> >> diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memo=
ry.json b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
> >> index fb75297..3575baa 100644
> >> --- a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
> >> +++ b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
> >> @@ -41,6 +41,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "ECLK count",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_E_E_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "EDC_ECLK"
> >> @@ -55,6 +56,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "UCLK count",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_E_U_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "EDC_UCLK"
> >> @@ -93,12 +95,14 @@
> >>      },
> >>      {
> >>          "BriefDescription": "DCLK count",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_D_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "iMC_DCLK"
> >>      },
> >>      {
> >>          "BriefDescription": "UCLK count",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_U_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "iMC_UCLK"
> >> diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json b=
/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
> >> index 352c5ef..27f2c81 100644
> >> --- a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
> >> @@ -368,6 +368,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted core clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.CORE",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2",
> >> @@ -427,6 +428,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted reference clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x3",
> >> @@ -434,6 +436,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hen the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction or the MWAIT instruction. This event is not=
 affected by core frequency changes (for example, P states, TM2 transitions=
) but has the same incrementing frequency as the time stamp counter. This e=
vent can approximate elapsed time while the core was not in a halt state. I=
t is counted on a dedicated fixed counter, leaving the eight programmable c=
ounters available for other events. Note: On all current platforms this eve=
nt stops counting during 'throttling (TM)' states duty off periods the proc=
essor is 'halted'.  The counter update is done at a lower clock rate then t=
he core clock the overflow status bit for this counter may appear 'sticky'.=
  After the counter has overflowed and software clears the overflow status =
bit and resets the counter to less than MAX. The reset value to the counter=
 is not clocked immediately so the overflow status bit will flip 'high (1)'=
 and generate another PMI (if enabled) after which the reset value gets clo=
cked into the counter. Therefore, software will get the interrupt, read the=
 overflow status bit '1 for bit 34 while the counter value is less than MAX=
=2E Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -460,6 +463,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted core clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2",
> >> @@ -467,6 +471,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the thread is not in a halt state. The thread enters the halt state when it=
 is running the HLT instruction. This event is a component in many key even=
t ratios. The core frequency may change from time to time due to transition=
s associated with Enhanced Intel SpeedStep Technology or TM2. For this reas=
on this event may have a changing ratio with regards to time. When the core=
 frequency is constant, this event can approximate elapsed time while the c=
ore was not in the halt state. It is counted on a dedicated fixed counter, =
leaving the eight programmable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -617,6 +622,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of inst=
ructions retired",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "SampleAfterValue": "2000003",
> >> @@ -625,6 +631,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Number of instructions retired. Fixed Co=
unter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "PublicDescription": "Counts the number of X86 instructions r=
etired - an Architectural PerfMon event. Counting continues during hardware=
 interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY =
is counted by a designated fixed counter freeing up programmable counters t=
o count other events. INST_RETIRED.ANY_P is counted by a programmable count=
er.",
> >> @@ -668,6 +675,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Precise instruction retired with PEBS pr=
ecise-distribution",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.PREC_DIST",
> >>          "PEBS": "1",
> >>          "PublicDescription": "A version of INST_RETIRED that allows f=
or a precise distribution of samples across instructions retired. It utiliz=
es the Precise Distribution of Instructions Retired (PDIR++) feature to fix=
 bias in how retired instructions get sampled. Use on Fixed Counter 0.",
> >> @@ -1006,6 +1014,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "TMA slots available for an unhalted logi=
cal processor. Fixed counter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "TOPDOWN.SLOTS",
> >>          "PublicDescription": "Number of available slots for an unhalt=
ed logical processor. The event increments by machine-width of the narrowes=
t pipeline as employed by the Top-down Microarchitecture Analysis method (T=
MA). The count is distributed among unhalted logical processors (hyper-thre=
ads) who share the same physical core. Software can use this event as the d=
enominator for the top-level metrics of the TMA method. This architectural =
event is counted on a designated fixed counter (Fixed Counter 3).",
> >>          "SampleAfterValue": "10000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json b=
/tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json
> >> index 375b780..22085f4 100644
> >> --- a/tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json
> >> @@ -193,6 +193,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hen the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction or the MWAIT instruction. This event is not=
 affected by core frequency changes (for example, P states, TM2 transitions=
) but has the same incrementing frequency as the time stamp counter. This e=
vent can approximate elapsed time while the core was not in a halt state. T=
his event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It=
 is counted on a dedicated fixed counter, leaving the eight programmable co=
unters available for other events. Note: On all current platforms this even=
t stops counting during 'throttling (TM)' states duty off periods the proce=
ssor is 'halted'.  The counter update is done at a lower clock rate then th=
e core clock the overflow status bit for this counter may appear 'sticky'. =
 After the counter has overflowed and software clears the overflow status b=
it and resets the counter to less than MAX. The reset value to the counter =
is not clocked immediately so the overflow status bit will flip 'high (1)' =
and generate another PMI (if enabled) after which the reset value gets cloc=
ked into the counter. Therefore, software will get the interrupt, read the =
overflow status bit '1 for bit 34 while the counter value is less than MAX.=
 Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -359,6 +360,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Precise instruction retired event with a=
 reduced effect of PEBS shadow in IP distribution",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.PREC_DIST",
> >>          "PEBS": "1",
> >>          "PublicDescription": "A version of INST_RETIRED that allows f=
or a more unbiased distribution of samples across instructions retired. It =
utilizes the Precise Distribution of Instructions Retired (PDIR) feature to=
 mitigate some bias in how retired instructions get sampled. Use on Fixed C=
ounter 0.",
> >> @@ -562,6 +564,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "TMA slots available for an unhalted logi=
cal processor. Fixed counter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "TOPDOWN.SLOTS",
> >>          "PublicDescription": "Number of available slots for an unhalt=
ed logical processor. The event increments by machine-width of the narrowes=
t pipeline as employed by the Top-down Microarchitecture Analysis method (T=
MA). The count is distributed among unhalted logical processors (hyper-thre=
ads) who share the same physical core. Software can use this event as the d=
enominator for the top-level metrics of the TMA method. This architectural =
event is counted on a designated fixed counter (Fixed Counter 3).",
> >>          "SampleAfterValue": "10000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json =
b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
> >> index ecaf94c..973a5f4 100644
> >> --- a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
> >> @@ -337,6 +337,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "This event counts the number of referen=
ce cycles when the core is not in a halt state. The core enters the halt st=
ate when it is running the HLT instruction or the MWAIT instruction. This e=
vent is not affected by core frequency changes (for example, P states, TM2 =
transitions) but has the same incrementing frequency as the time stamp coun=
ter. This event can approximate elapsed time while the core was not in a ha=
lt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCL=
K event. It is counted on a dedicated fixed counter, leaving the four (eigh=
t when Hyperthreading is disabled) programmable counters available for othe=
r events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -359,6 +360,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "This event counts the number of core cy=
cles while the thread is not in a halt state. The thread enters the halt st=
ate when it is running the HLT instruction. This event is a component in ma=
ny key event ratios. The core frequency may change from time to time due to=
 transitions associated with Enhanced Intel SpeedStep Technology or TM2. Fo=
r this reason this event may have a changing ratio with regards to time. Wh=
en the core frequency is constant, this event can approximate elapsed time =
while the core was not in the halt state. It is counted on a dedicated fixe=
d counter, leaving the four (eight when Hyperthreading is disabled) program=
mable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -367,6 +369,7 @@
> >>      {
> >>          "AnyThread": "1",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -440,6 +443,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired from execution.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "This event counts the number of instruc=
tions retired from execution. For instructions that consist of multiple mic=
ro-ops, this event counts the retirement of the last micro-op of the instru=
ction. Counting continues during hardware interrupts, traps, and inside int=
errupt handlers.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.js=
on b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
> >> index 6dcf3b7..cfbc0d2 100644
> >> --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
> >> @@ -284,6 +284,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hen the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction or the MWAIT instruction. This event is not=
 affected by core frequency changes (for example, P states, TM2 transitions=
) but has the same incrementing frequency as the time stamp counter. This e=
vent can approximate elapsed time while the core was not in a halt state. I=
t is counted on a dedicated fixed counter, leaving the eight programmable c=
ounters available for other events. Note: On all current platforms this eve=
nt stops counting during 'throttling (TM)' states duty off periods the proc=
essor is 'halted'.  The counter update is done at a lower clock rate then t=
he core clock the overflow status bit for this counter may appear 'sticky'.=
  After the counter has overflowed and software clears the overflow status =
bit and resets the counter to less than MAX. The reset value to the counter=
 is not clocked immediately so the overflow status bit will flip 'high (1)'=
 and generate another PMI (if enabled) after which the reset value gets clo=
cked into the counter. Therefore, software will get the interrupt, read the=
 overflow status bit '1 for bit 34 while the counter value is less than MAX=
=2E Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -299,6 +300,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the thread is not in a halt state. The thread enters the halt state when it=
 is running the HLT instruction. This event is a component in many key even=
t ratios. The core frequency may change from time to time due to transition=
s associated with Enhanced Intel SpeedStep Technology or TM2. For this reas=
on this event may have a changing ratio with regards to time. When the core=
 frequency is constant, this event can approximate elapsed time while the c=
ore was not in the halt state. It is counted on a dedicated fixed counter, =
leaving the eight programmable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -426,6 +428,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Number of instructions retired. Fixed Co=
unter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "PublicDescription": "Counts the number of X86 instructions r=
etired - an Architectural PerfMon event. Counting continues during hardware=
 interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY =
is counted by a designated fixed counter freeing up programmable counters t=
o count other events. INST_RETIRED.ANY_P is counted by a programmable count=
er.",
> >> @@ -457,6 +460,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Precise instruction retired with PEBS pr=
ecise-distribution",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.PREC_DIST",
> >>          "PEBS": "1",
> >>          "PublicDescription": "A version of INST_RETIRED that allows f=
or a precise distribution of samples across instructions retired. It utiliz=
es the Precise Distribution of Instructions Retired (PDIR++) feature to fix=
 bias in how retired instructions get sampled. Use on Fixed Counter 0.",
> >> @@ -719,6 +723,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "TMA slots available for an unhalted logi=
cal processor. Fixed counter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "TOPDOWN.SLOTS",
> >>          "PublicDescription": "Number of available slots for an unhalt=
ed logical processor. The event increments by machine-width of the narrowes=
t pipeline as employed by the Top-down Microarchitecture Analysis method (T=
MA). The count is distributed among unhalted logical processors (hyper-thre=
ads) who share the same physical core. Software can use this event as the d=
enominator for the top-level metrics of the TMA method. This architectural =
event is counted on a designated fixed counter (Fixed Counter 3).",
> >>          "SampleAfterValue": "10000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json=
 b/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
> >> index 4121295..67be689 100644
> >> --- a/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
> >> @@ -17,6 +17,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted core clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.CORE",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -29,6 +30,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted reference clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x3"
> >> @@ -43,6 +45,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted core clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -55,6 +58,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of inst=
ructions retired",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json b=
/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
> >> index 2d4214b..6423c01 100644
> >> --- a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
> >> @@ -143,6 +143,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted core clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.CORE",
> >>          "PublicDescription": "Counts the number of core cycles while =
the core is not in a halt state. The core enters the halt state when it is =
running the HLT instruction. This event is a component in many key event ra=
tios.  The core frequency may change from time to time. For this reason thi=
s event may have a changing ratio with regards to time. In systems with a c=
onstant core frequency, this event can give you a measurement of the elapse=
d time while the core was not in halt state by dividing the event count by =
the core frequency. This event is architecturally defined and is a designat=
ed fixed counter.  CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.CORE_P use th=
e core frequency which may change from time to time.  CPU_CLK_UNHALTE.REF_T=
SC and CPU_CLK_UNHALTED.REF are not affected by core frequency changes but =
counts as if the core is running at the maximum frequency all the time.  Th=
e fixed events are CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.REF_TSC and t=
he programmable events are CPU_CLK_UNHALTED.CORE_P and CPU_CLK_UNHALTED.REF=
=2E",
> >>          "SampleAfterValue": "2000003",
> >> @@ -165,6 +166,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of unha=
lted reference clock cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hile the core is not in a halt state. The core enters the halt state when i=
t is running the HLT instruction. This event is a component in many key eve=
nt ratios.  The core frequency may change from time. This event is not affe=
cted by core frequency changes but counts as if the core is running at the =
maximum frequency all the time.  Divide this event count by core frequency =
to determine the elapsed time while the core was not in halt state.  Divide=
 this event count by core frequency to determine the elapsed time while the=
 core was not in halt state.  This event is architecturally defined and is =
a designated fixed counter.  CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.COR=
E_P use the core frequency which may change from time to time.  CPU_CLK_UNH=
ALTE.REF_TSC and CPU_CLK_UNHALTED.REF are not affected by core frequency ch=
anges but counts as if the core is running at the maximum frequency all the=
 time.  The fixed events are CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.REF=
_TSC and the programmable events are CPU_CLK_UNHALTED.CORE_P and CPU_CLK_UN=
HALTED.REF.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -180,6 +182,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Fixed Counter: Counts the number of inst=
ructions retired",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "This event counts the number of instruc=
tions that retire.  For instructions that consist of multiple micro-ops, th=
is event counts exactly once, as the last micro-op of the instruction retir=
es.  The event continues counting while instructions retire, including duri=
ng interrupt service routines caused by hardware interrupts, faults or trap=
s.  Background: Modern microprocessors employ extensive pipelining and spec=
ulative techniques.  Since sometimes an instruction is started but never co=
mpleted, the notion of \"retirement\" is introduced.  A retired instruction=
 is one that commits its states. Or stated differently, an instruction migh=
t be abandoned at some point. No instruction is truly finished until it ret=
ires.  This counter measures the number of completed instructions.  The fix=
ed event is INST_RETIRED.ANY and the programmable event is INST_RETIRED.ANY=
_P.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json b/to=
ols/perf/pmu-events/arch/x86/skylake/pipeline.json
> >> index cd3e737..d790f82 100644
> >> --- a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/skylake/pipeline.json
> >> @@ -191,6 +191,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hen the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction or the MWAIT instruction. This event is not=
 affected by core frequency changes (for example, P states, TM2 transitions=
) but has the same incrementing frequency as the time stamp counter. This e=
vent can approximate elapsed time while the core was not in a halt state. T=
his event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It=
 is counted on a dedicated fixed counter, leaving the four (eight when Hype=
rthreading is disabled) programmable counters available for other events. N=
ote: On all current platforms this event stops counting during 'throttling =
(TM)' states duty off periods the processor is 'halted'.  The counter updat=
e is done at a lower clock rate then the core clock the overflow status bit=
 for this counter may appear 'sticky'.  After the counter has overflowed an=
d software clears the overflow status bit and resets the counter to less th=
an MAX. The reset value to the counter is not clocked immediately so the ov=
erflow status bit will flip 'high (1)' and generate another PMI (if enabled=
) after which the reset value gets clocked into the counter. Therefore, sof=
tware will get the interrupt, read the overflow status bit '1 for bit 34 wh=
ile the counter value is less than MAX. Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -222,6 +223,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the thread is not in a halt state. The thread enters the halt state when it=
 is running the HLT instruction. This event is a component in many key even=
t ratios. The core frequency may change from time to time due to transition=
s associated with Enhanced Intel SpeedStep Technology or TM2. For this reas=
on this event may have a changing ratio with regards to time. When the core=
 frequency is constant, this event can approximate elapsed time while the c=
ore was not in the halt state. It is counted on a dedicated fixed counter, =
leaving the four (eight when Hyperthreading is disabled) programmable count=
ers available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -230,6 +232,7 @@
> >>      {
> >>          "AnyThread": "1",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -369,6 +372,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired from execution.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "Counts the number of instructions retir=
ed from execution. For instructions that consist of multiple micro-ops, Cou=
nts the retirement of the last micro-op of the instruction. Counting contin=
ues during hardware interrupts, traps, and inside interrupt handlers. Notes=
: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the fo=
ur (eight when Hyperthreading is disabled) programmable counters available =
for other events. INST_RETIRED.ANY_P is counted by a programmable counter a=
nd it is an architectural performance event. Counting: Faulting executions =
of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json b/t=
ools/perf/pmu-events/arch/x86/skylakex/pipeline.json
> >> index 66d686c..efda247 100644
> >> --- a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
> >> @@ -200,6 +200,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hen the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction or the MWAIT instruction. This event is not=
 affected by core frequency changes (for example, P states, TM2 transitions=
) but has the same incrementing frequency as the time stamp counter. This e=
vent can approximate elapsed time while the core was not in a halt state. T=
his event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It=
 is counted on a dedicated fixed counter, leaving the four (eight when Hype=
rthreading is disabled) programmable counters available for other events. N=
ote: On all current platforms this event stops counting during 'throttling =
(TM)' states duty off periods the processor is 'halted'.  The counter updat=
e is done at a lower clock rate then the core clock the overflow status bit=
 for this counter may appear 'sticky'.  After the counter has overflowed an=
d software clears the overflow status bit and resets the counter to less th=
an MAX. The reset value to the counter is not clocked immediately so the ov=
erflow status bit will flip 'high (1)' and generate another PMI (if enabled=
) after which the reset value gets clocked into the counter. Therefore, sof=
tware will get the interrupt, read the overflow status bit '1 for bit 34 wh=
ile the counter value is less than MAX. Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -231,6 +232,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the thread is not in a halt state. The thread enters the halt state when it=
 is running the HLT instruction. This event is a component in many key even=
t ratios. The core frequency may change from time to time due to transition=
s associated with Enhanced Intel SpeedStep Technology or TM2. For this reas=
on this event may have a changing ratio with regards to time. When the core=
 frequency is constant, this event can approximate elapsed time while the c=
ore was not in the halt state. It is counted on a dedicated fixed counter, =
leaving the four (eight when Hyperthreading is disabled) programmable count=
ers available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -239,6 +241,7 @@
> >>      {
> >>          "AnyThread": "1",
> >>          "BriefDescription": "Core cycles when at least one thread on =
the physical core is not in halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD_ANY",
> >>          "SampleAfterValue": "2000003",
> >>          "UMask": "0x2"
> >> @@ -378,6 +381,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Instructions retired from execution.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PublicDescription": "Counts the number of instructions retir=
ed from execution. For instructions that consist of multiple micro-ops, Cou=
nts the retirement of the last micro-op of the instruction. Counting contin=
ues during hardware interrupts, traps, and inside interrupt handlers. Notes=
: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the fo=
ur (eight when Hyperthreading is disabled) programmable counters available =
for other events. INST_RETIRED.ANY_P is counted by a programmable counter a=
nd it is an architectural performance event. Counting: Faulting executions =
of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.",
> >>          "SampleAfterValue": "2000003",
> >> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json=
 b/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
> >> index 543dfc1..4df1294 100644
> >> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
> >> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
> >> @@ -460,6 +460,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Clockticks of the uncore caching & home =
agent (CHA)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_CHA_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts clockticks of the clock controll=
ing the uncore caching and home agent (CHA).",
> >> @@ -5678,6 +5679,7 @@
> >>      {
> >>          "BriefDescription": "This event is deprecated. Refer to new e=
vent UNC_CHA_CLOCKTICKS",
> >>          "Deprecated": "1",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_C_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "CHA"
> >> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconne=
ct.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
> >> index 3eece8a7..771ce55 100644
> >> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
> >> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
> >> @@ -1090,6 +1090,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Cycles - at UCLK",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M2M_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "M2M"
> >> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json b/=
tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
> >> index 2a3a709..21a6a0f 100644
> >> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
> >> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
> >> @@ -1271,6 +1271,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counting disabled",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_IIO_NOTHING",
> >>          "PerPkg": "1",
> >>          "Unit": "IIO"
> >> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.jso=
n b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
> >> index 7a40aa0..919ce2e 100644
> >> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
> >> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
> >> @@ -167,6 +167,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Memory controller clock ticks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Counts clockticks of the fixed frequenc=
y clock of the memory controller using one of the programmable counters.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json=
 b/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
> >> index c6254af..a01b279 100644
> >> --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
> >> +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "pclk Cycles",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_P_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "The PCU runs off a fixed 1 GHz clock.  =
This event counts the number of pclk cycles measured while the counter was =
enabled.  The pclk, like the Memory Controller's dclk, counts at a constant=
 rate making it a good measure of actual wall time.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json b=
/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
> >> index c483c08..2e40cd0 100644
> >> --- a/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json
> >> @@ -150,6 +150,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counts the number of unhalted reference =
clock cycles at TSC frequency. (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles t=
hat the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction. This event is not affected by core frequen=
cy changes and increments at a fixed frequency that is also used for the Ti=
me Stamp Counter (TSC). This event uses fixed counter 2.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -180,6 +181,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Counts the total number of instructions =
retired. (Fixed event)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "PublicDescription": "Counts the total number of instructions=
 that retired. For instructions that consist of multiple uops, this event c=
ounts the retirement of the last uop of the instruction. This event continu=
es counting during hardware interrupts, traps, and inside interrupt handler=
s. This event uses fixed counter 0.",
> >> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.js=
on b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
> >> index a68a5bb..279381b 100644
> >> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
> >> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
> >> @@ -872,6 +872,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Uncore cache clock ticks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_CHA_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "CHA"
> >> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-intercon=
nect.json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.j=
son
> >> index 7e2895f..ba8b654 100644
> >> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.js=
on
> >> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.js=
on
> >> @@ -1419,6 +1419,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Clockticks of the mesh to memory (M2M)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M2M_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "M2M"
> >> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.j=
son b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
> >> index b80911d..8278095 100644
> >> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
> >> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
> >> @@ -120,6 +120,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Memory controller clock ticks",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_M_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "PublicDescription": "Clockticks of the integrated memory con=
troller (IMC)",
> >> diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.js=
on b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
> >> index a61ffca..5251c6d 100644
> >> --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
> >> +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
> >> @@ -1,6 +1,7 @@
> >>  [
> >>      {
> >>          "BriefDescription": "Clockticks of the power control unit (PC=
U)",
> >> +        "EventCode": "0x0",
> >>          "EventName": "UNC_P_CLOCKTICKS",
> >>          "PerPkg": "1",
> >>          "Unit": "PCU"
> >> diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json b/=
tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
> >> index 541bf1d..215f253 100644
> >> --- a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
> >> +++ b/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
> >> @@ -193,6 +193,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Reference cycles when the core is not in=
 halt state.",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.REF_TSC",
> >>          "PublicDescription": "Counts the number of reference cycles w=
hen the core is not in a halt state. The core enters the halt state when it=
 is running the HLT instruction or the MWAIT instruction. This event is not=
 affected by core frequency changes (for example, P states, TM2 transitions=
) but has the same incrementing frequency as the time stamp counter. This e=
vent can approximate elapsed time while the core was not in a halt state. T=
his event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It=
 is counted on a dedicated fixed counter, leaving the eight programmable co=
unters available for other events. Note: On all current platforms this even=
t stops counting during 'throttling (TM)' states duty off periods the proce=
ssor is 'halted'.  The counter update is done at a lower clock rate then th=
e core clock the overflow status bit for this counter may appear 'sticky'. =
 After the counter has overflowed and software clears the overflow status b=
it and resets the counter to less than MAX. The reset value to the counter =
is not clocked immediately so the overflow status bit will flip 'high (1)' =
and generate another PMI (if enabled) after which the reset value gets cloc=
ked into the counter. Therefore, software will get the interrupt, read the =
overflow status bit '1 for bit 34 while the counter value is less than MAX.=
 Software should ignore this case.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -208,6 +209,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Core cycles when the thread is not in ha=
lt state",
> >> +        "EventCode": "0x0",
> >>          "EventName": "CPU_CLK_UNHALTED.THREAD",
> >>          "PublicDescription": "Counts the number of core cycles while =
the thread is not in a halt state. The thread enters the halt state when it=
 is running the HLT instruction. This event is a component in many key even=
t ratios. The core frequency may change from time to time due to transition=
s associated with Enhanced Intel SpeedStep Technology or TM2. For this reas=
on this event may have a changing ratio with regards to time. When the core=
 frequency is constant, this event can approximate elapsed time while the c=
ore was not in the halt state. It is counted on a dedicated fixed counter, =
leaving the eight programmable counters available for other events.",
> >>          "SampleAfterValue": "2000003",
> >> @@ -352,6 +354,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Number of instructions retired. Fixed Co=
unter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.ANY",
> >>          "PEBS": "1",
> >>          "PublicDescription": "Counts the number of X86 instructions r=
etired - an Architectural PerfMon event. Counting continues during hardware=
 interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY =
is counted by a designated fixed counter freeing up programmable counters t=
o count other events. INST_RETIRED.ANY_P is counted by a programmable count=
er.",
> >> @@ -377,6 +380,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "Precise instruction retired event with a=
 reduced effect of PEBS shadow in IP distribution",
> >> +        "EventCode": "0x0",
> >>          "EventName": "INST_RETIRED.PREC_DIST",
> >>          "PEBS": "1",
> >>          "PublicDescription": "A version of INST_RETIRED that allows f=
or a more unbiased distribution of samples across instructions retired. It =
utilizes the Precise Distribution of Instructions Retired (PDIR) feature to=
 mitigate some bias in how retired instructions get sampled. Use on Fixed C=
ounter 0.",
> >> @@ -570,6 +574,7 @@
> >>      },
> >>      {
> >>          "BriefDescription": "TMA slots available for an unhalted logi=
cal processor. Fixed counter - architectural event",
> >> +        "EventCode": "0x0",
> >>          "EventName": "TOPDOWN.SLOTS",
> >>          "PublicDescription": "Number of available slots for an unhalt=
ed logical processor. The event increments by machine-width of the narrowes=
t pipeline as employed by the Top-down Microarchitecture Analysis method (T=
MA). The count is distributed among unhalted logical processors (hyper-thre=
ads) who share the same physical core. Software can use this event as the d=
enominator for the top-level metrics of the TMA method. This architectural =
event is counted on a designated fixed counter (Fixed Counter 3).",
> >>          "SampleAfterValue": "10000003",
> >> --
> >> 1.8.3.1
> >>

--=20

- Arnaldo