by Stephane Eranian

[permalink] [raw]

Subject: Re: [PATCH V5 RESEND 00/14] TopDown metrics support for Icelake

Hi,

On Fri, Jan 10, 2020 at 5:17 AM Peter Zijlstra <[email protected]> wrote:
>
> On Mon, Jan 06, 2020 at 12:29:05PM -0800, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > Icelake has support for measuring the level 1 TopDown metrics
> > directly in hardware. This is implemented by an additional METRICS
> > register, and a new Fixed Counter 3 that measures pipeline SLOTS.
> >
> > New in Icelake
> > - Do not require generic counters. This allows to collect TopDown always
> > in addition to other events.
> > - Measuring TopDown per thread/process instead of only per core
> >
> > For the Ice Lake implementation of performance metrics, the values in
> > PERF_METRICS MSR are derived from fixed counter 3. Software should start
> > both registers, PERF_METRICS and fixed counter 3, from zero.
> > Additionally, software is recommended to periodically clear both
> > registers in order to maintain accurate measurements. The latter is
> > required for certain scenarios that involve sampling metrics at high
> > rates. Software should always write fixed counter 3 before write to
> > PERF_METRICS.
>
> Do we really have to support this trainwreck? This is such ill designed
> hardware, I'm loath to support it, it might encourage more such
> 'creative' things and we really don't need that.
>
Yes, we do because it provides important information per hyper-thread.

I understand that the hardware is convoluted to support because it
introduces a new concept: a single counter computing multiple high
level metrics. It is difficult to abstract cleanly especially when you
add on top that it is connected with a new fixed counter (SLOTS).

The challenges I see:
- single MSR containing multiple non monotonically incrementing fields
- point of reference. Need to know when fields were zeroed to
understand on what part of the execution the topdown percentages are
computed.
- must combine with fixed counter 3 (SLOTS) to operate correctly

I see two ways of supporting this new concept.

1/ Abstract as individual events

In Kan's approach, the nature of the PERF_METRICS MSR is hidden.
He exposes the individual metrics as pseudo-events: topdown-retiring,
topdown-bad-spec, slots, ...
These events are based on the fields of the PERF_METRICS (except slots).

Given that each field is a percentage, he chose to scale them by SLOTS
to expose them as monotonically incrementing events. This makes it
easier on the perf tool.

To ensure the pseudo-events make sense, it is necessary to put them
into a single event group.
That also helps the kernel with a single WRMSR/RDMSR for all 4 metrics.

Given that the point of reference is important, any read of the group
resets the fields.

With this approach, the perf tool has no changes required, except
recomputing the topdown percentages from the scale counts.

2/ Abstract the multi-metric MSR

This is another approach, whereby we could export a new abstraction of
a structured counter. The kernel could publish the structure of the
counter
like it does today for the structure of the config register
(/sys/devices/cpu/format). The tool would parse the format and extract
the fields from the
64-bit value of the MSR. The width and unit would be part of the
format, just like what is done for some pseudo events already.
To program this MSR, you'd have to add a single pseudo event, e.g.,
TOPDOWN_L1. The grouping would be implicit.
The point of reference approach would be the same as the first
approach: any read would reset the counts.
The kernel would still have to handle the SLOTS counter.
This approach requires fewer changes to the kernel but more in the tool.

If you have another approach in mind, please share it.
The PERF_METRICS hardware is very useful, we cannot really afford not
having it supported.
I am happy to help.

2020-04-20 17:05:57

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH V5 RESEND 00/14] TopDown metrics support for Icelake

On Mon, Apr 20, 2020 at 09:00:56AM -0700, Stephane Eranian wrote:
> Hi,
>
> On Fri, Jan 10, 2020 at 5:17 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Mon, Jan 06, 2020 at 12:29:05PM -0800, [email protected] wrote:
> > > From: Kan Liang <[email protected]>
> > >
> > > Icelake has support for measuring the level 1 TopDown metrics
> > > directly in hardware. This is implemented by an additional METRICS
> > > register, and a new Fixed Counter 3 that measures pipeline SLOTS.
> > >
> > > New in Icelake
> > > - Do not require generic counters. This allows to collect TopDown always
> > > in addition to other events.
> > > - Measuring TopDown per thread/process instead of only per core
> > >
> > > For the Ice Lake implementation of performance metrics, the values in
> > > PERF_METRICS MSR are derived from fixed counter 3. Software should start
> > > both registers, PERF_METRICS and fixed counter 3, from zero.
> > > Additionally, software is recommended to periodically clear both
> > > registers in order to maintain accurate measurements. The latter is
> > > required for certain scenarios that involve sampling metrics at high
> > > rates. Software should always write fixed counter 3 before write to
> > > PERF_METRICS.
> >
> > Do we really have to support this trainwreck? This is such ill designed
> > hardware, I'm loath to support it, it might encourage more such
> > 'creative' things and we really don't need that.
> >
> Yes, we do because it provides important information per hyper-thread.
>
> I understand that the hardware is convoluted to support because it
> introduces a new concept: a single counter computing multiple high
> level metrics. It is difficult to abstract cleanly especially when you
> add on top that it is connected with a new fixed counter (SLOTS).

It's not a new concept, it's just completely idiotic. It didn't need to
be this crazy. There is absolutely no sane reason for it to be this
crazy.

The 4 counters in a single msr thing is insane because it uses a
division.

Very much worse, it explicitly uses the exact value of another counter
(SLOTS) to drive that division, creating a tight coupling between the
registers and completely and utterly destroying the SLOTS counter.

Since it keeps internal 'shadow' counters for the 4 events anyway, it
might as well have kept a shadow counter for the SLOTS event and driven
it off of that, that would have kept the SLOTS counter sane, but nooo,
gotta wreck that.

> That also helps the kernel with a single WRMSR/RDMSR for all 4 metrics.

I also really don't buy that as a driver for all this insanity.
Optimizing MSRs to not be utterly stupid expensive would've been so much
saner and would've helped everyone.

This is just creating more wreckage.

What I really want to know is if future hardware is going to be as
stupid; or if there's going to be change. I really don't want to commit
to ABI here and then have to find out they fixed the hardware and we
can't do sane things anymore.

Obviously, future hardware is not something that is to be discussed, so
we're at a stand-still here.