2022-05-19 12:56:19

by Hector Martin

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Add hardware prefetch control driver for A64FX and x86

On 18/05/2022 15.30, Kohei Tarumizu wrote:
> This patch series add sysfs interface to control CPU's hardware
> prefetch behavior for performance tuning from userspace for the
> processor A64FX and x86 (on supported CPU).
>

[snip]

> In pattern A, a change of dist at L1 has a larger effect. On the other
> hand, in pattern B, the change of dist at L2 has a larger effect.
> As described above, the optimal dist combination depends on the
> characteristics of the application. Therefore, such a sysfs interface
> is useful for performance tuning.

If this is something to be tuned for specific applications, shouldn't it
be a prctl or similar and part of process context, so different
applications can use different settings (or even a single application
depending on what it's doing)? Especially if writing those sysregs/MSRs
is cheap.

In particular, configuring things separately for different cores feels
strange. You'd then have to pin applications to specific cores to get
the benefits, and wouldn't be able to optimize for multiple applications
running simultaneously that need different prefetch behavior if they
share cores.

--
Hector Martin ([email protected])
Public Key: https://mrcn.st/pub


2022-05-21 03:30:29

by [email protected]

[permalink] [raw]
Subject: RE: [PATCH v4 0/8] Add hardware prefetch control driver for A64FX and x86

Thanks for the comment.

> If this is something to be tuned for specific applications, shouldn't it be a prctl or
> similar and part of process context, so different applications can use different
> settings (or even a single application depending on what it's doing)? Especially if
> writing those sysregs/MSRs is cheap.

> In particular, configuring things separately for different cores feels strange. You'd
> then have to pin applications to specific cores to get the benefits, and wouldn't be
> able to optimize for multiple applications running simultaneously that need
> different prefetch behavior if they share cores.

As you say, this is used for tuning specific applications.

I assume that users using this feature bind an application to a specific
core and use it exclusively. This is not only for pfctl, but also to
prevent performance from being affected by context switches, etc.

I agree that it is also useful to be able to control in the process
context. However, in this case, I think that it is sufficient if it can
be provided as a userspace interface which expresses the hardware
prefetch register directly, assuming the above usage.