2021-11-04 05:31:51

by [email protected]

[permalink] [raw]
Subject: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors

This patch series add hardware prefetch driver register/unregister
function. The purpose of this driver is to provide an interface to
control the hardware prefetch mechanism depending on the application
characteristics.

An earlier RFC[1], we were suggested that we create a hardware
prefetch directory under /sys/devices/system/cpu/[CPUNUM]/cache.
Hardware prefetch is a cache-related feature, but it does not require
cache sysfs feature. Therefore, we decided to isolate the code.
Specifically, create a directory under cpu/[CPUNUM].

[1]https://lore.kernel.org/lkml/OSBPR01MB2037D114B11153F00F233F8780389@OSBPR01MB2037.jpnprd01.prod.outlook.com/

Changes since v1:
- Add Intel hardware prefetch support
- Fix typo

This version adds Intel Hardware Prefetch support by Proposal A that
proposed in v1 RFC PATCH[2], and the proposal is also described in the
[RFC & Future plan] section of this letter.
This is the first step to supporting Intel processors, so we add
support only for INTEL_FAM6_BROADWELL_X.

[2]https://lore.kernel.org/lkml/[email protected]/

Patch organizations are as follows:

- patch1: Add hardware prefetch core driver
This adds register/unregister function to create the sysfs interface
with attribute "enable", "dist", and "strong". Detailed description
of these are in Documentation/ABI/testing/sysfs-devices-system-cpu.

- patch2: Add support for A64FX
This adds module init/exit code for A64FX.

- patch3: Add support for Intel

- patch4: Add Kconfig/Makefile to build module

- patch5: Add documentation for the new sysfs interface

We tested this driver and measured its performance by STREAM benchmark
on our x86 machine. The results are as follows:

| Hardware Prefetch status | Triad |
|--------------------------|------------|
| Enabled | 40300.4600 |
| Disabled | 31694.6333 |

The performance is better with Enabled, which is an expected result.
We also measured the performance on our A64FX machine and showed the
results in v1 RFC PATCH.

[RFC & Future plan]
We plan to support Intel processors that have MSR 0x1A4(1A4H)[3].
We would appreciate it if you could give us a comment on how we should
handle multiple hardware prefetch types in enable attribute file for
Intel processor. Detailed description will be described later.

[3]https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
Volume 4

There are some cases where MSR 0x1A4 has different specifications
depending on the model. One of the specification of MSR 0x1A4 for each
bits is as follows:

[0] L2 Hardware Prefetcher Disable (R/W)
[1] L2 Adjacent Cache Line Prefetcher Disable (R/W)
[2] DCU Hardware Prefetcher Disable (R/W)
[3] DCU IP Prefetcher Disable (R/W)
[63:4] Reserved

If it supports enabling two types of hardware prefetches for each
cache, as in the specification above, we should consider how to
handle them.

We would like to assign these features to an enable attribute file
(i.e. Map l1/enable to bit[2:3] and l2/enable to bit[0:1]), and
consider the two proposals:

A) The enable file handles only one bit, and changes affect the multiple
hardware prefetch types at a certain cache level.

B) The enable file handles one or more bit, and changes to a single bit
affect a corresponding single hardware prefetch type.

For each proposal, an example of the result of writing to the enable
file when all bits of the MSR 0x1A4 are 0 is shown below.

| Value to write | bit[0] | bit[1] | bit[2] | bit[3] |
|-------------------------|--------|--------|--------|--------|
| A) write 1 to l1/enable | 0 | 0 | 1 | 1 |
| A) write 1 to l2/enable | 1 | 1 | 0 | 0 |
| B) write 1 to l1/enable | 0 | 0 | 1 | 0 |
| B) write 2 to l1/enable | 0 | 0 | 0 | 1 |
| B) write 3 to l2/enable | 1 | 1 | 0 | 0 |

Proposal A is simple, it uniformly controls the enablement of the
hardware prefetch type at a certain cache level. In this case, it is
easy to provide the same interface as the A64FX. However, it cannot
allow the detailed tuning(e.g. Write 1 to only bit[1]).

Proposal B allows the same tuning as direct register access. However,
user needs to know the hardware specifications (e.g. Number of features
that can be enabled via register) to use interface.

We think proposal A is better for providing a standard interface, but it
is a concern that it cannot provide all the features of the register.
Do you have any comments on these proposals?

Best regards,
Kohei Tarumizu

Kohei Tarumizu (5):
driver: hwpf: Add hardware prefetch core driver register/unregister
functions
driver: hwpf: Add support for A64FX to hardware prefetch driver
driver: hwpf: Add support for Intel to hardware prefetch driver
driver: hwpf: Add Kconfig/Makefile to build hardware prefetch driver
docs: ABI: Add sysfs documentation interface of hardware prefetch
driver

.../ABI/testing/sysfs-devices-system-cpu | 58 +++
MAINTAINERS | 7 +
arch/arm64/Kconfig.platforms | 6 +
arch/x86/Kconfig | 12 +
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/hwpf/Kconfig | 24 +
drivers/hwpf/Makefile | 9 +
drivers/hwpf/fujitsu_hwpf.c | 460 ++++++++++++++++++
drivers/hwpf/hwpf.c | 452 +++++++++++++++++
drivers/hwpf/intel_hwpf.c | 219 +++++++++
include/linux/hwpf.h | 38 ++
12 files changed, 1288 insertions(+)
create mode 100644 drivers/hwpf/Kconfig
create mode 100644 drivers/hwpf/Makefile
create mode 100644 drivers/hwpf/fujitsu_hwpf.c
create mode 100644 drivers/hwpf/hwpf.c
create mode 100644 drivers/hwpf/intel_hwpf.c
create mode 100644 include/linux/hwpf.h

--
2.27.0


2021-11-04 05:32:10

by [email protected]

[permalink] [raw]
Subject: [RFC PATCH v2 4/5] driver: hwpf: Add Kconfig/Makefile to build hardware prefetch driver

This adds kconfig/Makefile to build hardware prefetch driver for
A64FX and Intel support. This also add MAINTAINERS entry.

Note that this is the first time to add A64FX specific driver,
this adds A64FX entry in Kconfig.platforms of arm64 Kconfig.

Signed-off-by: Kohei Tarumizu <[email protected]>
---
MAINTAINERS | 7 +++++++
arch/arm64/Kconfig.platforms | 6 ++++++
arch/x86/Kconfig | 12 ++++++++++++
drivers/Kconfig | 2 ++
drivers/Makefile | 1 +
drivers/hwpf/Kconfig | 24 ++++++++++++++++++++++++
drivers/hwpf/Makefile | 9 +++++++++
7 files changed, 61 insertions(+)
create mode 100644 drivers/hwpf/Kconfig
create mode 100644 drivers/hwpf/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index f26920f0f..29ad0e613 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1588,6 +1588,13 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
F: arch/arm/mach-*/
F: arch/arm/plat-*/

+HARDWARE PREFETCH DRIVERS
+M: Kohei Tarumizu <[email protected]>
+L: [email protected] (moderated for non-subscribers)
+S: Maintained
+F: drivers/hwpf/
+F: include/linux/hwpf.h
+
ARM/ACTIONS SEMI ARCHITECTURE
M: Andreas Färber <[email protected]>
M: Manivannan Sadhasivam <[email protected]>
diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
index b0ce18d4c..8ecbcd0b7 100644
--- a/arch/arm64/Kconfig.platforms
+++ b/arch/arm64/Kconfig.platforms
@@ -20,6 +20,12 @@ config ARCH_SUNXI
help
This enables support for Allwinner sunxi based SoCs like the A64.

+config ARCH_A64FX
+ bool "Fujitsu A64FX Platforms"
+ select ARCH_HAS_HARDWARE_PREFETCH
+ help
+ This enables support for Fujitsu A64FX SoC family.
+
config ARCH_ALPINE
bool "Annapurna Labs Alpine platform"
select ALPINE_MSI if PCI
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d9830e7e1..d60ec8eb7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1356,6 +1356,18 @@ config X86_CPUID
with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
/dev/cpu/31/cpuid.

+config INTEL_HARDWARE_PREFETCH
+ tristate "Intel Hardware Prefetch support"
+ select ARCH_HAS_HARDWARE_PREFETCH
+ select HARDWARE_PREFETCH
+ depends on X86_64
+ help
+ This option enables a Hardware Prefetch sysfs interface.
+ This requires an Intel processor that has MSR about Hardware Prefetch.
+
+ See Documentation/ABI/testing/sysfs-devices-system-cpu for more
+ information.
+
choice
prompt "High Memory Support"
default HIGHMEM4G
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 0d399ddaa..c46702569 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -236,4 +236,6 @@ source "drivers/interconnect/Kconfig"
source "drivers/counter/Kconfig"

source "drivers/most/Kconfig"
+
+source "drivers/hwpf/Kconfig"
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index be5d40ae1..8cb2e42f6 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -188,3 +188,4 @@ obj-$(CONFIG_GNSS) += gnss/
obj-$(CONFIG_INTERCONNECT) += interconnect/
obj-$(CONFIG_COUNTER) += counter/
obj-$(CONFIG_MOST) += most/
+obj-$(CONFIG_HARDWARE_PREFETCH) += hwpf/
diff --git a/drivers/hwpf/Kconfig b/drivers/hwpf/Kconfig
new file mode 100644
index 000000000..e011fa6e0
--- /dev/null
+++ b/drivers/hwpf/Kconfig
@@ -0,0 +1,24 @@
+config ARCH_HAS_HARDWARE_PREFETCH
+ bool
+
+menuconfig HARDWARE_PREFETCH
+ bool "Hardware Prefetch Control"
+ depends on ARCH_HAS_HARDWARE_PREFETCH
+ default y
+ help
+ Hardware Prefetch Control Driver
+
+ This driver allows you to control the Hardware Prefetch mechanism.
+ If the hardware supports the mechanism, it provides a sysfs interface
+ for changing the feature's enablement, prefetch distance and strongness.
+
+if HARDWARE_PREFETCH
+
+config A64FX_HARDWARE_PREFETCH
+ tristate "A64FX Hardware Prefetch support"
+ depends on ARCH_A64FX
+ default m
+ help
+ This adds Hardware Prefetch driver support for A64FX SOCs.
+
+endif
diff --git a/drivers/hwpf/Makefile b/drivers/hwpf/Makefile
new file mode 100644
index 000000000..6790eb2d2
--- /dev/null
+++ b/drivers/hwpf/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+# Hardware prefetch core driver
+obj-$(CONFIG_HARDWARE_PREFETCH) += hwpf.o
+
+# FUJITSU SoC driver
+obj-$(CONFIG_A64FX_HARDWARE_PREFETCH) += fujitsu_hwpf.o
+
+# Intel SoC driver
+obj-$(CONFIG_INTEL_HARDWARE_PREFETCH) += intel_hwpf.o
--
2.27.0

2021-11-04 05:33:32

by [email protected]

[permalink] [raw]
Subject: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver

This descrives the sysfs interface implemented on the hardware prefetch
driver.

Signed-off-by: Kohei Tarumizu <[email protected]>
---
.../ABI/testing/sysfs-devices-system-cpu | 58 +++++++++++++++++++
1 file changed, 58 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index b46ef1476..caeefd320 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -666,3 +666,61 @@ Description: Preferred MTE tag checking mode
================ ==============================================

See also: Documentation/arm64/memory-tagging-extension.rst
+
+What: /sys/devices/system/cpu/cpu*/hwpf/l*/enable
+ /sys/devices/system/cpu/cpu*/hwpf/l*/available_dist
+ /sys/devices/system/cpu/cpu*/hwpf/l*/dist
+ /sys/devices/system/cpu/cpu*/hwpf/l*/reliable
+Date: October 2021
+Contact: Kohei Tarumizu <[email protected]>
+ Linux kernel mailing list <[email protected]>
+Description: Parameters for the hardware prefetch driver
+
+ This sysfs interface provides Hardware Prefetch (HWPF) tunable
+ attribute files by using implementation defined registers.
+ These attribute files are corresponding to the cache level of
+ the parent directory.
+
+ enable:
+ Read/write interface to change hardware prefetch
+ enablement.
+ Read returns hardware prefetch enablement status:
+ 0: hardware prefetch is enabled
+ 1: hardware prefetch is disabled
+
+ Write '0' to enable Hardware Prefetch.
+ Write '1' to disable Hardware Prefetch.
+
+ available_dist:
+ Read only interface to get a list of values that can be
+ written to dist.
+
+ dist:
+ Read/write interface to specify the hardware prefetch
+ distance.
+ Read return the current hardware prefetch distance value
+ in bytes or the string "auto".
+
+ Write either a value in byte read from available_dist,
+ or the string "auto" to this attribuite. If you write
+ a value less than these, the value is rounded up.
+
+ The value 0 and the string "auto" are the same and have
+ a special meaning. This means that instead of setting
+ dist to a user-specified value, it operates using
+ hardware-specific values.
+
+ strong:
+ Read/write interface to change hardware prefetch
+ strongness.
+ Strong prefetch operation is surely executed, if there
+ is no corresponding data in cache.
+ Weak prefetch operation allows the hardware not to
+ execute operation depending on hardware state.
+
+ Read returns hardware prefetch strongness status:
+ 0: hardware prefetch is generated strong
+ 1: hardware prefetch is generated weak
+
+ Write '0' to hardware prefetch generate strong.
+ Write '1' to hardware prefetch generate weak.
--
2.27.0

2021-11-04 14:57:27

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver

On 11/3/21 10:21 PM, Kohei Tarumizu wrote:
> +What: /sys/devices/system/cpu/cpu*/hwpf/l*/enable
> + /sys/devices/system/cpu/cpu*/hwpf/l*/available_dist
> + /sys/devices/system/cpu/cpu*/hwpf/l*/dist
> + /sys/devices/system/cpu/cpu*/hwpf/l*/reliable

How does this look in practice?

# ls /sys/devices/system/cpu/cpu0/hwpf/
l0
l1
l2
...

?

Dumb question, but why don't we give these things names? If the Intel
one is called "L2 Hardware Prefetcher Disable", couldn't the directory
be "l2_prefetch"?

BTW, your "reliable" is mismatched with the "strong" value in the docs.

2021-11-04 15:15:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors

On Thu, Nov 04, 2021 at 02:21:17PM +0900, Kohei Tarumizu wrote:
> This patch series add hardware prefetch driver register/unregister
> function. The purpose of this driver is to provide an interface to
> control the hardware prefetch mechanism depending on the application
> characteristics.

This is all fine and dandy but what I'm missing in this pile of text -
at least I couldn't find it - is why do we need this in the upstream
kernel?

Is there some real-life use case that would benefit from software
fiddling with prefetchers or is this one of those, well, we have those
controls, lets expose them in the OS?

IOW, you need to sell this stuff properly first - then talk design.

> create mode 100644 drivers/hwpf/Kconfig
> create mode 100644 drivers/hwpf/Makefile
> create mode 100644 drivers/hwpf/fujitsu_hwpf.c
> create mode 100644 drivers/hwpf/hwpf.c
> create mode 100644 drivers/hwpf/intel_hwpf.c
> create mode 100644 include/linux/hwpf.h

I'm not sure about a wholly separate drivers/hwpf/ - it's not like there
are gazillion different hw prefetch drivers.

HTH.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-11-04 17:17:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors

On Thu, Nov 04, 2021 at 02:21:17PM +0900, Kohei Tarumizu wrote:
> This patch series add hardware prefetch driver register/unregister
> function. The purpose of this driver is to provide an interface to
> control the hardware prefetch mechanism depending on the application
> characteristics.

Here you talk about applications..

> An earlier RFC[1], we were suggested that we create a hardware
> prefetch directory under /sys/devices/system/cpu/[CPUNUM]/cache.
> Hardware prefetch is a cache-related feature, but it does not require
> cache sysfs feature. Therefore, we decided to isolate the code.
> Specifically, create a directory under cpu/[CPUNUM].

Here you talk about CPUs..

How does that work?

2021-11-08 08:23:47

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver

On 11/7/21 5:29 PM, [email protected] wrote:
>> How does this look in practice?
> It works on a x86 machine is shown below:
>
> # find /sys/devices/system/cpu/cpu0/hwpf/
> /sys/devices/system/cpu/cpu0/hwpf/
> /sys/devices/system/cpu/cpu0/hwpf/l2
> /sys/devices/system/cpu/cpu0/hwpf/l2/enable
> /sys/devices/system/cpu/cpu0/hwpf/l1
> /sys/devices/system/cpu/cpu0/hwpf/l1/enable
>
>> Dumb question, but why don't we give these things names? If the Intel one is
>> called "L2 Hardware Prefetcher Disable", couldn't the directory be "l2_prefetch"?
> There is no specific reason for directory names. We named it "l*"
> because it is related to a certain cache level. We would change it,
> if there is another suitable name.

Ahh, so you really do intend the l2 directory to be for *all* the L2
prefetchers? I guess that's OK, but will folks ever want to do "L2
Hardware Prefetcher Disable", but not "L2 Adjacent Cache Line Prefetcher
Disable"?

2021-11-08 08:30:44

by [email protected]

[permalink] [raw]
Subject: RE: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver

Hi,

Thanks for your comment.

> How does this look in practice?

It works on a x86 machine is shown below:

# find /sys/devices/system/cpu/cpu0/hwpf/
/sys/devices/system/cpu/cpu0/hwpf/
/sys/devices/system/cpu/cpu0/hwpf/l2
/sys/devices/system/cpu/cpu0/hwpf/l2/enable
/sys/devices/system/cpu/cpu0/hwpf/l1
/sys/devices/system/cpu/cpu0/hwpf/l1/enable

> Dumb question, but why don't we give these things names? If the Intel one is
> called "L2 Hardware Prefetcher Disable", couldn't the directory be "l2_prefetch"?

There is no specific reason for directory names. We named it "l*"
because it is related to a certain cache level. We would change it,
if there is another suitable name.

> BTW, your "reliable" is mismatched with the "strong" value in the docs.

Sorry, that's our mistake. The "strong" is correct.

2021-11-08 08:52:51

by [email protected]

[permalink] [raw]
Subject: RE: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors

Hi,

Thanks for your comment.

> Here you talk about applications..

> Here you talk about CPUs..
>
> How does that work?

Does your question mean how users tune their applications? We intend
to use it as follows:
1.) User tunes the hardware prefetch parameters of a particular CPUs
via sysfs interface.
2.) Execute the application bound to the specific CPUs set above.

2021-11-08 08:52:51

by [email protected]

[permalink] [raw]
Subject: RE: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors

Hi,

Thanks for your comment.

> This is all fine and dandy but what I'm missing in this pile of text - at least I couldn't
> find it - is why do we need this in the upstream kernel?
>
> Is there some real-life use case that would benefit from software fiddling with
> prefetchers or is this one of those, well, we have those controls, lets expose them
> in the OS?
>
> IOW, you need to sell this stuff properly first - then talk design.

A64FX and some Intel processors has implementation-dependent register
for controlling hardware prefetch. Intel has MSR_MISC_FEATURE_CONTROL,
and A64FX has IMP_PF_STREAM_DETECT_CTRL_EL0. These register cannot be
accessed from userspace, so we provide a proper kernel interface.

The advantage of using this interface from userspace is that we can
expect performance improvements.

The following performance improvements have been reported for some
Intel processors.
https://github.com/xmrig/xmrig/issues/1433#issuecomment-572126184

A64FX also has several applications that have actually been improved
performance. In most of these cases, we are tuning the parameter of
hardware prefetch distance. One of them is the Stream benchmark.

For reference, here is the result of STREAM Triad when tuning with
the dist attribute file in L1 and L2 cache on A64FX.

| dist combination | Pattern A | Pattern B |
|-------------------|-------------|-------------|
| L1:256, L2:1024 | 234505.2144 | 114600.0801 |
| L1:1536, L2:1024 | 279172.8742 | 118979.4542 |
| L1:256, L2:10240 | 247716.7757 | 127364.1533 |
| L1:1536, L2:10240 | 283675.6625 | 125950.6847 |

In pattern A, we set the size of the array to 174720, which is about
half the size of the L1d cache. In pattern B, we set the size of the
array to 10485120, which is about twice the size of the L2 cache.

In pattern A, a change of dist at L1 has a larger effect. On the other
hand, in pattern B, the change of dist at L2 has a larger effect.
As described above, the optimal dist combination depends on the
characteristics of the application. Therefore, such a sysfs interface
is useful for performance tuning.

For these reasons, we would like to add this interface to the
upstream kernel.

> I'm not sure about a wholly separate drivers/hwpf/ - it's not like there are
> gazillion different hw prefetch drivers.

We created a new directory to lump multiple separate files into one
place. We don't think this is a good way. If there is any other
suitable way, we would like to change it.

2021-11-09 18:44:48

by [email protected]

[permalink] [raw]
Subject: RE: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver

> Ahh, so you really do intend the l2 directory to be for *all* the L2
> prefetchers?

Yes, we intend to create the l2 directory for *all* the L2 prefetchers
(i.e. "L2 Hardware Prefetcher Disable" and "L2 Adjacent Cache Line
Prefetcher Disable).

> I guess that's OK, but will folks ever want to do "L2
> Hardware Prefetcher Disable", but not "L2 Adjacent Cache Line Prefetcher
> Disable"?

There are people who actually tested the performance improvement[1].

[1]https://github.com/xmrig/xmrig/issues/1433#issuecomment-572126184

In this report, write 5 to MSR 0x1a4 (i.e. "L2 Hardware Prefetcher
Disable", but not "L2 Adjacent Cache Line Prefetcher Disable")
on i7-5930K for best performance. If such tuning is possible, it may
be useful for some people.

We describe how to deal these parameters in our sysfs interface at
"[RFC & Future plan]" section in the cover letter(0/5), but we can't
come up with any good ideas.

We thought that the sysfs interface should be generic and common,
and avoid showing architecture-dependent specifications.

We have considered the Proposal B that multiple hardware prefetch
types in one enable attribute file at above section. However, in
order to use it, we have to know the register specification, so we
think it is not appropriate.

Do you have any idea how to represent architecture-dependent
specifications in sysfs interface?

2021-11-10 00:10:19

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver

On 11/9/21 1:41 AM, [email protected] wrote:
>> I guess that's OK, but will folks ever want to do "L2
>> Hardware Prefetcher Disable", but not "L2 Adjacent Cache Line Prefetcher
>> Disable"?
> There are people who actually tested the performance improvement[1].
>
> [1]https://github.com/xmrig/xmrig/issues/1433#issuecomment-572126184
>
> In this report, write 5 to MSR 0x1a4 (i.e. "L2 Hardware Prefetcher
> Disable", but not "L2 Adjacent Cache Line Prefetcher Disable")
> on i7-5930K for best performance. If such tuning is possible, it may
> be useful for some people.
>
> We describe how to deal these parameters in our sysfs interface at
> "[RFC & Future plan]" section in the cover letter(0/5), but we can't
> come up with any good ideas.
>
> We thought that the sysfs interface should be generic and common,
> and avoid showing architecture-dependent specifications.
>
> We have considered the Proposal B that multiple hardware prefetch
> types in one enable attribute file at above section. However, in
> order to use it, we have to know the register specification, so we
> think it is not appropriate.
>
> Do you have any idea how to represent architecture-dependent
> specifications in sysfs interface?

First, I'd give them real names.

Second, I'd link them to the level or levels of the cache that they effect.

Third, I'd make sure that it is clear what caches it affects.

We have a representation of the caches in:

/sys/devices/system/cpu/cpu*/cache

It would be a shame to ignore those.

2021-11-10 08:36:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors

On Mon, Nov 08, 2021 at 02:17:43AM +0000, [email protected] wrote:
> The following performance improvements have been reported for some
> Intel processors.
> https://github.com/xmrig/xmrig/issues/1433#issuecomment-572126184

Yes, I know about that use case.

> For these reasons, we would like to add this interface to the
> upstream kernel.

So put all those justifications at the beginning of your 0th message
when you send a patchset so that it is clear to reviewers *why* you're
doing this. The "why" is the most important - everything else comes
after.

> > I'm not sure about a wholly separate drivers/hwpf/ - it's not like there are
> > gazillion different hw prefetch drivers.
>
> We created a new directory to lump multiple separate files into one
> place. We don't think this is a good way. If there is any other
> suitable way, we would like to change it.

Well, how many prefetcher drivers will be there?

On x86 there will be one per vendor, so 2-3 the most...

Also, as dhansen points out, we have already

/sys/devices/system/cpu/cpu*/cache

so all those knobs belong there on x86.

Also, I think that shoehorning all these different cache architectures
and different prefetcher knobs which are available from each CPU, into a
common sysfs hierarchy is going to cause a lot of ugly ifdeffery if not
done right.

Some caches will have control A while others won't - they will have
control B so people will wonder why control A works on box B_a but not
on box B_b...

So we have to be very careful what we expose to userspace because it
becomes an ABI which we have to support for an indefinite time.

Also, if you're going to give the xmrig example, then we should involve
the xmrig people and ask them whether the stuff you're exposing to
userspace is good for their use case.

And so on and so on...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-11-10 09:35:33

by [email protected]

[permalink] [raw]
Subject: RE: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver

> First, I'd give them real names.
>
> Second, I'd link them to the level or levels of the cache that they effect.
>
> Third, I'd make sure that it is clear what caches it affects.
>
> We have a representation of the caches in:
>
> /sys/devices/system/cpu/cpu*/cache
>
> It would be a shame to ignore those.

Thank you for your advice.
I try to reconsider the design.

2021-11-18 06:22:23

by [email protected]

[permalink] [raw]
Subject: RE: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors

I'm sorry for the late reply.

> So put all those justifications at the beginning of your 0th message
> when you send a patchset so that it is clear to reviewers *why* you're
> doing this. The "why" is the most important - everything else comes
> after.

I understand. The next time we send a patchset, put these
justifications at the beginning of our 0th message.

> Well, how many prefetcher drivers will be there?
>
> On x86 there will be one per vendor, so 2-3 the most…

Currentry, we plan to support only two drivers for Intel and A64FX.
Even if we support other vendors, it will probably increase only a
little.

>> We don't think this is a good way. If there is any other suitable
>> way, we would like to change it.

This means that our way is not good. Therefore, we would like to
reconsider the file structure along with changes in the interface
specification.

> Also, as dhansen points out, we have already
>
> /sys/devices/system/cpu/cpu*/cache
>
> so all those knobs belong there on x86.

Intel MSR and A64FX have hardware prefetcher that affect L1d cache and
L2 cache. Does it suit your intention to create a prefetcher directory
under the cache directory as below?

/sys/devices/system/cpu/cpu*/cache/
index0/prefetcher/enable
index2/prefetcher/enable

The above example presumes that the L1d cache is at index0 (level: 1,
type: Data) and the L2 cache is at index2 (level:2, type: Unified).

> Also, I think that shoehorning all these different cache architectures
> and different prefetcher knobs which are available from each CPU, into a
> common sysfs hierarchy is going to cause a lot of ugly ifdeffery if not
> done right.
>
> Some caches will have control A while others won't - they will have
> control B so people will wonder why control A works on box B_a but not
> on box B_b...
>
> So we have to be very careful what we expose to userspace because it
> becomes an ABI which we have to support for an indefinite time.

To avoid shoehorning different prefetchers in a common sysfs hierarchy,
we would like to represent these to different hierarchy.

Intel MSR has three type of prefetchers, and we represent "Hardware
Prefethcer" as "hwpf", "Adjacent Cache Line Prefetcher" as "aclpf",
and "IP Prefetcher" as "ippf". These prefetcher have one controllable
parameter "disable".

A64FX has one type of prefetcher, and we represent it as "hwpf". This
prefetcher has three parameter "disable", "dist" and "strong".

The following table shows which caches are affected by the combination
of prefetcher and parameter.

| Cache affected | Combination ([prefecher]/[parameter]) |
|----------------|---------------------------------------|
| Intel MSR L1d | hwpf/disable, ippf/disable |
| Intel MSR L2 | hwpf/disable, aclpf/disable |
| A64FX L1d | hwpf/disable, hwpf/dist, hwpf/strong |
| A64FX L2 | hwpf/disable, hwpf/dist, hwpf/strong |

Does it make sense to create sysfs directories as below?

* For Intel MSR
/.../index0/prefetcher/hwpf/enable
/.../index0/prefetcher/ippf/enable
/.../index2/prefetcher/hwpf/enable
/.../index2/prefetcher/aclpf/enable

* For A64FX
/.../index[0,2]/prefetcher/hwpf/enable
/.../index[0,2]/prefetcher/hwpf/dist
/.../index[0,2]/prefetcher/hwpf/strong

> Also, if you're going to give the xmrig example, then we should involve
> the xmrig people and ask them whether the stuff you're exposing to
> userspace is good for their use case.

We would like to ask them when the interface specification is fixed to
some extent.

2021-11-18 07:10:58

by [email protected]

[permalink] [raw]
Subject: RE: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors

> Does it make sense to create sysfs directories as below?
>
> * For Intel MSR
> /.../index0/prefetcher/hwpf/enable
> /.../index0/prefetcher/ippf/enable
> /.../index2/prefetcher/hwpf/enable
> /.../index2/prefetcher/aclpf/enable
>
> * For A64FX
> /.../index[0,2]/prefetcher/hwpf/enable
> /.../index[0,2]/prefetcher/hwpf/dist
> /.../index[0,2]/prefetcher/hwpf/strong

There was a mistake in the attribute file name. The following is
correct.

* For Intel MSR
/.../index0/prefetcher/hwpf/disable
/.../index0/prefetcher/ippf/disable
/.../index2/prefetcher/hwpf/disable
/.../index2/prefetcher/aclpf/disable

* For A64FX
/.../index[0,2]/prefetcher/hwpf/disable
/.../index[0,2]/prefetcher/hwpf/dist
/.../index[0,2]/prefetcher/hwpf/strong

2021-12-06 09:30:22

by [email protected]

[permalink] [raw]
Subject: RE: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors

>> Also, as dhansen points out, we have already
>>
>> /sys/devices/system/cpu/cpu*/cache
>>
>> so all those knobs belong there on x86.
>
> Intel MSR and A64FX have hardware prefetcher that affect L1d cache and
> L2 cache. Does it suit your intention to create a prefetcher directory
> under the cache directory as below?
>
> /sys/devices/system/cpu/cpu*/cache/
> index0/prefetcher/enable
> index2/prefetcher/enable
>
> The above example presumes that the L1d cache is at index0 (level: 1,
> type: Data) and the L2 cache is at index2 (level:2, type: Unified).

Any comment or suggestion would be much appreciated. In particular,
is our using cache/index directory above match your intent?