2022-04-21 22:48:37

by [email protected]

[permalink] [raw]
Subject: [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86

This patch series add sysfs interface to control CPU's hardware
prefetch behavior for performance tuning from userspace for the
processor A64FX and x86 (on supported CPU).

Changes from v2:
- move arm64 driver (arch/arm64) to A64FX only (drivers/soc/fujitsu)
- prohibit writing 0 to stream_detect_prefetcher_dist
- change the type of strongness state handled from bool to string
(e.g. "strong"), and rename to stream_detect_prefetcher_strength
- change x86 code to work correctly with resctrl's pseudo lock
- read and write registers in one smp_call_function_single() to
prevent context switch when writing registers in x86-pfctl.c
- restore to original value when re-enabling the register in
pseudo_lock.c
- add prefix to driver's name for A64FX(a64fx-) and x86(x86-)
- modify the document
- split the description of pfctl into blocks for x86 and A64FX
- remove unnecessary descriptions
https://lore.kernel.org/lkml/[email protected]/

[Background]
============
A64FX and some Intel processors have implementation-dependent register
for controlling CPU's hardware prefetch behavior. A64FX has
IMP_PF_STREAM_DETECT_CTRL_EL0[1], and Intel processors have MSR 0x1a4
(MSR_MISC_FEATURE_CONTROL)[2]. These registers cannot be accessed from
userspace.

[1]https://github.com/fujitsu/A64FX/tree/master/doc/
A64FX_Specification_HPC_Extension_v1_EN.pdf

[2]https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
Volume 4

The advantage of using this is improved performance. As an example of
performance improvements, the results of running the Stream benchmark
on the A64FX are described in section [Merit].

For MSR 0x1a4, it is also possible to change the value from userspace
via the MSR driver. However, using MSR driver is not recommended, so
it needs a proper kernel interface[3].

[3]https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about/

For these reasons, we provide a new proper kernel interface to control
both IMP_PF_STREAM_DETECT_CTRL_EL0 and MSR 0x1a4.

[Overall design]
================
The source code for this driver is divided into common parts
(driver/base/pfctl.c) and architecture parts (arch/XXX/XXX/pfctl.c).
Common parts is described architecture-independent processing, such as
creating sysfs.
Architecture parts is described architecture-dependent processing. It
must contain at least the what type of hardware prefetcher is supported
and how to read/write to the register. These information are set
through registration function in common parts.

This driver creates "prefetch_control" directory and some attribute
files in every CPU's cache/index[0,2] directory, if CPU supports
hardware prefetch control behavior. Each attribute file corresponds to
the cache level of the parent index directory.

Detailed description of this sysfs interface is in
Documentation/ABI/testing/sysfs-devices-system-cpu (patch8).

This driver needs cache sysfs directory and cache level/type
information. In ARM processor, these information can be obtained
from registers even without ACPI PPTT.
We add processing to create a cache/index directory using only the
information from the register if the machine does not support ACPI
PPTT and Kconfig for hardware prefetch control (CONFIG_HWPF_CONTROL)
is true in patch5.
This action caused a problem and is described in [Known problem].

[Examples]
==========
This section provides an example of using this sysfs interface at the
x86's model of INTEL_FAM6_BROADWELL_X.

This model has the following register specifications:

[0] L2 Hardware Prefetcher Disable (R/W)
[1] L2 Adjacent Cache Line Prefetcher Disable (R/W)
[2] DCU Hardware Prefetcher Disable (R/W)
[3] DCU IP Prefetcher Disable (R/W)
[63:4] Reserved

In this case, index0 (L1d cache) corresponds to bit[2,3] and index2
(L2 cache) corresponds to bit [0,1]. A list of attribute files of
index0 and index2 in CPU1 at BROADWELL_X is following:

```
# ls /sys/devices/system/cpu/cpu1/cache/index0/prefetch_control/

hardware_prefetcher_enable
ip_prefetcher_enable

# ls /sys/devices/system/cpu/cpu1/cache/index2/prefetch_control/

adjacent_cache_line_prefetcher_enable
hardware_prefetcher_enable
```

If user would like to disable the setting of "L2 Adjacent Cache Line
Prefetcher Disable (R/W)" in CPU1, do the following:

```
# echo 0 > /sys/devices/system/cpu/cpu1/cache/index2/prefetch_control/adjacent_cache_line_prefetcher_enable
```

In another example, a list of index0 at A64FX is following:

```
# ls /sys/devices/system/cpu/cpu1/cache/index0/prefetch_control/

stream_detect_prefetcher_dist
stream_detect_prefetcher_enable
stream_detect_prefetcher_strength
stream_detect_prefetcher_strength_available
```

[Patch organizations]
=====================
This patch series add hardware prefetch control core driver for A64FX
and x86. Also, we add support for A64FX and BROADWELL_X at x86.

- patch1: Add hardware prefetch core driver

This driver provides a register/unregister function to create the
"prefetch_control" directory and some attribute files in every CPU's
cache/index[0,2] directory.
If the architecture has control of the CPU's hardware prefetch
behavior, use this function to create sysfs. When registering, it
is necessary to provide what type of Hardware Prefetcher is
supported and how to read/write to the register.

- patch2: Add Kconfig/Makefile to build hardware prefetch control core
driver

- patch3: Add support for A64FX

This adds module init/exit code, and creates sysfs attribute file
"stream_detect_prefetcher_enable", "stream_detect_prefetcher_strong"
and "stream_detect_prefetcher_dist" for A64FX. This driver works only
if part number is FUJITSU_CPU_PART_A64FX at this point.

- patch4: Add Kconfig/Makefile to build driver for A64FX

- patch5: Create cache sysfs directory without ACPI PPTT for hardware
prefetch control

Hardware Prefetch control driver needs cache sysfs directory and cache
level/type information. In ARM processor, these information can be
obtained from registers(CLIDR_EL1) even without PPTT. Therefore, we
set the cpu_map_populated to true to create cache sysfs directory, if
the machine doesn't have PPTT.

- patch6: Fix to restore to original value when re-enabling hardware
prefetch register in pseudo_lock.c

The current pseudo_lock.c code overwrittes the value of the
MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0.
Therefore, modify it to save and restore the original values.

- patch7: Add support for x86

This adds module init/exit code, and creates sysfs attribute file
"hardware_prefetcher_enable", "ip_prefetcher_enable" and
"adjacent_cache_line_prefetcher_enable" for x86. This driver works
only if the model is INTEL_FAM6_BROADWELL_X at this point.

- patch8: Add Kconfig/Makefile to build driver for x86

- patch9: Add documentation for the new sysfs interface

[Known problem]
===============
- `lscpu` command terminates with -ENOENT because cache/index directory
is exists but shared_cpu_map file does not exist. This is due to
patch5, which creates a cache/index directory containing only level
and type without ACPI PPTT.

[Merit]
=======
For reference, here is the result of STREAM Triad when tuning with
the "s file in L1 and L2 cache on A64FX.

| dist combination | Pattern A | Pattern B |
|-------------------|-------------|-------------|
| L1:256, L2:1024 | 234505.2144 | 114600.0801 |
| L1:1536, L2:1024 | 279172.8742 | 118979.4542 |
| L1:256, L2:10240 | 247716.7757 | 127364.1533 |
| L1:1536, L2:10240 | 283675.6625 | 125950.6847 |

In pattern A, we set the size of the array to 174720, which is about
half the size of the L1d cache. In pattern B, we set the size of the
array to 10485120, which is about twice the size of the L2 cache.

In pattern A, a change of dist at L1 has a larger effect. On the other
hand, in pattern B, the change of dist at L2 has a larger effect.
As described above, the optimal dist combination depends on the
characteristics of the application. Therefore, such a sysfs interface
is useful for performance tuning.

Best regards,
Kohei Tarumizu

Kohei Tarumizu (9):
drivers: base: Add hardware prefetch control core driver
drivers: base: Add Kconfig/Makefile to build hardware prefetch control
core driver
soc: fujitsu: Add hardware prefetch control support for A64FX
soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control
driver
arm64: Create cache sysfs directory without ACPI PPTT for hardware
prefetch control
x86: resctrl: pseudo_lock: Fix to restore to original value when
re-enabling hardware prefetch register
x86: Add hardware prefetch control support for x86
x86: Add Kconfig/Makefile to build hardware prefetch control driver
docs: ABI: Add sysfs documentation interface of hardware prefetch
control driver

.../ABI/testing/sysfs-devices-system-cpu | 98 ++++
MAINTAINERS | 8 +
arch/arm64/kernel/cacheinfo.c | 29 ++
arch/x86/Kconfig | 6 +
arch/x86/kernel/cpu/Makefile | 2 +
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 12 +-
arch/x86/kernel/cpu/x86-pfctl.c | 347 +++++++++++++
drivers/base/Kconfig | 9 +
drivers/base/Makefile | 1 +
drivers/base/pfctl.c | 458 ++++++++++++++++++
drivers/soc/Kconfig | 1 +
drivers/soc/Makefile | 1 +
drivers/soc/fujitsu/Kconfig | 11 +
drivers/soc/fujitsu/Makefile | 2 +
drivers/soc/fujitsu/a64fx-pfctl.c | 356 ++++++++++++++
include/linux/pfctl.h | 49 ++
16 files changed, 1387 insertions(+), 3 deletions(-)
create mode 100644 arch/x86/kernel/cpu/x86-pfctl.c
create mode 100644 drivers/base/pfctl.c
create mode 100644 drivers/soc/fujitsu/Kconfig
create mode 100644 drivers/soc/fujitsu/Makefile
create mode 100644 drivers/soc/fujitsu/a64fx-pfctl.c
create mode 100644 include/linux/pfctl.h

--
2.27.0


2022-04-21 23:51:04

by [email protected]

[permalink] [raw]
Subject: [PATCH v3 4/9] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver

This adds Kconfig/Makefile to build hardware prefetch control driver
for A64FX support. This also adds a MAINTAINERS entry.

Signed-off-by: Kohei Tarumizu <[email protected]>
---
MAINTAINERS | 1 +
drivers/soc/Kconfig | 1 +
drivers/soc/Makefile | 1 +
drivers/soc/fujitsu/Kconfig | 11 +++++++++++
drivers/soc/fujitsu/Makefile | 2 ++
5 files changed, 16 insertions(+)
create mode 100644 drivers/soc/fujitsu/Kconfig
create mode 100644 drivers/soc/fujitsu/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index f6640dc053c0..b359dcc38be3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8619,6 +8619,7 @@ HARDWARE PREFETCH CONTROL DRIVERS
M: Kohei Tarumizu <[email protected]>
S: Maintained
F: drivers/base/pfctl.c
+F: drivers/soc/fujitsu/a64fx-pfctl.c
F: include/linux/pfctl.h

HARDWARE RANDOM NUMBER GENERATOR CORE
diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
index c5aae42673d3..d87754799d90 100644
--- a/drivers/soc/Kconfig
+++ b/drivers/soc/Kconfig
@@ -9,6 +9,7 @@ source "drivers/soc/atmel/Kconfig"
source "drivers/soc/bcm/Kconfig"
source "drivers/soc/canaan/Kconfig"
source "drivers/soc/fsl/Kconfig"
+source "drivers/soc/fujitsu/Kconfig"
source "drivers/soc/imx/Kconfig"
source "drivers/soc/ixp4xx/Kconfig"
source "drivers/soc/litex/Kconfig"
diff --git a/drivers/soc/Makefile b/drivers/soc/Makefile
index 904eec2a7871..6c8ff1792cda 100644
--- a/drivers/soc/Makefile
+++ b/drivers/soc/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_SOC_CANAAN) += canaan/
obj-$(CONFIG_ARCH_DOVE) += dove/
obj-$(CONFIG_MACH_DOVE) += dove/
obj-y += fsl/
+obj-y += fujitsu/
obj-$(CONFIG_ARCH_GEMINI) += gemini/
obj-y += imx/
obj-y += ixp4xx/
diff --git a/drivers/soc/fujitsu/Kconfig b/drivers/soc/fujitsu/Kconfig
new file mode 100644
index 000000000000..d9db05d5055d
--- /dev/null
+++ b/drivers/soc/fujitsu/Kconfig
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+menu "Fujitsu SoC drivers"
+
+config A64FX_HWPF_CONTROL
+ tristate "A64FX Hardware Prefetch Control driver"
+ depends on ARM64 || HWPF_CONTROL
+ help
+ This adds Hardware Prefetch driver control support for A64FX.
+
+endmenu
diff --git a/drivers/soc/fujitsu/Makefile b/drivers/soc/fujitsu/Makefile
new file mode 100644
index 000000000000..35e284a548bb
--- /dev/null
+++ b/drivers/soc/fujitsu/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_A64FX_HWPF_CONTROL) += a64fx-pfctl.o
--
2.27.0

2022-04-22 06:02:31

by [email protected]

[permalink] [raw]
Subject: [PATCH v3 9/9] docs: ABI: Add sysfs documentation interface of hardware prefetch control driver

This describes the sysfs interface implemented by the hardware prefetch
control driver.

Signed-off-by: Kohei Tarumizu <[email protected]>
---
.../ABI/testing/sysfs-devices-system-cpu | 98 +++++++++++++++++++
1 file changed, 98 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 2ad01cad7f1c..0da4c1bac51e 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -688,3 +688,101 @@ Description:
(RO) the list of CPUs that are isolated and don't
participate in load balancing. These CPUs are set by
boot parameter "isolcpus=".
+
+What: /sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/hardware_prefetcher_enable
+ /sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/ip_prefetcher_enable
+ /sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/adjacent_cache_line_prefetcher_enable
+Date: March 2022
+Contact: Linux kernel mailing list <[email protected]>
+Description: Parameters for some Intel CPU's hardware prefetch control
+
+ This sysfs interface provides Hardware Prefetch control
+ attribute for some Intel processors. Attributes are only
+ present if the particular cache implements the relevant
+ prefetcher controls.
+
+ *_prefetcher_enable:
+ (RW) control this prefetcher's enablement state.
+ Read returns current status:
+ 0: this prefetcher is disabled
+ 1: this prefetcher is enabled
+
+ - Attribute mapping
+
+ Some Intel processors have MSR 0x1a4. This register has several
+ specifications depending on the model. This interface provides
+ a one-to-one attribute file to control all the tunable
+ parameters the CPU provides of the following.
+
+ - "* Hardware Prefetcher Disable (R/W)"
+ corresponds to the "hardware_prefetcher_enable"
+
+ - "* Adjacent Cache Line Prefetcher Disable (R/W)"
+ corresponds to the "adjacent_cache_line_prefetcher_enable"
+
+ - "* IP Prefetcher Disable (R/W)"
+ corresponds to the "ip_prefetcher_enable"
+
+What: /sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_enable
+ /sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_strength
+ /sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_strength_available
+ /sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_dist
+Date: March 2022
+Contact: Linux kernel mailing list <[email protected]>
+Description: Parameters for A64FX's hardware prefetch control
+
+ This sysfs interface provides Hardware Prefetch control
+ attribute for the processor A64FX. Attributes are only
+ present if the particular cache implements the relevant
+ prefetcher controls.
+
+ stream_detect_prefetcher_enable:
+ (RW) control the prefetcher's enablement state.
+ Read returns current status:
+ 0: this prefetcher is disabled
+ 1: this prefetcher is enabled
+
+ stream_detect_prefetcher_strength:
+ (RW) control the prefetcher operation's strongness state.
+ Read returns current status:
+ weak: prefetch operation is weak
+ strong: prefetch operation is strong
+
+ Strong prefetch operation is surely executed, if there is
+ no corresponding data in cache.
+ Weak prefetch operation allows the hardware not to execute
+ operation depending on hardware state.
+
+
+ stream_detect_prefetcher_strength_available:
+ (RO) displays a space separated list of available strongness
+ state.
+
+ stream_detect_prefetcher_dist:
+ (RW) control the prefetcher distance value.
+ Read return current prefetcher distance value in bytes
+ or the string "auto".
+
+ Write either a value in byte or the string "auto" to this
+ parameter. If you write a value less than multiples of a
+ specific value, it is rounded up.
+
+ The string "auto" have a special meaning. This means that
+ instead of setting dist to a user-specified value, it
+ operates using hardware-specific values.
+
+ - Attribute mapping
+
+ The processor A64FX has register IMP_PF_STREAM_DETECT_CTRL_EL0
+ for Hardware Prefetch Control. This attribute maps each
+ specification to the following.
+
+ - "L*PF_DIS": enablement of hardware prefetcher
+ corresponds to the "stream_detect_prefetcher_enable"
+
+ - "L*W": strongness of hardware prefetcher
+ corresponds to "stream_detect_prefetcher_strength"
+ and "stream_detect_prefetcher_strength_available"
+
+ - "L*_DIST": distance of hardware prefetcher
+ corresponds to the "stream_detect_prefetcher_dist"
--
2.27.0

2022-04-22 12:41:32

by [email protected]

[permalink] [raw]
Subject: [PATCH v3 7/9] x86: Add hardware prefetch control support for x86

This adds module init/exit code, and creates sysfs attribute file
"hardware_prefetcher_enable", "ip_prefetcher_enable" and
"adjacent_cache_line_prefetcher_enable" for x86. This driver works
only if the model is INTEL_FAM6_BROADWELL_X at this point.

If you would like to support a new model with the same register
specifications as INTEL_FAM6_BROADWELL_X, it is possible to add the
model settings to array of broadwell_cpu_ids[].

The details of the registers to be read and written in this patch are
described below:

"https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html"
Volume 4

Signed-off-by: Kohei Tarumizu <[email protected]>
---
arch/x86/kernel/cpu/x86-pfctl.c | 347 ++++++++++++++++++++++++++++++++
1 file changed, 347 insertions(+)
create mode 100644 arch/x86/kernel/cpu/x86-pfctl.c

diff --git a/arch/x86/kernel/cpu/x86-pfctl.c b/arch/x86/kernel/cpu/x86-pfctl.c
new file mode 100644
index 000000000000..153b7a46ba80
--- /dev/null
+++ b/arch/x86/kernel/cpu/x86-pfctl.c
@@ -0,0 +1,347 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022 FUJITSU LIMITED
+ *
+ * x86 Hardware Prefetch Control support
+ */
+
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/pfctl.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <asm/cpu_device_id.h>
+#include <asm/intel-family.h>
+#include <asm/msr.h>
+
+struct pfctl_driver x86_pfctl_driver;
+
+/**************************************
+ * Intle BROADWELL support
+ **************************************/
+
+/*
+ * The register specification for each bits of Intel BROADWELL is as
+ * follow:
+ *
+ * [0] L2 Hardware Prefetcher Disable (R/W)
+ * [1] L2 Adjacent Cache Line Prefetcher Disable (R/W)
+ * [2] DCU Hardware Prefetcher Disable (R/W)
+ * [3] DCU IP Prefetcher Disable (R/W)
+ * [63:4] Reserved
+ *
+ * See "Intel 64 and IA-32 Architectures Software Developer's Manual"
+ * (https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html)
+ * for register specification details.
+ */
+#define BROADWELL_L2_HWPF_FIELD BIT_ULL(0)
+#define BROADWELL_L2_ACLPF_FIELD BIT_ULL(1)
+#define BROADWELL_DCU_HWPF_FIELD BIT_ULL(2)
+#define BROADWELL_DCU_IPPF_FIELD BIT_ULL(3)
+
+struct broadwell_read_info {
+ enum pfctl_attr pattr;
+ u64 val;
+ unsigned int level;
+ int ret;
+};
+
+struct broadwell_write_info {
+ enum pfctl_attr pattr;
+ u64 val;
+ unsigned int level;
+ int ret;
+};
+
+static int broadwell_get_hwpf_enable(u64 reg, unsigned int level)
+{
+ u64 val;
+
+ switch (level) {
+ case 1:
+ val = FIELD_GET(BROADWELL_DCU_HWPF_FIELD, reg);
+ break;
+ case 2:
+ val = FIELD_GET(BROADWELL_L2_HWPF_FIELD, reg);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (val == 0)
+ return PFCTL_ENABLE_VAL;
+ else if (val == 1)
+ return PFCTL_DISABLE_VAL;
+ else
+ return -EINVAL;
+}
+
+static int broadwell_modify_hwpf_enable(u64 *reg, unsigned int level, u64 val)
+{
+ if (val == PFCTL_ENABLE_VAL)
+ val = 0;
+ else
+ val = 1;
+
+ switch (level) {
+ case 1:
+ *reg &= ~BROADWELL_DCU_HWPF_FIELD;
+ *reg |= FIELD_PREP(BROADWELL_DCU_HWPF_FIELD, val);
+ break;
+ case 2:
+ *reg &= ~BROADWELL_L2_HWPF_FIELD;
+ *reg |= FIELD_PREP(BROADWELL_L2_HWPF_FIELD, val);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int broadwell_get_ippf_enable(u64 reg, unsigned int level)
+{
+ u64 val;
+
+ switch (level) {
+ case 1:
+ val = FIELD_GET(BROADWELL_DCU_IPPF_FIELD, reg);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (val == 0)
+ return PFCTL_ENABLE_VAL;
+ else if (val == 1)
+ return PFCTL_DISABLE_VAL;
+ else
+ return -EINVAL;
+}
+
+static int broadwell_modify_ippf_enable(u64 *reg, unsigned int level, u64 val)
+{
+ if (val == PFCTL_ENABLE_VAL)
+ val = 0;
+ else
+ val = 1;
+
+ switch (level) {
+ case 1:
+ *reg &= ~BROADWELL_DCU_IPPF_FIELD;
+ *reg |= FIELD_PREP(BROADWELL_DCU_IPPF_FIELD, val);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int broadwell_get_aclpf_enable(u64 reg, unsigned int level)
+{
+ u64 val;
+
+ switch (level) {
+ case 2:
+ val = FIELD_GET(BROADWELL_L2_ACLPF_FIELD, reg);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (val == 0)
+ return PFCTL_ENABLE_VAL;
+ else if (val == 1)
+ return PFCTL_DISABLE_VAL;
+ else
+ return -EINVAL;
+}
+
+static int broadwell_modify_aclpf_enable(u64 *reg, unsigned int level, u64 val)
+{
+ if (val == PFCTL_ENABLE_VAL)
+ val = 0;
+ else
+ val = 1;
+
+ switch (level) {
+ case 2:
+ *reg &= ~BROADWELL_L2_ACLPF_FIELD;
+ *reg |= FIELD_PREP(BROADWELL_L2_ACLPF_FIELD, val);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int broadwell_get_pfctl_params(enum pfctl_attr pattr, u64 reg,
+ unsigned int level, u64 *val)
+{
+ int ret;
+
+ switch (pattr) {
+ case HWPF_ENABLE:
+ ret = broadwell_get_hwpf_enable(reg, level);
+ break;
+ case IPPF_ENABLE:
+ ret = broadwell_get_ippf_enable(reg, level);
+ break;
+ case ACLPF_ENABLE:
+ ret = broadwell_get_aclpf_enable(reg, level);
+ break;
+ default:
+ return -ENOENT;
+ }
+
+ if (ret < 0)
+ return ret;
+ *val = ret;
+
+ return 0;
+}
+
+static int broadwell_modify_pfreg(enum pfctl_attr pattr, u64 *reg,
+ unsigned int level, u64 val)
+{
+ int ret;
+
+ switch (pattr) {
+ case HWPF_ENABLE:
+ ret = broadwell_modify_hwpf_enable(reg, level, val);
+ break;
+ case IPPF_ENABLE:
+ ret = broadwell_modify_ippf_enable(reg, level, val);
+ break;
+ case ACLPF_ENABLE:
+ ret = broadwell_modify_aclpf_enable(reg, level, val);
+ break;
+ default:
+ return -ENOENT;
+ }
+
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void _broadwell_read_pfreg(void *info)
+{
+ u64 reg;
+ struct broadwell_read_info *rinfo = info;
+
+ rdmsrl(MSR_MISC_FEATURE_CONTROL, reg);
+
+ rinfo->ret = broadwell_get_pfctl_params(rinfo->pattr, reg, rinfo->level,
+ &rinfo->val);
+ if (rinfo->ret < 0)
+ return;
+}
+
+static int broadwell_read_pfreg(enum pfctl_attr pattr, unsigned int cpu,
+ unsigned int level, u64 *val)
+{
+ struct broadwell_read_info info = {
+ .level = level,
+ .pattr = pattr,
+ };
+
+ smp_call_function_single(cpu, _broadwell_read_pfreg, &info, true);
+ if (info.ret < 0)
+ return info.ret;
+
+ *val = info.val;
+ return 0;
+}
+
+static void _broadwell_write_pfreg(void *info)
+{
+ u64 reg;
+ struct broadwell_write_info *winfo = info;
+
+ rdmsrl(MSR_MISC_FEATURE_CONTROL, reg);
+
+ winfo->ret = broadwell_modify_pfreg(winfo->pattr, &reg, winfo->level,
+ winfo->val);
+ if (winfo->ret < 0)
+ return;
+
+ wrmsrl(MSR_MISC_FEATURE_CONTROL, reg);
+}
+
+static int broadwell_write_pfreg(enum pfctl_attr pattr, unsigned int cpu,
+ unsigned int level, u64 val)
+{
+ struct broadwell_write_info info = {
+ .level = level,
+ .pattr = pattr,
+ .val = val,
+ };
+
+ smp_call_function_single(cpu, _broadwell_write_pfreg, &info, true);
+ return info.ret;
+}
+
+/*
+ * In addition to BROADWELL_X, NEHALEM and others have same register
+ * specifications as those represented by BROADWELL_XXX_FIELD.
+ * If you want to add support for these processor, add the new target model
+ * here.
+ */
+static const struct x86_cpu_id broadwell_cpu_ids[] = {
+ X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, NULL),
+ {}
+};
+
+/***** end of Intel BROADWELL support *****/
+
+/*
+ * This driver returns a negative value if it does not support the Hardware
+ * Prefetch Control or if it is running on a VM guest.
+ */
+static int __init setup_pfctl_driver_params(void)
+{
+ if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+ return -EINVAL;
+
+ if (x86_match_cpu(broadwell_cpu_ids)) {
+ x86_pfctl_driver.supported_l1d_prefetcher = HWPF|IPPF;
+ x86_pfctl_driver.supported_l2_prefetcher = HWPF|ACLPF;
+ x86_pfctl_driver.read_pfreg = broadwell_read_pfreg;
+ x86_pfctl_driver.write_pfreg = broadwell_write_pfreg;
+ } else {
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
+static int __init x86_pfctl_init(void)
+{
+ int ret;
+
+ ret = setup_pfctl_driver_params();
+ if (ret < 0)
+ return ret;
+
+ ret = pfctl_register_driver(&x86_pfctl_driver);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void __exit x86_pfctl_exit(void)
+{
+ pfctl_unregister_driver(&x86_pfctl_driver);
+}
+
+late_initcall(x86_pfctl_init);
+module_exit(x86_pfctl_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("FUJITSU LIMITED");
+MODULE_DESCRIPTION("x86 Hardware Prefetch Control Driver");
--
2.27.0

2022-04-22 17:23:53

by [email protected]

[permalink] [raw]
Subject: [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register

The current pseudo_lock.c code overwrittes the value of the
MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0.
Therefore, modify it to save and restore the original values.

Signed-off-by: Kohei Tarumizu <[email protected]>
---
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index db813f819ad6..2d713c20f55f 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -420,6 +420,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
struct pseudo_lock_region *plr = rdtgrp->plr;
u32 rmid_p, closid_p;
unsigned long i;
+ u64 saved_msr;
#ifdef CONFIG_KASAN
/*
* The registers used for local register variables are also used
@@ -463,6 +464,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
* the buffer and evict pseudo-locked memory read earlier from the
* cache.
*/
+ saved_msr = __rdmsr(MSR_MISC_FEATURE_CONTROL);
__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
closid_p = this_cpu_read(pqr_state.cur_closid);
rmid_p = this_cpu_read(pqr_state.cur_rmid);
@@ -514,7 +516,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
__wrmsr(IA32_PQR_ASSOC, rmid_p, closid_p);

/* Re-enable the hardware prefetcher(s) */
- wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+ wrmsrl(MSR_MISC_FEATURE_CONTROL, saved_msr);
local_irq_enable();

plr->thread_done = 1;
@@ -873,12 +875,14 @@ static int measure_cycles_lat_fn(void *_plr)
struct pseudo_lock_region *plr = _plr;
unsigned long i;
u64 start, end;
+ u32 saved_low, saved_high;
void *mem_r;

local_irq_disable();
/*
* Disable hardware prefetchers.
*/
+ rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
mem_r = READ_ONCE(plr->kmem);
/*
@@ -895,7 +899,7 @@ static int measure_cycles_lat_fn(void *_plr)
end = rdtsc_ordered();
trace_pseudo_lock_mem_latency((u32)(end - start));
}
- wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+ wrmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
local_irq_enable();
plr->thread_done = 1;
wake_up_interruptible(&plr->lock_thread_wq);
@@ -945,6 +949,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
unsigned long i;
void *mem_r;
u64 tmp;
+ u32 saved_low, saved_high;

miss_event = perf_event_create_kernel_counter(miss_attr, plr->cpu,
NULL, NULL, NULL);
@@ -973,6 +978,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
/*
* Disable hardware prefetchers.
*/
+ rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);

/* Initialize rest of local variables */
@@ -1031,7 +1037,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
*/
rmb();
/* Re-enable hardware prefetchers */
- wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+ wrmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
local_irq_enable();
out_hit:
perf_event_release_kernel(hit_event);
--
2.27.0

2022-04-22 18:02:55

by [email protected]

[permalink] [raw]
Subject: [PATCH v3 2/9] drivers: base: Add Kconfig/Makefile to build hardware prefetch control core driver

This adds Kconfig/Makefile to build hardware prefetch control core
driver. This also adds a MAINTAINERS entry.

Signed-off-by: Kohei Tarumizu <[email protected]>
---
MAINTAINERS | 6 ++++++
drivers/base/Kconfig | 9 +++++++++
drivers/base/Makefile | 1 +
3 files changed, 16 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 40fa1955ca3f..f6640dc053c0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8615,6 +8615,12 @@ F: include/linux/hwmon*.h
F: include/trace/events/hwmon*.h
K: (devm_)?hwmon_device_(un)?register(|_with_groups|_with_info)

+HARDWARE PREFETCH CONTROL DRIVERS
+M: Kohei Tarumizu <[email protected]>
+S: Maintained
+F: drivers/base/pfctl.c
+F: include/linux/pfctl.h
+
HARDWARE RANDOM NUMBER GENERATOR CORE
M: Matt Mackall <[email protected]>
M: Herbert Xu <[email protected]>
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 6f04b831a5c0..8f8a69e7f645 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -230,4 +230,13 @@ config GENERIC_ARCH_NUMA
Enable support for generic NUMA implementation. Currently, RISC-V
and ARM64 use it.

+config HWPF_CONTROL
+ bool "Hardware Prefetch Control driver"
+ help
+ This driver allows user to control CPU's Hardware Prefetch behavior.
+ If the machine supports this behavior, it provides a sysfs interface.
+
+ See Documentation/ABI/testing/sysfs-devices-system-cpu for more
+ information.
+
endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 02f7f1358e86..13f3a0ddf3d1 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o
obj-$(CONFIG_GENERIC_MSI_IRQ_DOMAIN) += platform-msi.o
obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
+obj-$(CONFIG_HWPF_CONTROL) += pfctl.o

obj-y += test/

--
2.27.0

2022-04-22 18:51:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver

On Wed, Apr 20 2022 at 12:02, Kohei Tarumizu wrote:
> +
> +menu "Fujitsu SoC drivers"
> +
> +config A64FX_HWPF_CONTROL
> + tristate "A64FX Hardware Prefetch Control driver"
> + depends on ARM64 || HWPF_CONTROL

&& HWPF_CONTROL

No point in enabling this on x86.

Thanks,

tglx

2022-04-26 01:17:25

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register

Hi Kohei,

Thank you very much for catching this issue. This fix is not specific to
or required by the driver you are creating in this series so you could also
extract this patch and submit it separately as a fix to resctrl.

When you do resubmit there are a few style related points that I highlight here,
the fix itself looks good.

For the subject, please use "x86/resctrl:" prefix in the subject.

On 4/19/2022 8:02 PM, Kohei Tarumizu wrote:
> The current pseudo_lock.c code overwrittes the value of the

overwrittes -> overwrites

> MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0.
> Therefore, modify it to save and restore the original values.
>

This needs a Fixes tag. A few patches are impacted by this fix:

Fixes: 018961ae5579 ("x86/intel_rdt: Pseudo-lock region creation/removal core")
Fixes: 443810fe6160 ("x86/intel_rdt: Create debugfs files for pseudo-locking testing")
Fixes: 8a2fc0e1bc0c ("x86/intel_rdt: More precise L2 hit/miss measurements")

> Signed-off-by: Kohei Tarumizu <[email protected]>
> ---
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> index db813f819ad6..2d713c20f55f 100644
> --- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> +++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> @@ -420,6 +420,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
> struct pseudo_lock_region *plr = rdtgrp->plr;
> u32 rmid_p, closid_p;
> unsigned long i;
> + u64 saved_msr;
> #ifdef CONFIG_KASAN
> /*
> * The registers used for local register variables are also used
> @@ -463,6 +464,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
> * the buffer and evict pseudo-locked memory read earlier from the
> * cache.
> */
> + saved_msr = __rdmsr(MSR_MISC_FEATURE_CONTROL);
> __wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
> closid_p = this_cpu_read(pqr_state.cur_closid);
> rmid_p = this_cpu_read(pqr_state.cur_rmid);
> @@ -514,7 +516,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
> __wrmsr(IA32_PQR_ASSOC, rmid_p, closid_p);
>
> /* Re-enable the hardware prefetcher(s) */
> - wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
> + wrmsrl(MSR_MISC_FEATURE_CONTROL, saved_msr);
> local_irq_enable();
>
> plr->thread_done = 1;
> @@ -873,12 +875,14 @@ static int measure_cycles_lat_fn(void *_plr)
> struct pseudo_lock_region *plr = _plr;
> unsigned long i;
> u64 start, end;
> + u32 saved_low, saved_high;
> void *mem_r;

Please do follow the current style of using "reverse fir tree order".
More information in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/maintainer-tip.rst#n587

>
> local_irq_disable();
> /*
> * Disable hardware prefetchers.
> */
> + rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
> wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
> mem_r = READ_ONCE(plr->kmem);
> /*
> @@ -895,7 +899,7 @@ static int measure_cycles_lat_fn(void *_plr)
> end = rdtsc_ordered();
> trace_pseudo_lock_mem_latency((u32)(end - start));
> }
> - wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
> + wrmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
> local_irq_enable();
> plr->thread_done = 1;
> wake_up_interruptible(&plr->lock_thread_wq);
> @@ -945,6 +949,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
> unsigned long i;
> void *mem_r;
> u64 tmp;
> + u32 saved_low, saved_high;

Same as above.

>
> miss_event = perf_event_create_kernel_counter(miss_attr, plr->cpu,
> NULL, NULL, NULL);
> @@ -973,6 +978,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
> /*
> * Disable hardware prefetchers.
> */
> + rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
> wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
>
> /* Initialize rest of local variables */
> @@ -1031,7 +1037,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
> */
> rmb();
> /* Re-enable hardware prefetchers */
> - wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
> + wrmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
> local_irq_enable();
> out_hit:
> perf_event_release_kernel(hit_event);

Thank you very much.

Reinette

2022-04-27 09:25:17

by [email protected]

[permalink] [raw]
Subject: RE: [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register

Thanks for the comment.

> Thank you very much for catching this issue. This fix is not specific to or required
> by the driver you are creating in this series so you could also extract this patch and
> submit it separately as a fix to resctrl.

I would like to send this patch separated from this series next time.

> When you do resubmit there are a few style related points that I highlight here, the
> fix itself looks good.
>
> For the subject, please use "x86/resctrl:" prefix in the subject.

> This needs a Fixes tag. A few patches are impacted by this fix:
>
> Fixes: 018961ae5579 ("x86/intel_rdt: Pseudo-lock region creation/removal
> core")
> Fixes: 443810fe6160 ("x86/intel_rdt: Create debugfs files for pseudo-locking
> testing")
> Fixes: 8a2fc0e1bc0c ("x86/intel_rdt: More precise L2 hit/miss measurements")

I would like to use this prefix and add Fixes tag for the next patch.

> Please do follow the current style of using "reverse fir tree order".
> More information in:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Docume
> ntation/process/maintainer-tip.rst#n587

> Same as above.

I check the URL to fix style problem.