This patch series add sysfs interface to control CPU's hardware
prefetch behavior for performance tuning from userspace for arm64 and
x86 (on supported CPU).
[Background]
============
A64FX and some Intel processors have implementation-dependent register
for controlling CPU's hardware prefetch behavior. A64FX has
IMP_PF_STREAM_DETECT_CTRL_EL0[1], and Intel processors have MSR 0x1a4
(MSR_MISC_FEATURE_CONTROL)[2]. These registers cannot be accessed from
userspace.
[1]https://github.com/fujitsu/A64FX/tree/master/doc/
A64FX_Specification_HPC_Extension_v1_EN.pdf
[2]https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
Volume 4
The advantage of using this is improved performance. As an example of
performance improvements, the results of running the Stream benchmark
on the A64FX are described in section [Merit].
For MSR 0x1a4, it is also possible to change the value from userspace
via the MSR driver. However, using MSR driver is not recommended, so
it needs a proper kernel interface[3].
[3]https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about/
For these reasons, we provide a new proper kernel interface to control
both IMP_PF_STREAM_DETECT_CTRL_EL0 and MSR 0x1a4.
[Overall design]
================
The following changes have been made based on feedback received in
earlier RFC PATCH[4].
[4]https://lore.kernel.org/lkml/[email protected]/
- create attribute file under
/sys/devices/system/cpu/cpu*/cache/index[0,2]
- provide a one-to-one option to control all the tunable parameters
the CPU provides.
- use x86_match_cpu() to identify the kind of model
The source code for this driver is divided into common parts
(driver/base/pfctl.c) and architecture parts (arch/XXX/XXX/pfctl.c).
Common parts is described architecture-independent processing, such as
creating sysfs.
Architecture parts is described architecture-dependent processing. It
must contain at least the what type of hardware prefetcher is supported
and how to read/write to the register. These information are set
through registration function in common parts.
This driver creates an attribute file named "prefetch_control”in
every CPU's cache/index[0,2] directory, if CPU supports hardware
prefetch control behavior. Each attribute file corresponds to the
cache level of the parent index directory. The attribute file can be
used to modify one or more options by writing a string.
Detailed description of this sysfs interface is in
Documentation/ABI/testing/sysfs-devices-system-cpu (patch8).
This driver needs cache sysfs directory and cache level/type
information. In ARM processor, these information can be obtained
from registers even without ACPI PPTT.
We add processing to create a cache/index directory using only the
information from the register if the machine does not support ACPI
PPTT and Kconfig for hardware prefetch control (CONFIG_HWPF_CONTROL)
is true in patch5.
This action caused a problem and is described in [Known problem].
[Examples]
==========
As an example of dealing with multiple options, the x86's model of
INTEL_FAM6_BROADWELL_X defines the register specifications as follows:
[0] L2 Hardware Prefetcher Disable (R/W)
[1] L2 Adjacent Cache Line Prefetcher Disable (R/W)
[2] DCU Hardware Prefetcher Disable (R/W)
[3] DCU IP Prefetcher Disable (R/W)
[63:4] Reserved
In this case, index0 (L1d cache) corresponds to bit[2,3] and index2
(L2 cache) corresponds to bit [0,1]. If you would like to disable the
setting of "L2 Adjacent Cache Line Prefetcher Disable (R/W)" in CPU1,
do the following:
```
# echo "adjacent_cache_line_prefetcher_enable=disable" > /sys/devices/system/cpu/cpu1/cache/index2/prefetch_control
```
When this attribute file is read, the current settings of the available
parameters are output as follows:
```
# cat /sys/devices/system/cpu/cpu1/cache/index2/prefetch_control
hardware_prefetcher_enable=enable
adjacent_cache_line_prefetcher_enable=disable
```
In the case of index0 at BROADWELL_X:
```
# cat /sys/devices/system/cpu/cpu1/cache/index0/prefetch_control
hardware_prefetcher_enable=enable
ip_prefetcher_enable=enable
```
In the case of index0 at A64FX:
```
# cat /sys/devices/system/cpu/cpu1/cache/index0/prefetch_control
stream_detect_prefetcher_enable=enable
stream_detect_prefetcher_dist=auto
stream_detect_prefetcher_strong=strong
```
[Patch organizations]
=====================
This patch series add hardware prefetch control core driver for ARM64
and x86. Also, we add support for FUJITSU_CPU_PART_A64FX at ARM64 and
BROADWELL_X at x86.
- patch1: Add hardware prefetch core driver
This driver provides a register/unregister function to create the
sysfs interface with attribute "prefetch_control".
If the architecture has control of the CPU's hardware prefetch
behavior, use this function to create sysfs. When registering, it
is necessary to provide what type of Hardware Prefetcher is
supported and how to read/write to the register.
- patch2: Add Kconfig/Makefile to build hardware prefetch control core driver
- patch3: Add support for ARM64
This adds module init/exit code, and creates sysfs attribute file
"prefetch_control" for ARM64. This driver works only if part number is
FUJITSU_CPU_PART_A64FX at this point.
- patch4: Add Kconfig/Makefile to build driver for arm64
- patch5: Create cache sysfs directory without ACPI PPTT for hardware prefetch control
Hardware Prefetch control driver needs cache sysfs directory and cache
level/type information. In ARM processor, these information can be
obtained from registers even without PPTT. Therefore, we set the
cpu_map_populated to true to create cache sysfs directory, if the
machine doesn't have PPTT.
- patch6: Add support for x86
This adds module init/exit code, and creates sysfs attribute file
"prefetch_control" for x86. This driver works only if the model is
INTEL_FAM6_BROADWELL_X at this point.
- patch7: Add Kconfig/Makefile to build driver for x86
- patch8: Add documentation for the new sysfs interface
[Known problem]
===============
- `lscpu` command terminates with -ENOENT because cache/index directory
is exists but shared_cpu_map file does not exist. This is due to
patch5, which creates a cache/index directory containing only level
and type without ACPI PPTT.
[Merit]
=======
For reference, here is the result of STREAM Triad when tuning with
the "s file in L1 and L2 cache on A64FX.
| dist combination | Pattern A | Pattern B |
|-------------------|-------------|-------------|
| L1:256, L2:1024 | 234505.2144 | 114600.0801 |
| L1:1536, L2:1024 | 279172.8742 | 118979.4542 |
| L1:256, L2:10240 | 247716.7757 | 127364.1533 |
| L1:1536, L2:10240 | 283675.6625 | 125950.6847 |
In pattern A, we set the size of the array to 174720, which is about
half the size of the L1d cache. In pattern B, we set the size of the
array to 10485120, which is about twice the size of the L2 cache.
In pattern A, a change of dist at L1 has a larger effect. On the other
hand, in pattern B, the change of dist at L2 has a larger effect.
As described above, the optimal dist combination depends on the
characteristics of the application. Therefore, such a sysfs interface
is useful for performance tuning.
Best regards,
Kohei Tarumizu
Kohei Tarumizu (8):
drivers: base: Add hardware prefetch control core driver
drivers: base: Add Kconfig/Makefile to build hardware prefetch control
core driver
arm64: Add hardware prefetch control support for ARM64
arm64: Add Kconfig/Makefile to build hardware prefetch control driver
arm64: Create cache sysfs directory without ACPI PPTT for hardware
prefetch control
x86: Add hardware prefetch control support for x86
x86: Add Kconfig/Makefile to build hardware prefetch control driver
docs: ABI: Add sysfs documentation interface of hardware prefetch
control driver
.../ABI/testing/sysfs-devices-system-cpu | 93 +++
MAINTAINERS | 8 +
arch/arm64/Kconfig | 8 +
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/cacheinfo.c | 29 +
arch/arm64/kernel/pfctl.c | 324 +++++++++++
arch/x86/Kconfig | 7 +
arch/x86/kernel/cpu/Makefile | 2 +
arch/x86/kernel/cpu/pfctl.c | 292 ++++++++++
drivers/base/Kconfig | 13 +
drivers/base/Makefile | 1 +
drivers/base/pfctl.c | 541 ++++++++++++++++++
include/linux/pfctl.h | 42 ++
13 files changed, 1361 insertions(+)
create mode 100644 arch/arm64/kernel/pfctl.c
create mode 100644 arch/x86/kernel/cpu/pfctl.c
create mode 100644 drivers/base/pfctl.c
create mode 100644 include/linux/pfctl.h
--
2.27.0
This driver provides a register/unregister function to create the
sysfs interface with an attribute file named "prefetch_control" in
every CPU's cache/index[0,2] directory.
If the architecture has control of the CPU's hardware prefetcher
behavior, use this function to create sysfs. When registering, it is
necessary to provide what type of hardware prefetcher is supported
and how to read/write to the register.
Following patches add support for ARM64 and x86.
Signed-off-by: Kohei Tarumizu <[email protected]>
---
drivers/base/pfctl.c | 541 ++++++++++++++++++++++++++++++++++++++++++
include/linux/pfctl.h | 42 ++++
2 files changed, 583 insertions(+)
create mode 100644 drivers/base/pfctl.c
create mode 100644 include/linux/pfctl.h
diff --git a/drivers/base/pfctl.c b/drivers/base/pfctl.c
new file mode 100644
index 000000000000..4bc3c2826d69
--- /dev/null
+++ b/drivers/base/pfctl.c
@@ -0,0 +1,541 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022 FUJITSU LIMITED
+ *
+ * This driver provides tunable sysfs interface for Hardware Prefetch Control.
+ * See Documentation/ABI/testing/sysfs-devices-system-cpu for more information.
+ *
+ * This code provides architecture-independent functions such as create and
+ * remove attribute file.
+ * The implementation of reads and writes to the Hardware Prefetch Control
+ * register is architecture-dependent. Therefore, each architecture register
+ * a callback to read and write the register via pfctl_register_driver().
+ */
+
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/device.h>
+#include <linux/pfctl.h>
+#include <linux/parser.h>
+#include <linux/slab.h>
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+enum {
+ OPT_HWPF_ENABLE,
+ OPT_IPPF_ENABLE,
+ OPT_ACLPF_ENABLE,
+ OPT_SDPF_ENABLE,
+ OPT_SDPF_DIST,
+ OPT_SDPF_DIST_AUTO,
+ OPT_SDPF_STRONG,
+ OPT_ERR,
+};
+
+static const char hwpf_enable_fmt[] = "hardware_prefetcher_enable=%s";
+static const char ippf_enable_fmt[] = "ip_prefetcher_enable=%s";
+static const char aclpf_enable_fmt[] =
+ "adjacent_cache_line_prefetcher_enable=%s";
+static const char sdpf_enable_fmt[] = "stream_detect_prefetcher_enable=%s";
+static const char sdpf_dist_fmt[] = "stream_detect_prefetcher_dist=%d";
+static const char sdpf_dist_auto_fmt[] = "stream_detect_prefetcher_dist=%s";
+static const char sdpf_strong_fmt[] = "stream_detect_prefetcher_strong=%s";
+
+static const match_table_t pfctl_opt_tokens = {
+ {OPT_HWPF_ENABLE, hwpf_enable_fmt},
+ {OPT_IPPF_ENABLE, ippf_enable_fmt},
+ {OPT_ACLPF_ENABLE, aclpf_enable_fmt},
+ {OPT_SDPF_ENABLE, sdpf_enable_fmt},
+ {OPT_SDPF_DIST, sdpf_dist_fmt},
+ {OPT_SDPF_DIST_AUTO, sdpf_dist_auto_fmt},
+ {OPT_SDPF_STRONG, sdpf_strong_fmt},
+ {OPT_ERR, NULL},
+};
+
+static DEFINE_PER_CPU(struct device *, cache_device_pcpu);
+#define per_cpu_cache_device(cpu) (per_cpu(cache_device_pcpu, cpu))
+
+struct pfctl_driver *pdriver;
+enum cpuhp_state hp_online;
+
+static bool prefetcher_is_available(unsigned int level, enum cache_type type,
+ int prefetcher)
+{
+ if ((level == 1) && (type == CACHE_TYPE_DATA))
+ if (pdriver->supported_l1d_prefetcher & prefetcher)
+ return true;
+ else
+ return false;
+ else if ((level == 2) &&
+ (type == CACHE_TYPE_UNIFIED))
+ if (pdriver->supported_l2_prefetcher & prefetcher)
+ return true;
+ else
+ return false;
+ else
+ return false;
+}
+
+static int parse_prefetch_options(struct prefetcher_options *opts,
+ const char *buf, unsigned int level,
+ enum cache_type type)
+{
+ char *options, *sep_opt, *p;
+ substring_t args[MAX_OPT_ARGS];
+ int opt_mask = 0;
+ int token;
+ int ret = 0;
+
+ options = kstrdup(buf, GFP_KERNEL);
+ if (!options)
+ return -ENOMEM;
+
+ sep_opt = strstrip(options);
+ while ((p = strsep(&sep_opt, " ")) != NULL) {
+ unsigned int val;
+
+ if (!*p)
+ continue;
+
+ token = match_token(p, pfctl_opt_tokens, args);
+ opt_mask |= token;
+
+ switch (token) {
+ case OPT_HWPF_ENABLE:
+ if (!prefetcher_is_available(level, type, HWPF)) {
+ pr_err("Unsupported parameter: '%s'\n", p);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ p = match_strdup(args);
+ if (!p) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (!strcmp(p, "enable")) {
+ opts->hwpf_enable = PFCTL_ENABLE_VAL;
+ } else if (!strcmp(p, "disable")) {
+ opts->hwpf_enable = PFCTL_DISABLE_VAL;
+ } else {
+ pr_err("Invalid value: '%s'\n", p);
+ ret = -EINVAL;
+ kfree(p);
+ goto out;
+ }
+
+ kfree(p);
+ break;
+ case OPT_IPPF_ENABLE:
+ if (!prefetcher_is_available(level, type, IPPF)) {
+ pr_err("Unsupported parameter: '%s'\n", p);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ p = match_strdup(args);
+ if (!p) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (!strcmp(p, "enable")) {
+ opts->ippf_enable = PFCTL_ENABLE_VAL;
+ } else if (!strcmp(p, "disable")) {
+ opts->ippf_enable = PFCTL_DISABLE_VAL;
+ } else {
+ pr_err("Invalid value: '%s'\n", p);
+ ret = -EINVAL;
+ kfree(p);
+ goto out;
+ }
+
+ kfree(p);
+ break;
+ case OPT_ACLPF_ENABLE:
+ if (!prefetcher_is_available(level, type, ACLPF)) {
+ pr_err("Unsupported parameter: '%s'\n", p);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ p = match_strdup(args);
+ if (!p) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (!strcmp(p, "enable")) {
+ opts->aclpf_enable = PFCTL_ENABLE_VAL;
+ } else if (!strcmp(p, "disable")) {
+ opts->aclpf_enable = PFCTL_DISABLE_VAL;
+ } else {
+ pr_err("Invalid value: '%s'\n", p);
+ ret = -EINVAL;
+ kfree(p);
+ goto out;
+ }
+
+ kfree(p);
+ break;
+ case OPT_SDPF_ENABLE:
+ if (!prefetcher_is_available(level, type, SDPF)) {
+ pr_err("Unsupported parameter: '%s'\n", p);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ p = match_strdup(args);
+ if (!p) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (!strcmp(p, "enable")) {
+ opts->sdpf_enable = PFCTL_ENABLE_VAL;
+ } else if (!strcmp(p, "disable")) {
+ opts->sdpf_enable = PFCTL_DISABLE_VAL;
+ } else {
+ pr_err("Invalid value: '%s'\n", p);
+ ret = -EINVAL;
+ kfree(p);
+ goto out;
+ }
+
+ kfree(p);
+ break;
+ case OPT_SDPF_DIST:
+ if (!prefetcher_is_available(level, type, SDPF)) {
+ pr_err("Unsupported parameter: '%s'\n", p);
+ ret = -EINVAL;
+ goto out;
+ }
+ ret = match_uint(args, &val);
+ if (ret < 0) {
+ pr_err("Invalid value: '%s'\n", p);
+ goto out;
+ }
+
+ opts->sdpf_dist = val;
+ break;
+ case OPT_SDPF_DIST_AUTO:
+ if (!prefetcher_is_available(level, type, SDPF)) {
+ pr_err("Unsupported parameter: '%s'\n", p);
+ ret = -EINVAL;
+ goto out;
+ }
+ p = match_strdup(args);
+ if (!p) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (!strcmp(p, "auto")) {
+ opts->sdpf_dist = PFCTL_DIST_AUTO_VAL;
+ } else {
+ pr_err("Invalid value: '%s'\n", p);
+ ret = -EINVAL;
+ kfree(p);
+ goto out;
+ }
+
+ kfree(p);
+ break;
+ case OPT_SDPF_STRONG:
+ if (!prefetcher_is_available(level, type, SDPF)) {
+ pr_err("Unsupported parameter: '%s'\n", p);
+ ret = -EINVAL;
+ goto out;
+ }
+ p = match_strdup(args);
+ if (!p) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (!strcmp(p, "strong")) {
+ opts->sdpf_strong = PFCTL_STRONG_VAL;
+ } else if (!strcmp(p, "weak")) {
+ opts->sdpf_strong = PFCTL_WEAK_VAL;
+ } else {
+ pr_err("Invalid value: '%s'\n", p);
+ ret = -EINVAL;
+ kfree(p);
+ goto out;
+ }
+
+ kfree(p);
+ break;
+ default:
+ pr_err("Unknown parameter or missing value '%s'\n", p);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+out:
+ kfree(options);
+ return ret;
+}
+
+static ssize_t prefetch_control_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ int ret;
+ unsigned int cpu;
+ ssize_t len = 0;
+ struct prefetcher_options opts;
+ struct cacheinfo *this_leaf = dev_get_drvdata(dev);
+
+ cpu = dev->parent->parent->id;
+
+ ret = pdriver->read_pfreg(cpu, this_leaf->level, &opts);
+ if (ret < 0)
+ return ret;
+
+ if (prefetcher_is_available(this_leaf->level, this_leaf->type, HWPF)) {
+ if (opts.hwpf_enable == PFCTL_ENABLE_VAL)
+ len += sysfs_emit_at(buf, len, hwpf_enable_fmt,
+ "enable");
+ else
+ len += sysfs_emit_at(buf, len, hwpf_enable_fmt,
+ "disable");
+ len += sysfs_emit_at(buf, len, "\n");
+ }
+
+ if (prefetcher_is_available(this_leaf->level, this_leaf->type, IPPF)) {
+ if (opts.ippf_enable == PFCTL_ENABLE_VAL)
+ len += sysfs_emit_at(buf, len, ippf_enable_fmt,
+ "enable");
+ else
+ len += sysfs_emit_at(buf, len, ippf_enable_fmt,
+ "disable");
+ len += sysfs_emit_at(buf, len, "\n");
+ }
+
+ if (prefetcher_is_available(this_leaf->level, this_leaf->type, ACLPF)) {
+ if (opts.aclpf_enable == PFCTL_ENABLE_VAL)
+ len += sysfs_emit_at(buf, len, aclpf_enable_fmt,
+ "enable");
+ else
+ len += sysfs_emit_at(buf, len, aclpf_enable_fmt,
+ "disable");
+ len += sysfs_emit_at(buf, len, "\n");
+ }
+
+ if (prefetcher_is_available(this_leaf->level, this_leaf->type, SDPF)) {
+ if (opts.sdpf_enable == PFCTL_ENABLE_VAL)
+ len += sysfs_emit_at(buf, len, sdpf_enable_fmt,
+ "enable");
+ else
+ len += sysfs_emit_at(buf, len, sdpf_enable_fmt,
+ "disable");
+ len += sysfs_emit_at(buf, len, "\n");
+
+ if (opts.sdpf_dist == PFCTL_DIST_AUTO_VAL)
+ len += sysfs_emit_at(buf, len, sdpf_dist_auto_fmt,
+ "auto");
+ else
+ len += sysfs_emit_at(buf, len, sdpf_dist_fmt,
+ opts.sdpf_dist);
+ len += sysfs_emit_at(buf, len, "\n");
+
+ if (opts.sdpf_strong == PFCTL_STRONG_VAL)
+ len += sysfs_emit_at(buf, len, sdpf_strong_fmt,
+ "strong");
+ else
+ len += sysfs_emit_at(buf, len, sdpf_strong_fmt,
+ "weak");
+ len += sysfs_emit_at(buf, len, "\n");
+ }
+
+ return len;
+}
+
+static ssize_t prefetch_control_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int ret;
+ unsigned int cpu;
+ struct cacheinfo *this_leaf = dev_get_drvdata(dev);
+ struct prefetcher_options opts = {
+ .hwpf_enable = PFCTL_PARAM_UNSET,
+ .ippf_enable = PFCTL_PARAM_UNSET,
+ .aclpf_enable = PFCTL_PARAM_UNSET,
+ .sdpf_enable = PFCTL_PARAM_UNSET,
+ .sdpf_dist = PFCTL_PARAM_UNSET,
+ .sdpf_strong = PFCTL_PARAM_UNSET,
+ };
+
+ cpu = dev->parent->parent->id;
+
+ ret = parse_prefetch_options(&opts, buf, this_leaf->level,
+ this_leaf->type);
+ if (ret < 0)
+ return ret;
+
+ ret = pdriver->write_pfreg(cpu, this_leaf->level, &opts);
+ if (ret < 0)
+ return ret;
+
+ return count;
+}
+
+static DEVICE_ATTR_ADMIN_RW(prefetch_control);
+
+static int find_cache_device(unsigned int cpu)
+{
+ struct device *cpu_dev = get_cpu_device(cpu);
+ struct device *cache_dev;
+
+ cache_dev = device_find_child_by_name(cpu_dev, "cache");
+ if (!cache_dev)
+ return -ENODEV;
+ per_cpu_cache_device(cpu) = cache_dev;
+
+ return 0;
+}
+
+static int _create_pfctl_attr(struct device *dev, void *data)
+{
+ int ret;
+ struct cacheinfo *leaf = dev_get_drvdata(dev);
+
+ if (!prefetcher_is_available(leaf->level, leaf->type, ANYPF))
+ return 0;
+
+ ret = sysfs_create_file(&dev->kobj, &dev_attr_prefetch_control.attr);
+ if (ret < 0) {
+ pr_err("sysfs_create_file failed: %d\n", ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int create_pfctl_attr(unsigned int cpu)
+{
+ int ret;
+ struct device *cache_dev = per_cpu_cache_device(cpu);
+
+ if (!cache_dev)
+ return -ENODEV;
+
+ ret = device_for_each_child(cache_dev, NULL, _create_pfctl_attr);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static int _remove_pfctl_attr(struct device *dev, void *data)
+{
+ struct cacheinfo *leaf = dev_get_drvdata(dev);
+
+ if (!prefetcher_is_available(leaf->level, leaf->type, ANYPF))
+ return 0;
+
+ sysfs_remove_file(&dev->kobj, &dev_attr_prefetch_control.attr);
+
+ return 0;
+}
+
+static void remove_pfctl_attr(unsigned int cpu)
+{
+ struct device *cache_dev = per_cpu_cache_device(cpu);
+
+ if (!cache_dev)
+ return;
+
+ device_for_each_child(cache_dev, NULL, _remove_pfctl_attr);
+}
+
+static int pfctl_online(unsigned int cpu)
+{
+ int ret;
+
+ ret = find_cache_device(cpu);
+ if (ret < 0)
+ return ret;
+
+ ret = create_pfctl_attr(cpu);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static int pfctl_prepare_down(unsigned int cpu)
+{
+ remove_pfctl_attr(cpu);
+
+ return 0;
+}
+
+/**
+ * pfctl_register_driver - register a Hardware Prefetch Control driver
+ * @driver_data: struct pfctl_driver must contain the supported prefetcher type
+ * and function pointer for reading and writing hardware prefetch
+ * register. If these are not defined this function return error.
+ *
+ * Note: This function must be called after the cache device is initialized
+ * because it requires access to the cache device.
+ * (e.g. Call at the late_initcall)
+ *
+ * Context: Any context.
+ * Return: 0 on success, negative error code on failure.
+ */
+int pfctl_register_driver(struct pfctl_driver *driver_data)
+{
+ int ret;
+
+ if (pdriver)
+ return -EEXIST;
+
+ if ((driver_data->supported_l1d_prefetcher == 0) &&
+ (driver_data->supported_l2_prefetcher == 0))
+ return -EINVAL;
+
+ if (!driver_data->read_pfreg || !driver_data->write_pfreg)
+ return -EINVAL;
+
+ pdriver = driver_data;
+
+ ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "base/pfctl:online",
+ pfctl_online, pfctl_prepare_down);
+ if (ret < 0) {
+ pr_err("failed to register hotplug callbacks\n");
+ pdriver = NULL;
+ return ret;
+ }
+
+ hp_online = ret;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pfctl_register_driver);
+
+/**
+ * pfctl_unregister_driver - unregister the Hardware Prefetch Control driver
+ * @driver_data: Used to verify that this function is called by the driver that
+ * called pfctl_register_driver by determining if driver_data is
+ * the same.
+ *
+ * Context: Any context.
+ * Return: nothing.
+ */
+void pfctl_unregister_driver(struct pfctl_driver *driver_data)
+{
+ if (!pdriver || (driver_data != pdriver))
+ return;
+
+ cpuhp_remove_state(hp_online);
+
+ pdriver = NULL;
+}
+EXPORT_SYMBOL_GPL(pfctl_unregister_driver);
diff --git a/include/linux/pfctl.h b/include/linux/pfctl.h
new file mode 100644
index 000000000000..7a05e2f4a4f7
--- /dev/null
+++ b/include/linux/pfctl.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_PFCTL_H
+#define _LINUX_PFCTL_H
+
+#define PFCTL_ENABLE_VAL 0
+#define PFCTL_DISABLE_VAL 1
+#define PFCTL_DIST_AUTO_VAL 0
+#define PFCTL_STRONG_VAL 0
+#define PFCTL_WEAK_VAL 1
+#define PFCTL_PARAM_UNSET -1
+
+struct prefetcher_options {
+ int hwpf_enable;
+ int ippf_enable;
+ int aclpf_enable;
+ int sdpf_enable;
+ int sdpf_dist;
+ int sdpf_strong;
+};
+
+enum prefetcher {
+ HWPF = BIT(0), /* Hardware Prefetcher */
+ IPPF = BIT(1), /* IP Prefetcher */
+ ACLPF = BIT(2), /* Adjacent Cache Line Prefetcher */
+ SDPF = BIT(3), /* Stream Detect Prefetcher */
+ ANYPF = HWPF|IPPF|ACLPF|SDPF,
+};
+
+struct pfctl_driver {
+ unsigned int supported_l1d_prefetcher;
+ unsigned int supported_l2_prefetcher;
+
+ int (*read_pfreg)(unsigned int cpu, unsigned int level,
+ struct prefetcher_options *opt);
+ int (*write_pfreg)(unsigned int cpu, unsigned int level,
+ struct prefetcher_options *opt);
+};
+
+int pfctl_register_driver(struct pfctl_driver *driver_data);
+void pfctl_unregister_driver(struct pfctl_driver *driver_data);
+
+#endif
--
2.27.0
This adds Kconfig/Makefile to build hardware prefetch control driver
for arm64 support. This also adds a MAINTAINERS entry.
Signed-off-by: Kohei Tarumizu <[email protected]>
---
MAINTAINERS | 1 +
arch/arm64/Kconfig | 8 ++++++++
arch/arm64/kernel/Makefile | 1 +
3 files changed, 10 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index b474051c41e7..0eaee76438e9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8447,6 +8447,7 @@ K: (devm_)?hwmon_device_(un)?register(|_with_groups|_with_info)
HARDWARE PREFETCH CONTROL DRIVERS
M: Kohei Tarumizu <[email protected]>
S: Maintained
+F: arch/arm64/kernel/pfctl.c
F: drivers/base/pfctl.c
F: include/linux/pfctl.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6978140edfa4..c2256dbb0243 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -36,6 +36,7 @@ config ARM64
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_SET_MEMORY
select ARCH_STACKWALK
+ select ARCH_HAS_HWPF_CONTROL
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
@@ -1941,6 +1942,13 @@ config STACKPROTECTOR_PER_TASK
def_bool y
depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_SYSREG
+config ARM64_HWPF_CONTROL
+ tristate "ARM64 Hardware Prefetch Control support"
+ depends on HWPF_CONTROL
+ default m
+ help
+ This adds Hardware Prefetch Control driver support for ARM64.
+
endmenu
menu "Boot options"
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 88b3e2a21408..d5eb1dc6bfa6 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -73,6 +73,7 @@ obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
obj-$(CONFIG_ARM64_MTE) += mte.o
obj-y += vdso-wrap.o
obj-$(CONFIG_COMPAT_VDSO) += vdso32-wrap.o
+obj-$(CONFIG_ARM64_HWPF_CONTROL) += pfctl.o
obj-y += probes/
head-y := head.o
--
2.27.0
This adds Kconfig/Makefile to build hardware prefetch control core
driver. This also adds a MAINTAINERS entry.
Signed-off-by: Kohei Tarumizu <[email protected]>
---
MAINTAINERS | 6 ++++++
drivers/base/Kconfig | 13 +++++++++++++
drivers/base/Makefile | 1 +
3 files changed, 20 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index ea3e6c914384..b474051c41e7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8444,6 +8444,12 @@ F: include/linux/hwmon*.h
F: include/trace/events/hwmon*.h
K: (devm_)?hwmon_device_(un)?register(|_with_groups|_with_info)
+HARDWARE PREFETCH CONTROL DRIVERS
+M: Kohei Tarumizu <[email protected]>
+S: Maintained
+F: drivers/base/pfctl.c
+F: include/linux/pfctl.h
+
HARDWARE RANDOM NUMBER GENERATOR CORE
M: Matt Mackall <[email protected]>
M: Herbert Xu <[email protected]>
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 6f04b831a5c0..d146604b5b3a 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -230,4 +230,17 @@ config GENERIC_ARCH_NUMA
Enable support for generic NUMA implementation. Currently, RISC-V
and ARM64 use it.
+config ARCH_HAS_HWPF_CONTROL
+ bool
+
+config HWPF_CONTROL
+ bool "Hardware Prefetch Control driver"
+ depends on ARCH_HAS_HWPF_CONTROL && SYSFS
+ help
+ This driver allows user to control CPU's Hardware Prefetch behavior.
+ If the machine supports this behavior, it provides a sysfs interface.
+
+ See Documentation/ABI/testing/sysfs-devices-system-cpu for more
+ information.
+
endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 02f7f1358e86..13f3a0ddf3d1 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o
obj-$(CONFIG_GENERIC_MSI_IRQ_DOMAIN) += platform-msi.o
obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
+obj-$(CONFIG_HWPF_CONTROL) += pfctl.o
obj-y += test/
--
2.27.0
This adds module init/exit code, and creates sysfs attribute files for
"prefetch_control". This driver works only if part number is
FUJITSU_CPU_PART_A64FX at this point. The details of the registers to
be read and written in this patch are described below.
"https://github.com/fujitsu/A64FX/tree/master/doc/"
A64FX_Specification_HPC_Extension_v1_EN.pdf
Signed-off-by: Kohei Tarumizu <[email protected]>
---
arch/arm64/kernel/pfctl.c | 324 ++++++++++++++++++++++++++++++++++++++
1 file changed, 324 insertions(+)
create mode 100644 arch/arm64/kernel/pfctl.c
diff --git a/arch/arm64/kernel/pfctl.c b/arch/arm64/kernel/pfctl.c
new file mode 100644
index 000000000000..14f4b8248280
--- /dev/null
+++ b/arch/arm64/kernel/pfctl.c
@@ -0,0 +1,324 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022 FUJITSU LIMITED
+ *
+ * ARM64 Hardware Prefetch Control support
+ */
+
+#include <asm/cputype.h>
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/module.h>
+#include <linux/pfctl.h>
+#include <linux/parser.h>
+
+struct pfctl_driver arm64_pfctl_driver;
+
+/**************************************
+ * FUJITSU A64FX support
+ **************************************/
+
+/*
+ * Constants for these add the "A64FX_SDPF" prefix to the name described in
+ * section "1.3.4.2. IMP_PF_STREAM_DETECT_CTRL_EL0" of "A64FX specification".
+ * (https://github.com/fujitsu/A64FX/tree/master/doc/A64FX_Specification_HPC_Extension_v1_EN.pdf")
+ * See this document for register specification details.
+ */
+#define A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0 sys_reg(3, 3, 11, 4, 0)
+#define A64FX_SDPF_V BIT_ULL(63)
+#define A64FX_SDPF_L1PF_DIS BIT_ULL(59)
+#define A64FX_SDPF_L2PF_DIS BIT_ULL(58)
+#define A64FX_SDPF_L1W BIT_ULL(55)
+#define A64FX_SDPF_L2W BIT_ULL(54)
+#define A64FX_SDPF_L1_DIST GENMASK_ULL(27, 24)
+#define A64FX_SDPF_L2_DIST GENMASK_ULL(19, 16)
+
+#define A64FX_SDPF_MIN_DIST_L1 256
+#define A64FX_SDPF_MIN_DIST_L2 1024
+
+struct a64fx_read_info {
+ struct prefetcher_options *opts;
+ unsigned int level;
+ int ret;
+};
+
+struct a64fx_write_info {
+ struct prefetcher_options *opts;
+ unsigned int level;
+ int ret;
+};
+
+static int a64fx_get_sdpf_enable(u64 reg, unsigned int level)
+{
+ switch (level) {
+ case 1:
+ return FIELD_GET(A64FX_SDPF_L1PF_DIS, reg);
+ case 2:
+ return FIELD_GET(A64FX_SDPF_L2PF_DIS, reg);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int a64fx_modify_sdpf_enable(u64 *reg, unsigned int level, int val)
+{
+ switch (level) {
+ case 1:
+ *reg &= ~A64FX_SDPF_L1PF_DIS;
+ *reg |= FIELD_PREP(A64FX_SDPF_L1PF_DIS, val);
+ break;
+ case 2:
+ *reg &= ~A64FX_SDPF_L2PF_DIS;
+ *reg |= FIELD_PREP(A64FX_SDPF_L2PF_DIS, val);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int a64fx_get_sdpf_dist(u64 reg, unsigned int level)
+{
+ switch (level) {
+ case 1:
+ return FIELD_GET(A64FX_SDPF_L1_DIST, reg) *
+ A64FX_SDPF_MIN_DIST_L1;
+ case 2:
+ return FIELD_GET(A64FX_SDPF_L2_DIST, reg) *
+ A64FX_SDPF_MIN_DIST_L2;
+ default:
+ return -EINVAL;
+ }
+}
+
+static int a64fx_modify_sdpf_dist(u64 *reg, unsigned int level, int val)
+{
+ switch (level) {
+ case 1:
+ val = roundup(val, A64FX_SDPF_MIN_DIST_L1) /
+ A64FX_SDPF_MIN_DIST_L1;
+ if (!FIELD_FIT(A64FX_SDPF_L1_DIST, val))
+ return -EINVAL;
+ *reg &= ~A64FX_SDPF_L1_DIST;
+ *reg |= FIELD_PREP(A64FX_SDPF_L1_DIST, val);
+ break;
+ case 2:
+ val = roundup(val, A64FX_SDPF_MIN_DIST_L2) /
+ A64FX_SDPF_MIN_DIST_L2;
+ if (!FIELD_FIT(A64FX_SDPF_L2_DIST, val))
+ return -EINVAL;
+ *reg &= ~A64FX_SDPF_L2_DIST;
+ *reg |= FIELD_PREP(A64FX_SDPF_L2_DIST, val);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int a64fx_get_sdpf_strong(u64 reg, unsigned int level)
+{
+ switch (level) {
+ case 1:
+ return FIELD_GET(A64FX_SDPF_L1W, reg);
+ case 2:
+ return FIELD_GET(A64FX_SDPF_L2W, reg);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int a64fx_modify_sdpf_strong(u64 *reg, unsigned int level, int val)
+{
+ switch (level) {
+ case 1:
+ *reg &= ~A64FX_SDPF_L1W;
+ *reg |= FIELD_PREP(A64FX_SDPF_L1W, val);
+ break;
+ case 2:
+ *reg &= ~A64FX_SDPF_L2W;
+ *reg |= FIELD_PREP(A64FX_SDPF_L2W, val);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void a64fx_enable_sdpf_verify(u64 *reg)
+{
+ *reg &= ~A64FX_SDPF_V;
+ *reg |= FIELD_PREP(A64FX_SDPF_V, 1);
+}
+
+static int a64fx_get_sdpf_params(struct prefetcher_options *opts, u64 reg,
+ unsigned int level)
+{
+ int ret;
+
+ ret = a64fx_get_sdpf_enable(reg, level);
+ if (ret < 0)
+ return ret;
+ opts->sdpf_enable = ret;
+
+ ret = a64fx_get_sdpf_dist(reg, level);
+ if (ret < 0)
+ return ret;
+ opts->sdpf_dist = ret;
+
+ ret = a64fx_get_sdpf_strong(reg, level);
+ if (ret < 0)
+ return ret;
+ opts->sdpf_strong = ret;
+
+ return 0;
+}
+
+static int a64fx_modify_pfreg_val(u64 *reg, struct prefetcher_options *opts,
+ unsigned int level)
+{
+ int ret;
+
+ if (opts->sdpf_enable != PFCTL_PARAM_UNSET) {
+ ret = a64fx_modify_sdpf_enable(reg, level, opts->sdpf_enable);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (opts->sdpf_dist != PFCTL_PARAM_UNSET) {
+ ret = a64fx_modify_sdpf_dist(reg, level, opts->sdpf_dist);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (opts->sdpf_strong != PFCTL_PARAM_UNSET) {
+ ret = a64fx_modify_sdpf_strong(reg, level, opts->sdpf_strong);
+ if (ret < 0)
+ return ret;
+ }
+
+ a64fx_enable_sdpf_verify(reg);
+
+ return 0;
+}
+
+static void _a64fx_read_pfreg(void *info)
+{
+ u64 reg;
+ struct a64fx_read_info *rinfo = info;
+
+ reg = read_sysreg_s(A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+ rinfo->ret = a64fx_get_sdpf_params(rinfo->opts, reg, rinfo->level);
+}
+
+static int a64fx_read_pfreg(unsigned int cpu, unsigned int level,
+ struct prefetcher_options *opt)
+{
+ struct a64fx_read_info info = {
+ .level = level,
+ .opts = opt,
+ };
+
+ smp_call_function_single(cpu, _a64fx_read_pfreg, &info, true);
+ return info.ret;
+}
+
+static void _a64fx_write_pfreg(void *info)
+{
+ int ret;
+ u64 reg;
+ struct a64fx_write_info *winfo = info;
+
+ reg = read_sysreg_s(A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+ ret = a64fx_modify_pfreg_val(®, winfo->opts, winfo->level);
+ if (ret < 0) {
+ winfo->ret = ret;
+ return;
+ }
+
+ write_sysreg_s(reg, A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+ winfo->ret = 0;
+}
+
+static int a64fx_write_pfreg(unsigned int cpu, unsigned int level,
+ struct prefetcher_options *opt)
+{
+ struct a64fx_write_info info = {
+ .level = level,
+ .opts = opt,
+ };
+
+ smp_call_function_single(cpu, _a64fx_write_pfreg, &info, true);
+ return info.ret;
+}
+
+/***** end of FUJITSU A64FX support *****/
+
+/*
+ * This driver returns a negative value if it does not support the Hardware
+ * Prefetch Control or if it is running on a VM guest.
+ */
+static int __init setup_pfctl_driver_params(void)
+{
+ unsigned long implementor = read_cpuid_implementor();
+ unsigned long part_number = read_cpuid_part_number();
+
+ if (!is_kernel_in_hyp_mode())
+ return -EINVAL;
+
+ switch (implementor) {
+ case ARM_CPU_IMP_FUJITSU:
+ switch (part_number) {
+ case FUJITSU_CPU_PART_A64FX:
+ /* A64FX register requires EL2 access */
+ if (!has_vhe())
+ return -EINVAL;
+
+ arm64_pfctl_driver.supported_l1d_prefetcher = SDPF;
+ arm64_pfctl_driver.supported_l2_prefetcher = SDPF;
+ arm64_pfctl_driver.read_pfreg = a64fx_read_pfreg;
+ arm64_pfctl_driver.write_pfreg = a64fx_write_pfreg;
+ break;
+ default:
+ return -ENODEV;
+ }
+ break;
+ default:
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
+static int __init arm64_pfctl_init(void)
+{
+ int ret;
+
+ ret = setup_pfctl_driver_params();
+ if (ret < 0)
+ return ret;
+
+ ret = pfctl_register_driver(&arm64_pfctl_driver);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void __exit arm64_pfctl_exit(void)
+{
+ pfctl_unregister_driver(&arm64_pfctl_driver);
+}
+
+late_initcall(arm64_pfctl_init);
+module_exit(arm64_pfctl_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("FUJITSU LIMITED");
+MODULE_DESCRIPTION("ARM64 Prefetch Control Driver");
--
2.27.0
This describes the sysfs interface implemented on the hardware prefetch
control driver.
Signed-off-by: Kohei Tarumizu <[email protected]>
---
.../ABI/testing/sysfs-devices-system-cpu | 93 +++++++++++++++++++
1 file changed, 93 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 61f5676a7429..66b4023b2ed1 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -681,3 +681,96 @@ Description:
(RO) the list of CPUs that are isolated and don't
participate in load balancing. These CPUs are set by
boot parameter "isolcpus=".
+
+What: /sys/devices/system/cpu/cpu*/cache/index[0,2]/prefetch_control
+Date: January 2022
+Contact: Linux kernel mailing list <[email protected]>
+Description: Parameters for CPU's hardware prefetch control
+
+ This sysfs interface provides Hardware Prefetch control
+ attribute file by using implementation defined registers.
+ This file exists in every CPU's cache/index[0,2] directory.
+ Each attribute file corresponds to the cache level of the
+ parent index directory.
+
+ prefetch_control: (RW) This file allows user to control
+ several options described below. Which options are available
+ depends on the CPU.
+
+ * hardware_prefetcher_enable:
+ The enablement status of prefetcher, "enable" or "disable".
+
+ * ip_prefetcher_enable:
+ The enablement status of prefetcher, "enable" or "disable".
+
+ * adjacent_cache_line_prefetcher_enable:
+ The enablement status of prefetcher, "enable" or "disable".
+
+ * stream_detect_prefetcher_enable:
+ The enablement status of prefetcher, "enable" or "disable".
+
+ * stream_detect_prefetcher_strong:
+ The strongness status of prefetcher, "strong" or "weak".
+
+ * stream_detect_prefetcher_dist:
+ The current prefetcher distance value in bytes or the "auto".
+ This value is a multiples of a specific value, depending on
+ the CPU.
+
+ Write either a value in byte or the string "auto" to this
+ parameter. If you write a value less than multiples of a
+ specific value, it is rounded up.
+
+ - Supported processors
+
+ This sysfs interface is available on several processors, x86
+ and ARM64. Currently, the following processors are supported:
+
+ - x86 processor
+ - INTEL_FAM6_BROADWELL_X
+
+ - ARM64 processor
+ - FUJITSU_CPU_PART_A64FX
+
+ - Attribute mapping
+
+ Some Intel processors have MSR 0x1a4. This register has several
+ specifications depending on the model. This interface provides
+ a one-to-one option to control all the tunable parameters the
+ CPU provides of the following.
+
+ - "* Hardware Prefetcher Disable (R/W)"
+ corresponds to the attribute "hardware_prefetcher_enable"
+
+ - "* Adjacent Cache Line Prefetcher Disable (R/W)"
+ corresponds to the attribute "adjacent_cache_line_prefetcher_enable"
+
+ - "* IP Prefetcher Disable (R/W)"
+ corresponds to the attribute "ip_prefetcher_enable"
+
+ The processor A64FX has register IMP_PF_STREAM_DETECT_CTRL_EL0
+ for Hardware Prefetch Control. This attribute maps each
+ specification to the following.
+
+ - "L*PF_DIS": enablement of hardware prefetcher
+ corresponds to the attribute "stream_detect_prefetcher_enable"
+
+ - "L*W": strongness of hardware prefetcher
+ corresponds to the attribute "stream_detect_prefetcher_strong"
+
+ - "L*_DIST": distance of hardware prefetcher
+ corresponds to the attribute "stream_detect_prefetcher_dist"
+
+ - Example::
+
+ # cat /sys/devices/system/cpu/cpu0/cache/index0/prefetch_control
+
+ > hardware_prefetcher_enable=enable
+ > ip_prefetcher_enable=enable
+
+ # echo "hardware_prefetcher_enable=disable" > /sys/devices/system/cpu/cpu0/cache/index0/prefetch_control
+
+ # cat /sys/devices/system/cpu/cpu0/cache/index2/prefetch_control
+
+ > hardware_prefetcher_enable=disable
+ > ip_prefetcher_enable=enable
--
2.27.0
This adds Kconfig/Makefile to build hardware prefetch control driver
for x86 support. This also adds a MAINTAINERS entry.
Signed-off-by: Kohei Tarumizu <[email protected]>
---
MAINTAINERS | 1 +
arch/x86/Kconfig | 7 +++++++
arch/x86/kernel/cpu/Makefile | 2 ++
3 files changed, 10 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 0eaee76438e9..ea049bddc4e6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8448,6 +8448,7 @@ HARDWARE PREFETCH CONTROL DRIVERS
M: Kohei Tarumizu <[email protected]>
S: Maintained
F: arch/arm64/kernel/pfctl.c
+F: arch/x86/kernel/pfctl.c
F: drivers/base/pfctl.c
F: include/linux/pfctl.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ebe8fc76949a..069aee252ba3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -26,6 +26,7 @@ config X86_64
depends on 64BIT
# Options that are inherently 64-bit kernel only:
select ARCH_HAS_GIGANTIC_PAGE
+ select ARCH_HAS_HWPF_CONTROL
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_USE_CMPXCHG_LOCKREF
select HAVE_ARCH_SOFT_DIRTY
@@ -1377,6 +1378,12 @@ config X86_CPUID
with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
/dev/cpu/31/cpuid.
+config X86_HWPF_CONTROL
+ tristate "x86 Hardware Prefetch Control support"
+ depends on HWPF_CONTROL
+ help
+ This adds Hardware Prefetch Control driver support for X86.
+
choice
prompt "High Memory Support"
default HIGHMEM4G
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 9661e3e802be..aec62a6b37d2 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -56,6 +56,8 @@ obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o
obj-$(CONFIG_HYPERVISOR_GUEST) += vmware.o hypervisor.o mshyperv.o
obj-$(CONFIG_ACRN_GUEST) += acrn.o
+obj-$(CONFIG_X86_HWPF_CONTROL) += pfctl.o
+
ifdef CONFIG_X86_FEATURE_NAMES
quiet_cmd_mkcapflags = MKCAP $@
cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $@ $^
--
2.27.0
This patch will create a cache sysfs directory without ACPI PPTT if
the CONFIG_HWPF_CONTROL is true.
Hardware prefetch control driver need cache sysfs directory and cache
level/type information. In ARM processor, these information can be
obtained from the register even without PPTT. Therefore, we set the
cpu_map_populated to true to create cache sysfs directory if the
machine doesn't have PPTT.
Signed-off-by: Kohei Tarumizu <[email protected]>
---
arch/arm64/kernel/cacheinfo.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 587543c6c51c..039ec32d0b3d 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -43,6 +43,21 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
this_leaf->type = type;
}
+#if defined(CONFIG_HWPF_CONTROL)
+static bool acpi_has_pptt(void)
+{
+ struct acpi_table_header *table;
+ acpi_status status;
+
+ status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+ if (ACPI_FAILURE(status))
+ return false;
+
+ acpi_put_table(table);
+ return true;
+}
+#endif
+
int init_cache_level(unsigned int cpu)
{
unsigned int ctype, level, leaves, fw_level;
@@ -95,5 +110,19 @@ int populate_cache_leaves(unsigned int cpu)
ci_leaf_init(this_leaf++, type, level);
}
}
+
+#if defined(CONFIG_HWPF_CONTROL)
+ /*
+ * Hardware prefetch functions need cache sysfs directory and cache
+ * level/type information. In ARM processor, these information can be
+ * obtained from registers even without PPTT. Therefore, we set the
+ * cpu_map_populated to true to create cache sysfs directory, if the
+ * machine doesn't have PPTT.
+ **/
+ if (!acpi_disabled)
+ if (!acpi_has_pptt())
+ this_cpu_ci->cpu_map_populated = true;
+#endif
+
return 0;
}
--
2.27.0
This adds module init/exit code, and creates sysfs attribute file
"prefetch_control" for x86. This driver works only if the model is
INTEL_FAM6_BROADWELL_X at this point.
If you would like to support a new model with the same register
specifications as INTEL_FAM6_BROADWELL_X, it is possible to add the
model settings to array of broadwell_cpu_ids[].
The details of the registers to be read and written in this patch are
described below:
"https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html"
Volume 4
Signed-off-by: Kohei Tarumizu <[email protected]>
---
arch/x86/kernel/cpu/pfctl.c | 292 ++++++++++++++++++++++++++++++++++++
1 file changed, 292 insertions(+)
create mode 100644 arch/x86/kernel/cpu/pfctl.c
diff --git a/arch/x86/kernel/cpu/pfctl.c b/arch/x86/kernel/cpu/pfctl.c
new file mode 100644
index 000000000000..02628f6d2c05
--- /dev/null
+++ b/arch/x86/kernel/cpu/pfctl.c
@@ -0,0 +1,292 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022 FUJITSU LIMITED
+ *
+ * x86 Hardware Prefetch Control support
+ */
+
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/pfctl.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <asm/cpu_device_id.h>
+#include <asm/intel-family.h>
+#include <asm/msr.h>
+
+struct pfctl_driver x86_pfctl_driver;
+
+/**************************************
+ * Intle BROADWELL support
+ **************************************/
+
+/*
+ * The register specification for each bits of Intel BROADWELL is as
+ * follow:
+ *
+ * [0] L2 Hardware Prefetcher Disable (R/W)
+ * [1] L2 Adjacent Cache Line Prefetcher Disable (R/W)
+ * [2] DCU Hardware Prefetcher Disable (R/W)
+ * [3] DCU IP Prefetcher Disable (R/W)
+ * [63:4] Reserved
+ *
+ * See "Intel 64 and IA-32 Architectures Software Developer's Manual"
+ * (https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html)
+ * for register specification details.
+ */
+#define BROADWELL_L2_HWPF_FIELD BIT_ULL(0)
+#define BROADWELL_L2_ACLPF_FIELD BIT_ULL(1)
+#define BROADWELL_DCU_HWPF_FIELD BIT_ULL(2)
+#define BROADWELL_DCU_IPPF_FIELD BIT_ULL(3)
+
+static int broadwell_get_hwpf_enable(u64 reg, unsigned int level)
+{
+ switch (level) {
+ case 1:
+ return FIELD_GET(BROADWELL_DCU_HWPF_FIELD, reg);
+ case 2:
+ return FIELD_GET(BROADWELL_L2_HWPF_FIELD, reg);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int broadwell_modify_hwpf_enable(u64 *reg, unsigned int level,
+ unsigned int val)
+{
+ switch (level) {
+ case 1:
+ *reg &= ~BROADWELL_DCU_HWPF_FIELD;
+ *reg |= FIELD_PREP(BROADWELL_DCU_HWPF_FIELD, val);
+ break;
+ case 2:
+ *reg &= ~BROADWELL_L2_HWPF_FIELD;
+ *reg |= FIELD_PREP(BROADWELL_L2_HWPF_FIELD, val);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int broadwell_get_ippf_enable(u64 reg, unsigned int level)
+{
+ switch (level) {
+ case 1:
+ return FIELD_GET(BROADWELL_DCU_IPPF_FIELD, reg);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int broadwell_modify_ippf_enable(u64 *reg, unsigned int level,
+ unsigned int val)
+{
+ switch (level) {
+ case 1:
+ *reg &= ~BROADWELL_DCU_IPPF_FIELD;
+ *reg |= FIELD_PREP(BROADWELL_DCU_IPPF_FIELD, val);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int broadwell_get_aclpf_enable(u64 reg, unsigned int level)
+{
+ switch (level) {
+ case 2:
+ return FIELD_GET(BROADWELL_L2_ACLPF_FIELD, reg);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int broadwell_modify_aclpf_enable(u64 *reg, unsigned int level,
+ unsigned int val)
+{
+ switch (level) {
+ case 2:
+ *reg &= ~BROADWELL_L2_ACLPF_FIELD;
+ *reg |= FIELD_PREP(BROADWELL_L2_ACLPF_FIELD, val);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int _broadwell_get_pfctl_params(struct prefetcher_options *opts, u64 reg,
+ unsigned int level, int supported_prefetcher)
+{
+ int ret;
+
+ if (supported_prefetcher & HWPF) {
+ ret = broadwell_get_hwpf_enable(reg, level);
+ if (ret < 0)
+ return ret;
+ opts->hwpf_enable = ret;
+ }
+
+ if (supported_prefetcher & IPPF) {
+ ret = broadwell_get_ippf_enable(reg, level);
+ if (ret < 0)
+ return ret;
+ opts->ippf_enable = ret;
+ }
+
+ if (supported_prefetcher & ACLPF) {
+ ret = broadwell_get_aclpf_enable(reg, level);
+ if (ret < 0)
+ return ret;
+ opts->aclpf_enable = ret;
+ }
+
+ return 0;
+}
+
+static int broadwell_get_pfctl_params(struct prefetcher_options *opts, u64 reg,
+ unsigned int level)
+{
+ int ret, supported_prefetcher;
+
+ if (level == 1)
+ supported_prefetcher =
+ x86_pfctl_driver.supported_l1d_prefetcher;
+ else if (level == 2)
+ supported_prefetcher =
+ x86_pfctl_driver.supported_l2_prefetcher;
+ else
+ return -EINVAL;
+
+ ret = _broadwell_get_pfctl_params(opts, reg, level, supported_prefetcher);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static int broadwell_modify_pfreg(u64 *reg, struct prefetcher_options *opts,
+ unsigned int level)
+{
+ int ret;
+
+ if (opts->hwpf_enable != PFCTL_PARAM_UNSET) {
+ ret = broadwell_modify_hwpf_enable(reg, level,
+ opts->hwpf_enable);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (opts->ippf_enable != PFCTL_PARAM_UNSET) {
+ ret = broadwell_modify_ippf_enable(reg, level,
+ opts->ippf_enable);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (opts->aclpf_enable != PFCTL_PARAM_UNSET) {
+ ret = broadwell_modify_aclpf_enable(reg, level,
+ opts->aclpf_enable);
+ if (ret < 0)
+ return ret;
+ }
+
+ return 0;
+}
+
+static int broadwell_read_pfreg(unsigned int cpu, unsigned int level,
+ struct prefetcher_options *opts)
+{
+ int ret;
+ u64 reg;
+
+ ret = rdmsrl_on_cpu(cpu, MSR_MISC_FEATURE_CONTROL, ®);
+ if (ret)
+ return ret;
+
+ ret = broadwell_get_pfctl_params(opts, reg, level);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int broadwell_write_pfreg(unsigned int cpu, unsigned int level,
+ struct prefetcher_options *opts)
+{
+ int ret;
+ u64 reg;
+
+ ret = rdmsrl_on_cpu(cpu, MSR_MISC_FEATURE_CONTROL, ®);
+ if (ret)
+ return ret;
+
+ ret = broadwell_modify_pfreg(®, opts, level);
+ if (ret < 0)
+ return ret;
+
+ ret = wrmsrl_on_cpu(cpu, MSR_MISC_FEATURE_CONTROL, reg);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+/*
+ * In addition to BROADWELL_X, NEHALEM and others have same register
+ * specifications as those represented by BROADWELL_XXX_FIELD.
+ * If you want to add support for these processor, add the new target model
+ * here.
+ */
+static const struct x86_cpu_id broadwell_cpu_ids[] = {
+ X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, NULL),
+ {}
+};
+
+/***** end of Intel BROADWELL support *****/
+
+static int __init setup_pfctl_driver_params(void)
+{
+ if (x86_match_cpu(broadwell_cpu_ids)) {
+ x86_pfctl_driver.supported_l1d_prefetcher = HWPF|IPPF;
+ x86_pfctl_driver.supported_l2_prefetcher = HWPF|ACLPF;
+ x86_pfctl_driver.read_pfreg = broadwell_read_pfreg;
+ x86_pfctl_driver.write_pfreg = broadwell_write_pfreg;
+ } else {
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
+static int __init x86_pfctl_init(void)
+{
+ int ret;
+
+ ret = setup_pfctl_driver_params();
+ if (ret < 0)
+ return ret;
+
+ ret = pfctl_register_driver(&x86_pfctl_driver);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void __exit x86_pfctl_exit(void)
+{
+ pfctl_unregister_driver(&x86_pfctl_driver);
+}
+
+late_initcall(x86_pfctl_init);
+module_exit(x86_pfctl_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("FUJITSU LIMITED");
+MODULE_DESCRIPTION("x86 Hardware Prefetch Control Driver");
--
2.27.0
On 1/24/22 23:14, Kohei Tarumizu wrote:
> # cat /sys/devices/system/cpu/cpu1/cache/index2/prefetch_control
>
> hardware_prefetcher_enable=enable
> adjacent_cache_line_prefetcher_enable=disable
Doesn't this break the one-value-per-file sysfs rules?
> Doesn't this break the one-value-per-file sysfs rules?
Sorry, I forgot the sysfs rules.
The current interface specification was decided referring to the
specification of /sys/class/rnbd-client/ctl/map_device. However,
I thought that the current specification of preftch_control was
inappropriate because it relevant to "Mixing types, expressing
multiple lines of data" in Documentation/filesystems/sysfs.rst.
The next version patch divides the attribute file by option, as
described below.
* Current version interface
```
/sys/devices/system/cpu/cpu0/cache/index0/
prefetch_control (attribute file)
```
* Next version interface look like this
```
/sys/devices/system/cpu/cpu0/cache/index0/prefetch_control/
hardware_prefetcher_enable (attribute file)
ip_prefetcher_enable (attribute file)
```
On Tue, Jan 25, 2022 at 04:14:11PM +0900, Kohei Tarumizu wrote:
> This patch will create a cache sysfs directory without ACPI PPTT if
> the CONFIG_HWPF_CONTROL is true.
>
> Hardware prefetch control driver need cache sysfs directory and cache
> level/type information. In ARM processor, these information can be
> obtained from the register even without PPTT. Therefore, we set the
> cpu_map_populated to true to create cache sysfs directory if the
> machine doesn't have PPTT.
I am assuming this is ACPI enabled system.
This looks bit hacky in my opinion. Before I explore better way of adding it,
I would like to check if you have explored ways to add PPTT reading these
registers from UEFI/EDK2 as PPTT has other topology information which you will
need anyways. That would simplify handling of these cacheinfo sysfs in the
kernel. Let me know what are your thoughts ?
--
Regards,
Sudeep
Hi--
On 1/24/22 23:14, Kohei Tarumizu wrote:
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 6978140edfa4..c2256dbb0243 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> def_bool y
> depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_SYSREG
>
> +config ARM64_HWPF_CONTROL
> + tristate "ARM64 Hardware Prefetch Control support"
> + depends on HWPF_CONTROL
> + default m
Don't enable random drivers unless they are required for booting etc.
So can you justify having this driver enabled by default?
I see that the X86 driver is not enabled by default.
> + help
> + This adds Hardware Prefetch Control driver support for ARM64.
thanks.
--
~Randy
> Don't enable random drivers unless they are required for booting etc.
> So can you justify having this driver enabled by default?
>
> I see that the X86 driver is not enabled by default.
In the next version of the patch, I remove the description of
"default m" as in x86 driver.
> I am assuming this is ACPI enabled system.
Yes, it is ACPI enabled system.
> This looks bit hacky in my opinion. Before I explore better way of adding it, I would
> like to check if you have explored ways to add PPTT reading these registers from
> UEFI/EDK2 as PPTT has other topology information which you will need anyways.
> That would simplify handling of these cacheinfo sysfs in the kernel. Let me know
> what are your thoughts ?
The latest firmware of ARM64 machine, FX700 with the A64FX processor
does not support PPTT.
I think adding PPTT is the best way to generate cacheinfo sysfs.
However, it is difficult to modify the firmware to add PPTT, so
it is not clear when it will be possible.
Therefore, I would like to implement the function in the kernel on
the condition that firmware does not support PPTT.
On Thu, Jul 07, 2022 at 09:37:57AM -0500, Jeremy Linton wrote:
> Hi,
>
> On 2/1/22 05:56, [email protected] wrote:
> > > I am assuming this is ACPI enabled system.
> >
> > Yes, it is ACPI enabled system.
> >
> > > This looks bit hacky in my opinion. Before I explore better way of adding it, I would
> > > like to check if you have explored ways to add PPTT reading these registers from
> > > UEFI/EDK2 as PPTT has other topology information which you will need anyways.
> > > That would simplify handling of these cacheinfo sysfs in the kernel. Let me know
> > > what are your thoughts ?
> >
> > The latest firmware of ARM64 machine, FX700 with the A64FX processor
> > does not support PPTT.
> > I think adding PPTT is the best way to generate cacheinfo sysfs.
> > However, it is difficult to modify the firmware to add PPTT, so
> > it is not clear when it will be possible.
> > Therefore, I would like to implement the function in the kernel on
> > the condition that firmware does not support PPTT.
>
> As a bit of a late comment here, I assume you tried injecting the PPTT via
> the initrd (directions in admin-guide/acpi/initrd_table_override.txt) then?
> That is one of the usual kernel workarounds for broken/missing ACPI tables.
>
> As mentioned above, besides not providing appropriate topology information
> to userspace, not having the PPTT is also possibly causing suboptimal
> scheduling decisions in the kernel itself.
>
Thanks a lot for the suggestion Jeremy. For some reason, I missed to follow
up on this after my initial response. Anyways I agree injecting PPTT via
initrd is a good compromise on systems that are shipped without or broken
PPTT. I have tested that to be fully functional on v5.19-rc* on one of
the servers shipped with broken PPTT.
--
Regards,
Sudeep
Hi,
On 2/1/22 05:56, [email protected] wrote:
>> I am assuming this is ACPI enabled system.
>
> Yes, it is ACPI enabled system.
>
>> This looks bit hacky in my opinion. Before I explore better way of adding it, I would
>> like to check if you have explored ways to add PPTT reading these registers from
>> UEFI/EDK2 as PPTT has other topology information which you will need anyways.
>> That would simplify handling of these cacheinfo sysfs in the kernel. Let me know
>> what are your thoughts ?
>
> The latest firmware of ARM64 machine, FX700 with the A64FX processor
> does not support PPTT.
> I think adding PPTT is the best way to generate cacheinfo sysfs.
> However, it is difficult to modify the firmware to add PPTT, so
> it is not clear when it will be possible.
> Therefore, I would like to implement the function in the kernel on
> the condition that firmware does not support PPTT.
As a bit of a late comment here, I assume you tried injecting the PPTT
via the initrd (directions in
admin-guide/acpi/initrd_table_override.txt) then? That is one of the
usual kernel workarounds for broken/missing ACPI tables.
As mentioned above, besides not providing appropriate topology
information to userspace, not having the PPTT is also possibly causing
suboptimal scheduling decisions in the kernel itself.
Hi,
Thanks for the comment.
> > > > I am assuming this is ACPI enabled system.
> > >
> > > Yes, it is ACPI enabled system.
> > >
> > > > This looks bit hacky in my opinion. Before I explore better way of
> > > > adding it, I would like to check if you have explored ways to add
> > > > PPTT reading these registers from
> > > > UEFI/EDK2 as PPTT has other topology information which you will need
> anyways.
> > > > That would simplify handling of these cacheinfo sysfs in the
> > > > kernel. Let me know what are your thoughts ?
> > >
> > > The latest firmware of ARM64 machine, FX700 with the A64FX processor
> > > does not support PPTT.
> > > I think adding PPTT is the best way to generate cacheinfo sysfs.
> > > However, it is difficult to modify the firmware to add PPTT, so it
> > > is not clear when it will be possible.
> > > Therefore, I would like to implement the function in the kernel on
> > > the condition that firmware does not support PPTT.
> >
> > As a bit of a late comment here, I assume you tried injecting the PPTT
> > via the initrd (directions in admin-guide/acpi/initrd_table_override.txt) then?
> > That is one of the usual kernel workarounds for broken/missing ACPI tables.
> >
> > As mentioned above, besides not providing appropriate topology
> > information to userspace, not having the PPTT is also possibly causing
> > suboptimal scheduling decisions in the kernel itself.
> >
>
> Thanks a lot for the suggestion Jeremy. For some reason, I missed to follow up on
> this after my initial response. Anyways I agree injecting PPTT via initrd is a good
> compromise on systems that are shipped without or broken PPTT. I have tested
> that to be fully functional on v5.19-rc* on one of the servers shipped with broken
> PPTT.
I will accept this proposal and try to test injecting PPTT via initrd
on my machine. If there are no problems, I remove this patch in the
next version.