From: Shiju Jose <[email protected]>
1. Add support for CXL feature mailbox commands.
2. Add CXL device scrub driver supporting patrol scrub control and ECS
control features.
3. Add scrub subsystem driver supports configuring memory scrubs in the system.
4. Register CXL device patrol scrub and ECS with scrub subsystem.
5. Add common library for RASF and RAS2 PCC interfaces.
6. Add driver for ACPI RAS2 feature table (RAS2).
7. Add memory RAS2 driver and register with scrub subsystem.
The QEMU series to support the CXL specific features is available here,
https://lore.kernel.org/qemu-devel/[email protected]/T/#t
Changes
v5 -> v6:
1. Changes for comments from Davidlohr, Thanks.
- Update CXL feature code based on spec 3.1.
- Rename attrb -> attr
- Use enums with default counting.
2. Rebased to the recent kernel.
v4 -> v5:
1. Following are the main changes made based on the feedback from Dan Williams on v4.
1.1. In the scrub subsystem the common scrub control attributes are statically defined
instead of dynamically created.
1.2. Add scrub subsystem support externally defined attribute group.
Add CXL ECS driver define ECS specific attribute group and pass to
the scrub subsystem.
1.3. Move cxl_mem_ecs_init() to cxl/core/region.c so that the CXL region_id
is used in the registration with the scrub subsystem.
1.4. Add previously posted RASF common and RAS2 patches to this scrub series.
2. Add support for the 'enable_background_scrub' attribute
for RAS2, on request from Bill Schwartz([email protected]).
v3 -> v4:
1. Fixes for the warnings/errors reported by kernel test robot.
2. Add support for reading the 'enable' attribute of CXL patrol scrub.
Changes
v2 -> v3:
1. Changes for comments from Davidlohr, Thanks.
- Updated cxl scrub kconfig
- removed usage of the flag is_support_feature from
the function cxl_mem_get_supported_feature_entry().
- corrected spelling error.
- removed unnecessary debug message.
- removed export feature commands to the userspace.
2. Possible fix for the warnings/errors reported by kernel
test robot.
3. Add documentation for the common scrub configure atrributes.
v1 -> v2:
1. Changes for comments from Dave Jiang, Thanks.
- Split patches.
- reversed xmas tree declarations.
- declared flags as enums.
- removed few unnecessary variable initializations.
- replaced PTR_ERR_OR_ZERO() with IS_ERR() and PTR_ERR().
- add auto clean declarations.
- replaced while loop with for loop.
- Removed allocation from cxl_get_supported_features() and
cxl_get_feature() and make change to take allocated memory
pointer from the caller.
- replaced if/else with switch case.
- replaced sprintf() with sysfs_emit() in 2 places.
- replaced goto label with return in few functions.
2. removed unused code for supported attributes from ecs.
3. Included following common patch for scrub configure driver
to this series.
"memory: scrub: Add scrub driver supports configuring memory scrubbers
in the system"
A Somasundaram (1):
ACPI:RASF: Add common library for RASF and RAS2 PCC interfaces
Shiju Jose (11):
cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command
cxl/mbox: Add GET_FEATURE mailbox command
cxl/mbox: Add SET_FEATURE mailbox command
cxl/memscrub: Add CXL device patrol scrub control feature
cxl/memscrub: Add CXL device ECS control feature
memory: scrub: Add scrub subsystem driver supports configuring memory
scrubs in the system
cxl/memscrub: Register CXL device patrol scrub with scrub configure
driver
cxl/memscrub: Register CXL device ECS with scrub configure driver
ACPICA: ACPI 6.5: Add support for RAS2 table
ACPI:RAS2: Add driver for ACPI RAS2 feature table (RAS2)
memory: RAS2: Add memory RAS2 driver
.../ABI/testing/sysfs-class-scrub-configure | 91 ++
drivers/acpi/Kconfig | 15 +
drivers/acpi/Makefile | 1 +
drivers/acpi/ras2_acpi.c | 97 ++
drivers/acpi/rasf_acpi_common.c | 272 +++++
drivers/cxl/Kconfig | 23 +
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/mbox.c | 59 +
drivers/cxl/core/memscrub.c | 1009 +++++++++++++++++
drivers/cxl/core/region.c | 1 +
drivers/cxl/cxlmem.h | 123 ++
drivers/cxl/pci.c | 5 +
drivers/memory/Kconfig | 15 +
drivers/memory/Makefile | 3 +
drivers/memory/ras2.c | 354 ++++++
drivers/memory/rasf_common.c | 269 +++++
drivers/memory/scrub/Kconfig | 11 +
drivers/memory/scrub/Makefile | 6 +
drivers/memory/scrub/memory-scrub.c | 367 ++++++
include/acpi/actbl2.h | 137 +++
include/acpi/rasf_acpi.h | 58 +
include/memory/memory-scrub.h | 78 ++
include/memory/rasf.h | 88 ++
23 files changed, 3083 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
create mode 100755 drivers/acpi/ras2_acpi.c
create mode 100755 drivers/acpi/rasf_acpi_common.c
create mode 100644 drivers/cxl/core/memscrub.c
create mode 100644 drivers/memory/ras2.c
create mode 100644 drivers/memory/rasf_common.c
create mode 100644 drivers/memory/scrub/Kconfig
create mode 100644 drivers/memory/scrub/Makefile
create mode 100755 drivers/memory/scrub/memory-scrub.c
create mode 100644 include/acpi/rasf_acpi.h
create mode 100755 include/memory/memory-scrub.h
create mode 100755 include/memory/rasf.h
--
2.34.1
From: Shiju Jose <[email protected]>
CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control
feature. The device patrol scrub proactively locates and makes corrections
to errors in regular cycle. The patrol scrub control allows the request to
configure patrol scrub input configurations.
The patrol scrub control allows the requester to specify the number of
hours for which the patrol scrub cycles must be completed, provided that
the requested number is not less than the minimum number of hours for the
patrol scrub cycle that the device is capable of. In addition, the patrol
scrub controls allow the host to disable and enable the feature in case
disabling of the feature is needed for other purposes such as
performance-aware operations which require the background operations to be
turned off.
Signed-off-by: Shiju Jose <[email protected]>
---
drivers/cxl/Kconfig | 17 +++
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/memscrub.c | 266 ++++++++++++++++++++++++++++++++++++
drivers/cxl/cxlmem.h | 8 ++
drivers/cxl/pci.c | 5 +
5 files changed, 297 insertions(+)
create mode 100644 drivers/cxl/core/memscrub.c
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 67998dbd1d46..873bdda5db32 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -157,4 +157,21 @@ config CXL_PMU
monitoring units and provide standard perf based interfaces.
If unsure say 'm'.
+
+config CXL_SCRUB
+ bool "CXL: Memory scrub feature"
+ depends on CXL_PCI
+ depends on CXL_MEM
+ help
+ The CXL memory scrub control is an optional feature allows host to
+ control the scrub configurations of CXL Type 3 devices, which
+ support patrol scrub and/or DDR5 ECS(Error Check Scrub).
+
+ Say 'y/n' to enable/disable the CXL memory scrub driver that will
+ attach to CXL.mem devices for memory scrub control feature. See
+ sections 8.2.9.9.11.1 and 8.2.9.9.11.2 in the CXL 3.1 specification
+ for a detailed description of CXL memory scrub control features.
+
+ If unsure say 'n'.
+
endif
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 9259bcc6773c..e0fc814c3983 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -16,3 +16,4 @@ cxl_core-y += pmu.o
cxl_core-y += cdat.o
cxl_core-$(CONFIG_TRACING) += trace.o
cxl_core-$(CONFIG_CXL_REGION) += region.o
+cxl_core-$(CONFIG_CXL_SCRUB) += memscrub.o
diff --git a/drivers/cxl/core/memscrub.c b/drivers/cxl/core/memscrub.c
new file mode 100644
index 000000000000..be8d9a9743eb
--- /dev/null
+++ b/drivers/cxl/core/memscrub.c
@@ -0,0 +1,266 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * cxl_memscrub.c - CXL memory scrub driver
+ *
+ * Copyright (c) 2023 HiSilicon Limited.
+ *
+ * - Provides functions to configure patrol scrub
+ * feature of the CXL memory devices.
+ */
+
+#define pr_fmt(fmt) "CXL_MEM_SCRUB: " fmt
+
+#include <cxlmem.h>
+
+/* CXL memory scrub feature common definitions */
+#define CXL_SCRUB_MAX_ATTR_RANGE_LENGTH 128
+
+static int cxl_mem_get_supported_feature_entry(struct cxl_memdev *cxlmd, const uuid_t *feat_uuid,
+ struct cxl_mbox_supp_feat_entry *feat_entry_out)
+{
+ struct cxl_mbox_get_supp_feats_out *feats_out __free(kvfree) = NULL;
+ struct cxl_mbox_supp_feat_entry *feat_entry;
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+ struct cxl_mbox_get_supp_feats_in pi;
+ int feat_index, count;
+ int nentries;
+ int ret;
+
+ feat_index = 0;
+ pi.count = sizeof(struct cxl_mbox_get_supp_feats_out) +
+ sizeof(struct cxl_mbox_supp_feat_entry);
+ feats_out = kvmalloc(pi.count, GFP_KERNEL);
+ if (!feats_out)
+ return -ENOMEM;
+
+ do {
+ pi.start_index = feat_index;
+ memset(feats_out, 0, pi.count);
+ ret = cxl_get_supported_features(mds, &pi, feats_out);
+ if (ret)
+ return ret;
+
+ nentries = feats_out->entries;
+ if (!nentries)
+ break;
+
+ /* Check CXL memdev supports the feature */
+ feat_entry = (void *)feats_out->feat_entries;
+ for (count = 0; count < nentries; count++, feat_entry++) {
+ if (uuid_equal(&feat_entry->uuid, feat_uuid)) {
+ memcpy(feat_entry_out, feat_entry, sizeof(*feat_entry_out));
+ return 0;
+ }
+ }
+ feat_index += nentries;
+ } while (nentries);
+
+ return -ENOTSUPP;
+}
+
+/* CXL memory patrol scrub control definitions */
+#define CXL_MEMDEV_PS_GET_FEAT_VERSION 0x01
+#define CXL_MEMDEV_PS_SET_FEAT_VERSION 0x01
+
+static const uuid_t cxl_patrol_scrub_uuid =
+ UUID_INIT(0x96dad7d6, 0xfde8, 0x482b, 0xa7, 0x33, 0x75, 0x77, 0x4e, \
+ 0x06, 0xdb, 0x8a);
+
+/* CXL memory patrol scrub control functions */
+struct cxl_patrol_scrub_context {
+ struct device *dev;
+ u16 get_feat_size;
+ u16 set_feat_size;
+ bool scrub_cycle_changeable;
+};
+
+/**
+ * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data structure.
+ * @enable: [IN] enable(1)/disable(0) patrol scrub.
+ * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is changeable.
+ * @rate: [IN] Requested patrol scrub cycle in hours.
+ * [OUT] Current patrol scrub cycle in hours.
+ * @min_rate:[OUT] minimum patrol scrub cycle, in hours, supported.
+ * @rate_avail:[OUT] Supported patrol scrub cycle in hours.
+ */
+struct cxl_memdev_ps_params {
+ bool enable;
+ bool scrub_cycle_changeable;
+ u16 rate;
+ u16 min_rate;
+ char rate_avail[CXL_SCRUB_MAX_ATTR_RANGE_LENGTH];
+};
+
+enum {
+ CXL_MEMDEV_PS_PARAM_ENABLE,
+ CXL_MEMDEV_PS_PARAM_RATE,
+};
+
+#define CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK BIT(0)
+#define CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK BIT(1)
+#define CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK GENMASK(7, 0)
+#define CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK GENMASK(15, 8)
+#define CXL_MEMDEV_PS_FLAG_ENABLED_MASK BIT(0)
+
+struct cxl_memdev_ps_feat_read_attrs {
+ u8 scrub_cycle_cap;
+ __le16 scrub_cycle;
+ u8 scrub_flags;
+} __packed;
+
+struct cxl_memdev_ps_set_feat_pi {
+ struct cxl_mbox_set_feat_in pi;
+ u8 scrub_cycle_hr;
+ u8 scrub_flags;
+} __packed;
+
+static int cxl_mem_ps_get_attrs(struct device *dev,
+ struct cxl_memdev_ps_params *params)
+{
+ struct cxl_memdev_ps_feat_read_attrs *rd_attrs __free(kvfree) = NULL;
+ struct cxl_mbox_get_feat_in pi = {
+ .uuid = cxl_patrol_scrub_uuid,
+ .offset = 0,
+ .count = sizeof(struct cxl_memdev_ps_feat_read_attrs),
+ .selection = CXL_GET_FEAT_SEL_CURRENT_VALUE,
+ };
+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+ int ret;
+
+ if (!mds)
+ return -EFAULT;
+
+ rd_attrs = kvmalloc(pi.count, GFP_KERNEL);
+ if (!rd_attrs)
+ return -ENOMEM;
+
+ ret = cxl_get_feature(mds, &pi, rd_attrs);
+ if (ret) {
+ params->enable = 0;
+ params->rate = 0;
+ snprintf(params->rate_avail, CXL_SCRUB_MAX_ATTR_RANGE_LENGTH,
+ "Unavailable");
+ return ret;
+ }
+ params->scrub_cycle_changeable = FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK,
+ rd_attrs->scrub_cycle_cap);
+ params->enable = FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+ rd_attrs->scrub_flags);
+ params->rate = FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
+ rd_attrs->scrub_cycle);
+ params->min_rate = FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK,
+ rd_attrs->scrub_cycle);
+ snprintf(params->rate_avail, CXL_SCRUB_MAX_ATTR_RANGE_LENGTH,
+ "Minimum scrub cycle = %d hour", params->min_rate);
+
+ return 0;
+}
+
+static int __maybe_unused
+cxl_mem_ps_set_attrs(struct device *dev, struct cxl_memdev_ps_params *params,
+ u8 param_type)
+{
+ struct cxl_memdev_ps_set_feat_pi set_pi = {
+ .pi.uuid = cxl_patrol_scrub_uuid,
+ .pi.flags = CXL_SET_FEAT_FLAG_MOD_VALUE_SAVED_ACROSS_RESET |
+ CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER,
+ .pi.offset = 0,
+ .pi.version = CXL_MEMDEV_PS_SET_FEAT_VERSION,
+ };
+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+ struct cxl_memdev_ps_params rd_params;
+ int ret;
+
+ if (!mds)
+ return -EFAULT;
+
+ ret = cxl_mem_ps_get_attrs(dev, &rd_params);
+ if (ret) {
+ dev_err(dev, "Get cxlmemdev patrol scrub params fail ret=%d\n",
+ ret);
+ return ret;
+ }
+
+ switch (param_type) {
+ case CXL_MEMDEV_PS_PARAM_ENABLE:
+ set_pi.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+ params->enable);
+ set_pi.scrub_cycle_hr = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
+ rd_params.rate);
+ break;
+ case CXL_MEMDEV_PS_PARAM_RATE:
+ if (params->rate < rd_params.min_rate) {
+ dev_err(dev, "Invalid CXL patrol scrub cycle(%d) to set\n",
+ params->rate);
+ dev_err(dev, "Minimum supported CXL patrol scrub cycle in hour %d\n",
+ params->min_rate);
+ return -EINVAL;
+ }
+ set_pi.scrub_cycle_hr = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
+ params->rate);
+ set_pi.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+ rd_params.enable);
+ break;
+ default:
+ dev_err(dev, "Invalid CXL patrol scrub parameter to set\n");
+ return -EINVAL;
+ }
+
+ ret = cxl_set_feature(mds, &set_pi, sizeof(set_pi));
+ if (ret) {
+ dev_err(dev, "CXL patrol scrub set feature fail ret=%d\n",
+ ret);
+ return ret;
+ }
+
+ /* Verify attribute set successfully */
+ if (param_type == CXL_MEMDEV_PS_PARAM_RATE) {
+ ret = cxl_mem_ps_get_attrs(dev, &rd_params);
+ if (ret) {
+ dev_err(dev, "Get cxlmemdev patrol scrub params fail ret=%d\n", ret);
+ return ret;
+ }
+ if (rd_params.rate != params->rate)
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
+{
+ struct cxl_patrol_scrub_context *cxl_ps_ctx;
+ struct cxl_mbox_supp_feat_entry feat_entry;
+ struct cxl_memdev_ps_params params;
+ int ret;
+
+ ret = cxl_mem_get_supported_feature_entry(cxlmd, &cxl_patrol_scrub_uuid,
+ &feat_entry);
+ if (ret < 0)
+ return ret;
+
+ if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
+ return -ENOTSUPP;
+
+ cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
+ if (!cxl_ps_ctx)
+ return -ENOMEM;
+
+ cxl_ps_ctx->get_feat_size = feat_entry.get_feat_size;
+ cxl_ps_ctx->set_feat_size = feat_entry.set_feat_size;
+ ret = cxl_mem_ps_get_attrs(&cxlmd->dev, ¶ms);
+ if (ret) {
+ dev_err(&cxlmd->dev, "Get CXL patrol scrub params fail ret=%d\n",
+ ret);
+ return ret;
+ }
+ cxl_ps_ctx->scrub_cycle_changeable = params.scrub_cycle_changeable;
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_patrol_scrub_init, CXL);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 2223ef3d3140..7025c4fd66f3 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -948,6 +948,14 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
+/* cxl memory scrub functions */
+#ifdef CONFIG_CXL_SCRUB
+int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd);
+#else
+static inline int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
+{ return -ENOTSUPP; }
+#endif
+
#ifdef CONFIG_CXL_SUSPEND
void cxl_mem_active_inc(void);
void cxl_mem_active_dec(void);
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 233e7c42c161..d2d734d22461 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -886,6 +886,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
return rc;
+ /*
+ * Initialize optional CXL scrub features
+ */
+ cxl_mem_patrol_scrub_init(cxlmd);
+
rc = devm_cxl_sanitize_setup_notifier(&pdev->dev, cxlmd);
if (rc)
return rc;
--
2.34.1
From: Shiju Jose <[email protected]>
CXL spec 3.1 section 8.2.9.9.11.2 describes the DDR5 Error Check
Scrub (ECS) control feature.
The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
Specification (JESD79-5) and allows the DRAM to internally read, correct
single-bit errors, and write back corrected data bits to the DRAM array
while providing transparency to error counts. The ECS control feature
allows the request to configure ECS input configurations during system
boot or at run-time.
The ECS control allows the requester to change the log entry type, the ECS
threshold count provided that the request is within the definition
specified in DDR5 mode registers, change mode between codeword mode and
row count mode, and reset the ECS counter.
Open Question:
Is cxl_mem_ecs_init() invoked in the right function in cxl/core/region.c?
Signed-off-by: Shiju Jose <[email protected]>
---
drivers/cxl/core/memscrub.c | 303 +++++++++++++++++++++++++++++++++++-
drivers/cxl/core/region.c | 1 +
drivers/cxl/cxlmem.h | 3 +
3 files changed, 306 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/memscrub.c b/drivers/cxl/core/memscrub.c
index be8d9a9743eb..a3a371c5aa7b 100644
--- a/drivers/cxl/core/memscrub.c
+++ b/drivers/cxl/core/memscrub.c
@@ -5,7 +5,7 @@
* Copyright (c) 2023 HiSilicon Limited.
*
* - Provides functions to configure patrol scrub
- * feature of the CXL memory devices.
+ * and DDR5 ECS features of the CXL memory devices.
*/
#define pr_fmt(fmt) "CXL_MEM_SCRUB: " fmt
@@ -264,3 +264,304 @@ int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
return 0;
}
EXPORT_SYMBOL_NS_GPL(cxl_mem_patrol_scrub_init, CXL);
+
+/* CXL DDR5 ECS control definitions */
+#define CXL_MEMDEV_ECS_GET_FEAT_VERSION 0x01
+#define CXL_MEMDEV_ECS_SET_FEAT_VERSION 0x01
+
+static const uuid_t cxl_ecs_uuid =
+ UUID_INIT(0xe5b13f22, 0x2328, 0x4a14, 0xb8, 0xba, 0xb9, 0x69, 0x1e, \
+ 0x89, 0x33, 0x86);
+
+struct cxl_ecs_context {
+ struct device *dev;
+ u16 nregions;
+ int region_id;
+ u16 get_feat_size;
+ u16 set_feat_size;
+};
+
+/**
+ * struct cxl_memdev_ecs_params - CXL memory DDR5 ECS parameter data structure.
+ * @log_entry_type: ECS log entry type, per DRAM or per memory media FRU.
+ * @threshold: ECS threshold count per GB of memory cells.
+ * @mode: codeword/row count mode
+ * 0 : ECS counts rows with errors
+ * 1 : ECS counts codeword with errors
+ * @reset_counter: [IN] reset ECC counter to default value.
+ */
+struct cxl_memdev_ecs_params {
+ u8 log_entry_type;
+ u16 threshold;
+ u8 mode;
+ bool reset_counter;
+};
+
+enum {
+ CXL_MEMDEV_ECS_PARAM_LOG_ENTRY_TYPE,
+ CXL_MEMDEV_ECS_PARAM_THRESHOLD,
+ CXL_MEMDEV_ECS_PARAM_MODE,
+ CXL_MEMDEV_ECS_PARAM_RESET_COUNTER,
+};
+
+#define CXL_MEMDEV_ECS_LOG_ENTRY_TYPE_MASK GENMASK(1, 0)
+#define CXL_MEMDEV_ECS_REALTIME_REPORT_CAP_MASK BIT(0)
+#define CXL_MEMDEV_ECS_THRESHOLD_COUNT_MASK GENMASK(2, 0)
+#define CXL_MEMDEV_ECS_MODE_MASK BIT(3)
+#define CXL_MEMDEV_ECS_RESET_COUNTER_MASK BIT(4)
+
+static const u16 ecs_supp_threshold[] = { 0, 0, 0, 256, 1024, 4096 };
+
+enum {
+ ECS_LOG_ENTRY_TYPE_DRAM = 0x0,
+ ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU = 0x1,
+};
+
+enum {
+ ECS_THRESHOLD_256 = 3,
+ ECS_THRESHOLD_1024 = 4,
+ ECS_THRESHOLD_4096 = 5,
+};
+
+enum {
+ ECS_MODE_COUNTS_ROWS = 0,
+ ECS_MODE_COUNTS_CODEWORDS = 1,
+};
+
+struct cxl_memdev_ecs_feat_read_attrs {
+ u8 ecs_log_cap;
+ u8 ecs_cap;
+ __le16 ecs_config;
+ u8 ecs_flags;
+} __packed;
+
+struct cxl_memdev_ecs_set_feat_pi {
+ struct cxl_mbox_set_feat_in pi;
+ struct cxl_memdev_ecs_feat_wr_attrs {
+ u8 ecs_log_cap;
+ __le16 ecs_config;
+ } __packed wr_attrs[];
+} __packed;
+
+/* CXL DDR5 ECS control functions */
+static int cxl_mem_ecs_get_attrs(struct device *dev, int fru_id,
+ struct cxl_memdev_ecs_params *params)
+{
+ struct cxl_memdev_ecs_feat_read_attrs *rd_attrs __free(kvfree) = NULL;
+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev->parent);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+ struct cxl_mbox_get_feat_in pi = {
+ .uuid = cxl_ecs_uuid,
+ .offset = 0,
+ .selection = CXL_GET_FEAT_SEL_CURRENT_VALUE,
+ };
+ struct cxl_ecs_context *cxl_ecs_ctx;
+ u8 threshold_index;
+ int ret;
+
+ if (!mds)
+ return -EFAULT;
+ cxl_ecs_ctx = dev_get_drvdata(dev);
+
+ pi.count = cxl_ecs_ctx->get_feat_size;
+ rd_attrs = kvmalloc(pi.count, GFP_KERNEL);
+ if (!rd_attrs)
+ return -ENOMEM;
+
+ ret = cxl_get_feature(mds, &pi, rd_attrs);
+ if (ret) {
+ params->log_entry_type = 0;
+ params->threshold = 0;
+ params->mode = 0;
+ return ret;
+ }
+ params->log_entry_type = FIELD_GET(CXL_MEMDEV_ECS_LOG_ENTRY_TYPE_MASK,
+ rd_attrs[fru_id].ecs_log_cap);
+ threshold_index = FIELD_GET(CXL_MEMDEV_ECS_THRESHOLD_COUNT_MASK,
+ rd_attrs[fru_id].ecs_config);
+ params->threshold = ecs_supp_threshold[threshold_index];
+ params->mode = FIELD_GET(CXL_MEMDEV_ECS_MODE_MASK,
+ rd_attrs[fru_id].ecs_config);
+
+ return 0;
+}
+
+static int __maybe_unused
+cxl_mem_ecs_set_attrs(struct device *dev, int fru_id,
+ struct cxl_memdev_ecs_params *params, u8 param_type)
+{
+ struct cxl_memdev_ecs_feat_read_attrs *rd_attrs __free(kvfree) = NULL;
+ struct cxl_memdev_ecs_set_feat_pi *set_pi __free(kvfree) = NULL;
+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev->parent);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+ struct cxl_mbox_get_feat_in pi = {
+ .uuid = cxl_ecs_uuid,
+ .offset = 0,
+ .selection = CXL_GET_FEAT_SEL_CURRENT_VALUE,
+ };
+ struct cxl_memdev_ecs_feat_wr_attrs *wr_attrs;
+ struct cxl_memdev_ecs_params rd_params;
+ struct cxl_ecs_context *cxl_ecs_ctx;
+ u16 nmedia_frus, count;
+ u32 set_pi_size;
+ int ret;
+
+ if (!mds)
+ return -EFAULT;
+
+ cxl_ecs_ctx = dev_get_drvdata(dev);
+ nmedia_frus = cxl_ecs_ctx->nregions;
+
+ rd_attrs = kvmalloc(cxl_ecs_ctx->get_feat_size, GFP_KERNEL);
+ if (!rd_attrs)
+ return -ENOMEM;
+
+ pi.count = cxl_ecs_ctx->get_feat_size;
+ ret = cxl_get_feature(mds, &pi, rd_attrs);
+ if (ret)
+ return ret;
+ set_pi_size = sizeof(struct cxl_mbox_set_feat_in) +
+ cxl_ecs_ctx->set_feat_size;
+ set_pi = kvmalloc(set_pi_size, GFP_KERNEL);
+ if (!set_pi)
+ return -ENOMEM;
+
+ set_pi->pi.uuid = cxl_ecs_uuid;
+ set_pi->pi.flags = CXL_SET_FEAT_FLAG_MOD_VALUE_SAVED_ACROSS_RESET |
+ CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER;
+ set_pi->pi.offset = 0;
+ set_pi->pi.version = CXL_MEMDEV_ECS_SET_FEAT_VERSION;
+ /* Fill writable attributes from the current attributes read for all the media FRUs */
+ wr_attrs = set_pi->wr_attrs;
+ for (count = 0; count < nmedia_frus; count++) {
+ wr_attrs[count].ecs_log_cap = rd_attrs[count].ecs_log_cap;
+ wr_attrs[count].ecs_config = rd_attrs[count].ecs_config;
+ }
+
+ /* Fill attribute to be set for the media FRU */
+ switch (param_type) {
+ case CXL_MEMDEV_ECS_PARAM_LOG_ENTRY_TYPE:
+ if (params->log_entry_type != ECS_LOG_ENTRY_TYPE_DRAM &&
+ params->log_entry_type != ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU) {
+ dev_err(dev->parent,
+ "Invalid CXL ECS scrub log entry type(%d) to set\n",
+ params->log_entry_type);
+ dev_err(dev->parent,
+ "Log Entry Type 0: per DRAM 1: per Memory Media FRU\n");
+ return -EINVAL;
+ }
+ wr_attrs[fru_id].ecs_log_cap = FIELD_PREP(CXL_MEMDEV_ECS_LOG_ENTRY_TYPE_MASK,
+ params->log_entry_type);
+ break;
+ case CXL_MEMDEV_ECS_PARAM_THRESHOLD:
+ wr_attrs[fru_id].ecs_config &= ~CXL_MEMDEV_ECS_THRESHOLD_COUNT_MASK;
+ switch (params->threshold) {
+ case 256:
+ wr_attrs[fru_id].ecs_config |= FIELD_PREP(
+ CXL_MEMDEV_ECS_THRESHOLD_COUNT_MASK,
+ ECS_THRESHOLD_256);
+ break;
+ case 1024:
+ wr_attrs[fru_id].ecs_config |= FIELD_PREP(
+ CXL_MEMDEV_ECS_THRESHOLD_COUNT_MASK,
+ ECS_THRESHOLD_1024);
+ break;
+ case 4096:
+ wr_attrs[fru_id].ecs_config |= FIELD_PREP(
+ CXL_MEMDEV_ECS_THRESHOLD_COUNT_MASK,
+ ECS_THRESHOLD_4096);
+ break;
+ default:
+ dev_err(dev->parent,
+ "Invalid CXL ECS scrub threshold count(%d) to set\n",
+ params->threshold);
+ dev_err(dev->parent,
+ "Supported scrub threshold count: 256,1024,4096\n");
+ return -EINVAL;
+ }
+ break;
+ case CXL_MEMDEV_ECS_PARAM_MODE:
+ if (params->mode != ECS_MODE_COUNTS_ROWS &&
+ params->mode != ECS_MODE_COUNTS_CODEWORDS) {
+ dev_err(dev->parent,
+ "Invalid CXL ECS scrub mode(%d) to set\n",
+ params->mode);
+ dev_err(dev->parent,
+ "Mode 0: ECS counts rows with errors"
+ " 1: ECS counts codewords with errors\n");
+ return -EINVAL;
+ }
+ wr_attrs[fru_id].ecs_config &= ~CXL_MEMDEV_ECS_MODE_MASK;
+ wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_MEMDEV_ECS_MODE_MASK,
+ params->mode);
+ break;
+ case CXL_MEMDEV_ECS_PARAM_RESET_COUNTER:
+ wr_attrs[fru_id].ecs_config &= ~CXL_MEMDEV_ECS_RESET_COUNTER_MASK;
+ wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_MEMDEV_ECS_RESET_COUNTER_MASK,
+ params->reset_counter);
+ break;
+ default:
+ dev_err(dev->parent, "Invalid CXL ECS parameter to set\n");
+ return -EINVAL;
+ }
+ ret = cxl_set_feature(mds, set_pi, set_pi_size);
+ if (ret) {
+ dev_err(dev->parent, "CXL ECS set feature fail ret=%d\n", ret);
+ return ret;
+ }
+
+ /* Verify attribute is set successfully */
+ ret = cxl_mem_ecs_get_attrs(dev, fru_id, &rd_params);
+ if (ret) {
+ dev_err(dev->parent, "Get cxlmemdev ECS params fail ret=%d\n", ret);
+ return ret;
+ }
+ switch (param_type) {
+ case CXL_MEMDEV_ECS_PARAM_LOG_ENTRY_TYPE:
+ if (rd_params.log_entry_type != params->log_entry_type)
+ return -EFAULT;
+ break;
+ case CXL_MEMDEV_ECS_PARAM_THRESHOLD:
+ if (rd_params.threshold != params->threshold)
+ return -EFAULT;
+ break;
+ case CXL_MEMDEV_ECS_PARAM_MODE:
+ if (rd_params.mode != params->mode)
+ return -EFAULT;
+ break;
+ }
+
+ return 0;
+}
+
+int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id)
+{
+ struct cxl_mbox_supp_feat_entry feat_entry;
+ struct cxl_ecs_context *cxl_ecs_ctx;
+ int nmedia_frus;
+ int ret;
+
+ ret = cxl_mem_get_supported_feature_entry(cxlmd, &cxl_ecs_uuid, &feat_entry);
+ if (ret < 0)
+ return ret;
+
+ if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
+ return -ENOTSUPP;
+ nmedia_frus = feat_entry.get_feat_size/
+ sizeof(struct cxl_memdev_ecs_feat_read_attrs);
+ if (nmedia_frus) {
+ cxl_ecs_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ecs_ctx), GFP_KERNEL);
+ if (!cxl_ecs_ctx)
+ return -ENOMEM;
+
+ cxl_ecs_ctx->nregions = nmedia_frus;
+ cxl_ecs_ctx->get_feat_size = feat_entry.get_feat_size;
+ cxl_ecs_ctx->set_feat_size = feat_entry.set_feat_size;
+ cxl_ecs_ctx->region_id = region_id;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_ecs_init, CXL);
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index ce0e2d82bb2b..35b57f0d85fa 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2913,6 +2913,7 @@ int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
dev_err(&cxlr->dev, "failed to enable, range: %pr\n",
p->res);
}
+ cxl_mem_ecs_init(cxlmd, atomic_read(&cxlrd->region_id));
put_device(region_dev);
out:
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 7025c4fd66f3..06965ba89085 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -951,9 +951,12 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
/* cxl memory scrub functions */
#ifdef CONFIG_CXL_SCRUB
int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd);
+int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id);
#else
static inline int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
{ return -ENOTSUPP; }
+static inline int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id)
+{ return -ENOTSUPP; }
#endif
#ifdef CONFIG_CXL_SUSPEND
--
2.34.1
From: A Somasundaram <[email protected]>
The code contains PCC interfaces for RASF and RAS2 table, functions to send
RASF commands as per ACPI 5.1 and RAS2 commands as per ACPI 6.5 & upwards
revision.
References for this implementation,
ACPI specification 6.5, section 5.2.20 for RASF table, section 5.2.21 for RAS2
table and chapter 14 for PCC (Platform Communication Channel).
Driver uses PCC interfaces to communicate to the ACPI HW.
This code implements PCC interfaces and the functions to send the RASF/RAS2
commands to be used by OSPM.
Signed-off-by: A Somasundaram <[email protected]>
Co-developed-by: Shiju Jose <[email protected]>
Signed-off-by: Shiju Jose <[email protected]>
---
drivers/acpi/Kconfig | 15 ++
drivers/acpi/Makefile | 1 +
drivers/acpi/rasf_acpi_common.c | 272 ++++++++++++++++++++++++++++++++
include/acpi/rasf_acpi.h | 58 +++++++
4 files changed, 346 insertions(+)
create mode 100755 drivers/acpi/rasf_acpi_common.c
create mode 100644 include/acpi/rasf_acpi.h
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 3c3f8037ebed..4b7ebfede625 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -284,6 +284,21 @@ config ACPI_CPPC_LIB
If your platform does not support CPPC in firmware,
leave this option disabled.
+config ACPI_RASF
+ bool "ACPI RASF driver"
+ depends on ACPI_PROCESSOR
+ select MAILBOX
+ select PCC
+ help
+ The driver adds support for PCC (platform communication
+ channel) interfaces to communicate with the ACPI complaint
+ hardware platform supports RASF(RAS Feature table) or
+ and RAS2(RAS2 Feature table).
+ The driver adds support for RASF/RAS2(extraction of RASF/RAS2
+ tables from OS system table), PCC interfaces and OSPM interfaces
+ to send RASF & RAS2 commands. Driver adds platform device which
+ binds to the RASF/RAS2 memory driver.
+
config ACPI_PROCESSOR
tristate "Processor"
depends on X86 || ARM64 || LOONGARCH
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 12ef8180d272..5c984c13de78 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -105,6 +105,7 @@ obj-$(CONFIG_ACPI_CUSTOM_METHOD)+= custom_method.o
obj-$(CONFIG_ACPI_BGRT) += bgrt.o
obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o
obj-$(CONFIG_ACPI_SPCR_TABLE) += spcr.o
+obj-$(CONFIG_ACPI_RASF) += rasf_acpi_common.o
obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
obj-$(CONFIG_ACPI_PPTT) += pptt.o
obj-$(CONFIG_ACPI_PFRUT) += pfr_update.o pfr_telemetry.o
diff --git a/drivers/acpi/rasf_acpi_common.c b/drivers/acpi/rasf_acpi_common.c
new file mode 100755
index 000000000000..3ee34f5d12d3
--- /dev/null
+++ b/drivers/acpi/rasf_acpi_common.c
@@ -0,0 +1,272 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * rasf_acpi_common.c - ACPI RASF table processing common functions
+ *
+ * (C) Copyright 2014, 2015 Hewlett-Packard Enterprises.
+ *
+ * Copyright (c) 2023 HiSilicon Limited.
+ *
+ * Support for
+ * RASF - ACPI 6.5 Specification, section 5.2.20
+ * RAS2 - ACPI 6.5 Specification, section 5.2.21
+ * PCC(Platform Communications Channel) - ACPI 6.5 Specification,
+ * chapter 14.
+ *
+ * Code contains common functions for RASF.
+ * PCC(Platform communication channel) interfaces for the RASF & RAS2
+ * and the functions for sending RASF & RAS2 commands to the ACPI HW.
+ */
+
+#define pr_fmt(fmt) "ACPI RASF COMMON: " fmt
+
+#include <linux/export.h>
+#include <linux/delay.h>
+#include <linux/ktime.h>
+#include <linux/platform_device.h>
+#include <acpi/rasf_acpi.h>
+#include <acpi/acpixf.h>
+
+static int rasf_check_pcc_chan(struct rasf_context *rasf_ctx)
+{
+ int ret = -EIO;
+ struct acpi_rasf_shared_memory __iomem *generic_comm_base = rasf_ctx->pcc_comm_addr;
+ ktime_t next_deadline = ktime_add(ktime_get(), rasf_ctx->deadline);
+
+ while (!ktime_after(ktime_get(), next_deadline)) {
+ /*
+ * As per ACPI spec, the PCC space wil be initialized by
+ * platform and should have set the command completion bit when
+ * PCC can be used by OSPM
+ */
+ if (readw_relaxed(&generic_comm_base->status) & RASF_PCC_CMD_COMPLETE) {
+ ret = 0;
+ break;
+ }
+ /*
+ * Reducing the bus traffic in case this loop takes longer than
+ * a few retries.
+ */
+ udelay(10);
+ }
+
+ return ret;
+}
+
+/**
+ * rasf_send_pcc_cmd() - Send RASF command via PCC channel
+ * @rasf_ctx: pointer to the rasf context structure
+ * @cmd: command to send
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int rasf_send_pcc_cmd(struct rasf_context *rasf_ctx, u16 cmd)
+{
+ int ret = -EIO;
+ struct acpi_rasf_shared_memory *generic_comm_base =
+ (struct acpi_rasf_shared_memory *)rasf_ctx->pcc_comm_addr;
+ static ktime_t last_cmd_cmpl_time, last_mpar_reset;
+ static int mpar_count;
+ unsigned int time_delta;
+
+ if (cmd == RASF_PCC_CMD_EXEC) {
+ ret = rasf_check_pcc_chan(rasf_ctx);
+ if (ret)
+ return ret;
+ }
+
+ /*
+ * Handle the Minimum Request Turnaround Time(MRTT)
+ * "The minimum amount of time that OSPM must wait after the completion
+ * of a command before issuing the next command, in microseconds"
+ */
+ if (rasf_ctx->pcc_mrtt) {
+ time_delta = ktime_us_delta(ktime_get(), last_cmd_cmpl_time);
+ if (rasf_ctx->pcc_mrtt > time_delta)
+ udelay(rasf_ctx->pcc_mrtt - time_delta);
+ }
+
+ /*
+ * Handle the non-zero Maximum Periodic Access Rate(MPAR)
+ * "The maximum number of periodic requests that the subspace channel can
+ * support, reported in commands per minute. 0 indicates no limitation."
+ *
+ * This parameter should be ideally zero or large enough so that it can
+ * handle maximum number of requests that all the cores in the system can
+ * collectively generate. If it is not, we will follow the spec and just
+ * not send the request to the platform after hitting the MPAR limit in
+ * any 60s window
+ */
+ if (rasf_ctx->pcc_mpar) {
+ if (mpar_count == 0) {
+ time_delta = ktime_ms_delta(ktime_get(), last_mpar_reset);
+ if (time_delta < 60 * MSEC_PER_SEC) {
+ pr_debug("PCC cmd not sent due to MPAR limit");
+ return -EIO;
+ }
+ last_mpar_reset = ktime_get();
+ mpar_count = rasf_ctx->pcc_mpar;
+ }
+ mpar_count--;
+ }
+
+ /* Write to the shared comm region. */
+ writew_relaxed(cmd, &generic_comm_base->command);
+
+ /* Flip CMD COMPLETE bit */
+ writew_relaxed(0, &generic_comm_base->status);
+
+ /* Ring doorbell */
+ ret = mbox_send_message(rasf_ctx->pcc_channel, &cmd);
+ if (ret < 0) {
+ pr_err("Err sending PCC mbox message. cmd:%d, ret:%d\n",
+ cmd, ret);
+ return ret;
+ }
+
+ /*
+ * For READs we need to ensure the cmd completed to ensure
+ * the ensuing read()s can proceed. For WRITEs we dont care
+ * because the actual write()s are done before coming here
+ * and the next READ or WRITE will check if the channel
+ * is busy/free at the entry of this call.
+ *
+ * If Minimum Request Turnaround Time is non-zero, we need
+ * to record the completion time of both READ and WRITE
+ * command for proper handling of MRTT, so we need to check
+ * for pcc_mrtt in addition to CMD_READ
+ */
+ if (cmd == RASF_PCC_CMD_EXEC || rasf_ctx->pcc_mrtt) {
+ ret = rasf_check_pcc_chan(rasf_ctx);
+ if (rasf_ctx->pcc_mrtt)
+ last_cmd_cmpl_time = ktime_get();
+ }
+
+ if (rasf_ctx->pcc_channel->mbox->txdone_irq)
+ mbox_chan_txdone(rasf_ctx->pcc_channel, ret);
+ else
+ mbox_client_txdone(rasf_ctx->pcc_channel, ret);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(rasf_send_pcc_cmd);
+
+/**
+ * rasf_register_pcc_channel() - Register PCC channel
+ * @rasf_ctx: pointer to the rasf context structure
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int rasf_register_pcc_channel(struct rasf_context *rasf_ctx)
+{
+ u64 usecs_lat;
+ unsigned int len;
+ struct pcc_mbox_chan *pcc_chan;
+ struct mbox_client *rasf_mbox_cl;
+ struct acpi_pcct_hw_reduced *rasf_ss;
+
+ rasf_mbox_cl = &rasf_ctx->mbox_client;
+ if (!rasf_mbox_cl || rasf_ctx->pcc_subspace_idx < 0)
+ return -EINVAL;
+
+ pcc_chan = pcc_mbox_request_channel(rasf_mbox_cl,
+ rasf_ctx->pcc_subspace_idx);
+
+ if (IS_ERR(pcc_chan)) {
+ pr_err("Failed to find PCC channel for subspace %d\n",
+ rasf_ctx->pcc_subspace_idx);
+ return -ENODEV;
+ }
+ rasf_ctx->pcc_chan = pcc_chan;
+ rasf_ctx->pcc_channel = pcc_chan->mchan;
+ /*
+ * The PCC mailbox controller driver should
+ * have parsed the PCCT (global table of all
+ * PCC channels) and stored pointers to the
+ * subspace communication region in con_priv.
+ */
+ rasf_ss = rasf_ctx->pcc_channel->con_priv;
+
+ if (!rasf_ss) {
+ pr_err("No PCC subspace found for RASF\n");
+ pcc_mbox_free_channel(rasf_ctx->pcc_chan);
+ return -ENODEV;
+ }
+
+ /*
+ * This is the shared communication region
+ * for the OS and Platform to communicate over.
+ */
+ rasf_ctx->comm_base_addr = rasf_ss->base_address;
+ len = rasf_ss->length;
+ pr_debug("PCC subspace for RASF=0x%llx len=%d\n",
+ rasf_ctx->comm_base_addr, len);
+
+ /*
+ * rasf_ss->latency is just a Nominal value. In reality
+ * the remote processor could be much slower to reply.
+ * So add an arbitrary amount of wait on top of Nominal.
+ */
+ usecs_lat = RASF_NUM_RETRIES * rasf_ss->latency;
+ rasf_ctx->deadline = ns_to_ktime(usecs_lat * NSEC_PER_USEC);
+ rasf_ctx->pcc_mrtt = rasf_ss->min_turnaround_time;
+ rasf_ctx->pcc_mpar = rasf_ss->max_access_rate;
+ rasf_ctx->pcc_comm_addr = acpi_os_ioremap(rasf_ctx->comm_base_addr, len);
+ pr_debug("pcc_comm_addr=%p\n", rasf_ctx->pcc_comm_addr);
+
+ /* Set flag so that we dont come here for each CPU. */
+ rasf_ctx->pcc_channel_acquired = true;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(rasf_register_pcc_channel);
+
+/**
+ * rasf_unregister_pcc_channel() - Unregister PCC channel
+ * @rasf_ctx: pointer to the rasf context structure
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int rasf_unregister_pcc_channel(struct rasf_context *rasf_ctx)
+{
+ if (!rasf_ctx->pcc_chan)
+ return -EINVAL;
+
+ pcc_mbox_free_channel(rasf_ctx->pcc_chan);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(rasf_unregister_pcc_channel);
+
+/**
+ * rasf_add_platform_device() - Add a platform device for RASF
+ * @name: name of the device we're adding
+ * @data: platform specific data for this platform device
+ * @size: size of platform specific data
+ *
+ * Returns: pointer to platform device on success, an error otherwise
+ */
+struct platform_device *rasf_add_platform_device(char *name, const void *data,
+ size_t size)
+{
+ int ret;
+ struct platform_device *pdev;
+
+ pdev = platform_device_alloc(name, PLATFORM_DEVID_AUTO);
+ if (!pdev)
+ return NULL;
+
+ ret = platform_device_add_data(pdev, data, size);
+ if (ret)
+ goto dev_put;
+
+ ret = platform_device_add(pdev);
+ if (ret)
+ goto dev_put;
+
+ return pdev;
+
+dev_put:
+ platform_device_put(pdev);
+
+ return NULL;
+}
diff --git a/include/acpi/rasf_acpi.h b/include/acpi/rasf_acpi.h
new file mode 100644
index 000000000000..aa4f935b28cf
--- /dev/null
+++ b/include/acpi/rasf_acpi.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * RASF driver header file
+ *
+ * (C) Copyright 2014, 2015 Hewlett-Packard Enterprises
+ *
+ * Copyright (c) 2023 HiSilicon Limited
+ */
+
+#ifndef _RASF_ACPI_H
+#define _RASF_ACPI_H
+
+#include <linux/acpi.h>
+#include <linux/mailbox_client.h>
+#include <linux/mailbox_controller.h>
+#include <linux/types.h>
+#include <acpi/pcc.h>
+
+#define RASF_PCC_CMD_COMPLETE 1
+
+/* RASF specific PCC commands */
+#define RASF_PCC_CMD_EXEC 0x01
+
+#define RASF_FAILURE 0
+#define RASF_SUCCESS 1
+
+/*
+ * Arbitrary Retries for PCC commands.
+ */
+#define RASF_NUM_RETRIES 600
+
+/*
+ * Data structures for PCC communication and RASF table
+ */
+struct rasf_context {
+ struct device *dev;
+ int id;
+ struct mbox_client mbox_client;
+ struct mbox_chan *pcc_channel;
+ struct pcc_mbox_chan *pcc_chan;
+ void __iomem *pcc_comm_addr;
+ u64 comm_base_addr;
+ int pcc_subspace_idx;
+ bool pcc_channel_acquired;
+ ktime_t deadline;
+ unsigned int pcc_mpar;
+ unsigned int pcc_mrtt;
+ spinlock_t spinlock; /* Lock to provide mutually exclusive access to PCC channel */
+ struct device *scrub_dev;
+ const struct rasf_hw_scrub_ops *ops;
+};
+
+struct platform_device *rasf_add_platform_device(char *name, const void *data,
+ size_t size);
+int rasf_send_pcc_cmd(struct rasf_context *rasf_ctx, u16 cmd);
+int rasf_register_pcc_channel(struct rasf_context *rasf_ctx);
+int rasf_unregister_pcc_channel(struct rasf_context *rasf_ctx);
+#endif /* _RASF_ACPI_H */
--
2.34.1
From: Shiju Jose <[email protected]>
Add support for ACPI RAS2 feature table (RAS2) defined in the ACPI 6.5
Specification, section 5.2.21.
This driver contains RAS2 Init, which extracts the RAS2 table.
Driver adds platform device, for each memory feature, which binds
to the RAS2 memory driver.
Signed-off-by: Shiju Jose <[email protected]>
---
drivers/acpi/Makefile | 2 +-
drivers/acpi/ras2_acpi.c | 97 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 98 insertions(+), 1 deletion(-)
create mode 100755 drivers/acpi/ras2_acpi.c
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 5c984c13de78..b2baf189ea0e 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -105,7 +105,7 @@ obj-$(CONFIG_ACPI_CUSTOM_METHOD)+= custom_method.o
obj-$(CONFIG_ACPI_BGRT) += bgrt.o
obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o
obj-$(CONFIG_ACPI_SPCR_TABLE) += spcr.o
-obj-$(CONFIG_ACPI_RASF) += rasf_acpi_common.o
+obj-$(CONFIG_ACPI_RASF) += rasf_acpi_common.o ras2_acpi.o
obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
obj-$(CONFIG_ACPI_PPTT) += pptt.o
obj-$(CONFIG_ACPI_PFRUT) += pfr_update.o pfr_telemetry.o
diff --git a/drivers/acpi/ras2_acpi.c b/drivers/acpi/ras2_acpi.c
new file mode 100755
index 000000000000..b8a7740355a8
--- /dev/null
+++ b/drivers/acpi/ras2_acpi.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * ras2_acpi.c - Implementation of ACPI RAS2 feature table processing
+ * functions.
+ *
+ * Copyright (c) 2023 HiSilicon Limited.
+ *
+ * Support for
+ * RAS2 - ACPI 6.5 Specification, section 5.2.21
+ *
+ * Driver contains RAS2 init, which extracts the RAS2 table and
+ * registers the PCC channel for communicating with the ACPI compliant
+ * platform that contains RAS2 command support in hardware.Driver adds
+ * platform device which binds to the RAS2 memory driver.
+ */
+
+#define pr_fmt(fmt) "ACPI RAS2: " fmt
+
+#include <linux/export.h>
+#include <linux/delay.h>
+#include <linux/ktime.h>
+#include <linux/platform_device.h>
+#include <acpi/rasf_acpi.h>
+#include <acpi/acpixf.h>
+
+#define RAS2_FEATURE_TYPE_MEMORY 0x00
+
+int __init ras2_acpi_init(void)
+{
+ u8 count;
+ acpi_status status;
+ acpi_size ras2_size;
+ int pcc_subspace_idx;
+ struct platform_device *pdev;
+ struct acpi_table_ras2 *pRas2Table;
+ struct acpi_ras2_pcc_desc *pcc_desc_list;
+ struct platform_device **pdev_list = NULL;
+ struct acpi_table_header *pAcpiTable = NULL;
+
+ status = acpi_get_table("RAS2", 0, &pAcpiTable);
+ if (ACPI_FAILURE(status) || !pAcpiTable) {
+ pr_err("ACPI RAS2 driver failed to initialize, get table failed\n");
+ return RASF_FAILURE;
+ }
+
+ ras2_size = pAcpiTable->length;
+ if (ras2_size < sizeof(struct acpi_table_ras2)) {
+ pr_err("ACPI RAS2 table present but broken (too short #1)\n");
+ goto free_ras2_table;
+ }
+
+ pRas2Table = (struct acpi_table_ras2 *)pAcpiTable;
+
+ if (pRas2Table->num_pcc_descs <= 0) {
+ pr_err("ACPI RAS2 table does not contain PCC descriptors\n");
+ goto free_ras2_table;
+ }
+
+ pdev_list = kzalloc((pRas2Table->num_pcc_descs * sizeof(struct platform_device *)),
+ GFP_KERNEL);
+ if (!pdev_list)
+ goto free_ras2_table;
+
+ pcc_desc_list = (struct acpi_ras2_pcc_desc *)
+ ((void *)pRas2Table + sizeof(struct acpi_table_ras2));
+ count = 0;
+ while (count < pRas2Table->num_pcc_descs) {
+ if (pcc_desc_list->feature_type == RAS2_FEATURE_TYPE_MEMORY) {
+ pcc_subspace_idx = pcc_desc_list->channel_id;
+ /* Add the platform device and bind ras2 memory driver */
+ pdev = rasf_add_platform_device("ras2", &pcc_subspace_idx,
+ sizeof(pcc_subspace_idx));
+ if (!pdev)
+ goto free_ras2_pdev;
+ pdev_list[count] = pdev;
+ }
+ count++;
+ pcc_desc_list = pcc_desc_list + sizeof(struct acpi_ras2_pcc_desc);
+ }
+
+ acpi_put_table(pAcpiTable);
+ return RASF_SUCCESS;
+
+free_ras2_pdev:
+ count = 0;
+ while (count < pRas2Table->num_pcc_descs) {
+ if (pcc_desc_list->feature_type ==
+ RAS2_FEATURE_TYPE_MEMORY)
+ platform_device_put(pdev_list[count++]);
+ }
+ kfree(pdev_list);
+
+free_ras2_table:
+ acpi_put_table(pAcpiTable);
+ return RASF_FAILURE;
+}
+late_initcall(ras2_acpi_init)
--
2.34.1
From: Shiju Jose <[email protected]>
Add scrub driver supports configuring the memory scrubs in the system.
The scrub driver provides the interface for registering the scrub devices
and supports configuring memory scrubs in the system.
Driver exposes the sysfs scrub control attributes to the user in
/sys/class/scrub/scrubX/regionN/
Signed-off-by: Shiju Jose <[email protected]>
---
.../ABI/testing/sysfs-class-scrub-configure | 91 +++++
drivers/memory/Kconfig | 1 +
drivers/memory/Makefile | 1 +
drivers/memory/scrub/Kconfig | 11 +
drivers/memory/scrub/Makefile | 6 +
drivers/memory/scrub/memory-scrub.c | 367 ++++++++++++++++++
include/memory/memory-scrub.h | 78 ++++
7 files changed, 555 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
create mode 100644 drivers/memory/scrub/Kconfig
create mode 100644 drivers/memory/scrub/Makefile
create mode 100755 drivers/memory/scrub/memory-scrub.c
create mode 100755 include/memory/memory-scrub.h
diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
new file mode 100644
index 000000000000..d2d422b667cf
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
@@ -0,0 +1,91 @@
+What: /sys/class/scrub/
+Date: January 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description:
+ The scrub/ class subdirectory belongs to the
+ scrubber subsystem.
+
+What: /sys/class/scrub/scrubX/
+Date: January 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description:
+ The /sys/class/scrub/scrub{0,1,2,3,...} directories
+ correspond to each scrub device.
+
+What: /sys/class/scrub/scrubX/name
+Date: January 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description:
+ (RO) name of the memory scrub device
+
+What: /sys/class/scrub/scrubX/regionN/
+Date: January 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description:
+ The /sys/class/scrub/scrubX/region{0,1,2,3,...}
+ directories correspond to each scrub region under a scrub device.
+ Scrub region is a physical address range for which scrub may be
+ separately controlled. Regions may overlap in which case the
+ scrubbing rate of the overlapped memory will be at least that
+ expected due to each overlapping region.
+
+What: /sys/class/scrub/scrubX/regionN/addr_base
+Date: January 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description:
+ (RW) The base of the address range of the memory region
+ to be scrubbed.
+ On reading, returns the base of the memory region for
+ the actual address range(The platform calculates
+ the nearest patrol scrub boundary address from where
+ it can start scrub).
+
+What: /sys/class/scrub/scrubX/regionN/addr_size
+Date: January 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description:
+ (RW) The size of the address range to be scrubbed.
+ On reading, returns the size of the memory region for
+ the actual address range.
+
+What: /sys/class/scrub/scrubX/regionN/enable
+Date: January 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description:
+ (WO) Enable/Disable scrub the memory region.
+ 1 - enable the memory scrub.
+ 0 - disable the memory scrub.
+
+What: /sys/class/scrub/scrubX/regionN/enable_background_scrub
+Date: January 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description:
+ (WO) Enable/Disable background scrubbing if supported.
+ 1 - enable background scrubbing.
+ 0 - disable background scrubbing.
+
+What: /sys/class/scrub/scrubX/regionN/rate_available
+Date: January 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description:
+ (RO) Supported range for the scrub rate)
+ by the scrubber for a memory region.
+ The unit of the scrub rate vary depends on the scrub.
+
+What: /sys/class/scrub/scrubX/regionN/rate
+Date: January 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description:
+ (RW) The scrub rate in the memory region specified
+ and it must be with in the supported range by the scrub.
+ The unit of the scrub rate vary depends on the scrub.
diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
index 8efdd1f97139..d2e015c09d83 100644
--- a/drivers/memory/Kconfig
+++ b/drivers/memory/Kconfig
@@ -227,5 +227,6 @@ config STM32_FMC2_EBI
source "drivers/memory/samsung/Kconfig"
source "drivers/memory/tegra/Kconfig"
+source "drivers/memory/scrub/Kconfig"
endif
diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
index d2e6ca9abbe0..4b37312cb342 100644
--- a/drivers/memory/Makefile
+++ b/drivers/memory/Makefile
@@ -27,6 +27,7 @@ obj-$(CONFIG_STM32_FMC2_EBI) += stm32-fmc2-ebi.o
obj-$(CONFIG_SAMSUNG_MC) += samsung/
obj-$(CONFIG_TEGRA_MC) += tegra/
+obj-$(CONFIG_SCRUB) += scrub/
obj-$(CONFIG_TI_EMIF_SRAM) += ti-emif-sram.o
obj-$(CONFIG_FPGA_DFL_EMIF) += dfl-emif.o
diff --git a/drivers/memory/scrub/Kconfig b/drivers/memory/scrub/Kconfig
new file mode 100644
index 000000000000..fa7d68f53a69
--- /dev/null
+++ b/drivers/memory/scrub/Kconfig
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Memory scrub driver configurations
+#
+
+config SCRUB
+ bool "Memory scrub driver"
+ help
+ This option selects the memory scrub subsystem, supports
+ configuring the parameters of underlying scrubbers in the
+ system for the DRAM memories.
diff --git a/drivers/memory/scrub/Makefile b/drivers/memory/scrub/Makefile
new file mode 100644
index 000000000000..1b677132ca13
--- /dev/null
+++ b/drivers/memory/scrub/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for memory scrub drivers
+#
+
+obj-$(CONFIG_SCRUB) += memory-scrub.o
diff --git a/drivers/memory/scrub/memory-scrub.c b/drivers/memory/scrub/memory-scrub.c
new file mode 100755
index 000000000000..a160b7a047e4
--- /dev/null
+++ b/drivers/memory/scrub/memory-scrub.c
@@ -0,0 +1,367 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory scrub driver supports configuring
+ * the memory scrubs.
+ *
+ * Copyright (c) 2023 HiSilicon Limited.
+ */
+
+#define pr_fmt(fmt) "MEM SCRUB: " fmt
+
+#include <linux/acpi.h>
+#include <linux/bitops.h>
+#include <linux/delay.h>
+#include <linux/platform_device.h>
+#include <linux/kfifo.h>
+#include <linux/spinlock.h>
+#include <memory/memory-scrub.h>
+
+/* memory scrubber config definitions */
+#define SCRUB_ID_PREFIX "scrub"
+#define SCRUB_ID_FORMAT SCRUB_ID_PREFIX "%d"
+#define SCRUB_DEV_MAX_NAME_LENGTH 128
+#define SCRUB_MAX_SYSFS_ATTR_NAME_LENGTH 64
+
+static DEFINE_IDA(scrub_ida);
+
+struct scrub_device {
+ char name[SCRUB_DEV_MAX_NAME_LENGTH];
+ int id;
+ struct device dev;
+ char region_name[SCRUB_MAX_SYSFS_ATTR_NAME_LENGTH];
+ int region_id;
+ struct attribute_group group;
+ const struct attribute_group *groups[2];
+ const struct scrub_ops *ops;
+};
+
+#define to_scrub_device(d) container_of(d, struct scrub_device, dev)
+
+static ssize_t name_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%s\n", to_scrub_device(dev)->name);
+}
+static DEVICE_ATTR_RO(name);
+
+static struct attribute *scrub_dev_attrs[] = {
+ &dev_attr_name.attr,
+ NULL
+};
+
+static umode_t scrub_dev_attr_is_visible(struct kobject *kobj,
+ struct attribute *attr, int n)
+{
+ if (attr != &dev_attr_name.attr)
+ return 0;
+
+ return attr->mode;
+}
+
+static const struct attribute_group scrub_dev_attr_group = {
+ .attrs = scrub_dev_attrs,
+ .is_visible = scrub_dev_attr_is_visible,
+};
+
+static const struct attribute_group *scrub_dev_attr_groups[] = {
+ &scrub_dev_attr_group,
+ NULL
+};
+
+static void scrub_dev_release(struct device *dev)
+{
+ struct scrub_device *scrub_dev = to_scrub_device(dev);
+
+ ida_free(&scrub_ida, scrub_dev->id);
+ kfree(scrub_dev);
+}
+
+static struct class scrub_class = {
+ .name = "scrub",
+ .dev_groups = scrub_dev_attr_groups,
+ .dev_release = scrub_dev_release,
+};
+
+static umode_t scrub_attr_visible(struct kobject *kobj,
+ struct attribute *a, int attr_id)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct scrub_device *scrub_dev = to_scrub_device(dev);
+ int region_id = scrub_dev->region_id;
+
+ if (!scrub_dev->ops)
+ return 0;
+
+ return scrub_dev->ops->is_visible(dev, attr_id, region_id);
+}
+
+static ssize_t scrub_attr_show(struct device *dev, int attr_id,
+ char *buf)
+{
+ struct scrub_device *scrub_dev = to_scrub_device(dev);
+ int region_id = scrub_dev->region_id;
+ int ret;
+ u64 val;
+
+ ret = scrub_dev->ops->read(dev, attr_id, region_id, &val);
+ if (ret < 0)
+ return ret;
+
+ return sprintf(buf, "%lld\n", val);
+}
+
+static ssize_t scrub_attr_show_hex(struct device *dev, int attr_id,
+ char *buf)
+{
+ struct scrub_device *scrub_dev = to_scrub_device(dev);
+ int region_id = scrub_dev->region_id;
+ int ret;
+ u64 val;
+
+ ret = scrub_dev->ops->read(dev, attr_id, region_id, &val);
+ if (ret < 0)
+ return ret;
+
+ return sprintf(buf, "0x%llx\n", val);
+}
+
+static ssize_t scrub_attr_show_string(struct device *dev, int attr_id,
+ char *buf)
+{
+ struct scrub_device *scrub_dev = to_scrub_device(dev);
+ int region_id = scrub_dev->region_id;
+ int ret;
+
+ ret = scrub_dev->ops->read_string(dev, attr_id, region_id, buf);
+ if (ret < 0)
+ return ret;
+
+ return strlen(buf);
+}
+
+static ssize_t scrub_attr_store(struct device *dev, int attr_id,
+ const char *buf, size_t count)
+{
+ struct scrub_device *scrub_dev = to_scrub_device(dev);
+ int region_id = scrub_dev->region_id;
+ long val;
+ int ret;
+
+ ret = kstrtol(buf, 10, &val);
+ if (ret < 0)
+ return ret;
+
+ ret = scrub_dev->ops->write(dev, attr_id, region_id, val);
+ if (ret < 0)
+ return ret;
+
+ return count;
+}
+
+static ssize_t scrub_attr_store_hex(struct device *dev, int attr_id,
+ const char *buf, size_t count)
+{
+ struct scrub_device *scrub_dev = to_scrub_device(dev);
+ int region_id = scrub_dev->region_id;
+ int ret;
+ u64 val;
+
+ ret = kstrtou64(buf, 16, &val);
+ if (ret < 0)
+ return ret;
+
+ ret = scrub_dev->ops->write(dev, attr_id, region_id, val);
+ if (ret < 0)
+ return ret;
+
+ return count;
+}
+
+static ssize_t show_scrub_attr(struct device *dev, char *buf, int attr_id)
+{
+ switch (attr_id) {
+ case scrub_addr_base:
+ case scrub_addr_size:
+ return scrub_attr_show_hex(dev, attr_id, buf);
+ case scrub_enable:
+ case scrub_rate:
+ return scrub_attr_show(dev, attr_id, buf);
+ case scrub_rate_available:
+ return scrub_attr_show_string(dev, attr_id, buf);
+ }
+
+ return -ENOTSUPP;
+}
+
+static ssize_t store_scrub_attr(struct device *dev, const char *buf,
+ size_t count, int attr_id)
+{
+ switch (attr_id) {
+ case scrub_addr_base:
+ case scrub_addr_size:
+ return scrub_attr_store_hex(dev, attr_id, buf, count);
+ case scrub_enable:
+ case scrub_enable_background_scrub:
+ case scrub_rate:
+ return scrub_attr_store(dev, attr_id, buf, count);
+ }
+
+ return -ENOTSUPP;
+}
+
+#define SCRUB_ATTR_RW(attr) \
+static ssize_t attr##_show(struct device *dev, \
+ struct device_attribute *attr, char *buf) \
+{ \
+ return show_scrub_attr(dev, buf, (scrub_##attr)); \
+} \
+static ssize_t attr##_store(struct device *dev, \
+ struct device_attribute *attr, \
+ const char *buf, size_t count) \
+{ \
+ return store_scrub_attr(dev, buf, count, (scrub_##attr));\
+} \
+static DEVICE_ATTR_RW(attr)
+
+#define SCRUB_ATTR_RO(attr) \
+static ssize_t attr##_show(struct device *dev, \
+ struct device_attribute *attr, char *buf) \
+{ \
+ return show_scrub_attr(dev, buf, (scrub_##attr)); \
+} \
+static DEVICE_ATTR_RO(attr)
+
+#define SCRUB_ATTR_WO(attr) \
+static ssize_t attr##_store(struct device *dev, \
+ struct device_attribute *attr, \
+ const char *buf, size_t count) \
+{ \
+ return store_scrub_attr(dev, buf, count, (scrub_##attr));\
+} \
+static DEVICE_ATTR_WO(attr)
+
+SCRUB_ATTR_RW(addr_base);
+SCRUB_ATTR_RW(addr_size);
+SCRUB_ATTR_RW(enable);
+SCRUB_ATTR_RW(enable_background_scrub);
+SCRUB_ATTR_RW(rate);
+SCRUB_ATTR_RO(rate_available);
+
+static struct attribute *scrub_attrs[] = {
+ &dev_attr_addr_base.attr,
+ &dev_attr_addr_size.attr,
+ &dev_attr_enable.attr,
+ &dev_attr_enable_background_scrub.attr,
+ &dev_attr_rate.attr,
+ &dev_attr_rate_available.attr,
+ NULL,
+};
+
+static struct device *
+scrub_device_register(struct device *dev, const char *name, void *drvdata,
+ const struct scrub_ops *ops,
+ int region_id,
+ struct attribute_group *attr_group)
+{
+ struct scrub_device *scrub_dev;
+ struct device *hdev;
+ int err;
+
+ scrub_dev = kzalloc(sizeof(*scrub_dev), GFP_KERNEL);
+ if (!scrub_dev)
+ return ERR_PTR(-ENOMEM);
+ hdev = &scrub_dev->dev;
+
+ scrub_dev->id = ida_alloc(&scrub_ida, GFP_KERNEL);
+ if (scrub_dev->id < 0) {
+ kfree(scrub_dev);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ snprintf((char *)scrub_dev->region_name, SCRUB_MAX_SYSFS_ATTR_NAME_LENGTH,
+ "region%d", region_id);
+ if (attr_group) {
+ attr_group->name = (char *)scrub_dev->region_name;
+ scrub_dev->groups[0] = attr_group;
+ scrub_dev->region_id = region_id;
+ } else {
+ scrub_dev->group.name = (char *)scrub_dev->region_name;
+ scrub_dev->group.attrs = scrub_attrs;
+ scrub_dev->group.is_visible = scrub_attr_visible;
+ scrub_dev->groups[0] = &scrub_dev->group;
+ scrub_dev->ops = ops;
+ scrub_dev->region_id = region_id;
+ }
+
+ hdev->groups = scrub_dev->groups;
+ hdev->class = &scrub_class;
+ hdev->parent = dev;
+ dev_set_drvdata(hdev, drvdata);
+ dev_set_name(hdev, SCRUB_ID_FORMAT, scrub_dev->id);
+ snprintf(scrub_dev->name, SCRUB_DEV_MAX_NAME_LENGTH, "%s", name);
+ err = device_register(hdev);
+ if (err) {
+ put_device(hdev);
+ return ERR_PTR(err);
+ }
+
+ return hdev;
+}
+
+static void devm_scrub_release(void *dev)
+{
+ struct device *hdev = dev;
+
+ device_unregister(hdev);
+}
+
+/**
+ * devm_scrub_device_register - register hw scrubber device
+ * @dev: the parent device
+ * @name: hw scrubber name attribute
+ * @drvdata: driver data to attach to created device
+ * @ops: pointer to scrub_ops structure (optional)
+ * @region_id: region ID
+ * @attr_group: input attribute group (optional)
+ *
+ * Returns the pointer to the new device. The new device is automatically
+ * unregistered with the parent device.
+ */
+struct device *
+devm_scrub_device_register(struct device *dev, const char *name,
+ void *drvdata,
+ const struct scrub_ops *ops,
+ int region_id,
+ struct attribute_group *attr_group)
+{
+ struct device *hdev;
+ int ret;
+
+ if (!dev || !name)
+ return ERR_PTR(-EINVAL);
+
+ hdev = scrub_device_register(dev, name, drvdata, ops,
+ region_id, attr_group);
+ if (IS_ERR(hdev))
+ return hdev;
+
+ ret = devm_add_action_or_reset(dev, devm_scrub_release, hdev);
+ if (ret)
+ return ERR_PTR(ret);
+
+ return hdev;
+}
+EXPORT_SYMBOL_GPL(devm_scrub_device_register);
+
+static int __init memory_scrub_control_init(void)
+{
+ int err;
+
+ err = class_register(&scrub_class);
+ if (err) {
+ pr_err("couldn't register memory scrub control sysfs class\n");
+ return err;
+ }
+
+ return 0;
+}
+subsys_initcall(memory_scrub_control_init);
diff --git a/include/memory/memory-scrub.h b/include/memory/memory-scrub.h
new file mode 100755
index 000000000000..3d7054e98b9a
--- /dev/null
+++ b/include/memory/memory-scrub.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Memory scrub controller driver support to configure
+ * the controls of the memory scrub and enable.
+ *
+ * Copyright (c) 2023 HiSilicon Limited.
+ */
+
+#ifndef __MEMORY_SCRUB_H
+#define __MEMORY_SCRUB_H
+
+#include <linux/types.h>
+
+enum scrub_types {
+ scrub_common,
+ scrub_max,
+};
+
+enum scrub_attributes {
+ scrub_addr_base,
+ scrub_addr_size,
+ scrub_enable,
+ scrub_enable_background_scrub,
+ scrub_rate,
+ scrub_rate_available,
+ max_attrs,
+};
+
+/**
+ * struct scrub_ops - scrub device operations
+ * @is_visible: Callback to return attribute visibility. Mandatory.
+ * Parameters are:
+ * @dev: pointer to hardware scrub device
+ * @attr: scrub attribute
+ * @region_id: memory region id
+ * The function returns the file permissions.
+ * If the return value is 0, no attribute will be created.
+ * @read: Read callback for data attributes. Mandatory if readable
+ * data attributes are present.
+ * Parameters are:
+ * @dev: pointer to hardware scrub device
+ * @attr: scrub attribute
+ * @region_id:
+ * memory region id
+ * @val: pointer to returned value
+ * The function returns 0 on success or a negative error number.
+ * @read_string: Read callback for string attributes. Mandatory if string
+ * attributes are present.
+ * Parameters are:
+ * @dev: pointer to hardware scrub device
+ * @attr: scrub attribute
+ * @region_id:
+ * memory region id
+ * @buf: pointer to buffer to copy string
+ * The function returns 0 on success or a negative error number.
+ * @write: Write callback for data attributes. Mandatory if writeable
+ * data attributes are present.
+ * Parameters are:
+ * @dev: pointer to hardware scrub device
+ * @attr: scrub attribute
+ * @region_id:
+ * memory region id
+ * @val: value to write
+ * The function returns 0 on success or a negative error number.
+ */
+struct scrub_ops {
+ umode_t (*is_visible)(struct device *dev, u32 attr, int region_id);
+ int (*read)(struct device *dev, u32 attr, int region_id, u64 *val);
+ int (*read_string)(struct device *dev, u32 attr, int region_id, char *buf);
+ int (*write)(struct device *dev, u32 attr, int region_id, u64 val);
+};
+
+struct device *
+devm_scrub_device_register(struct device *dev, const char *name,
+ void *drvdata, const struct scrub_ops *ops,
+ int region_id,
+ struct attribute_group *attr_group);
+#endif /* __MEMORY_SCRUB_H */
--
2.34.1
From: Shiju Jose <[email protected]>
Memory RAS2 driver binds to the platform device add by the ACPI RAS2
driver.
Driver registers the PCC channel for communicating with the ACPI compliant
platform that contains RAS2 command support in the hardware.
Add interface functions to support configuring the parameters of HW patrol
scrubs in the system, which exposed to the kernel via the RAS2 and PCC,
using the RAS2 commands.
Add support for RAS2 platform devices to register with scrub subsystem
driver. This enables user to configure the parameters of HW patrol scrubs,
which exposed to the kernel via the RAS2 table, through the scrub sysfs
attributes.
Open Question:
Sysfs scrub control attribute "enable_background_scrub" is added for RAS2,
based on the feedback from Bill Schwartz <[email protected]
on v4 to enable/disable the background_scrubbing in the platform as defined in the
“Configure Scrub Parameters [INPUT]“ field in RAS2 Table 5.87: Parameter Block
Structure for PATROL_SCRUB.
Is it a right approach to support "enable_background_scrub" in the sysfs
scrub control?
Signed-off-by: Shiju Jose <[email protected]>
---
drivers/memory/Kconfig | 14 ++
drivers/memory/Makefile | 2 +
drivers/memory/ras2.c | 354 +++++++++++++++++++++++++++++++++++
drivers/memory/rasf_common.c | 269 ++++++++++++++++++++++++++
include/memory/rasf.h | 88 +++++++++
5 files changed, 727 insertions(+)
create mode 100644 drivers/memory/ras2.c
create mode 100644 drivers/memory/rasf_common.c
create mode 100755 include/memory/rasf.h
diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
index d2e015c09d83..5fff18fcd3e2 100644
--- a/drivers/memory/Kconfig
+++ b/drivers/memory/Kconfig
@@ -225,6 +225,20 @@ config STM32_FMC2_EBI
devices (like SRAM, ethernet adapters, FPGAs, LCD displays, ...) on
SOCs containing the FMC2 External Bus Interface.
+config MEM_RASF
+ bool "Memory RASF driver"
+ depends on ACPI_RASF
+ depends on SCRUB
+ help
+ The driver bound to the platform device added by the ACPI RAS2
+ driver. Driver registers the PCC channel for communicating with
+ the ACPI compliant platform that contains RAS2 command support
+ in the hardware.
+ Registers with the scrub configure driver to provide sysfs interfaces
+ for configuring the hw patrol scrubber in the system, which exposed
+ via the ACPI RAS2 table and PCC. Provides the interface functions
+ support configuring the HW patrol scrubbers in the system.
+
source "drivers/memory/samsung/Kconfig"
source "drivers/memory/tegra/Kconfig"
source "drivers/memory/scrub/Kconfig"
diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
index 4b37312cb342..4bd7653e1dce 100644
--- a/drivers/memory/Makefile
+++ b/drivers/memory/Makefile
@@ -7,6 +7,8 @@ obj-$(CONFIG_DDR) += jedec_ddr_data.o
ifeq ($(CONFIG_DDR),y)
obj-$(CONFIG_OF) += of_memory.o
endif
+obj-$(CONFIG_MEM_RASF) += rasf_common.o ras2.o
+
obj-$(CONFIG_ARM_PL172_MPMC) += pl172.o
obj-$(CONFIG_ATMEL_EBI) += atmel-ebi.o
obj-$(CONFIG_BRCMSTB_DPFE) += brcmstb_dpfe.o
diff --git a/drivers/memory/ras2.c b/drivers/memory/ras2.c
new file mode 100644
index 000000000000..046e4ae7eaf0
--- /dev/null
+++ b/drivers/memory/ras2.c
@@ -0,0 +1,354 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * ras2.c - ACPI RAS2 memory driver
+ *
+ * Copyright (c) 2023 HiSilicon Limited.
+ *
+ * - Registers the PCC channel for communicating with the
+ * ACPI compliant platform that contains RAS2 command
+ * support in the hardware.
+ * - Provides functions to configure HW patrol scrubs
+ * in the system.
+ * - Registers with the scrub configure driver for the
+ * hw patrol scrub in the system, which exposed via
+ * the ACPI RAS2 table and PCC.
+ */
+
+#define pr_fmt(fmt) "MEMORY RAS2: " fmt
+
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/cleanup.h>
+
+#include <acpi/rasf_acpi.h>
+#include <memory/rasf.h>
+
+/* RAS2 specific definitions. */
+#define RAS2_SCRUB "ras2_scrub"
+#define RAS2_ID_FORMAT RAS2_SCRUB "%d"
+#define RAS2_SUPPORT_HW_PARTOL_SCRUB BIT(0)
+#define RAS2_TYPE_PATROL_SCRUB 0x0000
+
+#define RAS2_GET_PATROL_PARAMETERS 0x01
+#define RAS2_START_PATROL_SCRUBBER 0x02
+#define RAS2_STOP_PATROL_SCRUBBER 0x03
+
+#define RAS2_PATROL_SCRUB_RATE_VALID BIT(0)
+#define RAS2_PATROL_SCRUB_RATE_IN_MASK GENMASK(15, 8)
+#define RAS2_PATROL_SCRUB_EN_BACKGROUND BIT(0)
+#define RAS2_PATROL_SCRUB_RATE_OUT_MASK GENMASK(7, 0)
+#define RAS2_PATROL_SCRUB_MIN_RATE_OUT_MASK GENMASK(15, 8)
+#define RAS2_PATROL_SCRUB_MAX_RATE_OUT_MASK GENMASK(23, 16)
+
+static void ras2_tx_done(struct mbox_client *cl, void *msg, int ret)
+{
+ if (ret) {
+ dev_dbg(cl->dev, "TX did not complete: CMD sent:%x, ret:%d\n",
+ *(u16 *)msg, ret);
+ } else {
+ dev_dbg(cl->dev, "TX completed. CMD sent:%x, ret:%d\n",
+ *(u16 *)msg, ret);
+ }
+}
+
+/*
+ * The below functions are exposed to OSPM, to query, configure and
+ * initiate memory patrol scrub.
+ */
+static int ras2_is_patrol_scrub_support(struct rasf_context *ras2_ctx)
+{
+ int ret;
+ struct acpi_ras2_shared_memory __iomem *generic_comm_base;
+
+ if (!ras2_ctx || !ras2_ctx->pcc_comm_addr)
+ return -EFAULT;
+
+ generic_comm_base = ras2_ctx->pcc_comm_addr;
+ guard(spinlock_irqsave)(&ras2_ctx->spinlock);
+ generic_comm_base->set_capabilities[0] = 0;
+
+ /* send command for reading RAS2 capabilities */
+ ret = rasf_send_pcc_cmd(ras2_ctx, RASF_PCC_CMD_EXEC);
+ if (ret) {
+ pr_err("%s: rasf_send_pcc_cmd failed\n", __func__);
+ return ret;
+ }
+
+ return generic_comm_base->features[0] & RAS2_SUPPORT_HW_PARTOL_SCRUB;
+}
+
+static int ras2_get_patrol_scrub_params(struct rasf_context *ras2_ctx,
+ struct rasf_scrub_params *params)
+{
+ int ret = 0;
+ u8 min_supp_scrub_rate, max_supp_scrub_rate;
+ struct acpi_ras2_shared_memory __iomem *generic_comm_base;
+ struct acpi_ras2_patrol_scrub_parameter __iomem *patrol_scrub_params;
+
+ if (!ras2_ctx || !ras2_ctx->pcc_comm_addr)
+ return -EFAULT;
+
+ generic_comm_base = ras2_ctx->pcc_comm_addr;
+ patrol_scrub_params = ras2_ctx->pcc_comm_addr + sizeof(*generic_comm_base);
+
+ guard(spinlock_irqsave)(&ras2_ctx->spinlock);
+ generic_comm_base->set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+ /* send command for reading RASF capabilities */
+ ret = rasf_send_pcc_cmd(ras2_ctx, RASF_PCC_CMD_EXEC);
+ if (ret) {
+ pr_err("%s: rasf_send_pcc_cmd failed\n", __func__);
+ return ret;
+ }
+
+ if (!(generic_comm_base->features[0] & RAS2_SUPPORT_HW_PARTOL_SCRUB) ||
+ !(generic_comm_base->num_parameter_blocks)) {
+ pr_err("%s: Platform does not support HW Patrol Scrubber\n", __func__);
+ return -ENOTSUPP;
+ }
+
+ if (!patrol_scrub_params->requested_address_range[1]) {
+ pr_err("%s: Invalid requested address range, \
+ requested_address_range[0]=0x%llx \
+ requested_address_range[1]=0x%llx\n",
+ __func__,
+ patrol_scrub_params->requested_address_range[0],
+ patrol_scrub_params->requested_address_range[1]);
+ return -ENOTSUPP;
+ }
+
+ generic_comm_base->set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+ patrol_scrub_params->header.type = RAS2_TYPE_PATROL_SCRUB;
+ patrol_scrub_params->patrol_scrub_command = RAS2_GET_PATROL_PARAMETERS;
+
+ /* send command for reading the HW patrol scrub parameters */
+ ret = rasf_send_pcc_cmd(ras2_ctx, RASF_PCC_CMD_EXEC);
+ if (ret) {
+ pr_err("%s: failed to read HW patrol scrub parameters\n", __func__);
+ return ret;
+ }
+
+ /* copy output scrub parameters */
+ params->addr_base = patrol_scrub_params->actual_address_range[0];
+ params->addr_size = patrol_scrub_params->actual_address_range[1];
+ params->flags = patrol_scrub_params->flags;
+ params->rate = FIELD_GET(RAS2_PATROL_SCRUB_RATE_OUT_MASK,
+ patrol_scrub_params->scrub_params_out);
+ min_supp_scrub_rate = FIELD_GET(RAS2_PATROL_SCRUB_MIN_RATE_OUT_MASK,
+ patrol_scrub_params->scrub_params_out);
+ max_supp_scrub_rate = FIELD_GET(RAS2_PATROL_SCRUB_MAX_RATE_OUT_MASK,
+ patrol_scrub_params->scrub_params_out);
+ snprintf(params->rate_avail, RASF_MAX_RATE_RANGE_LENGTH,
+ "%d-%d", min_supp_scrub_rate, max_supp_scrub_rate);
+
+ return 0;
+}
+
+static int ras2_enable_patrol_scrub(struct rasf_context *ras2_ctx, bool enable)
+{
+ int ret = 0;
+ struct rasf_scrub_params params;
+ struct acpi_ras2_shared_memory __iomem *generic_comm_base;
+ u8 scrub_rate_to_set, min_supp_scrub_rate, max_supp_scrub_rate;
+ struct acpi_ras2_patrol_scrub_parameter __iomem *patrol_scrub_params;
+
+ if (!ras2_ctx || !ras2_ctx->pcc_comm_addr)
+ return -EFAULT;
+
+ generic_comm_base = ras2_ctx->pcc_comm_addr;
+ patrol_scrub_params = ras2_ctx->pcc_comm_addr + sizeof(*generic_comm_base);
+
+ if (enable) {
+ ret = ras2_get_patrol_scrub_params(ras2_ctx, ¶ms);
+ if (ret)
+ return ret;
+ }
+
+ guard(spinlock_irqsave)(&ras2_ctx->spinlock);
+ generic_comm_base->set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+ patrol_scrub_params->header.type = RAS2_TYPE_PATROL_SCRUB;
+
+ if (enable) {
+ patrol_scrub_params->patrol_scrub_command = RAS2_START_PATROL_SCRUBBER;
+ patrol_scrub_params->requested_address_range[0] = params.addr_base;
+ patrol_scrub_params->requested_address_range[1] = params.addr_size;
+
+ scrub_rate_to_set = FIELD_GET(RAS2_PATROL_SCRUB_RATE_IN_MASK,
+ patrol_scrub_params->scrub_params_in);
+ min_supp_scrub_rate = FIELD_GET(RAS2_PATROL_SCRUB_MIN_RATE_OUT_MASK,
+ patrol_scrub_params->scrub_params_out);
+ max_supp_scrub_rate = FIELD_GET(RAS2_PATROL_SCRUB_MAX_RATE_OUT_MASK,
+ patrol_scrub_params->scrub_params_out);
+ if (scrub_rate_to_set < min_supp_scrub_rate ||
+ scrub_rate_to_set > max_supp_scrub_rate) {
+ pr_warn("patrol scrub rate to set is out of the supported range\n");
+ pr_warn("min_supp_scrub_rate=%d max_supp_scrub_rate=%d\n",
+ min_supp_scrub_rate, max_supp_scrub_rate);
+ return -EINVAL;
+ }
+ } else {
+ patrol_scrub_params->patrol_scrub_command = RAS2_STOP_PATROL_SCRUBBER;
+ }
+
+ /* send command for enable/disable HW patrol scrub */
+ ret = rasf_send_pcc_cmd(ras2_ctx, RASF_PCC_CMD_EXEC);
+ if (ret) {
+ pr_err("%s: failed to enable/disable the HW patrol scrub\n", __func__);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int ras2_enable_background_scrub(struct rasf_context *ras2_ctx, bool enable)
+{
+ int ret;
+ struct acpi_ras2_shared_memory __iomem *generic_comm_base;
+ struct acpi_ras2_patrol_scrub_parameter __iomem *patrol_scrub_params;
+
+ if (!ras2_ctx || !ras2_ctx->pcc_comm_addr)
+ return -EFAULT;
+
+ generic_comm_base = ras2_ctx->pcc_comm_addr;
+ patrol_scrub_params = ras2_ctx->pcc_comm_addr + sizeof(*generic_comm_base);
+
+ guard(spinlock_irqsave)(&ras2_ctx->spinlock);
+ generic_comm_base->set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+ patrol_scrub_params->header.type = RAS2_TYPE_PATROL_SCRUB;
+ patrol_scrub_params->patrol_scrub_command = RAS2_START_PATROL_SCRUBBER;
+
+ patrol_scrub_params->scrub_params_in &= ~RAS2_PATROL_SCRUB_EN_BACKGROUND;
+ patrol_scrub_params->scrub_params_in |= FIELD_PREP(RAS2_PATROL_SCRUB_EN_BACKGROUND,
+ enable);
+
+ /* send command for enable/disable HW patrol scrub */
+ ret = rasf_send_pcc_cmd(ras2_ctx, RASF_PCC_CMD_EXEC);
+ if (ret) {
+ pr_err("%s: failed to enable/disable background patrol scrubbing\n", __func__);
+ return ret;
+ }
+
+ return 0;
+}
+static int ras2_set_patrol_scrub_params(struct rasf_context *ras2_ctx,
+ struct rasf_scrub_params *params, u8 param_type)
+{
+ struct acpi_ras2_shared_memory __iomem *generic_comm_base;
+ struct acpi_ras2_patrol_scrub_parameter __iomem *patrol_scrub_params;
+
+ if (!ras2_ctx || !ras2_ctx->pcc_comm_addr)
+ return -EFAULT;
+
+ generic_comm_base = ras2_ctx->pcc_comm_addr;
+ patrol_scrub_params = ras2_ctx->pcc_comm_addr + sizeof(*generic_comm_base);
+
+ guard(spinlock_irqsave)(&ras2_ctx->spinlock);
+ patrol_scrub_params->header.type = RAS2_TYPE_PATROL_SCRUB;
+ if (param_type == RASF_MEM_SCRUB_PARAM_ADDR_BASE && params->addr_base) {
+ patrol_scrub_params->requested_address_range[0] = params->addr_base;
+ } else if (param_type == RASF_MEM_SCRUB_PARAM_ADDR_SIZE && params->addr_size) {
+ patrol_scrub_params->requested_address_range[1] = params->addr_size;
+ } else if (param_type == RASF_MEM_SCRUB_PARAM_RATE) {
+ patrol_scrub_params->scrub_params_in &= ~RAS2_PATROL_SCRUB_RATE_IN_MASK;
+ patrol_scrub_params->scrub_params_in |= FIELD_PREP(RAS2_PATROL_SCRUB_RATE_IN_MASK,
+ params->rate);
+ } else {
+ pr_err("Invalid patrol scrub parameter to set\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static const struct rasf_hw_scrub_ops ras2_hw_ops = {
+ .enable_scrub = ras2_enable_patrol_scrub,
+ .enable_background_scrub = ras2_enable_background_scrub,
+ .get_scrub_params = ras2_get_patrol_scrub_params,
+ .set_scrub_params = ras2_set_patrol_scrub_params,
+};
+
+static const struct scrub_ops ras2_scrub_ops = {
+ .is_visible = rasf_hw_scrub_is_visible,
+ .read = rasf_hw_scrub_read,
+ .write = rasf_hw_scrub_write,
+ .read_string = rasf_hw_scrub_read_strings,
+};
+
+static DEFINE_IDA(ras2_ida);
+
+static void devm_ras2_release(void *ctx)
+{
+ struct rasf_context *ras2_ctx = ctx;
+
+ ida_free(&ras2_ida, ras2_ctx->id);
+ rasf_unregister_pcc_channel(ras2_ctx);
+}
+
+static int ras2_probe(struct platform_device *pdev)
+{
+ int ret, id;
+ struct mbox_client *cl;
+ struct device *hw_scrub_dev;
+ struct rasf_context *ras2_ctx;
+ char scrub_name[RASF_MAX_NAME_LENGTH];
+
+ ras2_ctx = devm_kzalloc(&pdev->dev, sizeof(*ras2_ctx), GFP_KERNEL);
+ if (!ras2_ctx)
+ return -ENOMEM;
+
+ ras2_ctx->dev = &pdev->dev;
+ ras2_ctx->ops = &ras2_hw_ops;
+ spin_lock_init(&ras2_ctx->spinlock);
+ platform_set_drvdata(pdev, ras2_ctx);
+
+ cl = &ras2_ctx->mbox_client;
+ /* Request mailbox channel */
+ cl->dev = &pdev->dev;
+ cl->tx_done = ras2_tx_done;
+ cl->knows_txdone = true;
+ ras2_ctx->pcc_subspace_idx = *((int *)pdev->dev.platform_data);
+ dev_dbg(&pdev->dev, "pcc-subspace-id=%d\n", ras2_ctx->pcc_subspace_idx);
+ ret = rasf_register_pcc_channel(ras2_ctx);
+ if (ret < 0)
+ return ret;
+
+ ret = devm_add_action_or_reset(&pdev->dev, devm_ras2_release, ras2_ctx);
+ if (ret < 0)
+ return ret;
+
+ if (ras2_is_patrol_scrub_support(ras2_ctx)) {
+ id = ida_alloc(&ras2_ida, GFP_KERNEL);
+ if (id < 0)
+ return id;
+ ras2_ctx->id = id;
+ snprintf(scrub_name, sizeof(scrub_name), "%s%d", RAS2_SCRUB, id);
+ dev_set_name(&pdev->dev, RAS2_ID_FORMAT, id);
+ hw_scrub_dev = devm_scrub_device_register(&pdev->dev, scrub_name,
+ ras2_ctx, &ras2_scrub_ops,
+ 0, NULL);
+ if (PTR_ERR_OR_ZERO(hw_scrub_dev))
+ return PTR_ERR_OR_ZERO(hw_scrub_dev);
+ }
+ ras2_ctx->scrub_dev = hw_scrub_dev;
+
+ return 0;
+}
+
+static const struct platform_device_id ras2_id_table[] = {
+ { .name = "ras2", },
+ { }
+};
+MODULE_DEVICE_TABLE(platform, ras2_id_table);
+
+static struct platform_driver ras2_driver = {
+ .probe = ras2_probe,
+ .driver = {
+ .name = "ras2",
+ .suppress_bind_attrs = true,
+ },
+ .id_table = ras2_id_table,
+};
+module_driver(ras2_driver, platform_driver_register, platform_driver_unregister);
+
+MODULE_DESCRIPTION("ras2 memory driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/memory/rasf_common.c b/drivers/memory/rasf_common.c
new file mode 100644
index 000000000000..85f67308698d
--- /dev/null
+++ b/drivers/memory/rasf_common.c
@@ -0,0 +1,269 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * rasf_common.c - Common functions for memory RASF driver
+ *
+ * Copyright (c) 2023 HiSilicon Limited.
+ *
+ * This driver implements call back functions for the scrub
+ * configure driver to configure the parameters of the hw patrol
+ * scrubbers in the system, which exposed via the ACPI RASF/RAS2
+ * table and PCC.
+ */
+
+#define pr_fmt(fmt) "MEMORY RASF COMMON: " fmt
+
+#include <linux/acpi.h>
+#include <linux/io.h>
+#include <linux/interrupt.h>
+#include <linux/mailbox_controller.h>
+#include <linux/mailbox_client.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+
+#include <acpi/rasf_acpi.h>
+#include <memory/rasf.h>
+
+static int enable_write(struct rasf_context *rasf_ctx, long val)
+{
+ int ret;
+ bool enable = val;
+
+ ret = rasf_ctx->ops->enable_scrub(rasf_ctx, enable);
+ if (ret) {
+ pr_err("enable patrol scrub fail, enable=%d ret=%d\n",
+ enable, ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int enable_background_scrub_write(struct rasf_context *rasf_ctx, long val)
+{
+ int ret;
+ bool enable = val;
+
+ ret = rasf_ctx->ops->enable_background_scrub(rasf_ctx, enable);
+ if (ret) {
+ pr_err("enable background patrol scrub fail, enable=%d ret=%d\n",
+ enable, ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int addr_base_read(struct rasf_context *rasf_ctx, u64 *val)
+{
+ int ret;
+ struct rasf_scrub_params params;
+
+ ret = rasf_ctx->ops->get_scrub_params(rasf_ctx, ¶ms);
+ if (ret) {
+ pr_err("get patrol scrub params fail ret=%d\n", ret);
+ return ret;
+ }
+ *val = params.addr_base;
+
+ return 0;
+}
+
+static int addr_base_write(struct rasf_context *rasf_ctx, u64 val)
+{
+ int ret;
+ struct rasf_scrub_params params;
+
+ params.addr_base = val;
+ ret = rasf_ctx->ops->set_scrub_params(rasf_ctx, ¶ms, RASF_MEM_SCRUB_PARAM_ADDR_BASE);
+ if (ret) {
+ pr_err("set patrol scrub params for addr_base fail ret=%d\n", ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int addr_size_read(struct rasf_context *rasf_ctx, u64 *val)
+{
+ int ret;
+ struct rasf_scrub_params params;
+
+ ret = rasf_ctx->ops->get_scrub_params(rasf_ctx, ¶ms);
+ if (ret) {
+ pr_err("get patrol scrub params fail ret=%d\n", ret);
+ return ret;
+ }
+ *val = params.addr_size;
+
+ return 0;
+}
+
+static int addr_size_write(struct rasf_context *rasf_ctx, u64 val)
+{
+ int ret;
+ struct rasf_scrub_params params;
+
+ params.addr_size = val;
+ ret = rasf_ctx->ops->set_scrub_params(rasf_ctx, ¶ms, RASF_MEM_SCRUB_PARAM_ADDR_SIZE);
+ if (ret) {
+ pr_err("set patrol scrub params for addr_size fail ret=%d\n", ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int rate_read(struct rasf_context *rasf_ctx, u64 *val)
+{
+ int ret;
+ struct rasf_scrub_params params;
+
+ ret = rasf_ctx->ops->get_scrub_params(rasf_ctx, ¶ms);
+ if (ret) {
+ pr_err("get patrol scrub params fail ret=%d\n", ret);
+ return ret;
+ }
+ *val = params.rate;
+
+ return 0;
+}
+
+static int rate_write(struct rasf_context *rasf_ctx, long val)
+{
+ int ret;
+ struct rasf_scrub_params params;
+
+ params.rate = val;
+ ret = rasf_ctx->ops->set_scrub_params(rasf_ctx, ¶ms, RASF_MEM_SCRUB_PARAM_RATE);
+ if (ret) {
+ pr_err("set patrol scrub params for rate fail ret=%d\n", ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int rate_available_read(struct rasf_context *rasf_ctx, char *buf)
+{
+ int ret;
+ struct rasf_scrub_params params;
+
+ ret = rasf_ctx->ops->get_scrub_params(rasf_ctx, ¶ms);
+ if (ret) {
+ pr_err("get patrol scrub params fail ret=%d\n", ret);
+ return ret;
+ }
+
+ sprintf(buf, "%s\n", params.rate_avail);
+
+ return 0;
+}
+
+/**
+ * rasf_hw_scrub_is_visible() - Callback to return attribute visibility
+ * @drv_data: Pointer to driver-private data structure passed
+ * as argument to devm_scrub_device_register().
+ * @attr_id: Scrub attribute
+ * @region_id: ID of the memory region
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+umode_t rasf_hw_scrub_is_visible(struct device *dev, u32 attr_id, int region_id)
+{
+ switch (attr_id) {
+ case scrub_rate_available:
+ return 0444;
+ case scrub_enable:
+ case scrub_enable_background_scrub:
+ return 0200;
+ case scrub_addr_base:
+ case scrub_addr_size:
+ case scrub_rate:
+ return 0644;
+ default:
+ return 0;
+ }
+}
+
+/**
+ * rasf_hw_scrub_read() - Read callback for data attributes
+ * @device: Pointer to scrub device
+ * @attr_id: Scrub attribute
+ * @region_id: ID of the memory region
+ * @val: Pointer to the returned data
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int rasf_hw_scrub_read(struct device *device, u32 attr_id, int region_id, u64 *val)
+{
+ struct rasf_context *rasf_ctx;
+
+ rasf_ctx = dev_get_drvdata(device);
+
+ switch (attr_id) {
+ case scrub_addr_base:
+ return addr_base_read(rasf_ctx, val);
+ case scrub_addr_size:
+ return addr_size_read(rasf_ctx, val);
+ case scrub_rate:
+ return rate_read(rasf_ctx, val);
+ default:
+ return -ENOTSUPP;
+ }
+}
+
+/**
+ * rasf_hw_scrub_write() - Write callback for data attributes
+ * @device: Pointer to scrub device
+ * @attr_id: Scrub attribute
+ * @region_id: ID of the memory region
+ * @val: Value to write
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int rasf_hw_scrub_write(struct device *device, u32 attr_id, int region_id, u64 val)
+{
+ struct rasf_context *rasf_ctx;
+
+ rasf_ctx = dev_get_drvdata(device);
+
+ switch (attr_id) {
+ case scrub_addr_base:
+ return addr_base_write(rasf_ctx, val);
+ case scrub_addr_size:
+ return addr_size_write(rasf_ctx, val);
+ case scrub_enable:
+ return enable_write(rasf_ctx, val);
+ case scrub_enable_background_scrub:
+ return enable_background_scrub_write(rasf_ctx, val);
+ case scrub_rate:
+ return rate_write(rasf_ctx, val);
+ default:
+ return -ENOTSUPP;
+ }
+}
+
+/**
+ * rasf_hw_scrub_read_strings() - Read callback for string attributes
+ * @device: Pointer to scrub device
+ * @attr_id: Scrub attribute
+ * @region_id: ID of the memory region
+ * @buf: Pointer to the buffer for copying returned string
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int rasf_hw_scrub_read_strings(struct device *device, u32 attr_id, int region_id,
+ char *buf)
+{
+ struct rasf_context *rasf_ctx;
+
+ rasf_ctx = dev_get_drvdata(device);
+
+ switch (attr_id) {
+ case scrub_rate_available:
+ return rate_available_read(rasf_ctx, buf);
+ default:
+ return -ENOTSUPP;
+ }
+}
diff --git a/include/memory/rasf.h b/include/memory/rasf.h
new file mode 100755
index 000000000000..aacfa84dcc6f
--- /dev/null
+++ b/include/memory/rasf.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0 */
+/*
+ * Memory RASF driver header file
+ *
+ * Copyright (c) 2023 HiSilicon Limited
+ */
+
+#ifndef _RASF_H
+#define _RASF_H
+
+#include <memory/memory-scrub.h>
+
+#define RASF_MAX_NAME_LENGTH 64
+#define RASF_MAX_RATE_RANGE_LENGTH 64
+
+/*
+ * Data structures RASF
+ */
+
+/**
+ * struct rasf_scrub_params- RASF scrub parameter data structure.
+ * @addr_base: [IN] Base address of the address range to be patrol scrubbed.
+ * [OUT] Base address of the actual address range.
+ * @addr_size: [IN] Size of the address range to be patrol scrubbed.
+ * [OUT] Size of the actual address range.
+ * @flags: [OUT] The platform returns this value in response to
+ * GET_PATROL_PARAMETERS.
+ * For RASF and RAS2:
+ * Bit [0]: Will be set if memory scrubber is already
+ * running for address range specified in “Actual Address Range”.
+ * For RASF:
+ * Bits [3:1]: Current Patrol rate, if Bit [0] is set.
+ * @rate: [IN] Requested patrol scrub rate.
+ * [OUT] Current patrol scrub rate.
+ * @rate_avail:[OUT] Supported patrol rates.
+ */
+struct rasf_scrub_params {
+ u64 addr_base;
+ u64 addr_size;
+ u16 flags;
+ u32 rate;
+ char rate_avail[RASF_MAX_RATE_RANGE_LENGTH];
+};
+
+enum {
+ RASF_MEM_SCRUB_PARAM_ADDR_BASE = 0,
+ RASF_MEM_SCRUB_PARAM_ADDR_SIZE,
+ RASF_MEM_SCRUB_PARAM_RATE,
+};
+
+/**
+ * struct rasf_hw_scrub_ops - rasf hw scrub device operations
+ * @enable_scrub: Function to enable/disable RASF/RAS2 scrubber.
+ * Parameters are:
+ * @rasf_ctx: Pointer to RASF/RAS2 context structure.
+ * @enable: enable/disable RASF/RAS2 patrol scrubber.
+ * The function returns 0 on success or a negative error number.
+ * @enable_background_scrub: Function to enable/disable background scrubbing.
+ * Parameters are:
+ * @rasf_ctx: Pointer to RASF/RAS2 context structure.
+ * @enable: enable/disable background patrol scrubbing.
+ * The function returns 0 on success or a negative error number.
+ * @get_scrub_params: Read scrubber parameters. Mandatory
+ * Parameters are:
+ * @rasf_ctx: Pointer to RASF/RAS2 context structure.
+ * @params: Pointer to scrub params data structure.
+ * The function returns 0 on success or a negative error number.
+ * @set_scrub_params: Set scrubber parameters. Mandatory.
+ * Parameters are:
+ * @rasf_ctx: Pointer to RASF/RAS2 context structure.
+ * @params: Pointer to scrub params data structure.
+ * @param_type: Scrub parameter type to set.
+ * The function returns 0 on success or a negative error number.
+ */
+struct rasf_hw_scrub_ops {
+ int (*enable_scrub)(struct rasf_context *rasf_ctx, bool enable);
+ int (*enable_background_scrub)(struct rasf_context *rasf_ctx, bool enable);
+ int (*get_scrub_params)(struct rasf_context *rasf_ctx,
+ struct rasf_scrub_params *params);
+ int (*set_scrub_params)(struct rasf_context *rasf_ctx,
+ struct rasf_scrub_params *params, u8 param_type);
+};
+
+umode_t rasf_hw_scrub_is_visible(struct device *dev, u32 attr_id, int region_id);
+int rasf_hw_scrub_read(struct device *dev, u32 attr_id, int region_id, u64 *val);
+int rasf_hw_scrub_write(struct device *dev, u32 attr_id, int region_id, u64 val);
+int rasf_hw_scrub_read_strings(struct device *dev, u32 attr_id, int region_id, char *buf);
+#endif /* _RASF_H */
--
2.34.1
From: Shiju Jose <[email protected]>
Add support for GET_SUPPORTED_FEATURES mailbox command.
CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
CXL devices supports features with changeable attributes.
Get Supported Features retrieves the list of supported device specific
features. The settings of a feature can be retrieved using Get Feature
and optionally modified using Set Feature.
Signed-off-by: Shiju Jose <[email protected]>
---
drivers/cxl/core/mbox.c | 23 +++++++++++++++
drivers/cxl/cxlmem.h | 62 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 85 insertions(+)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 27166a411705..191f51f3df0e 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1290,6 +1290,29 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds)
}
EXPORT_SYMBOL_NS_GPL(cxl_set_timestamp, CXL);
+int cxl_get_supported_features(struct cxl_memdev_state *mds,
+ struct cxl_mbox_get_supp_feats_in *pi,
+ void *feats_out)
+{
+ struct cxl_mbox_cmd mbox_cmd;
+ int rc;
+
+ mbox_cmd = (struct cxl_mbox_cmd) {
+ .opcode = CXL_MBOX_OP_GET_SUPPORTED_FEATURES,
+ .size_in = sizeof(*pi),
+ .payload_in = pi,
+ .size_out = le32_to_cpu(pi->count),
+ .payload_out = feats_out,
+ .min_out = sizeof(struct cxl_mbox_get_supp_feats_out),
+ };
+ rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ if (rc < 0)
+ return rc;
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
+
int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
struct cxl_region *cxlr)
{
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 5303d6942b88..23e4d98b9bae 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -529,6 +529,7 @@ enum cxl_opcode {
CXL_MBOX_OP_SET_TIMESTAMP = 0x0301,
CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
CXL_MBOX_OP_GET_LOG = 0x0401,
+ CXL_MBOX_OP_GET_SUPPORTED_FEATURES = 0x0500,
CXL_MBOX_OP_IDENTIFY = 0x4000,
CXL_MBOX_OP_GET_PARTITION_INFO = 0x4100,
CXL_MBOX_OP_SET_PARTITION_INFO = 0x4101,
@@ -698,6 +699,64 @@ struct cxl_mbox_set_timestamp_in {
} __packed;
+/* Get Supported Features CXL 3.1 Spec 8.2.9.6.1 */
+/*
+ * Get Supported Features input payload
+ * CXL rev 3.1 section 8.2.9.6.1 Table 8-95
+ */
+struct cxl_mbox_get_supp_feats_in {
+ __le32 count;
+ __le16 start_index;
+ u16 reserved;
+} __packed;
+
+/*
+ * Get Supported Features Supported Feature Entry
+ * CXL rev 3.1 section 8.2.9.6.1 Table 8-97
+ */
+/* Supported Feature Entry : Payload out attribute flags */
+#define CXL_FEAT_ENTRY_FLAG_CHANGABLE BIT(0)
+#define CXL_FEAT_ENTRY_FLAG_DEEPEST_RESET_PERSISTENCE_MASK GENMASK(3, 1)
+#define CXL_FEAT_ENTRY_FLAG_PERSIST_ACROSS_FIRMWARE_UPDATE BIT(4)
+#define CXL_FEAT_ENTRY_FLAG_SUPPORT_DEFAULT_SELECTION BIT(5)
+#define CXL_FEAT_ENTRY_FLAG_SUPPORT_SAVED_SELECTION BIT(6)
+
+enum cxl_feat_attr_value_persistence {
+ CXL_FEAT_ATTR_VALUE_PERSISTENCE_NONE,
+ CXL_FEAT_ATTR_VALUE_PERSISTENCE_CXL_RESET,
+ CXL_FEAT_ATTR_VALUE_PERSISTENCE_HOT_RESET,
+ CXL_FEAT_ATTR_VALUE_PERSISTENCE_WARM_RESET,
+ CXL_FEAT_ATTR_VALUE_PERSISTENCE_COLD_RESET,
+ CXL_FEAT_ATTR_VALUE_PERSISTENCE_MAX
+};
+
+#define CXL_FEAT_ENTRY_FLAG_PERSISTENCE_ACROSS_FW_UPDATE_MASK BIT(4)
+#define CXL_FEAT_ENTRY_FLAG_PERSISTENCE_DEFAULT_SEL_SUPPORT_MASK BIT(5)
+#define CXL_FEAT_ENTRY_FLAG_PERSISTENCE_SAVED_SEL_SUPPORT_MASK BIT(6)
+
+struct cxl_mbox_supp_feat_entry {
+ uuid_t uuid;
+ __le16 feat_index;
+ __le16 get_feat_size;
+ __le16 set_feat_size;
+ __le32 attr_flags;
+ u8 get_feat_version;
+ u8 set_feat_version;
+ __le16 set_feat_effects;
+ u8 rsvd[18];
+} __packed;
+
+/*
+ * Get Supported Features output payload
+ * CXL rev 3.1 section 8.2.9.6.1 Table 8-96
+ */
+struct cxl_mbox_get_supp_feats_out {
+ __le16 entries;
+ __le16 nsuppfeats_dev;
+ u32 reserved;
+ struct cxl_mbox_supp_feat_entry feat_entries[];
+} __packed;
+
/* Get Poison List CXL 3.0 Spec 8.2.9.8.4.1 */
struct cxl_mbox_poison_in {
__le64 offset;
@@ -829,6 +888,9 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_type event_type,
const uuid_t *uuid, union cxl_event *evt);
int cxl_set_timestamp(struct cxl_memdev_state *mds);
+int cxl_get_supported_features(struct cxl_memdev_state *mds,
+ struct cxl_mbox_get_supp_feats_in *pi,
+ void *feats_out);
int cxl_poison_state_init(struct cxl_memdev_state *mds);
int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
struct cxl_region *cxlr);
--
2.34.1
From: Shiju Jose <[email protected]>
Add support for ACPI RAS2 feature table(RAS2) defined in the ACPI 6.5
Specification & upwards revision, section 5.2.21.
The RAS2 table provides interfaces for platform RAS features. RAS2 offers
the same services as RASF, but is more scalable than the latter.
RAS2 supports independent RAS controls and capabilities for a given RAS
feature for multiple instances of the same component in a given system.
The platform can support either RAS2 or RASF but not both.
Link: https://github.com/acpica/acpica/pull/899
Signed-off-by: Shiju Jose <[email protected]>
---
include/acpi/actbl2.h | 137 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 137 insertions(+)
diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index 9775384d61c6..15c271657f9f 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -47,6 +47,7 @@
#define ACPI_SIG_PPTT "PPTT" /* Processor Properties Topology Table */
#define ACPI_SIG_PRMT "PRMT" /* Platform Runtime Mechanism Table */
#define ACPI_SIG_RASF "RASF" /* RAS Feature table */
+#define ACPI_SIG_RAS2 "RAS2" /* RAS2 Feature table */
#define ACPI_SIG_RGRT "RGRT" /* Regulatory Graphics Resource Table */
#define ACPI_SIG_RHCT "RHCT" /* RISC-V Hart Capabilities Table */
#define ACPI_SIG_SBST "SBST" /* Smart Battery Specification Table */
@@ -2751,6 +2752,142 @@ enum acpi_rasf_status {
#define ACPI_RASF_ERROR (1<<2)
#define ACPI_RASF_STATUS (0x1F<<3)
+/*******************************************************************************
+ *
+ * RAS2 - RAS2 Feature Table (ACPI 6.5)
+ * Version 2
+ *
+ *
+ ******************************************************************************/
+
+struct acpi_table_ras2 {
+ struct acpi_table_header header; /* Common ACPI table header */
+ u16 reserved;
+ u16 num_pcc_descs;
+};
+
+/*
+ * RAS2 Platform Communication Channel Descriptor
+ */
+
+struct acpi_ras2_pcc_desc {
+ u8 channel_id;
+ u16 reserved;
+ u8 feature_type;
+ u32 instance;
+};
+
+/*
+ * RAS2 Platform Communication Channel Shared Memory Region
+ */
+
+struct acpi_ras2_shared_memory {
+ u32 signature;
+ u16 command;
+ u16 status;
+ u16 version;
+ u8 features[16];
+ u8 set_capabilities[16];
+ u16 num_parameter_blocks;
+ u32 set_capabilities_status;
+};
+
+/* RAS2 Parameter Block Structure Header */
+
+struct acpi_ras2_parameter_block {
+ u16 type;
+ u16 version;
+ u16 length;
+};
+
+/*
+ * RAS2 Parameter Block Structure for PATROL_SCRUB
+ */
+
+struct acpi_ras2_patrol_scrub_parameter {
+ struct acpi_ras2_parameter_block header;
+ u16 patrol_scrub_command;
+ u64 requested_address_range[2];
+ u64 actual_address_range[2];
+ u32 flags;
+ u32 scrub_params_out;
+ u32 scrub_params_in;
+};
+
+/* Masks for Flags field above */
+
+#define ACPI_RAS2_SCRUBBER_RUNNING 1
+
+/*
+ * RAS2 Parameter Block Structure for LA2PA_TRANSLATION
+ */
+
+struct acpi_ras2_la2pa_translation_parameter {
+ struct acpi_ras2_parameter_block header;
+ u16 addr_translation_command;
+ u64 sub_instance_id;
+ u64 logical_address;
+ u64 physical_address;
+ u32 status;
+};
+
+/* Channel Commands */
+
+enum acpi_ras2_commands {
+ ACPI_RAS2_EXECUTE_RAS2_COMMAND = 1
+};
+
+/* Platform RAS2 Features */
+
+enum acpi_ras2_features {
+ ACPI_RAS2_PATROL_SCRUB_SUPPORTED = 0,
+ ACPI_RAS2_LA2PA_TRANSLATION = 1
+};
+
+/* RAS2 Patrol Scrub Commands */
+
+enum acpi_ras2_patrol_scrub_commands {
+ ACPI_RAS2_GET_PATROL_PARAMETERS = 1,
+ ACPI_RAS2_START_PATROL_SCRUBBER = 2,
+ ACPI_RAS2_STOP_PATROL_SCRUBBER = 3
+};
+
+/* RAS2 LA2PA Translation Commands */
+
+enum acpi_ras2_la2pa_translation_commands {
+ ACPI_RAS2_GET_LA2PA_TRANSLATION = 1
+};
+
+/* RAS2 LA2PA Translation Status values */
+
+enum acpi_ras2_la2pa_translation_status {
+ ACPI_RAS2_LA2PA_TRANSLATION_SUCCESS = 0,
+ ACPI_RAS2_LA2PA_TRANSLATION_FAIL = 1
+};
+
+/* Channel Command flags */
+
+#define ACPI_RAS2_GENERATE_SCI (1<<15)
+
+/* Status values */
+
+enum acpi_ras2_status {
+ ACPI_RAS2_SUCCESS = 0,
+ ACPI_RAS2_NOT_VALID = 1,
+ ACPI_RAS2_NOT_SUPPORTED = 2,
+ ACPI_RAS2_BUSY = 3,
+ ACPI_RAS2_FAILED = 4,
+ ACPI_RAS2_ABORTED = 5,
+ ACPI_RAS2_INVALID_DATA = 6
+};
+
+/* Status flags */
+
+#define ACPI_RAS2_COMMAND_COMPLETE (1)
+#define ACPI_RAS2_SCI_DOORBELL (1<<1)
+#define ACPI_RAS2_ERROR (1<<2)
+#define ACPI_RAS2_STATUS (0x1F<<3)
+
/*******************************************************************************
*
* RGRT - Regulatory Graphics Resource Table
--
2.34.1
From: Shiju Jose <[email protected]>
Register with the scrub configure driver to expose the sysfs attributes
to the user for configuring the CXL device memory patrol scrub. Add the
callback functions to support configuring the CXL memory device patrol
scrub.
Signed-off-by: Shiju Jose <[email protected]>
---
drivers/cxl/Kconfig | 6 ++
drivers/cxl/core/memscrub.c | 201 +++++++++++++++++++++++++++++++++++-
2 files changed, 204 insertions(+), 3 deletions(-)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 873bdda5db32..ec9a1877b663 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -162,11 +162,17 @@ config CXL_SCRUB
bool "CXL: Memory scrub feature"
depends on CXL_PCI
depends on CXL_MEM
+ depends on SCRUB
help
The CXL memory scrub control is an optional feature allows host to
control the scrub configurations of CXL Type 3 devices, which
support patrol scrub and/or DDR5 ECS(Error Check Scrub).
+ Register with the scrub configure driver to expose sysfs attributes
+ to the user for configuring the CXL device memory patrol and DDR5 ECS
+ scrubs. Provides the interface functions to support configuring the
+ CXL memory device patrol and ECS scrubs.
+
Say 'y/n' to enable/disable the CXL memory scrub driver that will
attach to CXL.mem devices for memory scrub control feature. See
sections 8.2.9.9.11.1 and 8.2.9.9.11.2 in the CXL 3.1 specification
diff --git a/drivers/cxl/core/memscrub.c b/drivers/cxl/core/memscrub.c
index a3a371c5aa7b..a1fb40f8307f 100644
--- a/drivers/cxl/core/memscrub.c
+++ b/drivers/cxl/core/memscrub.c
@@ -6,14 +6,19 @@
*
* - Provides functions to configure patrol scrub
* and DDR5 ECS features of the CXL memory devices.
+ * - Registers with the scrub driver to expose
+ * the sysfs attributes to the user for configuring
+ * the memory patrol scrub and DDR5 ECS features.
*/
#define pr_fmt(fmt) "CXL_MEM_SCRUB: " fmt
#include <cxlmem.h>
+#include <memory/memory-scrub.h>
/* CXL memory scrub feature common definitions */
#define CXL_SCRUB_MAX_ATTR_RANGE_LENGTH 128
+#define CXL_MEMDEV_MAX_NAME_LENGTH 128
static int cxl_mem_get_supported_feature_entry(struct cxl_memdev *cxlmd, const uuid_t *feat_uuid,
struct cxl_mbox_supp_feat_entry *feat_entry_out)
@@ -63,6 +68,8 @@ static int cxl_mem_get_supported_feature_entry(struct cxl_memdev *cxlmd, const u
#define CXL_MEMDEV_PS_GET_FEAT_VERSION 0x01
#define CXL_MEMDEV_PS_SET_FEAT_VERSION 0x01
+#define CXL_PATROL_SCRUB "cxl_patrol_scrub"
+
static const uuid_t cxl_patrol_scrub_uuid =
UUID_INIT(0x96dad7d6, 0xfde8, 0x482b, 0xa7, 0x33, 0x75, 0x77, 0x4e, \
0x06, 0xdb, 0x8a);
@@ -159,9 +166,8 @@ static int cxl_mem_ps_get_attrs(struct device *dev,
return 0;
}
-static int __maybe_unused
-cxl_mem_ps_set_attrs(struct device *dev, struct cxl_memdev_ps_params *params,
- u8 param_type)
+static int cxl_mem_ps_set_attrs(struct device *dev,
+ struct cxl_memdev_ps_params *params, u8 param_type)
{
struct cxl_memdev_ps_set_feat_pi set_pi = {
.pi.uuid = cxl_patrol_scrub_uuid,
@@ -232,11 +238,192 @@ cxl_mem_ps_set_attrs(struct device *dev, struct cxl_memdev_ps_params *params,
return 0;
}
+static int cxl_mem_ps_enable_read(struct device *dev, u64 *val)
+{
+ struct cxl_memdev_ps_params params;
+ int ret;
+
+ ret = cxl_mem_ps_get_attrs(dev, ¶ms);
+ if (ret) {
+ dev_err(dev, "Get CXL patrol scrub params fail ret=%d\n", ret);
+ return ret;
+ }
+ *val = params.enable;
+
+ return 0;
+}
+
+static int cxl_mem_ps_enable_write(struct device *dev, long val)
+{
+ struct cxl_memdev_ps_params params;
+ int ret;
+
+ params.enable = val;
+ ret = cxl_mem_ps_set_attrs(dev, ¶ms, CXL_MEMDEV_PS_PARAM_ENABLE);
+ if (ret) {
+ dev_err(dev, "CXL patrol scrub enable fail, enable=%d ret=%d\n",
+ params.enable, ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int cxl_mem_ps_rate_read(struct device *dev, u64 *val)
+{
+ struct cxl_memdev_ps_params params;
+ int ret;
+
+ ret = cxl_mem_ps_get_attrs(dev, ¶ms);
+ if (ret) {
+ dev_err(dev, "Get CXL patrol scrub params fail ret=%d\n", ret);
+ return ret;
+ }
+ *val = params.rate;
+
+ return 0;
+}
+
+static int cxl_mem_ps_rate_write(struct device *dev, long val)
+{
+ struct cxl_memdev_ps_params params;
+ int ret;
+
+ params.rate = val;
+ ret = cxl_mem_ps_set_attrs(dev, ¶ms, CXL_MEMDEV_PS_PARAM_RATE);
+ if (ret) {
+ dev_err(dev, "Set CXL patrol scrub params for rate fail ret=%d\n", ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int cxl_mem_ps_rate_available_read(struct device *dev, char *buf)
+{
+ struct cxl_memdev_ps_params params;
+ int ret;
+
+ ret = cxl_mem_ps_get_attrs(dev, ¶ms);
+ if (ret) {
+ dev_err(dev, "Get CXL patrol scrub params fail ret=%d\n", ret);
+ return ret;
+ }
+
+ sysfs_emit(buf, "%s\n", params.rate_avail);
+
+ return 0;
+}
+
+/**
+ * cxl_mem_patrol_scrub_is_visible() - Callback to return attribute visibility
+ * @dev: Pointer to scrub device
+ * @attr: Scrub attribute
+ * @region_id: ID of the memory region
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+static umode_t cxl_mem_patrol_scrub_is_visible(struct device *dev,
+ u32 attr_id, int region_id)
+{
+ const struct cxl_patrol_scrub_context *cxl_ps_ctx = dev_get_drvdata(dev);
+
+ if (attr_id == scrub_rate_available ||
+ attr_id == scrub_rate) {
+ if (!cxl_ps_ctx->scrub_cycle_changeable)
+ return 0;
+ }
+
+ switch (attr_id) {
+ case scrub_rate_available:
+ return 0444;
+ case scrub_enable:
+ case scrub_rate:
+ return 0644;
+ default:
+ return 0;
+ }
+}
+
+/**
+ * cxl_mem_patrol_scrub_read() - Read callback for data attributes
+ * @dev: Pointer to scrub device
+ * @attr: Scrub attribute
+ * @region_id: ID of the memory region
+ * @val: Pointer to the returned data
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+static int cxl_mem_patrol_scrub_read(struct device *dev, u32 attr,
+ int region_id, u64 *val)
+{
+
+ switch (attr) {
+ case scrub_enable:
+ return cxl_mem_ps_enable_read(dev->parent, val);
+ case scrub_rate:
+ return cxl_mem_ps_rate_read(dev->parent, val);
+ default:
+ return -ENOTSUPP;
+ }
+}
+
+/**
+ * cxl_mem_patrol_scrub_write() - Write callback for data attributes
+ * @dev: Pointer to scrub device
+ * @attr: Scrub attribute
+ * @region_id: ID of the memory region
+ * @val: Value to write
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+static int cxl_mem_patrol_scrub_write(struct device *dev, u32 attr,
+ int region_id, u64 val)
+{
+ switch (attr) {
+ case scrub_enable:
+ return cxl_mem_ps_enable_write(dev->parent, val);
+ case scrub_rate:
+ return cxl_mem_ps_rate_write(dev->parent, val);
+ default:
+ return -ENOTSUPP;
+ }
+}
+
+/**
+ * cxl_mem_patrol_scrub_read_strings() - Read callback for string attributes
+ * @dev: Pointer to scrub device
+ * @attr: Scrub attribute
+ * @region_id: ID of the memory region
+ * @buf: Pointer to the buffer for copying returned string
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+static int cxl_mem_patrol_scrub_read_strings(struct device *dev, u32 attr,
+ int region_id, char *buf)
+{
+ switch (attr) {
+ case scrub_rate_available:
+ return cxl_mem_ps_rate_available_read(dev->parent, buf);
+ default:
+ return -ENOTSUPP;
+ }
+}
+
+static const struct scrub_ops cxl_ps_scrub_ops = {
+ .is_visible = cxl_mem_patrol_scrub_is_visible,
+ .read = cxl_mem_patrol_scrub_read,
+ .write = cxl_mem_patrol_scrub_write,
+ .read_string = cxl_mem_patrol_scrub_read_strings,
+};
+
int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
{
+ char scrub_name[CXL_MEMDEV_MAX_NAME_LENGTH];
struct cxl_patrol_scrub_context *cxl_ps_ctx;
struct cxl_mbox_supp_feat_entry feat_entry;
struct cxl_memdev_ps_params params;
+ struct device *cxl_scrub_dev;
int ret;
ret = cxl_mem_get_supported_feature_entry(cxlmd, &cxl_patrol_scrub_uuid,
@@ -261,6 +448,14 @@ int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
}
cxl_ps_ctx->scrub_cycle_changeable = params.scrub_cycle_changeable;
+ snprintf(scrub_name, sizeof(scrub_name), "%s_%s",
+ CXL_PATROL_SCRUB, dev_name(&cxlmd->dev));
+ cxl_scrub_dev = devm_scrub_device_register(&cxlmd->dev, scrub_name,
+ cxl_ps_ctx, &cxl_ps_scrub_ops,
+ 0, NULL);
+ if (IS_ERR(cxl_scrub_dev))
+ return PTR_ERR(cxl_scrub_dev);
+
return 0;
}
EXPORT_SYMBOL_NS_GPL(cxl_mem_patrol_scrub_init, CXL);
--
2.34.1
From: Shiju Jose <[email protected]>
Add support for GET_FEATURE mailbox command.
CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
The settings of a feature can be retrieved using Get Feature command.
Signed-off-by: Shiju Jose <[email protected]>
---
drivers/cxl/core/mbox.c | 22 ++++++++++++++++++++++
drivers/cxl/cxlmem.h | 23 +++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 191f51f3df0e..f43189b6859a 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1313,6 +1313,28 @@ int cxl_get_supported_features(struct cxl_memdev_state *mds,
}
EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
+int cxl_get_feature(struct cxl_memdev_state *mds,
+ struct cxl_mbox_get_feat_in *pi, void *feat_out)
+{
+ struct cxl_mbox_cmd mbox_cmd;
+ int rc;
+
+ mbox_cmd = (struct cxl_mbox_cmd) {
+ .opcode = CXL_MBOX_OP_GET_FEATURE,
+ .size_in = sizeof(*pi),
+ .payload_in = pi,
+ .size_out = le16_to_cpu(pi->count),
+ .payload_out = feat_out,
+ .min_out = le16_to_cpu(pi->count),
+ };
+ rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ if (rc < 0)
+ return rc;
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
+
int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
struct cxl_region *cxlr)
{
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 23e4d98b9bae..eaecc3234cfd 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -530,6 +530,7 @@ enum cxl_opcode {
CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
CXL_MBOX_OP_GET_LOG = 0x0401,
CXL_MBOX_OP_GET_SUPPORTED_FEATURES = 0x0500,
+ CXL_MBOX_OP_GET_FEATURE = 0x0501,
CXL_MBOX_OP_IDENTIFY = 0x4000,
CXL_MBOX_OP_GET_PARTITION_INFO = 0x4100,
CXL_MBOX_OP_SET_PARTITION_INFO = 0x4101,
@@ -757,6 +758,26 @@ struct cxl_mbox_get_supp_feats_out {
struct cxl_mbox_supp_feat_entry feat_entries[];
} __packed;
+/* Get Feature CXL 3.1 Spec 8.2.9.6.2 */
+/*
+ * Get Feature input payload
+ * CXL rev 3.1 section 8.2.9.6.2 Table 8-99
+ */
+/* Get Feature : Payload in selection */
+enum cxl_get_feat_selection {
+ CXL_GET_FEAT_SEL_CURRENT_VALUE,
+ CXL_GET_FEAT_SEL_DEFAULT_VALUE,
+ CXL_GET_FEAT_SEL_SAVED_VALUE,
+ CXL_GET_FEAT_SEL_MAX
+};
+
+struct cxl_mbox_get_feat_in {
+ uuid_t uuid;
+ __le16 offset;
+ __le16 count;
+ u8 selection;
+} __packed;
+
/* Get Poison List CXL 3.0 Spec 8.2.9.8.4.1 */
struct cxl_mbox_poison_in {
__le64 offset;
@@ -891,6 +912,8 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds);
int cxl_get_supported_features(struct cxl_memdev_state *mds,
struct cxl_mbox_get_supp_feats_in *pi,
void *feats_out);
+int cxl_get_feature(struct cxl_memdev_state *mds,
+ struct cxl_mbox_get_feat_in *pi, void *feat_out);
int cxl_poison_state_init(struct cxl_memdev_state *mds);
int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
struct cxl_region *cxlr);
--
2.34.1
From: Shiju Jose <[email protected]>
Register with the scrub configure driver to expose the sysfs attributes
to the user for configuring the CXL memory device's ECS feature.
Add the static CXL ECS specific attributes to support configuring the
CXL memory device ECS feature.
Signed-off-by: Shiju Jose <[email protected]>
---
drivers/cxl/core/memscrub.c | 253 +++++++++++++++++++++++++++++++++++-
1 file changed, 250 insertions(+), 3 deletions(-)
diff --git a/drivers/cxl/core/memscrub.c b/drivers/cxl/core/memscrub.c
index a1fb40f8307f..325084b22e7a 100644
--- a/drivers/cxl/core/memscrub.c
+++ b/drivers/cxl/core/memscrub.c
@@ -464,6 +464,8 @@ EXPORT_SYMBOL_NS_GPL(cxl_mem_patrol_scrub_init, CXL);
#define CXL_MEMDEV_ECS_GET_FEAT_VERSION 0x01
#define CXL_MEMDEV_ECS_SET_FEAT_VERSION 0x01
+#define CXL_DDR5_ECS "cxl_ecs"
+
static const uuid_t cxl_ecs_uuid =
UUID_INIT(0xe5b13f22, 0x2328, 0x4a14, 0xb8, 0xba, 0xb9, 0x69, 0x1e, \
0x89, 0x33, 0x86);
@@ -582,9 +584,8 @@ static int cxl_mem_ecs_get_attrs(struct device *dev, int fru_id,
return 0;
}
-static int __maybe_unused
-cxl_mem_ecs_set_attrs(struct device *dev, int fru_id,
- struct cxl_memdev_ecs_params *params, u8 param_type)
+static int cxl_mem_ecs_set_attrs(struct device *dev, int fru_id,
+ struct cxl_memdev_ecs_params *params, u8 param_type)
{
struct cxl_memdev_ecs_feat_read_attrs *rd_attrs __free(kvfree) = NULL;
struct cxl_memdev_ecs_set_feat_pi *set_pi __free(kvfree) = NULL;
@@ -731,10 +732,247 @@ cxl_mem_ecs_set_attrs(struct device *dev, int fru_id,
return 0;
}
+static int cxl_mem_ecs_log_entry_type_write(struct device *dev, int region_id, long val)
+{
+ struct cxl_memdev_ecs_params params;
+ int ret;
+
+ params.log_entry_type = val;
+ ret = cxl_mem_ecs_set_attrs(dev, region_id, ¶ms,
+ CXL_MEMDEV_ECS_PARAM_LOG_ENTRY_TYPE);
+ if (ret) {
+ dev_err(dev->parent, "Set CXL ECS params for log entry type fail ret=%d\n",
+ ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int cxl_mem_ecs_threshold_write(struct device *dev, int region_id, long val)
+{
+ struct cxl_memdev_ecs_params params;
+ int ret;
+
+ params.threshold = val;
+ ret = cxl_mem_ecs_set_attrs(dev, region_id, ¶ms,
+ CXL_MEMDEV_ECS_PARAM_THRESHOLD);
+ if (ret) {
+ dev_err(dev->parent, "Set CXL ECS params for threshold fail ret=%d\n",
+ ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int cxl_mem_ecs_mode_write(struct device *dev, int region_id, long val)
+{
+ struct cxl_memdev_ecs_params params;
+ int ret;
+
+ params.mode = val;
+ ret = cxl_mem_ecs_set_attrs(dev, region_id, ¶ms,
+ CXL_MEMDEV_ECS_PARAM_MODE);
+ if (ret) {
+ dev_err(dev->parent, "Set CXL ECS params for mode fail ret=%d\n",
+ ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int cxl_mem_ecs_reset_counter_write(struct device *dev, int region_id, long val)
+{
+ struct cxl_memdev_ecs_params params;
+ int ret;
+
+ params.reset_counter = val;
+ ret = cxl_mem_ecs_set_attrs(dev, region_id, ¶ms,
+ CXL_MEMDEV_ECS_PARAM_RESET_COUNTER);
+ if (ret) {
+ dev_err(dev->parent, "Set CXL ECS params for reset ECC counter fail ret=%d\n",
+ ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+enum cxl_mem_ecs_scrub_attributes {
+ cxl_ecs_log_entry_type,
+ cxl_ecs_log_entry_type_per_dram,
+ cxl_ecs_log_entry_type_per_memory_media,
+ cxl_ecs_mode,
+ cxl_ecs_mode_counts_codewords,
+ cxl_ecs_mode_counts_rows,
+ cxl_ecs_reset,
+ cxl_ecs_threshold,
+ cxl_ecs_threshold_available,
+ cxl_ecs_max_attrs,
+};
+
+static ssize_t cxl_mem_ecs_show_scrub_attr(struct device *dev, char *buf,
+ int attr_id)
+{
+ struct cxl_ecs_context *cxl_ecs_ctx = dev_get_drvdata(dev);
+ int region_id = cxl_ecs_ctx->region_id;
+ struct cxl_memdev_ecs_params params;
+ int ret;
+
+ if (attr_id == cxl_ecs_log_entry_type ||
+ attr_id == cxl_ecs_log_entry_type_per_dram ||
+ attr_id == cxl_ecs_log_entry_type_per_memory_media ||
+ attr_id == cxl_ecs_mode ||
+ attr_id == cxl_ecs_mode_counts_codewords ||
+ attr_id == cxl_ecs_mode_counts_rows ||
+ attr_id == cxl_ecs_threshold) {
+ ret = cxl_mem_ecs_get_attrs(dev, region_id, ¶ms);
+ if (ret) {
+ dev_err(dev->parent, "Get CXL ECS params fail ret=%d\n", ret);
+ return ret;
+ }
+ }
+
+ switch (attr_id) {
+ case cxl_ecs_log_entry_type:
+ return sprintf(buf, "%d\n", params.log_entry_type);
+ case cxl_ecs_log_entry_type_per_dram:
+ if (params.log_entry_type == ECS_LOG_ENTRY_TYPE_DRAM)
+ return sysfs_emit(buf, "1\n");
+ else
+ return sysfs_emit(buf, "0\n");
+ case cxl_ecs_log_entry_type_per_memory_media:
+ if (params.log_entry_type == ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU)
+ return sysfs_emit(buf, "1\n");
+ else
+ return sysfs_emit(buf, "0\n");
+ case cxl_ecs_mode:
+ return sprintf(buf, "%d\n", params.mode);
+ case cxl_ecs_mode_counts_codewords:
+ if (params.mode == ECS_MODE_COUNTS_CODEWORDS)
+ return sysfs_emit(buf, "1\n");
+ else
+ return sysfs_emit(buf, "0\n");
+ case cxl_ecs_mode_counts_rows:
+ if (params.mode == ECS_MODE_COUNTS_ROWS)
+ return sysfs_emit(buf, "1\n");
+ else
+ return sysfs_emit(buf, "0\n");
+ case cxl_ecs_threshold:
+ return sprintf(buf, "%d\n", params.threshold);
+ case cxl_ecs_threshold_available:
+ return sysfs_emit(buf, "256,1024,4096\n");
+ }
+
+ return -ENOTSUPP;
+}
+
+static ssize_t cxl_mem_ecs_store_scrub_attr(struct device *dev, const char *buf,
+ size_t count, int attr_id)
+{
+ struct cxl_ecs_context *cxl_ecs_ctx = dev_get_drvdata(dev);
+ int region_id = cxl_ecs_ctx->region_id;
+ long val;
+ int ret;
+
+ ret = kstrtol(buf, 10, &val);
+ if (ret < 0)
+ return ret;
+
+ switch (attr_id) {
+ case cxl_ecs_log_entry_type:
+ ret = cxl_mem_ecs_log_entry_type_write(dev, region_id, val);
+ if (ret)
+ return -ENOTSUPP;
+ break;
+ case cxl_ecs_mode:
+ ret = cxl_mem_ecs_mode_write(dev, region_id, val);
+ if (ret)
+ return -ENOTSUPP;
+ break;
+ case cxl_ecs_reset:
+ ret = cxl_mem_ecs_reset_counter_write(dev, region_id, val);
+ if (ret)
+ return -ENOTSUPP;
+ break;
+ case cxl_ecs_threshold:
+ ret = cxl_mem_ecs_threshold_write(dev, region_id, val);
+ if (ret)
+ return -ENOTSUPP;
+ break;
+ default:
+ return -ENOTSUPP;
+ }
+
+ return count;
+}
+
+#define CXL_ECS_SCRUB_ATTR_RW(attr) \
+static ssize_t attr##_show(struct device *dev, \
+ struct device_attribute *attr, char *buf) \
+{ \
+ return cxl_mem_ecs_show_scrub_attr(dev, buf, (cxl_ecs_##attr)); \
+} \
+static ssize_t attr##_store(struct device *dev, \
+ struct device_attribute *attr, \
+ const char *buf, size_t count) \
+{ \
+ return cxl_mem_ecs_store_scrub_attr(dev, buf, count, (cxl_ecs_##attr));\
+} \
+static DEVICE_ATTR_RW(attr)
+
+#define CXL_ECS_SCRUB_ATTR_RO(attr) \
+static ssize_t attr##_show(struct device *dev, \
+ struct device_attribute *attr, char *buf) \
+{ \
+ return cxl_mem_ecs_show_scrub_attr(dev, buf, (cxl_ecs_##attr)); \
+} \
+static DEVICE_ATTR_RO(attr)
+
+#define CXL_ECS_SCRUB_ATTR_WO(attr) \
+static ssize_t attr##_store(struct device *dev, \
+ struct device_attribute *attr, \
+ const char *buf, size_t count) \
+{ \
+ return cxl_mem_ecs_store_scrub_attr(dev, buf, count, (cxl_ecs_##attr));\
+} \
+static DEVICE_ATTR_WO(attr)
+
+CXL_ECS_SCRUB_ATTR_RW(log_entry_type);
+CXL_ECS_SCRUB_ATTR_RO(log_entry_type_per_dram);
+CXL_ECS_SCRUB_ATTR_RO(log_entry_type_per_memory_media);
+CXL_ECS_SCRUB_ATTR_RW(mode);
+CXL_ECS_SCRUB_ATTR_RO(mode_counts_codewords);
+CXL_ECS_SCRUB_ATTR_RO(mode_counts_rows);
+CXL_ECS_SCRUB_ATTR_WO(reset);
+CXL_ECS_SCRUB_ATTR_RW(threshold);
+CXL_ECS_SCRUB_ATTR_RO(threshold_available);
+
+static struct attribute *cxl_mem_ecs_scrub_attrs[] = {
+ &dev_attr_log_entry_type.attr,
+ &dev_attr_log_entry_type_per_dram.attr,
+ &dev_attr_log_entry_type_per_memory_media.attr,
+ &dev_attr_mode.attr,
+ &dev_attr_mode_counts_codewords.attr,
+ &dev_attr_mode_counts_rows.attr,
+ &dev_attr_reset.attr,
+ &dev_attr_threshold.attr,
+ &dev_attr_threshold_available.attr,
+ NULL,
+};
+
+static struct attribute_group cxl_mem_ecs_attr_group = {
+ .attrs = cxl_mem_ecs_scrub_attrs,
+};
+
int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id)
{
+ char scrub_name[CXL_MEMDEV_MAX_NAME_LENGTH];
struct cxl_mbox_supp_feat_entry feat_entry;
struct cxl_ecs_context *cxl_ecs_ctx;
+ struct device *cxl_scrub_dev;
int nmedia_frus;
int ret;
@@ -755,6 +993,15 @@ int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id)
cxl_ecs_ctx->get_feat_size = feat_entry.get_feat_size;
cxl_ecs_ctx->set_feat_size = feat_entry.set_feat_size;
cxl_ecs_ctx->region_id = region_id;
+
+ snprintf(scrub_name, sizeof(scrub_name), "%s_%s_region%d",
+ CXL_DDR5_ECS, dev_name(&cxlmd->dev), cxl_ecs_ctx->region_id);
+ cxl_scrub_dev = devm_scrub_device_register(&cxlmd->dev, scrub_name,
+ cxl_ecs_ctx, NULL,
+ cxl_ecs_ctx->region_id,
+ &cxl_mem_ecs_attr_group);
+ if (IS_ERR(cxl_scrub_dev))
+ return PTR_ERR(cxl_scrub_dev);
}
return 0;
--
2.34.1
On Thu, 15 Feb 2024 19:14:43 +0800
<[email protected]> wrote:
> From: Shiju Jose <[email protected]>
>
> Add support for GET_SUPPORTED_FEATURES mailbox command.
>
> CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
> CXL devices supports features with changeable attributes.
> Get Supported Features retrieves the list of supported device specific
> features. The settings of a feature can be retrieved using Get Feature
> and optionally modified using Set Feature.
>
> Signed-off-by: Shiju Jose <[email protected]>
Hi Shiju,
Some comment inline.
Mostly just naming suggestions. Actual functionality looks good to me.
> ---
> drivers/cxl/core/mbox.c | 23 +++++++++++++++
> drivers/cxl/cxlmem.h | 62 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 85 insertions(+)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 27166a411705..191f51f3df0e 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1290,6 +1290,29 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_set_timestamp, CXL);
>
> +int cxl_get_supported_features(struct cxl_memdev_state *mds,
> + struct cxl_mbox_get_supp_feats_in *pi,
> + void *feats_out)
Odd indent. Align the later lines with s of struct
Comments on input types in header below.
> +{
> + struct cxl_mbox_cmd mbox_cmd;
> + int rc;
> +
> + mbox_cmd = (struct cxl_mbox_cmd) {
> + .opcode = CXL_MBOX_OP_GET_SUPPORTED_FEATURES,
> + .size_in = sizeof(*pi),
> + .payload_in = pi,
> + .size_out = le32_to_cpu(pi->count),
> + .payload_out = feats_out,
> + .min_out = sizeof(struct cxl_mbox_get_supp_feats_out),
feats_out should be typed, in which case this becomes
sizeof(*feats_out)
> + };
> + rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> + if (rc < 0)
> + return rc;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
> +
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> struct cxl_region *cxlr)
> {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 5303d6942b88..23e4d98b9bae 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -529,6 +529,7 @@ enum cxl_opcode {
> CXL_MBOX_OP_SET_TIMESTAMP = 0x0301,
> CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
> CXL_MBOX_OP_GET_LOG = 0x0401,
> + CXL_MBOX_OP_GET_SUPPORTED_FEATURES = 0x0500,
> CXL_MBOX_OP_IDENTIFY = 0x4000,
> CXL_MBOX_OP_GET_PARTITION_INFO = 0x4100,
> CXL_MBOX_OP_SET_PARTITION_INFO = 0x4101,
> @@ -698,6 +699,64 @@ struct cxl_mbox_set_timestamp_in {
>
> } __packed;
>
> +/* Get Supported Features CXL 3.1 Spec 8.2.9.6.1 */
> +/*
> + * Get Supported Features input payload
> + * CXL rev 3.1 section 8.2.9.6.1 Table 8-95
> + */
> +struct cxl_mbox_get_supp_feats_in {
> + __le32 count;
> + __le16 start_index;
> + u16 reserved;
From a local style point of view using a u16 for reserved is
a new style choice - best to avoid that - most common option looks to
be
u8 rsvd[2];
> +} __packed;
> +
> +/*
> + * Get Supported Features Supported Feature Entry
> + * CXL rev 3.1 section 8.2.9.6.1 Table 8-97
> + */
> +/* Supported Feature Entry : Payload out attribute flags */
> +#define CXL_FEAT_ENTRY_FLAG_CHANGABLE BIT(0)
> +#define CXL_FEAT_ENTRY_FLAG_DEEPEST_RESET_PERSISTENCE_MASK GENMASK(3, 1)
> +#define CXL_FEAT_ENTRY_FLAG_PERSIST_ACROSS_FIRMWARE_UPDATE BIT(4)
> +#define CXL_FEAT_ENTRY_FLAG_SUPPORT_DEFAULT_SELECTION BIT(5)
> +#define CXL_FEAT_ENTRY_FLAG_SUPPORT_SAVED_SELECTION BIT(6)
> +
> +enum cxl_feat_attr_value_persistence {
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_NONE,
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_CXL_RESET,
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_HOT_RESET,
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_WARM_RESET,
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_COLD_RESET,
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_MAX
> +};
> +
> +#define CXL_FEAT_ENTRY_FLAG_PERSISTENCE_ACROSS_FW_UPDATE_MASK BIT(4)
Not sure there is benefit in defining mask for single bit fields.
Or if there is don't define the value above.
> +#define CXL_FEAT_ENTRY_FLAG_PERSIST_ACROSS_FIRMWARE_UPDATE BIT(4)
Either is probably fine, just not both!
> +#define CXL_FEAT_ENTRY_FLAG_PERSISTENCE_DEFAULT_SEL_SUPPORT_MASK BIT(5)
> +#define CXL_FEAT_ENTRY_FLAG_PERSISTENCE_SAVED_SEL_SUPPORT_MASK BIT(6)
> +
> +struct cxl_mbox_supp_feat_entry {
> + uuid_t uuid;
> + __le16 feat_index;
Given it's in a feat entry, could drop 'feat' as redundant info.
__le16 index;
__le16 get_size;
etc
> + __le16 get_feat_size;
> + __le16 set_feat_size;
> + __le32 attr_flags;
> + u8 get_feat_version;
> + u8 set_feat_version;
> + __le16 set_feat_effects;
> + u8 rsvd[18];
> +} __packed;
> +
> +/*
> + * Get Supported Features output payload
> + * CXL rev 3.1 section 8.2.9.6.1 Table 8-96
> + */
> +struct cxl_mbox_get_supp_feats_out {
> + __le16 entries;
> + __le16 nsuppfeats_dev;
Probably don't need the _dev postfix. Command being sent to a device
so that doesn't add much.
I looked at naming in similar cases. For mailbox clear we have nr_recs,
so perhaps nr_supported ?
> + u32 reserved;
u8 rsvd[4];
as above - match the local syle.
> + struct cxl_mbox_supp_feat_entry feat_entries[];
> +} __packed;
> +
> /* Get Poison List CXL 3.0 Spec 8.2.9.8.4.1 */
> struct cxl_mbox_poison_in {
> __le64 offset;
> @@ -829,6 +888,9 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> enum cxl_event_type event_type,
> const uuid_t *uuid, union cxl_event *evt);
> int cxl_set_timestamp(struct cxl_memdev_state *mds);
> +int cxl_get_supported_features(struct cxl_memdev_state *mds,
> + struct cxl_mbox_get_supp_feats_in *pi,
> + void *feats_out);
Don't use a void * for that output data. It should be a
struct cxl_mbox_get_supp_feats_out *
For the input, the other similar functions are providing parameters
directly, not wrapped up in a structure. Easy enough to do that here
as well as we only need
u32 count, u16 start_index
instead of pi
> int cxl_poison_state_init(struct cxl_memdev_state *mds);
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> struct cxl_region *cxlr);
On Thu, 15 Feb 2024 19:14:44 +0800
<[email protected]> wrote:
> From: Shiju Jose <[email protected]>
>
> Add support for GET_FEATURE mailbox command.
>
> CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
> The settings of a feature can be retrieved using Get Feature command.
Hi Shiju
I think this needs to be more complex so that this utility function gets
the whole feature, not just a section of it (subject to big enough buffer
being available etc). We don't want the higher level code to have to
deal with the complexity of small mailboxes.
A few other things inline.
Jonathan
>
> Signed-off-by: Shiju Jose <[email protected]>
> ---
> drivers/cxl/core/mbox.c | 22 ++++++++++++++++++++++
> drivers/cxl/cxlmem.h | 23 +++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 191f51f3df0e..f43189b6859a 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1313,6 +1313,28 @@ int cxl_get_supported_features(struct cxl_memdev_state *mds,
> }
> EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
>
> +int cxl_get_feature(struct cxl_memdev_state *mds,
> + struct cxl_mbox_get_feat_in *pi, void *feat_out)
Comments below on this signature. Key is I'd expect this function to deal
with potential need for multiple requests (small mailbox size compared to
the size of the output data being read).
To test that we'd probably have to tweak the qemu code to use a smaller mailbox.
Or fake that in here so that we do multiple smaller reads.
> +{
> + struct cxl_mbox_cmd mbox_cmd;
> + int rc;
> +
> + mbox_cmd = (struct cxl_mbox_cmd) {
> + .opcode = CXL_MBOX_OP_GET_FEATURE,
> + .size_in = sizeof(*pi),
> + .payload_in = pi,
> + .size_out = le16_to_cpu(pi->count),
> + .payload_out = feat_out,
> + .min_out = le16_to_cpu(pi->count),
Are there feature with variable responses sizes? I think there will be.
size_out should be the size of the buffer, but min_out should be
the size of the particular feature data header - note these will change
as we iterate over multiple messages.
> + };
> + rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> + if (rc < 0)
> + return rc;
> +
> + return 0;
I think this should return the size to the caller, rather than 0 on success.
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
> +
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> struct cxl_region *cxlr)
> {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 23e4d98b9bae..eaecc3234cfd 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -530,6 +530,7 @@ enum cxl_opcode {
> CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
> CXL_MBOX_OP_GET_LOG = 0x0401,
> CXL_MBOX_OP_GET_SUPPORTED_FEATURES = 0x0500,
> + CXL_MBOX_OP_GET_FEATURE = 0x0501,
> CXL_MBOX_OP_IDENTIFY = 0x4000,
> CXL_MBOX_OP_GET_PARTITION_INFO = 0x4100,
> CXL_MBOX_OP_SET_PARTITION_INFO = 0x4101,
> @@ -757,6 +758,26 @@ struct cxl_mbox_get_supp_feats_out {
> struct cxl_mbox_supp_feat_entry feat_entries[];
> } __packed;
>
> +/* Get Feature CXL 3.1 Spec 8.2.9.6.2 */
> +/*
> + * Get Feature input payload
> + * CXL rev 3.1 section 8.2.9.6.2 Table 8-99
> + */
> +/* Get Feature : Payload in selection */
Naming of enum is good enough that I don't think we need
this particular comment.
> +enum cxl_get_feat_selection {
> + CXL_GET_FEAT_SEL_CURRENT_VALUE,
> + CXL_GET_FEAT_SEL_DEFAULT_VALUE,
> + CXL_GET_FEAT_SEL_SAVED_VALUE,
> + CXL_GET_FEAT_SEL_MAX
> +};
> +
> +struct cxl_mbox_get_feat_in {
> + uuid_t uuid;
> + __le16 offset;
> + __le16 count;
> + u8 selection;
> +} __packed;
> +
> /* Get Poison List CXL 3.0 Spec 8.2.9.8.4.1 */
> struct cxl_mbox_poison_in {
> __le64 offset;
> @@ -891,6 +912,8 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds);
> int cxl_get_supported_features(struct cxl_memdev_state *mds,
> struct cxl_mbox_get_supp_feats_in *pi,
> void *feats_out);
> +int cxl_get_feature(struct cxl_memdev_state *mds,
> + struct cxl_mbox_get_feat_in *pi, void *feat_out);
For this I'd expect us to wrap up the need for multi messages inside this.
So this would then just take the feature index, a size for the output buffer
overall size, plus min acceptable response size and a selection enum value.
int cxl_get_feature(struct cxl_memdev_state *mds,
uuid_t feat,
void *feat_out, size_t feat_out_min_size,
size_t feat_out_size);
> int cxl_poison_state_init(struct cxl_memdev_state *mds);
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> struct cxl_region *cxlr);
On Thu, 15 Feb 2024 19:14:46 +0800
<[email protected]> wrote:
> From: Shiju Jose <[email protected]>
>
> CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control
> feature. The device patrol scrub proactively locates and makes corrections
> to errors in regular cycle. The patrol scrub control allows the request to
> configure patrol scrub input configurations.
>
> The patrol scrub control allows the requester to specify the number of
> hours for which the patrol scrub cycles must be completed, provided that
> the requested number is not less than the minimum number of hours for the
> patrol scrub cycle that the device is capable of. In addition, the patrol
> scrub controls allow the host to disable and enable the feature in case
> disabling of the feature is needed for other purposes such as
> performance-aware operations which require the background operations to be
> turned off.
>
> Signed-off-by: Shiju Jose <[email protected]>
Hi Shiju
Various comments inline. Sorry I didn't get to this on earlier versions!
Jonathan
> ---
> drivers/cxl/Kconfig | 17 +++
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/memscrub.c | 266 ++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxlmem.h | 8 ++
> drivers/cxl/pci.c | 5 +
> 5 files changed, 297 insertions(+)
> create mode 100644 drivers/cxl/core/memscrub.c
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 67998dbd1d46..873bdda5db32 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -157,4 +157,21 @@ config CXL_PMU
> monitoring units and provide standard perf based interfaces.
>
> If unsure say 'm'.
> +
> +config CXL_SCRUB
> + bool "CXL: Memory scrub feature"
> + depends on CXL_PCI
> + depends on CXL_MEM
> + help
> + The CXL memory scrub control is an optional feature allows host to
> + control the scrub configurations of CXL Type 3 devices, which
> + support patrol scrub and/or DDR5 ECS(Error Check Scrub).
> +
> + Say 'y/n' to enable/disable the CXL memory scrub driver that will
> + attach to CXL.mem devices for memory scrub control feature. See
> + sections 8.2.9.9.11.1 and 8.2.9.9.11.2 in the CXL 3.1 specification
> + for a detailed description of CXL memory scrub control features.
> +
> + If unsure say 'n'.
No need for negative here I think. It's a reasonable thing to turn on
and hardware should provide minimum guarantees that stop it being dangerous.
> +
> endif
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 9259bcc6773c..e0fc814c3983 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -16,3 +16,4 @@ cxl_core-y += pmu.o
> cxl_core-y += cdat.o
> cxl_core-$(CONFIG_TRACING) += trace.o
> cxl_core-$(CONFIG_CXL_REGION) += region.o
> +cxl_core-$(CONFIG_CXL_SCRUB) += memscrub.o
> diff --git a/drivers/cxl/core/memscrub.c b/drivers/cxl/core/memscrub.c
> new file mode 100644
> index 000000000000..be8d9a9743eb
> --- /dev/null
> +++ b/drivers/cxl/core/memscrub.c
> @@ -0,0 +1,266 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * cxl_memscrub.c - CXL memory scrub driver
No point in a filename (which incidentally is wrong ;)
in a file (because they are often wrong and add nothing)
> + *
> + * Copyright (c) 2023 HiSilicon Limited.
2024 probably appropriate now.
> + *
> + * - Provides functions to configure patrol scrub
> + * feature of the CXL memory devices.
Very short line wrap.
> + */
> +
> +#define pr_fmt(fmt) "CXL_MEM_SCRUB: " fmt
> +
> +#include <cxlmem.h>
> +
> +/* CXL memory scrub feature common definitions */
> +#define CXL_SCRUB_MAX_ATTR_RANGE_LENGTH 128
> +
> +static int cxl_mem_get_supported_feature_entry(struct cxl_memdev *cxlmd, const uuid_t *feat_uuid,
> + struct cxl_mbox_supp_feat_entry *feat_entry_out)
> +{
> + struct cxl_mbox_get_supp_feats_out *feats_out __free(kvfree) = NULL;
> + struct cxl_mbox_supp_feat_entry *feat_entry;
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> + struct cxl_mbox_get_supp_feats_in pi;
> + int feat_index, count;
> + int nentries;
> + int ret;
> +
> + feat_index = 0;
> + pi.count = sizeof(struct cxl_mbox_get_supp_feats_out) +
> + sizeof(struct cxl_mbox_supp_feat_entry);
> + feats_out = kvmalloc(pi.count, GFP_KERNEL);
Not very big. kmalloc should be fine I think.
> + if (!feats_out)
> + return -ENOMEM;
> +
> + do {
> + pi.start_index = feat_index;
> + memset(feats_out, 0, pi.count);
> + ret = cxl_get_supported_features(mds, &pi, feats_out);
> + if (ret)
> + return ret;
> +
> + nentries = feats_out->entries;
> + if (!nentries)
> + break;
I'd return here.
> +
> + /* Check CXL memdev supports the feature */
> + feat_entry = (void *)feats_out->feat_entries;
Cast is odd. I think type is correct already.
> + for (count = 0; count < nentries; count++, feat_entry++) {
> + if (uuid_equal(&feat_entry->uuid, feat_uuid)) {
> + memcpy(feat_entry_out, feat_entry, sizeof(*feat_entry_out));
Long line. add a break after feat_entry
> + return 0;
> + }
> + }
> + feat_index += nentries;
> + } while (nentries);
Given exit on !entries don't think you can exit via normal while condition path. So
make this while (true)
> +
> + return -ENOTSUPP;
> +}
> +
> +/* CXL memory patrol scrub control definitions */
> +#define CXL_MEMDEV_PS_GET_FEAT_VERSION 0x01
> +#define CXL_MEMDEV_PS_SET_FEAT_VERSION 0x01
> +
> +static const uuid_t cxl_patrol_scrub_uuid =
> + UUID_INIT(0x96dad7d6, 0xfde8, 0x482b, 0xa7, 0x33, 0x75, 0x77, 0x4e, \
> + 0x06, 0xdb, 0x8a);
> +
> +/* CXL memory patrol scrub control functions */
> +struct cxl_patrol_scrub_context {
> + struct device *dev;
> + u16 get_feat_size;
> + u16 set_feat_size;
> + bool scrub_cycle_changeable;
> +};
> +
> +/**
> + * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data structure.
> + * @enable: [IN] enable(1)/disable(0) patrol scrub.
In and out I think.
> + * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is changeable.
> + * @rate: [IN] Requested patrol scrub cycle in hours.
> + * [OUT] Current patrol scrub cycle in hours.
> + * @min_rate:[OUT] minimum patrol scrub cycle, in hours, supported.
> + * @rate_avail:[OUT] Supported patrol scrub cycle in hours.
> + */
> +struct cxl_memdev_ps_params {
> + bool enable;
> + bool scrub_cycle_changeable;
> + u16 rate;
> + u16 min_rate;
> + char rate_avail[CXL_SCRUB_MAX_ATTR_RANGE_LENGTH];
> +};
> +
> +enum {
> + CXL_MEMDEV_PS_PARAM_ENABLE,
> + CXL_MEMDEV_PS_PARAM_RATE,
> +};
> +
> +#define CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK BIT(0)
> +#define CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK BIT(1)
> +#define CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK GENMASK(7, 0)
> +#define CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK GENMASK(15, 8)
> +#define CXL_MEMDEV_PS_FLAG_ENABLED_MASK BIT(0)
> +
> +struct cxl_memdev_ps_feat_read_attrs {
> + u8 scrub_cycle_cap;
> + __le16 scrub_cycle;
> + u8 scrub_flags;
> +} __packed;
> +
> +struct cxl_memdev_ps_set_feat_pi {
> + struct cxl_mbox_set_feat_in pi;
Maybe rename this in earlier patch to make it clear it is a header.
Not sure why it is called pi vs attrs term used for read.
> + u8 scrub_cycle_hr;
> + u8 scrub_flags;
> +} __packed;
> +
> +static int cxl_mem_ps_get_attrs(struct device *dev,
> + struct cxl_memdev_ps_params *params)
> +{
> + struct cxl_memdev_ps_feat_read_attrs *rd_attrs __free(kvfree) = NULL;
> + struct cxl_mbox_get_feat_in pi = {
> + .uuid = cxl_patrol_scrub_uuid,
> + .offset = 0,
> + .count = sizeof(struct cxl_memdev_ps_feat_read_attrs),
> + .selection = CXL_GET_FEAT_SEL_CURRENT_VALUE,
> + };
> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> + int ret;
> +
> + if (!mds)
> + return -EFAULT;
> +
> + rd_attrs = kvmalloc(pi.count, GFP_KERNEL);
Small so I don't see need for kvmalloc.
In general that might not be true for a feature, but in this case
we know it is.
> + if (!rd_attrs)
> + return -ENOMEM;
> +
> + ret = cxl_get_feature(mds, &pi, rd_attrs);
> + if (ret) {
> + params->enable = 0;
> + params->rate = 0;
The cxl_get_feature() should not have side effects on failure to read.
As such, these parameters should be left in original state if there is
a problem. Initialize them to these states and we should be fine unless
a read succeeds in updating them.
> + snprintf(params->rate_avail, CXL_SCRUB_MAX_ATTR_RANGE_LENGTH,
> + "Unavailable");
> + return ret;
> + }
> + params->scrub_cycle_changeable = FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK,
> + rd_attrs->scrub_cycle_cap);
> + params->enable = FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> + rd_attrs->scrub_flags);
> + params->rate = FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> + rd_attrs->scrub_cycle);
> + params->min_rate = FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK,
> + rd_attrs->scrub_cycle);
> + snprintf(params->rate_avail, CXL_SCRUB_MAX_ATTR_RANGE_LENGTH,
> + "Minimum scrub cycle = %d hour", params->min_rate);
> +
> + return 0;
> +}
> +
> +static int __maybe_unused
> +cxl_mem_ps_set_attrs(struct device *dev, struct cxl_memdev_ps_params *params,
> + u8 param_type)
> +{
> + struct cxl_memdev_ps_set_feat_pi set_pi = {
> + .pi.uuid = cxl_patrol_scrub_uuid,
> + .pi.flags = CXL_SET_FEAT_FLAG_MOD_VALUE_SAVED_ACROSS_RESET |
> + CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER,
> + .pi.offset = 0,
> + .pi.version = CXL_MEMDEV_PS_SET_FEAT_VERSION,
> + };
> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> + struct cxl_memdev_ps_params rd_params;
> + int ret;
> +
> + if (!mds)
> + return -EFAULT;
> +
> + ret = cxl_mem_ps_get_attrs(dev, &rd_params);
> + if (ret) {
> + dev_err(dev, "Get cxlmemdev patrol scrub params fail ret=%d\n",
> + ret);
> + return ret;
> + }
> +
> + switch (param_type) {
> + case CXL_MEMDEV_PS_PARAM_ENABLE:
> + set_pi.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> + params->enable);
> + set_pi.scrub_cycle_hr = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> + rd_params.rate);
> + break;
> + case CXL_MEMDEV_PS_PARAM_RATE:
> + if (params->rate < rd_params.min_rate) {
> + dev_err(dev, "Invalid CXL patrol scrub cycle(%d) to set\n",
> + params->rate);
> + dev_err(dev, "Minimum supported CXL patrol scrub cycle in hour %d\n",
> + params->min_rate);
> + return -EINVAL;
> + }
> + set_pi.scrub_cycle_hr = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> + params->rate);
> + set_pi.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> + rd_params.enable);
> + break;
> + default:
> + dev_err(dev, "Invalid CXL patrol scrub parameter to set\n");
> + return -EINVAL;
> + }
> +
> + ret = cxl_set_feature(mds, &set_pi, sizeof(set_pi));
> + if (ret) {
> + dev_err(dev, "CXL patrol scrub set feature fail ret=%d\n",
> + ret);
> + return ret;
> + }
> +
> + /* Verify attribute set successfully */
Why? Is there a specification defined reason it might not give an error return
but still fail to set the attribute? (rounding or similar perhaps?)
If so add a comment here. If not drop this check.
> + if (param_type == CXL_MEMDEV_PS_PARAM_RATE) {
> + ret = cxl_mem_ps_get_attrs(dev, &rd_params);
> + if (ret) {
> + dev_err(dev, "Get cxlmemdev patrol scrub params fail ret=%d\n", ret);
> + return ret;
> + }
> + if (rd_params.rate != params->rate)
> + return -EFAULT;
> + }
> +
> + return 0;
> +}
> +
> +int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_patrol_scrub_context *cxl_ps_ctx;
> + struct cxl_mbox_supp_feat_entry feat_entry;
> + struct cxl_memdev_ps_params params;
> + int ret;
> +
> + ret = cxl_mem_get_supported_feature_entry(cxlmd, &cxl_patrol_scrub_uuid,
> + &feat_entry);
> + if (ret < 0)
> + return ret;
> +
> + if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
> + return -ENOTSUPP;
> +
> + cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
> + if (!cxl_ps_ctx)
> + return -ENOMEM;
> +
> + cxl_ps_ctx->get_feat_size = feat_entry.get_feat_size;
> + cxl_ps_ctx->set_feat_size = feat_entry.set_feat_size;
> + ret = cxl_mem_ps_get_attrs(&cxlmd->dev, ¶ms);
> + if (ret) {
> + dev_err(&cxlmd->dev, "Get CXL patrol scrub params fail ret=%d\n",
> + ret);
> + return ret;
Called from probe so
return dev_err_probe(&cxlmd->dev, ret,
"Get CXL patrol scrub params failed\n");
If you do hit this path, convention is cleanup and devm resources so we don't
waste memory that will never be used. Rare case where devm_kfree() makes sense.
Or reorganize so you've gotten all the data before doing that allocation.
> + }
> + cxl_ps_ctx->scrub_cycle_changeable = params.scrub_cycle_changeable;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_patrol_scrub_init, CXL);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 2223ef3d3140..7025c4fd66f3 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -948,6 +948,14 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
> int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
> int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
>
> +/* cxl memory scrub functions */
> +#ifdef CONFIG_CXL_SCRUB
> +int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd);
> +#else
> +static inline int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
> +{ return -ENOTSUPP; }
That's a really obscure and little used return code + arch specific.
Probably EONOTSUPP
> +#endif
> +
> #ifdef CONFIG_CXL_SUSPEND
> void cxl_mem_active_inc(void);
> void cxl_mem_active_dec(void);
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 233e7c42c161..d2d734d22461 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -886,6 +886,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (rc)
> return rc;
>
> + /*
> + * Initialize optional CXL scrub features
> + */
Single line comment fine, but given naming is obvious, no comment needed. However do
log a dev_dbg() if it fails (probably not for -ENOTSUPP
> + cxl_mem_patrol_scrub_init(cxlmd);
> +
> rc = devm_cxl_sanitize_setup_notifier(&pdev->dev, cxlmd);
> if (rc)
> return rc;
On Thu, 15 Feb 2024 19:14:47 +0800
<[email protected]> wrote:
> From: Shiju Jose <[email protected]>
>
> CXL spec 3.1 section 8.2.9.9.11.2 describes the DDR5 Error Check
> Scrub (ECS) control feature.
>
> The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
> Specification (JESD79-5) and allows the DRAM to internally read, correct
> single-bit errors, and write back corrected data bits to the DRAM array
> while providing transparency to error counts. The ECS control feature
> allows the request to configure ECS input configurations during system
> boot or at run-time.
>
> The ECS control allows the requester to change the log entry type, the ECS
> threshold count provided that the request is within the definition
> specified in DDR5 mode registers, change mode between codeword mode and
> row count mode, and reset the ECS counter.
>
> Open Question:
> Is cxl_mem_ecs_init() invoked in the right function in cxl/core/region.c?
>
> Signed-off-by: Shiju Jose <[email protected]>
Hi Shiju,
I'd missed the placement of declarations with __free in the previous code.
For these general agreement is just put the declaration where if is first
assigned. 2 reasons:
1) Makes it obvious right unwind is being provided.
2) Avoids ordering issues as cleanup entries run in reverse order when we
leave the scope. There aren't problems with that here, but we want
to make it as easy as possible for reviewers to see that.
> +/* CXL DDR5 ECS control functions */
> +static int cxl_mem_ecs_get_attrs(struct device *dev, int fru_id,
> + struct cxl_memdev_ecs_params *params)
> +{
> + struct cxl_memdev_ecs_feat_read_attrs *rd_attrs __free(kvfree) = NULL;
See below.
> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev->parent);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> + struct cxl_mbox_get_feat_in pi = {
> + .uuid = cxl_ecs_uuid,
> + .offset = 0,
> + .selection = CXL_GET_FEAT_SEL_CURRENT_VALUE,
> + };
> + struct cxl_ecs_context *cxl_ecs_ctx;
> + u8 threshold_index;
> + int ret;
> +
> + if (!mds)
> + return -EFAULT;
> + cxl_ecs_ctx = dev_get_drvdata(dev);
> +
> + pi.count = cxl_ecs_ctx->get_feat_size;
> + rd_attrs = kvmalloc(pi.count, GFP_KERNEL);
> + if (!rd_attrs)
> + return -ENOMEM;
> +
> + ret = cxl_get_feature(mds, &pi, rd_attrs);
> + if (ret) {
> + params->log_entry_type = 0;
> + params->threshold = 0;
As in previous, I'd expect this to be side effect free so leave these alone.
> + params->mode = 0;
> + return ret;
> + }
> + params->log_entry_type = FIELD_GET(CXL_MEMDEV_ECS_LOG_ENTRY_TYPE_MASK,
> + rd_attrs[fru_id].ecs_log_cap);
> + threshold_index = FIELD_GET(CXL_MEMDEV_ECS_THRESHOLD_COUNT_MASK,
> + rd_attrs[fru_id].ecs_config);
> + params->threshold = ecs_supp_threshold[threshold_index];
> + params->mode = FIELD_GET(CXL_MEMDEV_ECS_MODE_MASK,
> + rd_attrs[fru_id].ecs_config);
> +
> + return 0;
> +}
> +
> +static int __maybe_unused
> +cxl_mem_ecs_set_attrs(struct device *dev, int fru_id,
> + struct cxl_memdev_ecs_params *params, u8 param_type)
> +{
> + struct cxl_memdev_ecs_feat_read_attrs *rd_attrs __free(kvfree) = NULL;
> + struct cxl_memdev_ecs_set_feat_pi *set_pi __free(kvfree) = NULL;
Linus Torvalds is very much against this pattern for __free.
The declaration should be at the point of allocation.
> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev->parent);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> + struct cxl_mbox_get_feat_in pi = {
> + .uuid = cxl_ecs_uuid,
> + .offset = 0,
> + .selection = CXL_GET_FEAT_SEL_CURRENT_VALUE,
> + };
> + struct cxl_memdev_ecs_feat_wr_attrs *wr_attrs;
> + struct cxl_memdev_ecs_params rd_params;
> + struct cxl_ecs_context *cxl_ecs_ctx;
> + u16 nmedia_frus, count;
> + u32 set_pi_size;
> + int ret;
> +
> + if (!mds)
> + return -EFAULT;
> +
> + cxl_ecs_ctx = dev_get_drvdata(dev);
I'm not sure exactly which dev this is, but using drvdata
probably not appropriate here. Embed the struct in the memdev.
> + nmedia_frus = cxl_ecs_ctx->nregions;
> +
Have this here and similar cases.
stuct cxl_memdev_ecs_feat_read_attrs *rd_attrs __free(kvfree) =
kvmalloc();
Though some of these are small and if they are just use malloc() and
kfree.
> + rd_attrs = kvmalloc(cxl_ecs_ctx->get_feat_size, GFP_KERNEL);
> + if (!rd_attrs)
> + return -ENOMEM;
> +
> + pi.count = cxl_ecs_ctx->get_feat_size;
> + ret = cxl_get_feature(mds, &pi, rd_attrs);
> + if (ret)
> + return ret;
> + set_pi_size = sizeof(struct cxl_mbox_set_feat_in) +
> + cxl_ecs_ctx->set_feat_size;
> + set_pi = kvmalloc(set_pi_size, GFP_KERNEL);
As aboe. Drag the declaration and free logic down here.
> + if (!set_pi)
> + return -ENOMEM;
> +
> + set_pi->pi.uuid = cxl_ecs_uuid;
> + set_pi->pi.flags = CXL_SET_FEAT_FLAG_MOD_VALUE_SAVED_ACROSS_RESET |
> + CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER;
> + set_pi->pi.offset = 0;
> + set_pi->pi.version = CXL_MEMDEV_ECS_SET_FEAT_VERSION;
> + /* Fill writable attributes from the current attributes read for all the media FRUs */
> + wr_attrs = set_pi->wr_attrs;
> + for (count = 0; count < nmedia_frus; count++) {
> + wr_attrs[count].ecs_log_cap = rd_attrs[count].ecs_log_cap;
> + wr_attrs[count].ecs_config = rd_attrs[count].ecs_config;
> + }
> +
> + /* Fill attribute to be set for the media FRU */
> + switch (param_type) {
> + case CXL_MEMDEV_ECS_PARAM_LOG_ENTRY_TYPE:
> + if (params->log_entry_type != ECS_LOG_ENTRY_TYPE_DRAM &&
> + params->log_entry_type != ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU) {
> + dev_err(dev->parent,
> + "Invalid CXL ECS scrub log entry type(%d) to set\n",
> + params->log_entry_type);
> + dev_err(dev->parent,
> + "Log Entry Type 0: per DRAM 1: per Memory Media FRU\n");
> + return -EINVAL;
> + }
> + wr_attrs[fru_id].ecs_log_cap = FIELD_PREP(CXL_MEMDEV_ECS_LOG_ENTRY_TYPE_MASK,
> + params->log_entry_type);
> + break;
> + case CXL_MEMDEV_ECS_PARAM_THRESHOLD:
> + wr_attrs[fru_id].ecs_config &= ~CXL_MEMDEV_ECS_THRESHOLD_COUNT_MASK;
> + switch (params->threshold) {
> + case 256:
> + wr_attrs[fru_id].ecs_config |= FIELD_PREP(
> + CXL_MEMDEV_ECS_THRESHOLD_COUNT_MASK,
> + ECS_THRESHOLD_256);
> + break;
> + case 1024:
> + wr_attrs[fru_id].ecs_config |= FIELD_PREP(
> + CXL_MEMDEV_ECS_THRESHOLD_COUNT_MASK,
> + ECS_THRESHOLD_1024);
> + break;
> + case 4096:
> + wr_attrs[fru_id].ecs_config |= FIELD_PREP(
> + CXL_MEMDEV_ECS_THRESHOLD_COUNT_MASK,
> + ECS_THRESHOLD_4096);
> + break;
> + default:
> + dev_err(dev->parent,
> + "Invalid CXL ECS scrub threshold count(%d) to set\n",
> + params->threshold);
> + dev_err(dev->parent,
> + "Supported scrub threshold count: 256,1024,4096\n");
> + return -EINVAL;
> + }
> + break;
> + case CXL_MEMDEV_ECS_PARAM_MODE:
> + if (params->mode != ECS_MODE_COUNTS_ROWS &&
> + params->mode != ECS_MODE_COUNTS_CODEWORDS) {
> + dev_err(dev->parent,
> + "Invalid CXL ECS scrub mode(%d) to set\n",
> + params->mode);
> + dev_err(dev->parent,
> + "Mode 0: ECS counts rows with errors"
> + " 1: ECS counts codewords with errors\n");
> + return -EINVAL;
> + }
> + wr_attrs[fru_id].ecs_config &= ~CXL_MEMDEV_ECS_MODE_MASK;
> + wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_MEMDEV_ECS_MODE_MASK,
> + params->mode);
> + break;
> + case CXL_MEMDEV_ECS_PARAM_RESET_COUNTER:
> + wr_attrs[fru_id].ecs_config &= ~CXL_MEMDEV_ECS_RESET_COUNTER_MASK;
> + wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_MEMDEV_ECS_RESET_COUNTER_MASK,
> + params->reset_counter);
> + break;
> + default:
> + dev_err(dev->parent, "Invalid CXL ECS parameter to set\n");
> + return -EINVAL;
> + }
> + ret = cxl_set_feature(mds, set_pi, set_pi_size);
> + if (ret) {
> + dev_err(dev->parent, "CXL ECS set feature fail ret=%d\n", ret);
> + return ret;
> + }
> +
> + /* Verify attribute is set successfully */
> + ret = cxl_mem_ecs_get_attrs(dev, fru_id, &rd_params);
> + if (ret) {
> + dev_err(dev->parent, "Get cxlmemdev ECS params fail ret=%d\n", ret);
> + return ret;
> + }
> + switch (param_type) {
> + case CXL_MEMDEV_ECS_PARAM_LOG_ENTRY_TYPE:
> + if (rd_params.log_entry_type != params->log_entry_type)
> + return -EFAULT;
> + break;
> + case CXL_MEMDEV_ECS_PARAM_THRESHOLD:
> + if (rd_params.threshold != params->threshold)
> + return -EFAULT;
> + break;
> + case CXL_MEMDEV_ECS_PARAM_MODE:
> + if (rd_params.mode != params->mode)
> + return -EFAULT;
> + break;
> + }
> +
> + return 0;
> +}
> +
> +int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id)
> +{
> + struct cxl_mbox_supp_feat_entry feat_entry;
> + struct cxl_ecs_context *cxl_ecs_ctx;
> + int nmedia_frus;
> + int ret;
> +
> + ret = cxl_mem_get_supported_feature_entry(cxlmd, &cxl_ecs_uuid, &feat_entry);
> + if (ret < 0)
> + return ret;
> +
> + if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
> + return -ENOTSUPP;
> + nmedia_frus = feat_entry.get_feat_size/
> + sizeof(struct cxl_memdev_ecs_feat_read_attrs);
> + if (nmedia_frus) {
Flip logic and this ends up simpler (don't think this changes much in later patches).
if (!nmedia_frus)
return -ENODEV; or similar.
> + cxl_ecs_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ecs_ctx), GFP_KERNEL);
> + if (!cxl_ecs_ctx)
> + return -ENOMEM;
> +
> + cxl_ecs_ctx->nregions = nmedia_frus;
> + cxl_ecs_ctx->get_feat_size = feat_entry.get_feat_size;
> + cxl_ecs_ctx->set_feat_size = feat_entry.set_feat_size;
> + cxl_ecs_ctx->region_id = region_id;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_ecs_init, CXL);
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index ce0e2d82bb2b..35b57f0d85fa 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2913,6 +2913,7 @@ int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
> dev_err(&cxlr->dev, "failed to enable, range: %pr\n",
> p->res);
> }
> + cxl_mem_ecs_init(cxlmd, atomic_read(&cxlrd->region_id));
Add debug message her eif it fails.
>
> put_device(region_dev);
> out:
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 7025c4fd66f3..06965ba89085 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -951,9 +951,12 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
> /* cxl memory scrub functions */
> #ifdef CONFIG_CXL_SCRUB
> int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd);
> +int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id);
> #else
> static inline int cxl_mem_patrol_scrub_init(struct cxl_memdev *cxlmd)
> { return -ENOTSUPP; }
> +static inline int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id)
> +{ return -ENOTSUPP; }
As in previous - don't use this error code.
> #endif
>
> #ifdef CONFIG_CXL_SUSPEND
On Thu, 15 Feb 2024 19:14:49 +0800
<[email protected]> wrote:
> From: Shiju Jose <[email protected]>
>
> Register with the scrub configure driver to expose the sysfs attributes
> to the user for configuring the CXL device memory patrol scrub. Add the
> callback functions to support configuring the CXL memory device patrol
> scrub.
>
> Signed-off-by: Shiju Jose <[email protected]>
Trivial comment inline.
> diff --git a/drivers/cxl/core/memscrub.c b/drivers/cxl/core/memscrub.c
> index a3a371c5aa7b..a1fb40f8307f 100644
> --- a/drivers/cxl/core/memscrub.c
> +++ b/drivers/cxl/core/memscrub.c
> @@ -6,14 +6,19 @@
> +
> +/**
> + * cxl_mem_patrol_scrub_is_visible() - Callback to return attribute visibility
> + * @dev: Pointer to scrub device
> + * @attr: Scrub attribute
> + * @region_id: ID of the memory region
> + *
> + * Returns: 0 on success, an error otherwise
> + */
> +static umode_t cxl_mem_patrol_scrub_is_visible(struct device *dev,
> + u32 attr_id, int region_id)
> +{
> + const struct cxl_patrol_scrub_context *cxl_ps_ctx = dev_get_drvdata(dev);
> +
> + if (attr_id == scrub_rate_available ||
> + attr_id == scrub_rate) {
> + if (!cxl_ps_ctx->scrub_cycle_changeable)
> + return 0;
> + }
> +
> + switch (attr_id) {
> + case scrub_rate_available:
> + return 0444;
Usual trick on these is to write back their default values if we support them.
If we can make this function take that as well then this bcomes
return mode;
for all those supported.
> + case scrub_enable:
> + case scrub_rate:
> + return 0644;
> + default:
> + return 0;
> + }
> +}
> +
> From: Shiju Jose <[email protected]>
>
> Add scrub driver supports configuring the memory scrubs in the system.
> The scrub driver provides the interface for registering the scrub devices
> and supports configuring memory scrubs in the system.
> Driver exposes the sysfs scrub control attributes to the user in
> /sys/class/scrub/scrubX/regionN/
>
> Signed-off-by: Shiju Jose <[email protected]>
Hi Shiju,
A few minor things inline. Given I reviewed this internally I don't
have that much to add!
Jonathan
> ---
> .../ABI/testing/sysfs-class-scrub-configure | 91 +++++
> drivers/memory/Kconfig | 1 +
> drivers/memory/Makefile | 1 +
> drivers/memory/scrub/Kconfig | 11 +
> drivers/memory/scrub/Makefile | 6 +
> drivers/memory/scrub/memory-scrub.c | 367 ++++++++++++++++++
> include/memory/memory-scrub.h | 78 ++++
> 7 files changed, 555 insertions(+)
> create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
> create mode 100644 drivers/memory/scrub/Kconfig
> create mode 100644 drivers/memory/scrub/Makefile
> create mode 100755 drivers/memory/scrub/memory-scrub.c
> create mode 100755 include/memory/memory-scrub.h
>
> diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
> new file mode 100644
> index 000000000000..d2d422b667cf
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
> +What: /sys/class/scrub/scrubX/regionN/rate_available
> +Date: January 2024
> +KernelVersion: 6.8
> +Contact: [email protected]
> +Description:
> + (RO) Supported range for the scrub rate)
> + by the scrubber for a memory region.
> + The unit of the scrub rate vary depends on the scrub.
Not good to have a unit that is dependent on scrub. We need to figure
out how to either define that, or provide an interface to expose it
to userspace and make it a userspace tool problem.
> diff --git a/drivers/memory/scrub/memory-scrub.c b/drivers/memory/scrub/memory-scrub.c
> new file mode 100755
> index 000000000000..a160b7a047e4
> --- /dev/null
> +SCRUB_ATTR_RW(addr_base);
> +SCRUB_ATTR_RW(addr_size);
> +SCRUB_ATTR_RW(enable);
> +SCRUB_ATTR_RW(enable_background_scrub);
> +SCRUB_ATTR_RW(rate);
> +SCRUB_ATTR_RO(rate_available);
> +
> +static struct attribute *scrub_attrs[] = {
> + &dev_attr_addr_base.attr,
> + &dev_attr_addr_size.attr,
> + &dev_attr_enable.attr,
> + &dev_attr_enable_background_scrub.attr,
> + &dev_attr_rate.attr,
> + &dev_attr_rate_available.attr,
> + NULL,
no comma
> +};
> +
> +static struct device *
> +scrub_device_register(struct device *dev, const char *name, void *drvdata,
> + const struct scrub_ops *ops,
> + int region_id,
> + struct attribute_group *attr_group)
> +{
> + struct scrub_device *scrub_dev;
> + struct device *hdev;
> + int err;
> +
> + scrub_dev = kzalloc(sizeof(*scrub_dev), GFP_KERNEL);
> + if (!scrub_dev)
> + return ERR_PTR(-ENOMEM);
> + hdev = &scrub_dev->dev;
> +
> + scrub_dev->id = ida_alloc(&scrub_ida, GFP_KERNEL);
> + if (scrub_dev->id < 0) {
> + kfree(scrub_dev);
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + snprintf((char *)scrub_dev->region_name, SCRUB_MAX_SYSFS_ATTR_NAME_LENGTH,
> + "region%d", region_id);
> + if (attr_group) {
I'd like a comment on this. Not immediately obvious what this parameter is to me,
or when we would and wouldn't have one.
> + attr_group->name = (char *)scrub_dev->region_name;
> + scrub_dev->groups[0] = attr_group;
> + scrub_dev->region_id = region_id;
> + } else {
> + scrub_dev->group.name = (char *)scrub_dev->region_name;
In both paths, drop out of if / else
> + scrub_dev->group.attrs = scrub_attrs;
> + scrub_dev->group.is_visible = scrub_attr_visible;
> + scrub_dev->groups[0] = &scrub_dev->group;
> + scrub_dev->ops = ops;
> + scrub_dev->region_id = region_id;
Set in both paths, so drop out of the if / else;
> + }
> +
> + hdev->groups = scrub_dev->groups;
> + hdev->class = &scrub_class;
> + hdev->parent = dev;
> + dev_set_drvdata(hdev, drvdata);
> + dev_set_name(hdev, SCRUB_ID_FORMAT, scrub_dev->id);
> + snprintf(scrub_dev->name, SCRUB_DEV_MAX_NAME_LENGTH, "%s", name);
> + err = device_register(hdev);
> + if (err) {
> + put_device(hdev);
> + return ERR_PTR(err);
> + }
> +
> + return hdev;
> +}
> +
> +static void devm_scrub_release(void *dev)
> +{
> + struct device *hdev = dev;
> +
> + device_unregister(hdev);
Trivial but local variable doesn't really add anything.
deivce_unregister(dev);
is pretty clear on types!
> +}
> diff --git a/include/memory/memory-scrub.h b/include/memory/memory-scrub.h
> new file mode 100755
> index 000000000000..3d7054e98b9a
> --- /dev/null
> +++ b/include/memory/memory-scrub.h
> @@ -0,0 +1,78 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Memory scrub controller driver support to configure
> + * the controls of the memory scrub and enable.
> + *
> + * Copyright (c) 2023 HiSilicon Limited.
> + */
> +
> +#ifndef __MEMORY_SCRUB_H
> +#define __MEMORY_SCRUB_H
> +
> +#include <linux/types.h>
> +
> +enum scrub_types {
> + scrub_common,
> + scrub_max,
No comma on a terminating entry like this.
> +};
On Thu, 15 Feb 2024 19:14:50 +0800
<[email protected]> wrote:
> From: Shiju Jose <[email protected]>
>
> Register with the scrub configure driver to expose the sysfs attributes
> to the user for configuring the CXL memory device's ECS feature.
> Add the static CXL ECS specific attributes to support configuring the
> CXL memory device ECS feature.
>
> Signed-off-by: Shiju Jose <[email protected]>
The ABI in here needs documentation. My key takeaway is that
it is very ECS specific. I think one of the big challenges of a common scrub
control system is going to be trying to come up with some meaningful
common ABI.
> ---
> drivers/cxl/core/memscrub.c | 253 +++++++++++++++++++++++++++++++++++-
> 1 file changed, 250 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/core/memscrub.c b/drivers/cxl/core/memscrub.c
> index a1fb40f8307f..325084b22e7a 100644
> --- a/drivers/cxl/core/memscrub.c
> +++ b/drivers/cxl/core/memscrub.c
> @@ -464,6 +464,8 @@ EXPORT_SYMBOL_NS_GPL(cxl_mem_patrol_scrub_init, CXL);
> #define CXL_MEMDEV_ECS_GET_FEAT_VERSION 0x01
> #define CXL_MEMDEV_ECS_SET_FEAT_VERSION 0x01
>
> +#define CXL_DDR5_ECS "cxl_ecs"
I would just put these name defines inline.
> +enum cxl_mem_ecs_scrub_attributes {
> + cxl_ecs_log_entry_type,
> + cxl_ecs_log_entry_type_per_dram,
> + cxl_ecs_log_entry_type_per_memory_media,
> + cxl_ecs_mode,
> + cxl_ecs_mode_counts_codewords,
> + cxl_ecs_mode_counts_rows,
> + cxl_ecs_reset,
> + cxl_ecs_threshold,
> + cxl_ecs_threshold_available,
> + cxl_ecs_max_attrs,
This is pretty much all custom ABI. Challenging to make it common with
the main scrub and RASF controls, but I think we do need to see if we can
come up with something that is at least vaguely consistent across
different forms of scrub control.
What the user cares about is how likely an error is to get past the
scrubbing that is running (I think - RAS folk speak up if I have
this wrong!)
So how do we go from the ECS parameters to that sort of info?
I think ECS is effectively scrubbing at a fixed rate (google suggests
all ram every 24 hours). We are really controlling what info is
reported rather than what scrub is carried out.
Useful stuff to potentially control but different from the
other cases.
> +};
> +
> int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id)
> {
> + char scrub_name[CXL_MEMDEV_MAX_NAME_LENGTH];
> struct cxl_mbox_supp_feat_entry feat_entry;
> struct cxl_ecs_context *cxl_ecs_ctx;
> + struct device *cxl_scrub_dev;
Make this more local as we don't need it out here?
> int nmedia_frus;
> int ret;
>
> @@ -755,6 +993,15 @@ int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id)
> cxl_ecs_ctx->get_feat_size = feat_entry.get_feat_size;
> cxl_ecs_ctx->set_feat_size = feat_entry.set_feat_size;
> cxl_ecs_ctx->region_id = region_id;
> +
> + snprintf(scrub_name, sizeof(scrub_name), "%s_%s_region%d",
> + CXL_DDR5_ECS, dev_name(&cxlmd->dev), cxl_ecs_ctx->region_id);
> + cxl_scrub_dev = devm_scrub_device_register(&cxlmd->dev, scrub_name,
> + cxl_ecs_ctx, NULL,
> + cxl_ecs_ctx->region_id,
> + &cxl_mem_ecs_attr_group);
> + if (IS_ERR(cxl_scrub_dev))
> + return PTR_ERR(cxl_scrub_dev);
> }
>
> return 0;
On Thu, 15 Feb 2024 19:14:52 +0800
<[email protected]> wrote:
> From: Shiju Jose <[email protected]>
>
> Add support for ACPI RAS2 feature table(RAS2) defined in the ACPI 6.5
> Specification & upwards revision, section 5.2.21.
>
> The RAS2 table provides interfaces for platform RAS features. RAS2 offers
> the same services as RASF, but is more scalable than the latter.
> RAS2 supports independent RAS controls and capabilities for a given RAS
> feature for multiple instances of the same component in a given system.
> The platform can support either RAS2 or RASF but not both.
>
> Link: https://github.com/acpica/acpica/pull/899
It merged in October. Rafael, can we get this into the kernel version
so we don't have a dependency in this patch set?
Thanks,
Jonathan
> Signed-off-by: Shiju Jose <[email protected]>
> ---
> include/acpi/actbl2.h | 137 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 137 insertions(+)
>
> diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
> index 9775384d61c6..15c271657f9f 100644
> --- a/include/acpi/actbl2.h
> +++ b/include/acpi/actbl2.h
> @@ -47,6 +47,7 @@
> #define ACPI_SIG_PPTT "PPTT" /* Processor Properties Topology Table */
> #define ACPI_SIG_PRMT "PRMT" /* Platform Runtime Mechanism Table */
> #define ACPI_SIG_RASF "RASF" /* RAS Feature table */
> +#define ACPI_SIG_RAS2 "RAS2" /* RAS2 Feature table */
> #define ACPI_SIG_RGRT "RGRT" /* Regulatory Graphics Resource Table */
> #define ACPI_SIG_RHCT "RHCT" /* RISC-V Hart Capabilities Table */
> #define ACPI_SIG_SBST "SBST" /* Smart Battery Specification Table */
> @@ -2751,6 +2752,142 @@ enum acpi_rasf_status {
> #define ACPI_RASF_ERROR (1<<2)
> #define ACPI_RASF_STATUS (0x1F<<3)
>
> +/*******************************************************************************
> + *
> + * RAS2 - RAS2 Feature Table (ACPI 6.5)
> + * Version 2
> + *
> + *
> + ******************************************************************************/
> +
> +struct acpi_table_ras2 {
> + struct acpi_table_header header; /* Common ACPI table header */
> + u16 reserved;
> + u16 num_pcc_descs;
> +};
> +
> +/*
> + * RAS2 Platform Communication Channel Descriptor
> + */
> +
> +struct acpi_ras2_pcc_desc {
> + u8 channel_id;
> + u16 reserved;
> + u8 feature_type;
> + u32 instance;
> +};
> +
> +/*
> + * RAS2 Platform Communication Channel Shared Memory Region
> + */
> +
> +struct acpi_ras2_shared_memory {
> + u32 signature;
> + u16 command;
> + u16 status;
> + u16 version;
> + u8 features[16];
> + u8 set_capabilities[16];
> + u16 num_parameter_blocks;
> + u32 set_capabilities_status;
> +};
> +
> +/* RAS2 Parameter Block Structure Header */
> +
> +struct acpi_ras2_parameter_block {
> + u16 type;
> + u16 version;
> + u16 length;
> +};
> +
> +/*
> + * RAS2 Parameter Block Structure for PATROL_SCRUB
> + */
> +
> +struct acpi_ras2_patrol_scrub_parameter {
> + struct acpi_ras2_parameter_block header;
> + u16 patrol_scrub_command;
> + u64 requested_address_range[2];
> + u64 actual_address_range[2];
> + u32 flags;
> + u32 scrub_params_out;
> + u32 scrub_params_in;
> +};
> +
> +/* Masks for Flags field above */
> +
> +#define ACPI_RAS2_SCRUBBER_RUNNING 1
> +
> +/*
> + * RAS2 Parameter Block Structure for LA2PA_TRANSLATION
> + */
> +
> +struct acpi_ras2_la2pa_translation_parameter {
> + struct acpi_ras2_parameter_block header;
> + u16 addr_translation_command;
> + u64 sub_instance_id;
> + u64 logical_address;
> + u64 physical_address;
> + u32 status;
> +};
> +
> +/* Channel Commands */
> +
> +enum acpi_ras2_commands {
> + ACPI_RAS2_EXECUTE_RAS2_COMMAND = 1
> +};
> +
> +/* Platform RAS2 Features */
> +
> +enum acpi_ras2_features {
> + ACPI_RAS2_PATROL_SCRUB_SUPPORTED = 0,
> + ACPI_RAS2_LA2PA_TRANSLATION = 1
> +};
> +
> +/* RAS2 Patrol Scrub Commands */
> +
> +enum acpi_ras2_patrol_scrub_commands {
> + ACPI_RAS2_GET_PATROL_PARAMETERS = 1,
> + ACPI_RAS2_START_PATROL_SCRUBBER = 2,
> + ACPI_RAS2_STOP_PATROL_SCRUBBER = 3
> +};
> +
> +/* RAS2 LA2PA Translation Commands */
> +
> +enum acpi_ras2_la2pa_translation_commands {
> + ACPI_RAS2_GET_LA2PA_TRANSLATION = 1
> +};
> +
> +/* RAS2 LA2PA Translation Status values */
> +
> +enum acpi_ras2_la2pa_translation_status {
> + ACPI_RAS2_LA2PA_TRANSLATION_SUCCESS = 0,
> + ACPI_RAS2_LA2PA_TRANSLATION_FAIL = 1
> +};
> +
> +/* Channel Command flags */
> +
> +#define ACPI_RAS2_GENERATE_SCI (1<<15)
> +
> +/* Status values */
> +
> +enum acpi_ras2_status {
> + ACPI_RAS2_SUCCESS = 0,
> + ACPI_RAS2_NOT_VALID = 1,
> + ACPI_RAS2_NOT_SUPPORTED = 2,
> + ACPI_RAS2_BUSY = 3,
> + ACPI_RAS2_FAILED = 4,
> + ACPI_RAS2_ABORTED = 5,
> + ACPI_RAS2_INVALID_DATA = 6
> +};
> +
> +/* Status flags */
> +
> +#define ACPI_RAS2_COMMAND_COMPLETE (1)
> +#define ACPI_RAS2_SCI_DOORBELL (1<<1)
> +#define ACPI_RAS2_ERROR (1<<2)
> +#define ACPI_RAS2_STATUS (0x1F<<3)
> +
> /*******************************************************************************
> *
> * RGRT - Regulatory Graphics Resource Table
On Thu, 15 Feb 2024 19:14:51 +0800
<[email protected]> wrote:
> From: A Somasundaram <[email protected]>
>
> The code contains PCC interfaces for RASF and RAS2 table, functions to send
> RASF commands as per ACPI 5.1 and RAS2 commands as per ACPI 6.5 & upwards
> revision.
>
> References for this implementation,
> ACPI specification 6.5, section 5.2.20 for RASF table, section 5.2.21 for RAS2
> table and chapter 14 for PCC (Platform Communication Channel).
>
> Driver uses PCC interfaces to communicate to the ACPI HW.
> This code implements PCC interfaces and the functions to send the RASF/RAS2
> commands to be used by OSPM.
>
> Signed-off-by: A Somasundaram <[email protected]>
> Co-developed-by: Shiju Jose <[email protected]>
> Signed-off-by: Shiju Jose <[email protected]>
I looked at this in depth a while back so this time a quicker review. Just some
trivial stuff inline.
> diff --git a/drivers/acpi/rasf_acpi_common.c b/drivers/acpi/rasf_acpi_common.c
> new file mode 100755
> index 000000000000..3ee34f5d12d3
> --- /dev/null
> +++ b/drivers/acpi/rasf_acpi_common.c
> @@ -0,0 +1,272 @@
..
> +
> +#include <linux/export.h>
> +#include <linux/delay.h>
> +#include <linux/ktime.h>
> +#include <linux/platform_device.h>
> +#include <acpi/rasf_acpi.h>
> +#include <acpi/acpixf.h>
> +
> +static int rasf_check_pcc_chan(struct rasf_context *rasf_ctx)
> +{
> + int ret = -EIO;
> + struct acpi_rasf_shared_memory __iomem *generic_comm_base = rasf_ctx->pcc_comm_addr;
> + ktime_t next_deadline = ktime_add(ktime_get(), rasf_ctx->deadline);
> +
> + while (!ktime_after(ktime_get(), next_deadline)) {
> + /*
> + * As per ACPI spec, the PCC space wil be initialized by
> + * platform and should have set the command completion bit when
> + * PCC can be used by OSPM
> + */
> + if (readw_relaxed(&generic_comm_base->status) & RASF_PCC_CMD_COMPLETE) {
> + ret = 0;
return 0;
> + break;
> + }
> + /*
> + * Reducing the bus traffic in case this loop takes longer than
> + * a few retries.
> + */
> + udelay(10);
> + }
> +
> + return ret;
return -EIO;
> +}
> +
> +/**
> + * rasf_send_pcc_cmd() - Send RASF command via PCC channel
> + * @rasf_ctx: pointer to the rasf context structure
> + * @cmd: command to send
> + *
> + * Returns: 0 on success, an error otherwise
> + */
> +int rasf_send_pcc_cmd(struct rasf_context *rasf_ctx, u16 cmd)
> +{
> + int ret = -EIO;
Looks like it's overwritten in all paths where ret is used.
> + struct acpi_rasf_shared_memory *generic_comm_base =
> + (struct acpi_rasf_shared_memory *)rasf_ctx->pcc_comm_addr;
> + static ktime_t last_cmd_cmpl_time, last_mpar_reset;
> + static int mpar_count;
> + unsigned int time_delta;
> +
> + if (cmd == RASF_PCC_CMD_EXEC) {
> + ret = rasf_check_pcc_chan(rasf_ctx);
> + if (ret)
> + return ret;
> + }
> +
> + /*
> + * Handle the Minimum Request Turnaround Time(MRTT)
> + * "The minimum amount of time that OSPM must wait after the completion
> + * of a command before issuing the next command, in microseconds"
> + */
> + if (rasf_ctx->pcc_mrtt) {
> + time_delta = ktime_us_delta(ktime_get(), last_cmd_cmpl_time);
> + if (rasf_ctx->pcc_mrtt > time_delta)
> + udelay(rasf_ctx->pcc_mrtt - time_delta);
> + }
> +
> + /*
> + * Handle the non-zero Maximum Periodic Access Rate(MPAR)
> + * "The maximum number of periodic requests that the subspace channel can
> + * support, reported in commands per minute. 0 indicates no limitation."
> + *
> + * This parameter should be ideally zero or large enough so that it can
> + * handle maximum number of requests that all the cores in the system can
> + * collectively generate. If it is not, we will follow the spec and just
> + * not send the request to the platform after hitting the MPAR limit in
> + * any 60s window
> + */
> + if (rasf_ctx->pcc_mpar) {
> + if (mpar_count == 0) {
> + time_delta = ktime_ms_delta(ktime_get(), last_mpar_reset);
> + if (time_delta < 60 * MSEC_PER_SEC) {
> + pr_debug("PCC cmd not sent due to MPAR limit");
> + return -EIO;
> + }
> + last_mpar_reset = ktime_get();
> + mpar_count = rasf_ctx->pcc_mpar;
> + }
> + mpar_count--;
> + }
> +
> + /* Write to the shared comm region. */
> + writew_relaxed(cmd, &generic_comm_base->command);
> +
> + /* Flip CMD COMPLETE bit */
> + writew_relaxed(0, &generic_comm_base->status);
> +
> + /* Ring doorbell */
> + ret = mbox_send_message(rasf_ctx->pcc_channel, &cmd);
> + if (ret < 0) {
> + pr_err("Err sending PCC mbox message. cmd:%d, ret:%d\n",
> + cmd, ret);
> + return ret;
> + }
> +
> + /*
> + * For READs we need to ensure the cmd completed to ensure
> + * the ensuing read()s can proceed. For WRITEs we dont care
> + * because the actual write()s are done before coming here
> + * and the next READ or WRITE will check if the channel
> + * is busy/free at the entry of this call.
> + *
> + * If Minimum Request Turnaround Time is non-zero, we need
> + * to record the completion time of both READ and WRITE
> + * command for proper handling of MRTT, so we need to check
> + * for pcc_mrtt in addition to CMD_READ
> + */
> + if (cmd == RASF_PCC_CMD_EXEC || rasf_ctx->pcc_mrtt) {
> + ret = rasf_check_pcc_chan(rasf_ctx);
> + if (rasf_ctx->pcc_mrtt)
> + last_cmd_cmpl_time = ktime_get();
> + }
> +
> + if (rasf_ctx->pcc_channel->mbox->txdone_irq)
> + mbox_chan_txdone(rasf_ctx->pcc_channel, ret);
> + else
> + mbox_client_txdone(rasf_ctx->pcc_channel, ret);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(rasf_send_pcc_cmd);
> +
> +/**
> + * rasf_register_pcc_channel() - Register PCC channel
> + * @rasf_ctx: pointer to the rasf context structure
> + *
> + * Returns: 0 on success, an error otherwise
> + */
> +int rasf_register_pcc_channel(struct rasf_context *rasf_ctx)
> +{
> + u64 usecs_lat;
> + unsigned int len;
> + struct pcc_mbox_chan *pcc_chan;
> + struct mbox_client *rasf_mbox_cl;
> + struct acpi_pcct_hw_reduced *rasf_ss;
> +
> + rasf_mbox_cl = &rasf_ctx->mbox_client;
> + if (!rasf_mbox_cl || rasf_ctx->pcc_subspace_idx < 0)
> + return -EINVAL;
> +
> + pcc_chan = pcc_mbox_request_channel(rasf_mbox_cl,
> + rasf_ctx->pcc_subspace_idx);
> +
> + if (IS_ERR(pcc_chan)) {
> + pr_err("Failed to find PCC channel for subspace %d\n",
> + rasf_ctx->pcc_subspace_idx);
> + return -ENODEV;
> + }
> + rasf_ctx->pcc_chan = pcc_chan;
If you are storing the chan, why do we need to sparately
store mchan?
> + rasf_ctx->pcc_channel = pcc_chan->mchan;
> + /*
> + * The PCC mailbox controller driver should
> + * have parsed the PCCT (global table of all
> + * PCC channels) and stored pointers to the
> + * subspace communication region in con_priv.
> + */
> + rasf_ss = rasf_ctx->pcc_channel->con_priv;
> +
> + if (!rasf_ss) {
> + pr_err("No PCC subspace found for RASF\n");
> + pcc_mbox_free_channel(rasf_ctx->pcc_chan);
> + return -ENODEV;
> + }
> +
> + /*
> + * This is the shared communication region
> + * for the OS and Platform to communicate over.
> + */
> + rasf_ctx->comm_base_addr = rasf_ss->base_address;
> + len = rasf_ss->length;
> + pr_debug("PCC subspace for RASF=0x%llx len=%d\n",
> + rasf_ctx->comm_base_addr, len);
dev_dbg(rasf_ctx->dev ...
throughout probably better.
> +
> + /*
> + * rasf_ss->latency is just a Nominal value. In reality
> + * the remote processor could be much slower to reply.
> + * So add an arbitrary amount of wait on top of Nominal.
> + */
> + usecs_lat = RASF_NUM_RETRIES * rasf_ss->latency;
> + rasf_ctx->deadline = ns_to_ktime(usecs_lat * NSEC_PER_USEC);
> + rasf_ctx->pcc_mrtt = rasf_ss->min_turnaround_time;
> + rasf_ctx->pcc_mpar = rasf_ss->max_access_rate;
> + rasf_ctx->pcc_comm_addr = acpi_os_ioremap(rasf_ctx->comm_base_addr, len);
> + pr_debug("pcc_comm_addr=%p\n", rasf_ctx->pcc_comm_addr);
> +
> + /* Set flag so that we dont come here for each CPU. */
> + rasf_ctx->pcc_channel_acquired = true;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(rasf_register_pcc_channel);
> +/**
> + * rasf_add_platform_device() - Add a platform device for RASF
> + * @name: name of the device we're adding
> + * @data: platform specific data for this platform device
> + * @size: size of platform specific data
> + *
> + * Returns: pointer to platform device on success, an error otherwise
I wonder if we should just rename this to ras2 and ignore the fact
it came form rasf?
> + */
> +struct platform_device *rasf_add_platform_device(char *name, const void *data,
> + size_t size)
> +{
> + int ret;
> + struct platform_device *pdev;
> +
> + pdev = platform_device_alloc(name, PLATFORM_DEVID_AUTO);
> + if (!pdev)
> + return NULL;
> +
> + ret = platform_device_add_data(pdev, data, size);
> + if (ret)
> + goto dev_put;
> +
> + ret = platform_device_add(pdev);
> + if (ret)
> + goto dev_put;
> +
> + return pdev;
> +
> +dev_put:
> + platform_device_put(pdev);
> +
> + return NULL;
Could return an error pointer to provide more info from ret?
> +}
> diff --git a/include/acpi/rasf_acpi.h b/include/acpi/rasf_acpi.h
> new file mode 100644
> index 000000000000..aa4f935b28cf
> --- /dev/null
> +++ b/include/acpi/rasf_acpi.h
> @@ -0,0 +1,58 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * RASF driver header file
> + *
> + * (C) Copyright 2014, 2015 Hewlett-Packard Enterprises
> + *
> + * Copyright (c) 2023 HiSilicon Limited
> + */
> +
> +#ifndef _RASF_ACPI_H
> +#define _RASF_ACPI_H
> +
> +#include <linux/acpi.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mailbox_controller.h>
> +#include <linux/types.h>
> +#include <acpi/pcc.h>
> +
> +#define RASF_PCC_CMD_COMPLETE 1
> +
> +/* RASF specific PCC commands */
> +#define RASF_PCC_CMD_EXEC 0x01
> +
> +#define RASF_FAILURE 0
> +#define RASF_SUCCESS 1
> +
> +/*
> + * Arbitrary Retries for PCC commands.
Perhaps a comment on why PCC retry might be needed?
> + */
> +#define RASF_NUM_RETRIES 600
> +
> +/*
> + * Data structures for PCC communication and RASF table
> + */
> +struct rasf_context {
> + struct device *dev;
> + int id;
> + struct mbox_client mbox_client;
> + struct mbox_chan *pcc_channel;
> + struct pcc_mbox_chan *pcc_chan;
> + void __iomem *pcc_comm_addr;
> + u64 comm_base_addr;
> + int pcc_subspace_idx;
> + bool pcc_channel_acquired;
> + ktime_t deadline;
Perhaps move all the pcc channel specific stuff to a named struct
struct {
unsigned int mpar;
unsigned int mrtt;
} pcc;
> + unsigned int pcc_mpar;
> + unsigned int pcc_mrtt;
> + spinlock_t spinlock; /* Lock to provide mutually exclusive access to PCC channel */
Move comment to line above. Saves on long line without loss of readability.
> + struct device *scrub_dev;
> + const struct rasf_hw_scrub_ops *ops;
> +};
On Thu, 15 Feb 2024 19:14:54 +0800
<[email protected]> wrote:
> From: Shiju Jose <[email protected]>
>
> Memory RAS2 driver binds to the platform device add by the ACPI RAS2
> driver.
> Driver registers the PCC channel for communicating with the ACPI compliant
> platform that contains RAS2 command support in the hardware.
>
> Add interface functions to support configuring the parameters of HW patrol
> scrubs in the system, which exposed to the kernel via the RAS2 and PCC,
> using the RAS2 commands.
>
> Add support for RAS2 platform devices to register with scrub subsystem
> driver. This enables user to configure the parameters of HW patrol scrubs,
> which exposed to the kernel via the RAS2 table, through the scrub sysfs
> attributes.
>
> Open Question:
> Sysfs scrub control attribute "enable_background_scrub" is added for RAS2,
> based on the feedback from Bill Schwartz <[email protected]
> on v4 to enable/disable the background_scrubbing in the platform as defined in the
> “Configure Scrub Parameters [INPUT]“ field in RAS2 Table 5.87: Parameter Block
> Structure for PATROL_SCRUB.
> Is it a right approach to support "enable_background_scrub" in the sysfs
> scrub control?
Does anyone know what this means? IIUC patrol scrub is always background...
>
> Signed-off-by: Shiju Jose <[email protected]>
A few minor comments inline. Rushed review as out of time for today though
so may have missed stuff.
> diff --git a/drivers/memory/rasf_common.c b/drivers/memory/rasf_common.c
> new file mode 100644
> index 000000000000..85f67308698d
> --- /dev/null
> +++ b/drivers/memory/rasf_common.c
> @@ -0,0 +1,269 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * rasf_common.c - Common functions for memory RASF driver
> + *
> + * Copyright (c) 2023 HiSilicon Limited.
> + *
> + * This driver implements call back functions for the scrub
> + * configure driver to configure the parameters of the hw patrol
> + * scrubbers in the system, which exposed via the ACPI RASF/RAS2
> + * table and PCC.
> + */
> +
> +#define pr_fmt(fmt) "MEMORY RASF COMMON: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/io.h>
> +#include <linux/interrupt.h>
> +#include <linux/mailbox_controller.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
of.h Shouldn't be here in an ACPI driver!
> +#include <linux/platform_device.h>
> +
> +#include <acpi/rasf_acpi.h>
> +#include <memory/rasf.h>
> +
> +static int enable_write(struct rasf_context *rasf_ctx, long val)
> +{
> + int ret;
> + bool enable = val;
> +
> + ret = rasf_ctx->ops->enable_scrub(rasf_ctx, enable);
> + if (ret) {
> + pr_err("enable patrol scrub fail, enable=%d ret=%d\n",
> + enable, ret);
dev_err(rasf_ctx->dev,...
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * rasf_hw_scrub_is_visible() - Callback to return attribute visibility
> + * @drv_data: Pointer to driver-private data structure passed
> + * as argument to devm_scrub_device_register().
> + * @attr_id: Scrub attribute
> + * @region_id: ID of the memory region
> + *
> + * Returns: 0 on success, an error otherwise
> + */
> +umode_t rasf_hw_scrub_is_visible(struct device *dev, u32 attr_id, int region_id)
> +{
> + switch (attr_id) {
> + case scrub_rate_available:
> + return 0444;
> + case scrub_enable:
> + case scrub_enable_background_scrub:
> + return 0200;
> + case scrub_addr_base:
> + case scrub_addr_size:
> + case scrub_rate:
> + return 0644;
As before, I'd prefer to see this passed the current permissions then just
return those rather than encoding them here and in the attributes where they
may end up out of sync
> + default:
> + return 0;
> + }
> +}
> +
> +/**
> + * rasf_hw_scrub_read_strings() - Read callback for string attributes
> + * @device: Pointer to scrub device
> + * @attr_id: Scrub attribute
> + * @region_id: ID of the memory region
> + * @buf: Pointer to the buffer for copying returned string
> + *
> + * Returns: 0 on success, an error otherwise
> + */
> +int rasf_hw_scrub_read_strings(struct device *device, u32 attr_id, int region_id,
> + char *buf)
dev maybe instead of device. Shorter lines and it's very common shorthand.
> +{
> + struct rasf_context *rasf_ctx;
> +
> + rasf_ctx = dev_get_drvdata(device);
struct rasf_context *rasf_ctx = dev_get_drvdata(dev);
Same throughout.
> +
> + switch (attr_id) {
> + case scrub_rate_available:
> + return rate_available_read(rasf_ctx, buf);
> + default:
> + return -ENOTSUPP;
> + }
> +}
shiju.jose@ wrote:
> From: Shiju Jose <[email protected]>
>
> 1. Add support for CXL feature mailbox commands.
> 2. Add CXL device scrub driver supporting patrol scrub control and ECS
> control features.
> 3. Add scrub subsystem driver supports configuring memory scrubs in the system.
> 4. Register CXL device patrol scrub and ECS with scrub subsystem.
> 5. Add common library for RASF and RAS2 PCC interfaces.
> 6. Add driver for ACPI RAS2 feature table (RAS2).
> 7. Add memory RAS2 driver and register with scrub subsystem.
I stepped away from this patch set to focus on the changes that landed
for v6.8 and the follow-on regression fixups. Now that v6.8 CXL work has
quieted down and I circle back to this set for v6.9 I find the lack of
story in this cover letter to be unsettling. As a reviewer I should not
have to put together the story on why Linux should care about this
feature and independently build up the maintainence-burden vs benefit
tradeoff analysis.
Maybe it is self evident to others, but for me there is little in these
changelogs besides "mechanism exists, enable it". There are plenty of
platform or device mechanisms that get specified that Linux does not
enable for one reason or another.
The cover letter needs to answer why it matters, and what are the
tradeoffs. Mind you, in my submissions I do not always get this right in
the cover letter [1], but hopefully at least one of the patches tells
the story [2].
In other words, imagine you are writing the pull request to Linus or
someone else with limited time who needs to make a risk decision on a
pull request with a diffstat of:
23 files changed, 3083 insertions(+)
..where the easiest decision is to just decline. As is, these
changelogs are not close to tipping the scale to "accept".
[sidebar: how did this manage to implement a new subsystem with 2
consumers (CXL + ACPI), without modifying a single existing line? Zero
deletions? That is either an indication that Linux perfectly anticipated
this future use case (unlikely), or more work needs to be done to digest
an integrate these concepts into existing code paths]
One of the first questions for me is why CXL and RAS2 as the first
consumers and not NVDIMM-ARS and/or RASF Patrol Scrub? Part of the
maintenance burden tradeoff is providing a migration path for legacy on
the way to adding the new thing. If old scrub implementations could be
deprecated / deleted on the way to supporting new scrub use cases that
becomes interesting.
[1]: http://lore.kernel.org/r/20240208220909.GA975234@bhelgaas
[2]: http://lore.kernel.org/r/20240208221305.GA975512@bhelgaas
Hi Dan,
Thanks for the feedback.
Please find reply inline.
>-----Original Message-----
>From: Dan Williams <[email protected]>
>Sent: 22 February 2024 00:21
>To: Shiju Jose <[email protected]>; [email protected]; linux-
>[email protected]; [email protected]; [email protected];
>[email protected]; Jonathan Cameron <[email protected]>;
>[email protected]; [email protected]; [email protected];
>[email protected]
>Cc: [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]; [email protected];
>tanxiaofei <[email protected]>; Zengtao (B) <[email protected]>;
>[email protected]; wanghuiqiang <[email protected]>;
>Linuxarm <[email protected]>; Shiju Jose <[email protected]>
>Subject: RE: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands,
>CXL device patrol scrub control and DDR5 ECS control features
>
>shiju.jose@ wrote:
>> From: Shiju Jose <[email protected]>
>>
>> 1. Add support for CXL feature mailbox commands.
>> 2. Add CXL device scrub driver supporting patrol scrub control and ECS
>> control features.
>> 3. Add scrub subsystem driver supports configuring memory scrubs in the
>system.
>> 4. Register CXL device patrol scrub and ECS with scrub subsystem.
>> 5. Add common library for RASF and RAS2 PCC interfaces.
>> 6. Add driver for ACPI RAS2 feature table (RAS2).
>> 7. Add memory RAS2 driver and register with scrub subsystem.
>
>I stepped away from this patch set to focus on the changes that landed for v6.8
>and the follow-on regression fixups. Now that v6.8 CXL work has quieted down
>and I circle back to this set for v6.9 I find the lack of story in this cover letter to
>be unsettling. As a reviewer I should not have to put together the story on why
>Linux should care about this feature and independently build up the
>maintainence-burden vs benefit tradeoff analysis.
I will add more details to the cover letter.
>
>Maybe it is self evident to others, but for me there is little in these changelogs
>besides "mechanism exists, enable it". There are plenty of platform or device
>mechanisms that get specified that Linux does not enable for one reason or
>another.
>
>The cover letter needs to answer why it matters, and what are the tradeoffs.
>Mind you, in my submissions I do not always get this right in the cover letter [1],
>but hopefully at least one of the patches tells the story [2].
>
>In other words, imagine you are writing the pull request to Linus or someone
>else with limited time who needs to make a risk decision on a pull request with a
>diffstat of:
>
> 23 files changed, 3083 insertions(+)
>
>...where the easiest decision is to just decline. As is, these changelogs are not
>close to tipping the scale to "accept".
>
>[sidebar: how did this manage to implement a new subsystem with 2 consumers
>(CXL + ACPI), without modifying a single existing line? Zero deletions? That is
>either an indication that Linux perfectly anticipated this future use case
>(unlikely), or more work needs to be done to digest an integrate these concepts
>into existing code paths]
>
>One of the first questions for me is why CXL and RAS2 as the first consumers and
>not NVDIMM-ARS and/or RASF Patrol Scrub? Part of the maintenance burden
We don't personally care about NVDIMMS but would welcome drivers from others.
Regarding RASF patrol scrub no one cared about it as it's useless and
any new implementation should be RAS2.
Previous discussions in the community about RASF and scrub could be find here.
https://lore.kernel.org/lkml/[email protected]/#r
and some old ones,
https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
https://lore.kernel.org/all/[email protected]/
>tradeoff is providing a migration path for legacy on the way to adding the new
>thing. If old scrub implementations could be deprecated / deleted on the way to
>supporting new scrub use cases that becomes interesting.
>
>[1]: http://lore.kernel.org/r/20240208220909.GA975234@bhelgaas
>[2]: http://lore.kernel.org/r/20240208221305.GA975512@bhelgaas
Thanks,
Shiju
Hi Jonathan,
Thanks for the comments.
I will post the updated version incorporated with your suggestions in the series.
>-----Original Message-----
>From: Jonathan Cameron <[email protected]>
>Sent: 20 February 2024 11:14
>To: Shiju Jose <[email protected]>
>Cc: [email protected]; [email protected]; linux-
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; linux-
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected];
>tanxiaofei <[email protected]>; Zengtao (B) <[email protected]>;
>[email protected]; wanghuiqiang <[email protected]>;
>Linuxarm <[email protected]>
>Subject: Re: [RFC PATCH v6 02/12] cxl/mbox: Add GET_FEATURE mailbox
>command
>
>On Thu, 15 Feb 2024 19:14:44 +0800
><[email protected]> wrote:
>
>> From: Shiju Jose <[email protected]>
>>
>> Add support for GET_FEATURE mailbox command.
>>
>> CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
>> The settings of a feature can be retrieved using Get Feature command.
>Hi Shiju
>
>I think this needs to be more complex so that this utility function gets the whole
>feature, not just a section of it (subject to big enough buffer being available etc).
>We don't want the higher level code to have to deal with the complexity of small
>mailboxes.
Sure.
>
>A few other things inline.
>
>Jonathan
>
>>
>> Signed-off-by: Shiju Jose <[email protected]>
>> ---
>> drivers/cxl/core/mbox.c | 22 ++++++++++++++++++++++
>> drivers/cxl/cxlmem.h | 23 +++++++++++++++++++++++
>> 2 files changed, 45 insertions(+)
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index
>> 191f51f3df0e..f43189b6859a 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -1313,6 +1313,28 @@ int cxl_get_supported_features(struct
>> cxl_memdev_state *mds, }
>> EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
>>
>> +int cxl_get_feature(struct cxl_memdev_state *mds,
>> + struct cxl_mbox_get_feat_in *pi, void *feat_out)
>
>Comments below on this signature. Key is I'd expect this function to deal with
>potential need for multiple requests (small mailbox size compared to the size of
>the output data being read).
>
>To test that we'd probably have to tweak the qemu code to use a smaller
>mailbox.
>Or fake that in here so that we do multiple smaller reads.
Sure.
>
>> +{
>> + struct cxl_mbox_cmd mbox_cmd;
>> + int rc;
>> +
>> + mbox_cmd = (struct cxl_mbox_cmd) {
>> + .opcode = CXL_MBOX_OP_GET_FEATURE,
>> + .size_in = sizeof(*pi),
>> + .payload_in = pi,
>> + .size_out = le16_to_cpu(pi->count),
>> + .payload_out = feat_out,
>> + .min_out = le16_to_cpu(pi->count),
>
>Are there feature with variable responses sizes? I think there will be.
>size_out should be the size of the buffer, but min_out should be the size of the
>particular feature data header - note these will change as we iterate over
>multiple messages.
Sure.
>
>
>> + };
>> + rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>> + if (rc < 0)
>> + return rc;
>> +
>> + return 0;
>I think this should return the size to the caller, rather than 0 on success.
I will change.
>
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
>> +
>> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>> struct cxl_region *cxlr)
>> {
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index
>> 23e4d98b9bae..eaecc3234cfd 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -530,6 +530,7 @@ enum cxl_opcode {
>> CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
>> CXL_MBOX_OP_GET_LOG = 0x0401,
>> CXL_MBOX_OP_GET_SUPPORTED_FEATURES = 0x0500,
>> + CXL_MBOX_OP_GET_FEATURE = 0x0501,
>> CXL_MBOX_OP_IDENTIFY = 0x4000,
>> CXL_MBOX_OP_GET_PARTITION_INFO = 0x4100,
>> CXL_MBOX_OP_SET_PARTITION_INFO = 0x4101,
>> @@ -757,6 +758,26 @@ struct cxl_mbox_get_supp_feats_out {
>> struct cxl_mbox_supp_feat_entry feat_entries[]; } __packed;
>>
>> +/* Get Feature CXL 3.1 Spec 8.2.9.6.2 */
>> +/*
>> + * Get Feature input payload
>> + * CXL rev 3.1 section 8.2.9.6.2 Table 8-99 */
>> +/* Get Feature : Payload in selection */
>
>Naming of enum is good enough that I don't think we need this particular
>comment.
I will change.
>
>> +enum cxl_get_feat_selection {
>> + CXL_GET_FEAT_SEL_CURRENT_VALUE,
>> + CXL_GET_FEAT_SEL_DEFAULT_VALUE,
>> + CXL_GET_FEAT_SEL_SAVED_VALUE,
>> + CXL_GET_FEAT_SEL_MAX
>> +};
>> +
>> +struct cxl_mbox_get_feat_in {
>> + uuid_t uuid;
>> + __le16 offset;
>> + __le16 count;
>> + u8 selection;
>> +} __packed;
>> +
>> /* Get Poison List CXL 3.0 Spec 8.2.9.8.4.1 */ struct
>> cxl_mbox_poison_in {
>> __le64 offset;
>> @@ -891,6 +912,8 @@ int cxl_set_timestamp(struct cxl_memdev_state
>> *mds); int cxl_get_supported_features(struct cxl_memdev_state *mds,
>> struct cxl_mbox_get_supp_feats_in *pi,
>> void *feats_out);
>> +int cxl_get_feature(struct cxl_memdev_state *mds,
>> + struct cxl_mbox_get_feat_in *pi, void *feat_out);
>
>For this I'd expect us to wrap up the need for multi messages inside this.
>So this would then just take the feature index, a size for the output buffer overall
>size, plus min acceptable response size and a selection enum value.
Sure.
>
>int cxl_get_feature(struct cxl_memdev_state *mds,
> uuid_t feat,
> void *feat_out, size_t feat_out_min_size,
> size_t feat_out_size);
>
>> int cxl_poison_state_init(struct cxl_memdev_state *mds); int
>> cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>> struct cxl_region *cxlr);
Thanks,
Shiju
Shiju Jose wrote:
> Hi Dan,
>
> Thanks for the feedback.
>
> Please find reply inline.
>
> >-----Original Message-----
> >From: Dan Williams <[email protected]>
> >Sent: 22 February 2024 00:21
> >To: Shiju Jose <[email protected]>; [email protected]; linux-
> >[email protected]; [email protected]; [email protected];
> >[email protected]; Jonathan Cameron <[email protected]>;
> >[email protected]; [email protected]; [email protected];
> >[email protected]
> >Cc: [email protected]; [email protected];
> >[email protected]; [email protected]; [email protected];
> >[email protected]; [email protected]; [email protected];
> >[email protected]; [email protected]; [email protected];
> >[email protected]; [email protected]; [email protected];
> >[email protected]; [email protected]; [email protected];
> >[email protected]; [email protected]; [email protected];
> >[email protected]; [email protected];
> >[email protected]; [email protected];
> >tanxiaofei <[email protected]>; Zengtao (B) <[email protected]>;
> >[email protected]; wanghuiqiang <[email protected]>;
> >Linuxarm <[email protected]>; Shiju Jose <[email protected]>
> >Subject: RE: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands,
> >CXL device patrol scrub control and DDR5 ECS control features
> >
> >shiju.jose@ wrote:
> >> From: Shiju Jose <[email protected]>
> >>
> >> 1. Add support for CXL feature mailbox commands.
> >> 2. Add CXL device scrub driver supporting patrol scrub control and ECS
> >> control features.
> >> 3. Add scrub subsystem driver supports configuring memory scrubs in the
> >system.
> >> 4. Register CXL device patrol scrub and ECS with scrub subsystem.
> >> 5. Add common library for RASF and RAS2 PCC interfaces.
> >> 6. Add driver for ACPI RAS2 feature table (RAS2).
> >> 7. Add memory RAS2 driver and register with scrub subsystem.
> >
> >I stepped away from this patch set to focus on the changes that landed for v6.8
> >and the follow-on regression fixups. Now that v6.8 CXL work has quieted down
> >and I circle back to this set for v6.9 I find the lack of story in this cover letter to
> >be unsettling. As a reviewer I should not have to put together the story on why
> >Linux should care about this feature and independently build up the
> >maintainence-burden vs benefit tradeoff analysis.
> I will add more details to the cover letter.
>
> >
> >Maybe it is self evident to others, but for me there is little in these changelogs
> >besides "mechanism exists, enable it". There are plenty of platform or device
> >mechanisms that get specified that Linux does not enable for one reason or
> >another.
> >
> >The cover letter needs to answer why it matters, and what are the tradeoffs.
> >Mind you, in my submissions I do not always get this right in the cover letter [1],
> >but hopefully at least one of the patches tells the story [2].
> >
> >In other words, imagine you are writing the pull request to Linus or someone
> >else with limited time who needs to make a risk decision on a pull request with a
> >diffstat of:
> >
> > 23 files changed, 3083 insertions(+)
> >
> >...where the easiest decision is to just decline. As is, these changelogs are not
> >close to tipping the scale to "accept".
> >
> >[sidebar: how did this manage to implement a new subsystem with 2 consumers
> >(CXL + ACPI), without modifying a single existing line? Zero deletions? That is
> >either an indication that Linux perfectly anticipated this future use case
> >(unlikely), or more work needs to be done to digest an integrate these concepts
> >into existing code paths]
> >
> >One of the first questions for me is why CXL and RAS2 as the first consumers and
> >not NVDIMM-ARS and/or RASF Patrol Scrub? Part of the maintenance burden
> We don't personally care about NVDIMMS but would welcome drivers from others.
Upstream would also welcome consideration of maintenance burden
reduction before piling on, at least include *some* consideration of the
implications vs this response that comes off as "that's somebody else's
problem".
> Regarding RASF patrol scrub no one cared about it as it's useless and
> any new implementation should be RAS2.
The assertion that "RASF patrol scrub no one cared about it as it's
useless and any new implementation should be RAS2" needs evidence.
For example, what platforms are going to ship with RAS2 support, what
are the implications of Linux not having RAS2 scrub support in a month,
or in year? There are parts of the ACPI spec that have never been
implemented what is the evidence that RAS2 is not going to suffer the
same fate as RASF? There are parts of the CXL specification that have
never been implemented in mass market products.
> Previous discussions in the community about RASF and scrub could be find here.
> https://lore.kernel.org/lkml/[email protected]/#r
> and some old ones,
> https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
>
Do not make people hunt for old discussions, if there are useful points
in that discussion that make the case for the patch set include those in
the next submission, don't make people hunt for the latest state of the
story.
> https://lore.kernel.org/all/[email protected]/
Yes, now that is a useful changelog, thank you for highlighting it,
please follow its example.
On Fri, 23 Feb 2024 11:42:24 -0800
Dan Williams <[email protected]> wrote:
> Shiju Jose wrote:
> > Hi Dan,
> >
> > Thanks for the feedback.
> >
> > Please find reply inline.
> >
> > >-----Original Message-----
> > >From: Dan Williams <[email protected]>
> > >Sent: 22 February 2024 00:21
> > >To: Shiju Jose <[email protected]>; [email protected]; linux-
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; Jonathan Cameron <[email protected]>;
> > >[email protected]; [email protected]; vishal.l.verma@intelcom;
> > >[email protected]
> > >Cc: [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected];
> > >[email protected]; [email protected];
> > >tanxiaofei <[email protected]>; Zengtao (B) <[email protected]>;
> > >[email protected]; wanghuiqiang <[email protected]>;
> > >Linuxarm <[email protected]>; Shiju Jose <[email protected]>
> > >Subject: RE: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands,
> > >CXL device patrol scrub control and DDR5 ECS control features
> > >
> > >shiju.jose@ wrote:
> > >> From: Shiju Jose <[email protected]>
> > >>
> > >> 1. Add support for CXL feature mailbox commands.
> > >> 2. Add CXL device scrub driver supporting patrol scrub control and ECS
> > >> control features.
> > >> 3. Add scrub subsystem driver supports configuring memory scrubs in the
> > >system.
> > >> 4. Register CXL device patrol scrub and ECS with scrub subsystem.
> > >> 5. Add common library for RASF and RAS2 PCC interfaces.
> > >> 6. Add driver for ACPI RAS2 feature table (RAS2).
> > >> 7. Add memory RAS2 driver and register with scrub subsystem.
> > >
> > >I stepped away from this patch set to focus on the changes that landed for v6.8
> > >and the follow-on regression fixups. Now that v6.8 CXL work has quieted down
> > >and I circle back to this set for v6.9 I find the lack of story in this cover letter to
> > >be unsettling. As a reviewer I should not have to put together the story on why
> > >Linux should care about this feature and independently build up the
> > >maintainence-burden vs benefit tradeoff analysis.
> > I will add more details to the cover letter.
> >
> > >
> > >Maybe it is self evident to others, but for me there is little in these changelogs
> > >besides "mechanism exists, enable it". There are plenty of platform or device
> > >mechanisms that get specified that Linux does not enable for one reason or
> > >another.
> > >
> > >The cover letter needs to answer why it matters, and what are the tradeoffs.
> > >Mind you, in my submissions I do not always get this right in the cover letter [1],
> > >but hopefully at least one of the patches tells the story [2].
> > >
> > >In other words, imagine you are writing the pull request to Linus or someone
> > >else with limited time who needs to make a risk decision on a pull request with a
> > >diffstat of:
> > >
> > > 23 files changed, 3083 insertions(+)
> > >
> > >...where the easiest decision is to just decline. As is, these changelogs are not
> > >close to tipping the scale to "accept".
> > >
> > >[sidebar: how did this manage to implement a new subsystem with 2 consumers
> > >(CXL + ACPI), without modifying a single existing line? Zero deletions? That is
> > >either an indication that Linux perfectly anticipated this future use case
> > >(unlikely), or more work needs to be done to digest an integrate these concepts
> > >into existing code paths]
> > >
> > >One of the first questions for me is why CXL and RAS2 as the first consumers and
> > >not NVDIMM-ARS and/or RASF Patrol Scrub? Part of the maintenance burden
> > We don't personally care about NVDIMMS but would welcome drivers from others.
>
> Upstream would also welcome consideration of maintenance burden
> reduction before piling on, at least include *some* consideration of the
> implications vs this response that comes off as "that's somebody else's
> problem".
We can do analysis of whether the interfaces are suitable etc but
have no access to test hardware or emulation. I guess I can hack something
together easily enough. Today ndctl has some support. Interestingly the model
is different from typical volatile scrubbing as it's all on demand - that
could be easily wrapped up in a software scrub scheduler though, but we'd need
input from you and other Intel people on how this is actually used.
The use model is a lot less obvious than autonomous scrubbers - I assume because
the persistence means you need to do this rarely if at all (though ARS does
support scrubbing volatile memory on nvdimms)
So initial conclusion is it would need a few more controls or it needs
some software handling of scan scheduling to map it to the interface type
that is common to CXL and RAS2 scrub controls.
Intent of the comment was to keep scope somewhat confined, and to
invite others to get involved, not to rule out doing some light weight
analysis of whether this feature would work for another potential user
which we weren't even aware of until you mentioned it (thanks!).
>
> > Regarding RASF patrol scrub no one cared about it as it's useless and
> > any new implementation should be RAS2.
>
> The assertion that "RASF patrol scrub no one cared about it as it's
> useless and any new implementation should be RAS2" needs evidence.
>
> For example, what platforms are going to ship with RAS2 support, what
> are the implications of Linux not having RAS2 scrub support in a month,
> or in year? There are parts of the ACPI spec that have never been
> implemented what is the evidence that RAS2 is not going to suffer the
> same fate as RASF?
From discussions with various firmware folk we have a chicken and egg
situation on RAS2. They will stick to their custom solutions unless there is
plausible support in Linux for it - so right now it's a question mark
on roadmaps. Trying to get rid of that question mark is why Shiju and I
started looking at this in the first place. To get rid of that question
mark we don't necessarily need to have it upstream, but we do need
to be able to make the argument that there will be a solution ready
soon after they release the BIOS image. (Some distros will take years
to catch up though).
If anyone else an speak up on this point please do. Discussions and
feedback communicated to Shiju and I off list aren't going to
convince people :(
Negatives perhaps easier to give than positives given this is seen as
a potential feature for future platforms so may be confidential.
> There are parts of the CXL specification that have
> never been implemented in mass market products.
Obviously can't talk about who was involved in this feature
in it's definition, but I have strong confidence it will get implemented
for reasons I can point at on a public list.
a) There will be scrubbing on devices.
b) It will need control (evidence for this is the BIOS controls mentioned below
for equivalent main memory).
c) Hotplug means that control must be done by OS driver (or via very fiddly
pre hotplug hacks that I think we can all agree should not be necessary
and aren't even an option on all platforms)
d) No one likes custom solutions.
This isn't a fancy feature with a high level of complexity which helps.
Today there is the option for main memory of leaving it to BIOS parameters.
A quick google gave me some examples (to make sure they are public):
Dell: PowerEdge R640 BIOS and UEFI Reference Guide
- Memory patrol scrub - Sets the memory patrol scrub frequency.
HP UEFI System Utilities for HPE ProLiant Gen 11 SErvers
- Enabling or disable patrol scrub
Spec list of flags for lenovo systems (tells you that turning patrol scrub
off is a good idea ;)
Huawei Kunpeng 920 RAS config menu.
- Active Scrub, Active Scrub interval etc.
>
> > Previous discussions in the community about RASF and scrub could be find here.
> > https://lore.kernel.org/lkml/[email protected]/#r
> > and some old ones,
> > https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> >
>
> Do not make people hunt for old discussions, if there are useful points
> in that discussion that make the case for the patch set include those in
> the next submission, don't make people hunt for the latest state of the
> story.
Sure, more of an essay needed along with links given we are talking
about the views of others.
Quick summary from a reread of the linked threads.
AMD not implemented RASF/RAS2 yet - looking at it last year, but worried
about inflexibility of RAS2 spec today. They were looking at some spec
changes to improve this + other functions to be added to RAS2.
I agree with it being limited, but think extending with backwards
compatibility isn't a problem (and ACPI spec rules in theory guarantee
it won't break). I'm keen on working with current version
so that we can ensure the ABI design for CXL encompasses it.
Intel folk were cc'd but not said anything on that thread, but Tony Luck
did comment in Jiaqi Yan's software scrubbing discussion linked below.
He observed that a hardware implementation can be complex if doing range
based scrubbing due to interleave etc. RAS2 and CXL both side step this
somewhat by making it someone elses problem. In RAS2 the firmware gets
to program multiple scrubbers to cover the range requested. In CXL
for now this leaves the problem for userspace, but we can definitely
consider a region interface if it makes sense.
I'd also like to see inputs from a wider range of systems folk + other
CPU companies. How easy this is to implement is heavily dependent on
what entity in your system is responsible for this sort of runtime
service and that varies a lot.
>
> > https://lore.kernel.org/all/[email protected]/
>
> Yes, now that is a useful changelog, thank you for highlighting it,
> please follow its example.
It's not a changelog as such but a RFC in text only form.
However indeed lots of good info in there.
Jonathan
On 24/02/23 11:42AM, Dan Williams wrote:
> Shiju Jose wrote:
> > Hi Dan,
> >
> > Thanks for the feedback.
> >
> > Please find reply inline.
> >
> > >-----Original Message-----
> > >From: Dan Williams <[email protected]>
> > >Sent: 22 February 2024 00:21
> > >To: Shiju Jose <[email protected]>; [email protected]; linux-
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; Jonathan Cameron <[email protected]>;
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]
> > >Cc: [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected]; [email protected];
> > >[email protected]; [email protected];
> > >[email protected]; [email protected];
> > >tanxiaofei <[email protected]>; Zengtao (B) <[email protected]>;
> > >[email protected]; wanghuiqiang <[email protected]>;
> > >Linuxarm <[email protected]>; Shiju Jose <[email protected]>
> > >Subject: RE: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands,
> > >CXL device patrol scrub control and DDR5 ECS control features
> > >
> > >shiju.jose@ wrote:
> > >> From: Shiju Jose <[email protected]>
> > >>
> > >> 1. Add support for CXL feature mailbox commands.
> > >> 2. Add CXL device scrub driver supporting patrol scrub control and ECS
> > >> control features.
> > >> 3. Add scrub subsystem driver supports configuring memory scrubs in the
> > >system.
> > >> 4. Register CXL device patrol scrub and ECS with scrub subsystem.
> > >> 5. Add common library for RASF and RAS2 PCC interfaces.
> > >> 6. Add driver for ACPI RAS2 feature table (RAS2).
> > >> 7. Add memory RAS2 driver and register with scrub subsystem.
> > >
> > >I stepped away from this patch set to focus on the changes that landed for v6.8
> > >and the follow-on regression fixups. Now that v6.8 CXL work has quieted down
> > >and I circle back to this set for v6.9 I find the lack of story in this cover letter to
> > >be unsettling. As a reviewer I should not have to put together the story on why
> > >Linux should care about this feature and independently build up the
> > >maintainence-burden vs benefit tradeoff analysis.
> > I will add more details to the cover letter.
> >
> > >
> > >Maybe it is self evident to others, but for me there is little in these changelogs
> > >besides "mechanism exists, enable it". There are plenty of platform or device
> > >mechanisms that get specified that Linux does not enable for one reason or
> > >another.
> > >
> > >The cover letter needs to answer why it matters, and what are the tradeoffs.
> > >Mind you, in my submissions I do not always get this right in the cover letter [1],
> > >but hopefully at least one of the patches tells the story [2].
> > >
> > >In other words, imagine you are writing the pull request to Linus or someone
> > >else with limited time who needs to make a risk decision on a pull request with a
> > >diffstat of:
> > >
> > > 23 files changed, 3083 insertions(+)
> > >
> > >...where the easiest decision is to just decline. As is, these changelogs are not
> > >close to tipping the scale to "accept".
> > >
> > >[sidebar: how did this manage to implement a new subsystem with 2 consumers
> > >(CXL + ACPI), without modifying a single existing line? Zero deletions? That is
> > >either an indication that Linux perfectly anticipated this future use case
> > >(unlikely), or more work needs to be done to digest an integrate these concepts
> > >into existing code paths]
> > >
> > >One of the first questions for me is why CXL and RAS2 as the first consumers and
> > >not NVDIMM-ARS and/or RASF Patrol Scrub? Part of the maintenance burden
> > We don't personally care about NVDIMMS but would welcome drivers from others.
>
> Upstream would also welcome consideration of maintenance burden
> reduction before piling on, at least include *some* consideration of the
> implications vs this response that comes off as "that's somebody else's
> problem".
>
> > Regarding RASF patrol scrub no one cared about it as it's useless and
> > any new implementation should be RAS2.
>
> The assertion that "RASF patrol scrub no one cared about it as it's
> useless and any new implementation should be RAS2" needs evidence.
>
> For example, what platforms are going to ship with RAS2 support, what
> are the implications of Linux not having RAS2 scrub support in a month,
> or in year? There are parts of the ACPI spec that have never been
> implemented what is the evidence that RAS2 is not going to suffer the
> same fate as RASF? There are parts of the CXL specification that have
> never been implemented in mass market products.
>
> > Previous discussions in the community about RASF and scrub could be find here.
> > https://lore.kernel.org/lkml/[email protected]/#r
> > and some old ones,
> > https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> >
>
> Do not make people hunt for old discussions, if there are useful points
> in that discussion that make the case for the patch set include those in
> the next submission, don't make people hunt for the latest state of the
> story.
>
> > https://lore.kernel.org/all/[email protected]/
>
> Yes, now that is a useful changelog, thank you for highlighting it,
> please follow its example.
Just a comment that is not directed at the implementation details: at Micron we
see demand for the scrub control feature, so we do hope to see this support
go in sooner rather than later.
Regards,
John
Jonathan Cameron wrote:
Thanks for taking the time Jonathan, this really helps.
[..]
> We can do analysis of whether the interfaces are suitable etc but
> have no access to test hardware or emulation. I guess I can hack something
> together easily enough. Today ndctl has some support. Interestingly the model
> is different from typical volatile scrubbing as it's all on demand - that
> could be easily wrapped up in a software scrub scheduler though, but we'd need
> input from you and other Intel people on how this is actually used.
>
> The use model is a lot less obvious than autonomous scrubbers - I assume because
> the persistence means you need to do this rarely if at all (though ARS does
> support scrubbing volatile memory on nvdimms)
>
> So initial conclusion is it would need a few more controls or it needs
> some software handling of scan scheduling to map it to the interface type
> that is common to CXL and RAS2 scrub controls.
>
> Intent of the comment was to keep scope somewhat confined, and to
> invite others to get involved, not to rule out doing some light weight
> analysis of whether this feature would work for another potential user
> which we weren't even aware of until you mentioned it (thanks!).
Ok, Fair enough.
> > > Regarding RASF patrol scrub no one cared about it as it's useless and
> > > any new implementation should be RAS2.
> >
> > The assertion that "RASF patrol scrub no one cared about it as it's
> > useless and any new implementation should be RAS2" needs evidence.
> >
> > For example, what platforms are going to ship with RAS2 support, what
> > are the implications of Linux not having RAS2 scrub support in a month,
> > or in year? There are parts of the ACPI spec that have never been
> > implemented what is the evidence that RAS2 is not going to suffer the
> > same fate as RASF?
>
> From discussions with various firmware folk we have a chicken and egg
> situation on RAS2. They will stick to their custom solutions unless there is
> plausible support in Linux for it - so right now it's a question mark
> on roadmaps. Trying to get rid of that question mark is why Shiju and I
> started looking at this in the first place. To get rid of that question
> mark we don't necessarily need to have it upstream, but we do need
> to be able to make the argument that there will be a solution ready
> soon after they release the BIOS image. (Some distros will take years
> to catch up though).
>
> If anyone else an speak up on this point please do. Discussions and
> feedback communicated to Shiju and I off list aren't going to
> convince people :(
> Negatives perhaps easier to give than positives given this is seen as
> a potential feature for future platforms so may be confidential.
So one of the observations from efforts like RAS API [1] is that CXL is
definining mechanisms that others are using for non-CXL use cases. I.e.
a CXL-like mailbox that supports events is a generic transport that can
be used for many RAS scenarios not just CXL endpoints. It supplants
building new ACPI interfaces for these things because the expectation is
that an OS just repurposes its CXL Type-3 infrastructure to also drive
event collection for RAS API compliant devices in the topology.
[1]: https://www.opencompute.org/w/index.php?title=RAS_API_Workstream
So when considering whether Linux should build support for ACPI RASF,
ACPI RAS2, and / or Open Compute RAS API it is worthwile to ask if one
of those can supplant the others.
Speaking only for myself with my Linux kernel maintainer hat on, I am
much more attracted to proposals like RAS API where native drivers can
be deployed vs ACPI which brings ACPI static definition baggage and a
3rd component to manage. RAS API is kernel driver + device-firmware
while I assume ACPI RAS* is kernel ACPI driver + BIOS firmware +
device-firmware.
In other words, this patch proposal enables both CXL memscrub and ACPI
RAS2 memscrub. It asserts that nobody cares about ACPI RASF memscrub,
and your clarification asserts that RAS2 is basically dead until Linux
adopts it. So then the question becomes why should Linux breath air into
the ACPI RAS2 memscrub proposal when initiatives like RAS API exist?
The RAS API example seems to indicate that one way to get scrub support
for non-CXL memory controllers would be to reuse CXL memscrub
infrastructure. In a world where there is kernel mechanism to understand
CXL-like scrub mechanisms, why not nudge the industry in that direction
instead of continuing to build new and different ACPI mechanisms?
> > There are parts of the CXL specification that have
> > never been implemented in mass market products.
>
> Obviously can't talk about who was involved in this feature
> in it's definition, but I have strong confidence it will get implemented
> for reasons I can point at on a public list.
> a) There will be scrubbing on devices.
> b) It will need control (evidence for this is the BIOS controls mentioned below
> for equivalent main memory).
> c) Hotplug means that control must be done by OS driver (or via very fiddly
> pre hotplug hacks that I think we can all agree should not be necessary
> and aren't even an option on all platforms)
> d) No one likes custom solutions.
> This isn't a fancy feature with a high level of complexity which helps.
That does help, it would help even more if the maintenance burden of CXL
scrub precludes needing to carry the burden of other implementations.
[..]
> >
> > > Previous discussions in the community about RASF and scrub could be find here.
> > > https://lore.kernel.org/lkml/[email protected]/#r
> > > and some old ones,
> > > https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> > >
> >
> > Do not make people hunt for old discussions, if there are useful points
> > in that discussion that make the case for the patch set include those in
> > the next submission, don't make people hunt for the latest state of the
> > story.
>
> Sure, more of an essay needed along with links given we are talking
> about the views of others.
>
> Quick summary from a reread of the linked threads.
> AMD not implemented RASF/RAS2 yet - looking at it last year, but worried
> about inflexibility of RAS2 spec today. They were looking at some spec
> changes to improve this + other functions to be added to RAS2.
> I agree with it being limited, but think extending with backwards
> compatibility isn't a problem (and ACPI spec rules in theory guarantee
> it won't break). I'm keen on working with current version
> so that we can ensure the ABI design for CXL encompasses it.
>
> Intel folk were cc'd but not said anything on that thread, but Tony Luck
> did comment in Jiaqi Yan's software scrubbing discussion linked below.
> He observed that a hardware implementation can be complex if doing range
> based scrubbing due to interleave etc. RAS2 and CXL both side step this
> somewhat by making it someone elses problem. In RAS2 the firmware gets
> to program multiple scrubbers to cover the range requested. In CXL
> for now this leaves the problem for userspace, but we can definitely
> consider a region interface if it makes sense.
>
> I'd also like to see inputs from a wider range of systems folk + other
> CPU companies. How easy this is to implement is heavily dependent on
> what entity in your system is responsible for this sort of runtime
> service and that varies a lot.
This answers my main question of whether RAS2 is a done deal with
shipping platforms making it awkward for Linux to *not* support RAS2, or
if this is the start of an industry conversation that wants some Linux
ecosystem feedback. It sounds more like the latter.
> > > https://lore.kernel.org/all/[email protected]/
> >
> > Yes, now that is a useful changelog, thank you for highlighting it,
> > please follow its example.
>
> It's not a changelog as such but a RFC in text only form.
> However indeed lots of good info in there.
>
> Jonathan
Thanks again for taking the time Jonathan.
> Obviously can't talk about who was involved in this feature
> in it's definition, but I have strong confidence it will get implemented
> for reasons I can point at on a public list.
> a) There will be scrubbing on devices.
> b) It will need control (evidence for this is the BIOS controls mentioned below
> for equivalent main memory).
> c) Hotplug means that control must be done by OS driver (or via very fiddly
> pre hotplug hacks that I think we can all agree should not be necessary
> and aren't even an option on all platforms)
> d) No one likes custom solutions.
> This isn't a fancy feature with a high level of complexity which helps.
But how will users know what are appropriate scrubbing
parameters for these devices?
Car analogy: Fuel injection systems on internal combustion engines
have tweakable controls. But no auto manufacturer wires them up to
a user accessible dashboad control.
Back to computers:
I'd expect the OEMs that produce memory devices to set appropriate
scrubbing rates based on their internal knowledge of the components
used in construction.
What is the use case where some user would need to override these
parameters and scrub and a faster/slower rate than that set by the
manufacturer?
-Tony
On Thu, 29 Feb 2024 12:41:53 -0800
Tony Luck <[email protected]> wrote:
> > Obviously can't talk about who was involved in this feature
> > in it's definition, but I have strong confidence it will get implemented
> > for reasons I can point at on a public list.
> > a) There will be scrubbing on devices.
> > b) It will need control (evidence for this is the BIOS controls mentioned below
> > for equivalent main memory).
> > c) Hotplug means that control must be done by OS driver (or via very fiddly
> > pre hotplug hacks that I think we can all agree should not be necessary
> > and aren't even an option on all platforms)
> > d) No one likes custom solutions.
> > This isn't a fancy feature with a high level of complexity which helps.
Hi Tony,
>
> But how will users know what are appropriate scrubbing
> parameters for these devices?
>
> Car analogy: Fuel injection systems on internal combustion engines
> have tweakable controls. But no auto manufacturer wires them up to
> a user accessible dashboad control.
Good analogy - I believe performance tuning 3rd parties will change
them for you. So the controls are used - be it not by every user.
>
> Back to computers:
>
> I'd expect the OEMs that produce memory devices to set appropriate
> scrubbing rates based on their internal knowledge of the components
> used in construction.
Absolutely agree that they will set a default / baseline value,
but reality is that 'everyone' (for the first few OEMs I googled)
exposes tuning controls in their shipping BIOS menus to configure
this because there are users who want to change it. I'd expect
them to clamp the minimum scrub frequency to something that avoids
them getting hardware returned on mass for reliability and the
maximum at whatever ensures the perf is good enough that they sell
hardware in the first place. I'd also expect a bios menu to
allow cloud hosts etc to turn off exposing RAS2 or similar.
>
> What is the use case where some user would need to override these
> parameters and scrub and a faster/slower rate than that set by the
> manufacturer?
Its a performance vs reliability trade off. If your larger scale
architecture (many servers) requires a few nodes to be super stable you
will pay pretty much any cost to keep them running. If a single node
failure makes little or no difference, you'll happily crank this down
(same with refresh) in order to save some power / get a small
performance lift. Or if you care about latency tails, more than
reliability you'll turn this off.
For comedy value, some BIOS guides point out that leaving scrub on may
affect performance benchmarking. Obviously not a good data point, but
a hint at the sort of market that cares. Same market that buy cheaper
RAM knowing they are going to have more system crashes.
There is probably a description gap. That might be a paperwork
question as part of system specification.
What is relationship between scrub rate and error rate under particular
styles of workload (because you get a free scrub whenever you access
the memory)? The RAM dimms themselves could in theory provide inputs
but the workload dependence makes this hard. Probably fallback on a
a test and tune loop over very long runs. Single bit error rates
used to detect when getting below a level people are happy with for
instance.
With the fancier units that can be supported, you can play more reliable
memory games by scanning subsets of the memory more frequently.
Though it was about a kernel daemon doing scrub, Jiaqi's RFC document here
https://lore.kernel.org/all/[email protected]/
provided justification for on demand scrub - some interesting stuff in the
bit on hardware patrol scrubbing. I see you commented on the thread
and complexity of hardware solutions.
- Cheap memory makes this all more important.
- Need for configuration of how fast and when depending on system state.
- Lack of flexibility of what is scanned (RAS2 provides some by association
with NUMA node + option to request particular ranges, CXL provides per
end point controls).
There are some gaps on hardware scrubbers, but offloading this problem
definitely attractive.
So my understanding is there is demand to tune this but it won't be exposed
on every system.
Jonathan
>
> -Tony
>
> > > > Regarding RASF patrol scrub no one cared about it as it's useless and
> > > > any new implementation should be RAS2.
> > >
> > > The assertion that "RASF patrol scrub no one cared about it as it's
> > > useless and any new implementation should be RAS2" needs evidence.
> > >
> > > For example, what platforms are going to ship with RAS2 support, what
> > > are the implications of Linux not having RAS2 scrub support in a month,
> > > or in year? There are parts of the ACPI spec that have never been
> > > implemented what is the evidence that RAS2 is not going to suffer the
> > > same fate as RASF?
> >
> > From discussions with various firmware folk we have a chicken and egg
> > situation on RAS2. They will stick to their custom solutions unless there is
> > plausible support in Linux for it - so right now it's a question mark
> > on roadmaps. Trying to get rid of that question mark is why Shiju and I
> > started looking at this in the first place. To get rid of that question
> > mark we don't necessarily need to have it upstream, but we do need
> > to be able to make the argument that there will be a solution ready
> > soon after they release the BIOS image. (Some distros will take years
> > to catch up though).
> >
> > If anyone else an speak up on this point please do. Discussions and
> > feedback communicated to Shiju and I off list aren't going to
> > convince people :(
> > Negatives perhaps easier to give than positives given this is seen as
> > a potential feature for future platforms so may be confidential.
>
> So one of the observations from efforts like RAS API [1] is that CXL is
> definining mechanisms that others are using for non-CXL use cases. I.e.
> a CXL-like mailbox that supports events is a generic transport that can
> be used for many RAS scenarios not just CXL endpoints. It supplants
> building new ACPI interfaces for these things because the expectation is
> that an OS just repurposes its CXL Type-3 infrastructure to also drive
> event collection for RAS API compliant devices in the topology.
>
> [1]: https://www.opencompute.org/w/index.php?title=RAS_API_Workstream
>
> So when considering whether Linux should build support for ACPI RASF,
> ACPI RAS2, and / or Open Compute RAS API it is worthwile to ask if one
> of those can supplant the others.
RAS API is certainly interesting but the bit of the discussion
that matters here will equally apply to CXL RAS controls as of today
(will ship before OCP) and Open Compute's RAS API (sometime in the future).
The subsystem presented here was to address the "show us your code" that
was inevitable feedback if we'd gone for a discussion Doc style RFC.
What really matters here is whether a common ABI is necessary and what
it looks like.
Not even the infrastructure, just whether it's sysfs and what the controls)
Sure there is less code if it all looks like that CXL get feature,
but not that much less. + I'm hoping we'll also end up sharing with
the various embedded device solutions out there today.
I notice a few familiar names in the meeting recordings. Anyone want
to provide a summary of overlap etc and likely end result?
I scan read the docs and caught up with some meetings at high speed,
but that's not the same as day to day involvement in the spec development.
Maybe the lesson to take away from this is a more general interface is
needed incorporating scrub control (which at this stage is probably
just a name change!)
I see that patrol scrub is on the RAS actions list which is great.
>
> Speaking only for myself with my Linux kernel maintainer hat on, I am
> much more attracted to proposals like RAS API where native drivers can
> be deployed vs ACPI which brings ACPI static definition baggage and a
> 3rd component to manage. RAS API is kernel driver + device-firmware
> while I assume ACPI RAS* is kernel ACPI driver + BIOS firmware +
> device-firmware.
Not really. The only thing needed from BIOS firmware is a static table
to OS to describe where to find the hardware (RAS2 is a header and (1+)
pointers to the PCCT table entry that tells you where the mailbox(s)
(PCC Channel) are and their interrupts etc. It's all of 48 bytes of
static data to parse.
Could have been done that in DSDT (where you will find other PCC channels
as many methods can use them under the hood to chat to firmware + there
are some other users where they are the only option) but my guess is
assumption is RAS might be needed pre AML interpreter so it's a static table.
A PCC channel is the ACPI spec standard mailbox design (well several
options for how to do it, but given the code is upstream and in use
for other purposes, no new maintenance burden for us :)
PCC channels can be shared resources handling multiple protocols.
They are used for various other things where the OS needs
to talk to firmware and have been upstream for a while.
ACPI driver --------<PCC Mailbox>---> Device Firmware
vs
RAS API Driver-----<CXL Mailbox>----> Device Firmware
or
CXL Driver --------<CXL Mailbox>----> Device Firmware
The new complexity (much like the CXL solution) lies in the
control protocol sent over the mailbox (which is pretty simple!)
Some of the complexity in the driver is left over from earlier
version doing RASF and RAS2 so we'll flatten that layering out
and it'll be even simpler in next RFC and perhaps not hint at
false complexity or maintenance burden.
The only significant burden I really see form incorporating RAS2
is the need for an interface that works for both (very similar)
configuration control sets. Once that is defined we need to support
the ABI for ever anyway so may be sysfs attribute of extra ABI to
support in current design?
>
> In other words, this patch proposal enables both CXL memscrub and ACPI
> RAS2 memscrub. It asserts that nobody cares about ACPI RASF memscrub,
> and your clarification asserts that RAS2 is basically dead until Linux
> adopts it. So then the question becomes why should Linux breath air into
> the ACPI RAS2 memscrub proposal when initiatives like RAS API exist?
A fair question and one where I'm looking for inputs from others.
However I think you may be assuming a lot more than is actually
involved in the RAS2 approach - see below.
>
> The RAS API example seems to indicate that one way to get scrub support
> for non-CXL memory controllers would be to reuse CXL memscrub
> infrastructure. In a world where there is kernel mechanism to understand
> CXL-like scrub mechanisms, why not nudge the industry in that direction
> instead of continuing to build new and different ACPI mechanisms?
There may be some shared elements of course (and it seems the RAS API
stuff has severak sets of proposals for interfacing approaches), but ultimately
a RAS API element still hangs off something that isn't a CXL device, so
still demands some common infrastructure (e.g. a class or similar) or
we are going to find the RAS tools buried under a bunch of different individual
drivers.
1) Maybe shared for system components (maybe not from some of the diagrams!)
But likely 1 interface per socket. Probably PCI, but maybe platform devices
(I'd not be surprised to see a PCC channel type added for this mailbox)
/sys/bus/pci/devices/pcixxx/rasstuff/etc
2) CXL devices say /sys/bus/cxl/devices/mem0/rasstuff/etc.
3) Other system components such as random PCI drivers.
Like other cases of common infrastructure, I'd argue for a nice class with
the devices parentage linking back to the underlying EP driver.
/sys/class/ras/ras0 parent -> /sys/bus/cxl/devices/mem0/
/sys/class/ras/ras1 parent -> /sys/bus/pci/device/pcixxx/ RAS API device.
etc
Same as if we had a bunch of devices that happened to have an LED on them
and wanted common userspace controls so registered with /sys/class/led
So to me RAS API looks like another user of this proposal that indeed
shares a bunch of common code with the CXL driver stack (hopefully they'll
move to PCI MMPT from current definition based on CXL 2.0 mailbox so the
discoverability isn't CXL spec based. (I may not have latest version of course!)
>
> > > There are parts of the CXL specification that have
> > > never been implemented in mass market products.
> >
> > Obviously can't talk about who was involved in this feature
> > in it's definition, but I have strong confidence it will get implemented
> > for reasons I can point at on a public list.
> > a) There will be scrubbing on devices.
> > b) It will need control (evidence for this is the BIOS controls mentioned below
> > for equivalent main memory).
> > c) Hotplug means that control must be done by OS driver (or via very fiddly
> > pre hotplug hacks that I think we can all agree should not be necessary
> > and aren't even an option on all platforms)
> > d) No one likes custom solutions.
> > This isn't a fancy feature with a high level of complexity which helps.
>
> That does help, it would help even more if the maintenance burden of CXL
> scrub precludes needing to carry the burden of other implementations.
I think we disagree on whether the burden is significant - sure
we can spin interfaces differently to make it easier for CXL and we can
just stick it on the individual endpoints for now.
Key here is ABI, I don't really care about whether we wrap it up in a subsystem
(mostly we do that to enforce compliance with the ABI design as easier than
reviewing against a document!)
I want to see userspace ABI that is general enough to extend to other
devices and doesn't require a horrible hydra of a userspace program on top
of incompatible controls because everyone wanted to do it slightly
differently. The exercise of including RAS2 (and earlier RASF which
we dropped) was about establishing commonality and I think that was very
useful.
I'm reluctant to say it will never be necessary to support RAS2 (because
I want to see solutions well before anyone will have built OCPs proposal
and RAS2 works on many of today's systems with a small amount of firmware
work, many have existing PCC channels to appropriate management controllers
and as I understand it non standard interfaces to control the scrubbing
engines).
So I think not considering an ABI that is designed to be general is just
storing up pain for us in the future.
I'm not sure the design we have here is the right one which is why it
was an RFC :)
>
> [..]
> > >
> > > > Previous discussions in the community about RASF and scrub could be find here.
> > > > https://lore.kernel.org/lkml/[email protected]/#r
> > > > and some old ones,
> > > > https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> > > >
> > >
> > > Do not make people hunt for old discussions, if there are useful points
> > > in that discussion that make the case for the patch set include those in
> > > the next submission, don't make people hunt for the latest state of the
> > > story.
> >
> > Sure, more of an essay needed along with links given we are talking
> > about the views of others.
> >
> > Quick summary from a reread of the linked threads.
> > AMD not implemented RASF/RAS2 yet - looking at it last year, but worried
> > about inflexibility of RAS2 spec today. They were looking at some spec
> > changes to improve this + other functions to be added to RAS2.
> > I agree with it being limited, but think extending with backwards
> > compatibility isn't a problem (and ACPI spec rules in theory guarantee
> > it won't break). I'm keen on working with current version
> > so that we can ensure the ABI design for CXL encompasses it.
> >
> > Intel folk were cc'd but not said anything on that thread, but Tony Luck
> > did comment in Jiaqi Yan's software scrubbing discussion linked below.
> > He observed that a hardware implementation can be complex if doing range
> > based scrubbing due to interleave etc. RAS2 and CXL both side step this
> > somewhat by making it someone elses problem. In RAS2 the firmware gets
> > to program multiple scrubbers to cover the range requested. In CXL
> > for now this leaves the problem for userspace, but we can definitely
> > consider a region interface if it makes sense.
> >
> > I'd also like to see inputs from a wider range of systems folk + other
> > CPU companies. How easy this is to implement is heavily dependent on
> > what entity in your system is responsible for this sort of runtime
> > service and that varies a lot.
>
> This answers my main question of whether RAS2 is a done deal with
> shipping platforms making it awkward for Linux to *not* support RAS2, or
> if this is the start of an industry conversation that wants some Linux
> ecosystem feedback. It sounds more like the latter.
I'll let others speak up on this as I was presenting on my current outlook
and understand others are much further down the path.
>
> > > > https://lore.kernel.org/all/[email protected]/
> > >
> > > Yes, now that is a useful changelog, thank you for highlighting it,
> > > please follow its example.
> >
> > It's not a changelog as such but a RFC in text only form.
> > However indeed lots of good info in there.
> >
> > Jonathan
>
> Thanks again for taking the time Jonathan.
>
You are welcome and thanks for all the questions / pointers.
Jonathan
On Fri, Mar 1, 2024 at 6:42 AM Jonathan Cameron
<[email protected]> wrote:
>
>
> >
> > > > > Regarding RASF patrol scrub no one cared about it as it's useless and
> > > > > any new implementation should be RAS2.
> > > >
> > > > The assertion that "RASF patrol scrub no one cared about it as it's
> > > > useless and any new implementation should be RAS2" needs evidence.
> > > >
> > > > For example, what platforms are going to ship with RAS2 support, what
> > > > are the implications of Linux not having RAS2 scrub support in a month,
> > > > or in year? There are parts of the ACPI spec that have never been
> > > > implemented what is the evidence that RAS2 is not going to suffer the
> > > > same fate as RASF?
> > >
> > > From discussions with various firmware folk we have a chicken and egg
> > > situation on RAS2. They will stick to their custom solutions unless there is
> > > plausible support in Linux for it - so right now it's a question mark
> > > on roadmaps. Trying to get rid of that question mark is why Shiju and I
> > > started looking at this in the first place. To get rid of that question
> > > mark we don't necessarily need to have it upstream, but we do need
> > > to be able to make the argument that there will be a solution ready
> > > soon after they release the BIOS image. (Some distros will take years
> > > to catch up though).
> > >
> > > If anyone else an speak up on this point please do. Discussions and
> > > feedback communicated to Shiju and I off list aren't going to
> > > convince people :(
> > > Negatives perhaps easier to give than positives given this is seen as
> > > a potential feature for future platforms so may be confidential.
> >
> > So one of the observations from efforts like RAS API [1] is that CXL is
> > definining mechanisms that others are using for non-CXL use cases. I.e.
> > a CXL-like mailbox that supports events is a generic transport that can
> > be used for many RAS scenarios not just CXL endpoints. It supplants
> > building new ACPI interfaces for these things because the expectation is
> > that an OS just repurposes its CXL Type-3 infrastructure to also drive
> > event collection for RAS API compliant devices in the topology.
Thanks Dan for bringing up [1]. After sending out [2], the proposal of
an in-kernel "memory scrubber" implemented in **software**, I actually
paid more attention to the hardware patro scrubber given it is (at
least should be) programmable in runtime with RASF and RAS2 as
HORIGUCHI pointed out[3]. The idea is, if hardware patrol scrubber can
become as flexible as software, why let the software waste the CPU
cycles and membw? Then I attempted to
1. define the features required to scrub memory efficiently and flexibly
2. get that list of features **standardized** by engaging hw vendors
who make patrol scrubbers in their chips
3. design a Linux interface so that user space can drive **both
software and hardware scrubber** (Turns out software scrubber still
has an advantage over hardware and I will cover it later.)
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/[email protected]/
The difference between my work and Shiju+Jonathan's RFC is: taking a
bet on RAS2, #1 and #2 is not a problem for them.
I am under the impression that RAS2 is probably going to suffer from
RASF, but at least I hope Shiju/Jonanthan's API can be compatible with
the sw scrubber that I planned to upstream (made some suggestions[4]
for this purpose).
[4] https://lore.kernel.org/linux-mm/CACw3F539gZc0FoJLo6VvYSyZmeWZ3Pbec7AzsH+MYUJJNzQbUQ@mail.gmail.com/
**However**, ...
> >
> > [1]: https://www.opencompute.org/w/index.php?title=RAS_API_Workstream
.. our hardware RAS team (better expert on fault management and hw
reliability) strongly pushed back some of my proposed feature list and
pointed me exactly to [1].
From my understanding after talking to our RAS people, Open Compute
RAS API is the future to move on. Meanwhile everyone should probably
start to forget about RAS2. The memory interleave issue [5] in RASF
seemed to be carried over to RAS2, making vendors reluctant to adopt.
[5] https://lore.kernel.org/all/SJ1PR11MB6083BF93E9A88E659CED5EC4FC3F9@SJ1PR11MB6083.namprd11.prod.outlook.com/
> >
> > So when considering whether Linux should build support for ACPI RASF,
> > ACPI RAS2, and / or Open Compute RAS API it is worthwile to ask if one
> > of those can supplant the others.
I think Open Compute RAS API and RAS2/RASF are probably *incompatible*
from their core. Open Compute RAS API "will likely block any OS access
to the patrol scrubber", in the context of Open Compute RAS API's out
of band solution. While userspace has the need to use patrol scrubber,
"the OS doesn't understand memory enough to drive it".
Taking STOP_PATROL_SCRUBBER as example, while it may make sense to
users, stopping patrol scrubber is unacceptable for platform where OEM
has enabled patrol scrubber, because the patrol scrubber is a key part
of logging and is repurposed for other RAS actions. So from Open
Compute RAS API's perspective, STOP_PATROL_SCRUBBER from RAS2 must be
blocked and, tbh must not be exposed to OS/userspace at all.
"Requested Address Range"/"Actual Address Range" (region to scrub) is
a similarly bad thing to expose in RAS2.
But I am still seeking a common ground between Open Compute RAS API
and my feature wishlist of flexible and efficient patrol scrubber, by
taking a step back and ...
>
> RAS API is certainly interesting but the bit of the discussion
> that matters here will equally apply to CXL RAS controls as of today
> (will ship before OCP) and Open Compute's RAS API (sometime in the future).
>
> The subsystem presented here was to address the "show us your code" that
> was inevitable feedback if we'd gone for a discussion Doc style RFC.
>
> What really matters here is whether a common ABI is necessary and what
> it looks like.
> Not even the infrastructure, just whether it's sysfs and what the controls)
> Sure there is less code if it all looks like that CXL get feature,
> but not that much less. + I'm hoping we'll also end up sharing with
> the various embedded device solutions out there today.
>
> I notice a few familiar names in the meeting recordings. Anyone want
> to provide a summary of overlap etc and likely end result?
> I scan read the docs and caught up with some meetings at high speed,
> but that's not the same as day to day involvement in the spec development.
> Maybe the lesson to take away from this is a more general interface is
> needed incorporating scrub control (which at this stage is probably
> just a name change!)
>
> I see that patrol scrub is on the RAS actions list which is great.
>
> >
> > Speaking only for myself with my Linux kernel maintainer hat on, I am
> > much more attracted to proposals like RAS API where native drivers can
> > be deployed vs ACPI which brings ACPI static definition baggage and a
> > 3rd component to manage. RAS API is kernel driver + device-firmware
> > while I assume ACPI RAS* is kernel ACPI driver + BIOS firmware +
> > device-firmware.
>
> Not really. The only thing needed from BIOS firmware is a static table
> to OS to describe where to find the hardware (RAS2 is a header and (1+)
> pointers to the PCCT table entry that tells you where the mailbox(s)
> (PCC Channel) are and their interrupts etc. It's all of 48 bytes of
> static data to parse.
>
> Could have been done that in DSDT (where you will find other PCC channels
> as many methods can use them under the hood to chat to firmware + there
> are some other users where they are the only option) but my guess is
> assumption is RAS might be needed pre AML interpreter so it's a static table.
>
> A PCC channel is the ACPI spec standard mailbox design (well several
> options for how to do it, but given the code is upstream and in use
> for other purposes, no new maintenance burden for us :)
> PCC channels can be shared resources handling multiple protocols.
> They are used for various other things where the OS needs
> to talk to firmware and have been upstream for a while.
>
>
> ACPI driver --------<PCC Mailbox>---> Device Firmware
> vs
> RAS API Driver-----<CXL Mailbox>----> Device Firmware
> or
> CXL Driver --------<CXL Mailbox>----> Device Firmware
>
> The new complexity (much like the CXL solution) lies in the
> control protocol sent over the mailbox (which is pretty simple!)
>
> Some of the complexity in the driver is left over from earlier
> version doing RASF and RAS2 so we'll flatten that layering out
> and it'll be even simpler in next RFC and perhaps not hint at
> false complexity or maintenance burden.
>
> The only significant burden I really see form incorporating RAS2
> is the need for an interface that works for both (very similar)
> configuration control sets. Once that is defined we need to support
> the ABI for ever anyway so may be sysfs attribute of extra ABI to
> support in current design?
>
>
> >
> > In other words, this patch proposal enables both CXL memscrub and ACPI
> > RAS2 memscrub. It asserts that nobody cares about ACPI RASF memscrub,
> > and your clarification asserts that RAS2 is basically dead until Linux
> > adopts it. So then the question becomes why should Linux breath air into
> > the ACPI RAS2 memscrub proposal when initiatives like RAS API exist?
>
> A fair question and one where I'm looking for inputs from others.
>
> However I think you may be assuming a lot more than is actually
> involved in the RAS2 approach - see below.
>
> >
> > The RAS API example seems to indicate that one way to get scrub support
> > for non-CXL memory controllers would be to reuse CXL memscrub
> > infrastructure. In a world where there is kernel mechanism to understand
> > CXL-like scrub mechanisms, why not nudge the industry in that direction
> > instead of continuing to build new and different ACPI mechanisms?
>
> There may be some shared elements of course (and it seems the RAS API
> stuff has severak sets of proposals for interfacing approaches), but ultimately
> a RAS API element still hangs off something that isn't a CXL device, so
> still demands some common infrastructure (e.g. a class or similar) or
> we are going to find the RAS tools buried under a bunch of different individual
> drivers.
> 1) Maybe shared for system components (maybe not from some of the diagrams!)
> But likely 1 interface per socket. Probably PCI, but maybe platform devices
> (I'd not be surprised to see a PCC channel type added for this mailbox)
> /sys/bus/pci/devices/pcixxx/rasstuff/etc
> 2) CXL devices say /sys/bus/cxl/devices/mem0/rasstuff/etc.
> 3) Other system components such as random PCI drivers.
>
> Like other cases of common infrastructure, I'd argue for a nice class with
> the devices parentage linking back to the underlying EP driver.
> /sys/class/ras/ras0 parent -> /sys/bus/cxl/devices/mem0/
> /sys/class/ras/ras1 parent -> /sys/bus/pci/device/pcixxx/ RAS API device.
> etc
>
> Same as if we had a bunch of devices that happened to have an LED on them
> and wanted common userspace controls so registered with /sys/class/led
>
> So to me RAS API looks like another user of this proposal that indeed
> shares a bunch of common code with the CXL driver stack (hopefully they'll
> move to PCI MMPT from current definition based on CXL 2.0 mailbox so the
> discoverability isn't CXL spec based. (I may not have latest version of course!)
>
> >
> > > > There are parts of the CXL specification that have
> > > > never been implemented in mass market products.
> > >
> > > Obviously can't talk about who was involved in this feature
> > > in it's definition, but I have strong confidence it will get implemented
> > > for reasons I can point at on a public list.
> > > a) There will be scrubbing on devices.
> > > b) It will need control (evidence for this is the BIOS controls mentioned below
> > > for equivalent main memory).
> > > c) Hotplug means that control must be done by OS driver (or via very fiddly
> > > pre hotplug hacks that I think we can all agree should not be necessary
> > > and aren't even an option on all platforms)
> > > d) No one likes custom solutions.
> > > This isn't a fancy feature with a high level of complexity which helps.
> >
> > That does help, it would help even more if the maintenance burden of CXL
> > scrub precludes needing to carry the burden of other implementations.
>
> I think we disagree on whether the burden is significant - sure
> we can spin interfaces differently to make it easier for CXL and we can
> just stick it on the individual endpoints for now.
>
> Key here is ABI, I don't really care about whether we wrap it up in a subsystem
> (mostly we do that to enforce compliance with the ABI design as easier than
> reviewing against a document!)
>
> I want to see userspace ABI that is general enough to extend to other
> devices and doesn't require a horrible hydra of a userspace program on top
> of incompatible controls because everyone wanted to do it slightly
> differently. The exercise of including RAS2 (and earlier RASF which
> we dropped) was about establishing commonality and I think that was very
> useful.
>
> I'm reluctant to say it will never be necessary to support RAS2 (because
> I want to see solutions well before anyone will have built OCPs proposal
> and RAS2 works on many of today's systems with a small amount of firmware
> work, many have existing PCC channels to appropriate management controllers
> and as I understand it non standard interfaces to control the scrubbing
> engines).
>
> So I think not considering an ABI that is designed to be general is just
> storing up pain for us in the future.
considering user demands. In the end hardware is implemented to
fulfill buyer's needs. I think I should just ask for both
valuable-to-customer and compatible-with-"Open Compute RAS API"
features and drop others (for example, stop is dropped). Given my role
in cloud provider, I wanted to start with two key features:
* Adjust the speed of the scrubbing within vendor defined range.
Adding to Jonathan's reply[6] to Tony, in cloud only the control brain
of the fleet running in userspace or remote knows when performance is
important (a customer VM is present on the host) and when reliability
is important (host is idle but should be well-prepared to serve
customers without memory errors).
* Granularity. Per memory controller control granularity is
unrealistic. How about per NUMA node? Starting with a whole host is
also acceptable right now.
Does these 2 feature requests make sense to people here, and should be
covered by Open Compute RAS API (if they are not)? If both are, then
#1 and #2 are solved by Open Compute RAS API and kernel developer can
start to design OS API.
[6] https://lore.kernel.org/linux-mm/[email protected]/T/#m8d0b0737e2e5704529cc13b55008710a928b62b8
>
> I'm not sure the design we have here is the right one which is why it
> was an RFC :)
>
> >
> > [..]
> > > >
> > > > > Previous discussions in the community about RASF and scrub could be find here.
> > > > > https://lore.kernel.org/lkml/[email protected]/#r
> > > > > and some old ones,
> > > > > https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> > > > >
> > > >
> > > > Do not make people hunt for old discussions, if there are useful points
> > > > in that discussion that make the case for the patch set include those in
> > > > the next submission, don't make people hunt for the latest state of the
> > > > story.
> > >
> > > Sure, more of an essay needed along with links given we are talking
> > > about the views of others.
> > >
> > > Quick summary from a reread of the linked threads.
> > > AMD not implemented RASF/RAS2 yet - looking at it last year, but worried
> > > about inflexibility of RAS2 spec today. They were looking at some spec
> > > changes to improve this + other functions to be added to RAS2.
> > > I agree with it being limited, but think extending with backwards
> > > compatibility isn't a problem (and ACPI spec rules in theory guarantee
> > > it won't break). I'm keen on working with current version
> > > so that we can ensure the ABI design for CXL encompasses it.
> > >
> > > Intel folk were cc'd but not said anything on that thread, but Tony Luck
> > > did comment in Jiaqi Yan's software scrubbing discussion linked below.
> > > He observed that a hardware implementation can be complex if doing range
> > > based scrubbing due to interleave etc. RAS2 and CXL both side step this
> > > somewhat by making it someone elses problem. In RAS2 the firmware gets
> > > to program multiple scrubbers to cover the range requested. In CXL
> > > for now this leaves the problem for userspace, but we can definitely
> > > consider a region interface if it makes sense.
> > >
> > > I'd also like to see inputs from a wider range of systems folk + other
> > > CPU companies. How easy this is to implement is heavily dependent on
> > > what entity in your system is responsible for this sort of runtime
> > > service and that varies a lot.
> >
> > This answers my main question of whether RAS2 is a done deal with
> > shipping platforms making it awkward for Linux to *not* support RAS2, or
> > if this is the start of an industry conversation that wants some Linux
> > ecosystem feedback. It sounds more like the latter.
>
> I'll let others speak up on this as I was presenting on my current outlook
> and understand others are much further down the path.
>
> >
> > > > > https://lore.kernel.org/all/[email protected]/
Last thing about the advantage of the above in-kernel "software
scrubber" vs hardware patrol scrubber. A better way to prevent +
detect memory errors is do a write, followed by a read op. The write
op is very difficult, if not impossible, to be fulfilled by hardware
patrol scrubber at **OS runtime** (during boot time as some sort of
memory test is possible); there must be some negotiations with
userspace. The hw won't even know if a page is free to write (not used
by anything in OS). But I think this write-read-then-check idea is
feasible in the software scrubber. That's why I want a general memory
scrub kernel API for both software and patrol scrubber.
> > > >
> > > > Yes, now that is a useful changelog, thank you for highlighting it,
> > > > please follow its example.
> > >
> > > It's not a changelog as such but a RFC in text only form.
Hopefully the software solution can still be attractive to upstream
after people now pay more attention to the hardware solution (and I
hope to send out new RFC with code).
> > > However indeed lots of good info in there.
> > >
> > > Jonathan
> >
> > Thanks again for taking the time Jonathan.
> >
> You are welcome and thanks for all the questions / pointers.
>
> Jonathan
>
>
Thanks,
Jiaqi
On Thu, Feb 15, 2024 at 07:14:43PM +0800, [email protected] wrote:
> From: Shiju Jose <[email protected]>
>
> Add support for GET_SUPPORTED_FEATURES mailbox command.
>
> CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
> CXL devices supports features with changeable attributes.
> Get Supported Features retrieves the list of supported device specific
> features. The settings of a feature can be retrieved using Get Feature
> and optionally modified using Set Feature.
>
> Signed-off-by: Shiju Jose <[email protected]>
> ---
> drivers/cxl/core/mbox.c | 23 +++++++++++++++
> drivers/cxl/cxlmem.h | 62 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 85 insertions(+)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 27166a411705..191f51f3df0e 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1290,6 +1290,29 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_set_timestamp, CXL);
>
> +int cxl_get_supported_features(struct cxl_memdev_state *mds,
> + struct cxl_mbox_get_supp_feats_in *pi,
> + void *feats_out)
> +{
> + struct cxl_mbox_cmd mbox_cmd;
> + int rc;
> +
> + mbox_cmd = (struct cxl_mbox_cmd) {
> + .opcode = CXL_MBOX_OP_GET_SUPPORTED_FEATURES,
> + .size_in = sizeof(*pi),
> + .payload_in = pi,
> + .size_out = le32_to_cpu(pi->count),
> + .payload_out = feats_out,
> + .min_out = sizeof(struct cxl_mbox_get_supp_feats_out),
> + };
> + rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> + if (rc < 0)
> + return rc;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
> +
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> struct cxl_region *cxlr)
> {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 5303d6942b88..23e4d98b9bae 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -529,6 +529,7 @@ enum cxl_opcode {
> CXL_MBOX_OP_SET_TIMESTAMP = 0x0301,
> CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
> CXL_MBOX_OP_GET_LOG = 0x0401,
> + CXL_MBOX_OP_GET_SUPPORTED_FEATURES = 0x0500,
> CXL_MBOX_OP_IDENTIFY = 0x4000,
> CXL_MBOX_OP_GET_PARTITION_INFO = 0x4100,
> CXL_MBOX_OP_SET_PARTITION_INFO = 0x4101,
> @@ -698,6 +699,64 @@ struct cxl_mbox_set_timestamp_in {
>
> } __packed;
>
> +/* Get Supported Features CXL 3.1 Spec 8.2.9.6.1 */
In current code, block comments starts with /* and the real comments go
from the second line.
Fan
> +/*
> + * Get Supported Features input payload
> + * CXL rev 3.1 section 8.2.9.6.1 Table 8-95
> + */
> +struct cxl_mbox_get_supp_feats_in {
> + __le32 count;
> + __le16 start_index;
> + u16 reserved;
> +} __packed;
> +
> +/*
> + * Get Supported Features Supported Feature Entry
> + * CXL rev 3.1 section 8.2.9.6.1 Table 8-97
> + */
> +/* Supported Feature Entry : Payload out attribute flags */
> +#define CXL_FEAT_ENTRY_FLAG_CHANGABLE BIT(0)
> +#define CXL_FEAT_ENTRY_FLAG_DEEPEST_RESET_PERSISTENCE_MASK GENMASK(3, 1)
> +#define CXL_FEAT_ENTRY_FLAG_PERSIST_ACROSS_FIRMWARE_UPDATE BIT(4)
> +#define CXL_FEAT_ENTRY_FLAG_SUPPORT_DEFAULT_SELECTION BIT(5)
> +#define CXL_FEAT_ENTRY_FLAG_SUPPORT_SAVED_SELECTION BIT(6)
> +
> +enum cxl_feat_attr_value_persistence {
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_NONE,
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_CXL_RESET,
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_HOT_RESET,
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_WARM_RESET,
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_COLD_RESET,
> + CXL_FEAT_ATTR_VALUE_PERSISTENCE_MAX
> +};
> +
> +#define CXL_FEAT_ENTRY_FLAG_PERSISTENCE_ACROSS_FW_UPDATE_MASK BIT(4)
> +#define CXL_FEAT_ENTRY_FLAG_PERSISTENCE_DEFAULT_SEL_SUPPORT_MASK BIT(5)
> +#define CXL_FEAT_ENTRY_FLAG_PERSISTENCE_SAVED_SEL_SUPPORT_MASK BIT(6)
> +
> +struct cxl_mbox_supp_feat_entry {
> + uuid_t uuid;
> + __le16 feat_index;
> + __le16 get_feat_size;
> + __le16 set_feat_size;
> + __le32 attr_flags;
> + u8 get_feat_version;
> + u8 set_feat_version;
> + __le16 set_feat_effects;
> + u8 rsvd[18];
> +} __packed;
> +
> +/*
> + * Get Supported Features output payload
> + * CXL rev 3.1 section 8.2.9.6.1 Table 8-96
> + */
> +struct cxl_mbox_get_supp_feats_out {
> + __le16 entries;
> + __le16 nsuppfeats_dev;
> + u32 reserved;
> + struct cxl_mbox_supp_feat_entry feat_entries[];
> +} __packed;
> +
> /* Get Poison List CXL 3.0 Spec 8.2.9.8.4.1 */
> struct cxl_mbox_poison_in {
> __le64 offset;
> @@ -829,6 +888,9 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> enum cxl_event_type event_type,
> const uuid_t *uuid, union cxl_event *evt);
> int cxl_set_timestamp(struct cxl_memdev_state *mds);
> +int cxl_get_supported_features(struct cxl_memdev_state *mds,
> + struct cxl_mbox_get_supp_feats_in *pi,
> + void *feats_out);
> int cxl_poison_state_init(struct cxl_memdev_state *mds);
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> struct cxl_region *cxlr);
> --
> 2.34.1
>