2024-05-09 21:57:19

by Dan Williams

[permalink] [raw]
Subject: RE: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

shiju.jose@ wrote:
> From: Shiju Jose <[email protected]>
>
> Add scrub subsystem supports configuring the memory scrubbers
> in the system. The scrub subsystem provides the interface for
> registering the scrub devices. The scrub control attributes
> are provided to the user in /sys/class/ras/rasX/scrub
>
> Co-developed-by: Jonathan Cameron <[email protected]>
> Signed-off-by: Jonathan Cameron <[email protected]>
> Signed-off-by: Shiju Jose <[email protected]>
> ---
> .../ABI/testing/sysfs-class-scrub-configure | 47 +++
> drivers/ras/Kconfig | 7 +
> drivers/ras/Makefile | 1 +
> drivers/ras/memory_scrub.c | 271 ++++++++++++++++++
> include/linux/memory_scrub.h | 37 +++
> 5 files changed, 363 insertions(+)
> create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
> create mode 100755 drivers/ras/memory_scrub.c
> create mode 100755 include/linux/memory_scrub.h
>
> diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
> new file mode 100644
> index 000000000000..3ed77dbb00ad
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
> @@ -0,0 +1,47 @@
> +What: /sys/class/ras/
> +Date: March 2024
> +KernelVersion: 6.9
> +Contact: [email protected]
> +Description:
> + The ras/ class subdirectory belongs to the
> + common ras features such as scrub subsystem.

Why create "ras" class versus just a "srcub" class? I am otherwise not
aware of a precedent for class device hierarchy. For example, on my
system there is:

/sys/class/
├── scsi_device
├── scsi_disk
├── scsi_generic
└── scsi_host

..not:

/sys/class/scsi/
├── device
├── disk
├── generic
└── host


> +
> +What: /sys/class/ras/rasX/scrub/
> +Date: March 2024
> +KernelVersion: 6.9
> +Contact: [email protected]
> +Description:
> + The /sys/class/ras/ras{0,1,2,3,...}/scrub directories
> + correspond to each scrub device registered with the
> + scrub subsystem.

I notice there are some visibility rules in the code, but those
expectations are not documented here.

This documentation would also help developers writing new users of
scrub_device_register().

> +
> +What: /sys/class/ras/rasX/scrub/name
> +Date: March 2024
> +KernelVersion: 6.9
> +Contact: [email protected]
> +Description:
> + (RO) name of the memory scrubber
> +
> +What: /sys/class/ras/rasX/scrub/enable_background
> +Date: March 2024
> +KernelVersion: 6.9
> +Contact: [email protected]
> +Description:
> + (RW) Enable/Disable background(patrol) scrubbing if supported.
> +
> +What: /sys/class/ras/rasX/scrub/rate_available
> +Date: March 2024
> +KernelVersion: 6.9
> +Contact: [email protected]
> +Description:
> + (RO) Supported range for the scrub rate by the scrubber.
> + The scrub rate represents in hours.
> +
> +What: /sys/class/ras/rasX/scrub/rate
> +Date: March 2024
> +KernelVersion: 6.9
> +Contact: [email protected]
> +Description:
> + (RW) The scrub rate specified and it must be with in the
> + supported range by the scrubber.
> + The scrub rate represents in hours.
> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
> index fc4f4bb94a4c..181701479564 100644
> --- a/drivers/ras/Kconfig
> +++ b/drivers/ras/Kconfig
> @@ -46,4 +46,11 @@ config RAS_FMPM
> Memory will be retired during boot time and run time depending on
> platform-specific policies.
>
> +config SCRUB
> + tristate "Memory scrub driver"
> + help
> + This option selects the memory scrub subsystem, supports
> + configuring the parameters of underlying scrubbers in the
> + system for the DRAM memories.
> +
> endif
> diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
> index 11f95d59d397..89bcf0d84355 100644
> --- a/drivers/ras/Makefile
> +++ b/drivers/ras/Makefile
> @@ -2,6 +2,7 @@
> obj-$(CONFIG_RAS) += ras.o
> obj-$(CONFIG_DEBUG_FS) += debugfs.o
> obj-$(CONFIG_RAS_CEC) += cec.o
> +obj-$(CONFIG_SCRUB) += memory_scrub.o
>
> obj-$(CONFIG_RAS_FMPM) += amd/fmpm.o
> obj-y += amd/atl/
> diff --git a/drivers/ras/memory_scrub.c b/drivers/ras/memory_scrub.c
> new file mode 100755
> index 000000000000..7e995380ec3a
> --- /dev/null
> +++ b/drivers/ras/memory_scrub.c
> @@ -0,0 +1,271 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Memory scrub subsystem supports configuring the registered
> + * memory scrubbers.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + */
> +
> +#define pr_fmt(fmt) "MEM SCRUB: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/bitops.h>
> +#include <linux/delay.h>
> +#include <linux/kfifo.h>
> +#include <linux/memory_scrub.h>
> +#include <linux/platform_device.h>
> +#include <linux/spinlock.h>
> +
> +/* memory scrubber config definitions */
> +#define SCRUB_ID_PREFIX "ras"
> +#define SCRUB_ID_FORMAT SCRUB_ID_PREFIX "%d"
> +
> +static DEFINE_IDA(scrub_ida);
> +
> +struct scrub_device {
> + int id;
> + struct device dev;
> + const struct scrub_ops *ops;
> +};
> +
> +#define to_scrub_device(d) container_of(d, struct scrub_device, dev)
> +static ssize_t enable_background_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> + struct scrub_device *scrub_dev = to_scrub_device(dev);
> + bool enable;
> + int ret;
> +
> + ret = kstrtobool(buf, &enable);
> + if (ret < 0)
> + return ret;
> +
> + ret = scrub_dev->ops->set_enabled_bg(dev, enable);
> + if (ret)
> + return ret;

It strikes me as somewhat pointless to have such a thin sysfs
implementation whose only job is to call down into a callback to do the
work. Unless there are other consumers of 'struct scrub_ops' outside of
these sysfs files why not just have the low-level drivers register their
corresponding attributes themselves?

Unless the functionality is truly generic just let the low-level driver
be responsible for conforming to the sysfs ABI expectations, and, for
example, each register their own "enable_background" attribute if they
support that semantic.

So scrub_device_register() would grow a 'const struct attribute_group
**groups' argument, or something along those lines.