2022-03-01 20:20:10

by Joseph, Jithu

[permalink] [raw]
Subject: [RFC 00/10] Introduce In Field Scan driver

Note to Maintainers:
Requesting x86 Maintainers to take a look at patch01 as it
touches arch/x86 portion of the kernel. Also would like to guide them
to patch07 which sets up hotplug notifiers and creates kthreads.

Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to review it.

Requesting Greg KH to review the sysfs changes added by patch08.

Patch10 adds tracing support, requesting Steven Rostedt to review that.

Rest of the patches adds the IFS platform driver, requesting Platform driver maintainers
to review them.


In Field Scan (IFS) is a hardware feature to run circuit level tests on
a CPU core to detect problems that are not caught by parity or ECC checks.

Intel will provide a firmware file containing the scan tests. Similar to
microcode there is a separate file for each family-model-stepping. The
tests in the file are divided into some number of "chunks" that can be
run individually.

The driver loads the tests into memory reserved BIOS local to each CPU
socket in a two step process using writes to MSRs to first load the
SHA hashes for the test. Then the tests themselves. Status MSRs provide
feedback on the success/failure of these steps.

Tests are run by synchronizing execution of all threads on a core and
then writing to the ACTIVATE_SCAN MSR on all threads. Instruction
execution continues when:

1) all tests have completed
2) execution was interrupted
3) a test detected a problem

In all cases reading the SCAN_STATUS MSR provides details on what
happened. Interrupted tests may be restarted.

The IFS driver provides interfaces from /sys to reload tests and to
control execution:

/sys/devices/system/cpu/ifs/reload
Writing "1" to this file will reload the tests from
/lib/firmware/intel/ifs/{ff-mm-ss}.scan

/sys/devices/system/cpu/ifs/run_test
Writing "1" to this file will trigger a scan on each core
sequentially by logical CPU number (when HT is enabled this only
runs the tests once for each core)

/sys/devices/system/cpu/cpu#/ifs/run_test
Writing "1" to one of these files will trigger a scan on just
that core.

Results of the tests are also provided in /sys:

/sys/devices/system/cpu/ifs/status
Global status. Will show the most serious status across
all cores (fail > untested > pass)

/sys/devices/system/cpu/ifs/cpu_fail_list
/sys/devices/system/cpu/ifs/cpu_pass_list
/sys/devices/system/cpu/ifs/cpu_untested_list
CPU lists showing which CPUs have which test status

/sys/devices/system/cpu/cpu#/ifs/status
Status (pass/fail/untested) of each core

/sys/devices/system/cpu/cpu#/ifs/details
Hex value of the SCAN_STATUS MSR for the most recent test on
this core. Note that the error_code field may contain driver
defined software code not defined in the Intel SDM.

Current driver limitations:

1) The ACTIVATE_SCAN MSR allows for running any consecutive subrange or
available tests. But the driver always tries to run all tests and only
uses the subrange feature to restart an interrupted test.

2) Hardware allows for some number of cores to be tested in parallel.
The driver does not make use of this, it only tests one core at a time.


Jithu Joseph (8):
x86/microcode/intel: expose collect_cpu_info_early() for IFS
platform/x86/intel/ifs: Add driver for In-Field Scan
platform/x86/intel/ifs: Load IFS Image
platform/x86/intel/ifs: Check IFS Image sanity
platform/x86/intel/ifs: Authenticate and copy to secured memory
platform/x86/intel/ifs: Create kthreads for online cpus for scan test
platform/x86/intel/ifs: Add IFS sysfs interface
platform/x86/intel/ifs: add ABI documentation for IFS

Tony Luck (2):
Documentation: In-Field Scan
trace: platform/x86/intel/ifs: Add trace point to track Intel IFS
operations

Documentation/ABI/stable/sysfs-driver-ifs | 85 +++++
Documentation/x86/ifs.rst | 108 ++++++
Documentation/x86/index.rst | 1 +
MAINTAINERS | 7 +
arch/x86/include/asm/microcode_intel.h | 6 +
arch/x86/kernel/cpu/microcode/intel.c | 8 +-
drivers/platform/x86/intel/Kconfig | 1 +
drivers/platform/x86/intel/Makefile | 1 +
drivers/platform/x86/intel/ifs/Kconfig | 9 +
drivers/platform/x86/intel/ifs/Makefile | 7 +
drivers/platform/x86/intel/ifs/core.c | 387 +++++++++++++++++++++
drivers/platform/x86/intel/ifs/ifs.h | 155 +++++++++
drivers/platform/x86/intel/ifs/load.c | 299 ++++++++++++++++
drivers/platform/x86/intel/ifs/sysfs.c | 394 ++++++++++++++++++++++
include/trace/events/ifs.h | 38 +++
15 files changed, 1503 insertions(+), 3 deletions(-)
create mode 100644 Documentation/ABI/stable/sysfs-driver-ifs
create mode 100644 Documentation/x86/ifs.rst
create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
create mode 100644 drivers/platform/x86/intel/ifs/Makefile
create mode 100644 drivers/platform/x86/intel/ifs/core.c
create mode 100644 drivers/platform/x86/intel/ifs/ifs.h
create mode 100644 drivers/platform/x86/intel/ifs/load.c
create mode 100644 drivers/platform/x86/intel/ifs/sysfs.c
create mode 100644 include/trace/events/ifs.h

--
2.17.1


2022-03-01 20:20:59

by Joseph, Jithu

[permalink] [raw]
Subject: [RFC 01/10] x86/microcode/intel: expose collect_cpu_info_early() for IFS

IFS uses a image provided by Intel that can be regarded as firmware.
IFS image carries the Processor Signature such as family/model/stepping
similar to what we find in the microcode header.

Expose collect_cpu_info_early() and cpu_signatures_match() for IFS image
sanity check.

No functional change.

Originally-by: Kyung Min Park <[email protected]>
Signed-off-by: Jithu Joseph <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/microcode_intel.h | 6 ++++++
arch/x86/kernel/cpu/microcode/intel.c | 8 +++++---
2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/microcode_intel.h b/arch/x86/include/asm/microcode_intel.h
index d85a07d7154f..ec19eeac535b 100644
--- a/arch/x86/include/asm/microcode_intel.h
+++ b/arch/x86/include/asm/microcode_intel.h
@@ -74,12 +74,18 @@ extern void load_ucode_intel_ap(void);
extern void show_ucode_info_early(void);
extern int __init save_microcode_in_initrd_intel(void);
void reload_ucode_intel(void);
+int collect_cpu_info_early(struct ucode_cpu_info *uci);
+bool cpu_signatures_match(unsigned int s1, unsigned int p1,
+ unsigned int s2, unsigned int p2);
#else
static inline __init void load_ucode_intel_bsp(void) {}
static inline void load_ucode_intel_ap(void) {}
static inline void show_ucode_info_early(void) {}
static inline int __init save_microcode_in_initrd_intel(void) { return -EINVAL; }
static inline void reload_ucode_intel(void) {}
+static inline void collect_cpu_info_early(struct ucode_cpu_info *uci) {}
+static inline void cpu_signatures_match(unsigned int s1, unsigned int p1,
+ unsigned int s2, unsigned int p2) {}
#endif

#endif /* _ASM_X86_MICROCODE_INTEL_H */
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index d28a9f8f3fec..360ec06eec1e 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -45,8 +45,8 @@ static struct microcode_intel *intel_ucode_patch;
/* last level cache size per core */
static int llc_size_per_core;

-static inline bool cpu_signatures_match(unsigned int s1, unsigned int p1,
- unsigned int s2, unsigned int p2)
+bool cpu_signatures_match(unsigned int s1, unsigned int p1,
+ unsigned int s2, unsigned int p2)
{
if (s1 != s2)
return false;
@@ -58,6 +58,7 @@ static inline bool cpu_signatures_match(unsigned int s1, unsigned int p1,
/* ... or they intersect. */
return p1 & p2;
}
+EXPORT_SYMBOL_GPL(cpu_signatures_match);

/*
* Returns 1 if update has been found, 0 otherwise.
@@ -342,7 +343,7 @@ scan_microcode(void *data, size_t size, struct ucode_cpu_info *uci, bool save)
return patch;
}

-static int collect_cpu_info_early(struct ucode_cpu_info *uci)
+int collect_cpu_info_early(struct ucode_cpu_info *uci)
{
unsigned int val[2];
unsigned int family, model;
@@ -372,6 +373,7 @@ static int collect_cpu_info_early(struct ucode_cpu_info *uci)

return 0;
}
+EXPORT_SYMBOL_GPL(collect_cpu_info_early);

static void show_saved_mc(void)
{
--
2.17.1

2022-03-01 23:02:34

by Joseph, Jithu

[permalink] [raw]
Subject: [RFC 05/10] platform/x86/intel/ifs: Check IFS Image sanity

IFS image is designed specifically for a given family, model and
stepping of the processor. Like Intel microcode header, the IFS image
has the Processor Signature, Checksum and Processor Flags that must be
matched with the information returned by the CPUID.

Originally-by: Kyung Min Park <[email protected]>
Signed-off-by: Jithu Joseph <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
drivers/platform/x86/intel/ifs/load.c | 67 +++++++++++++++++++++++++++
1 file changed, 67 insertions(+)

diff --git a/drivers/platform/x86/intel/ifs/load.c b/drivers/platform/x86/intel/ifs/load.c
index 1a5e906c51af..b40f70258f8e 100644
--- a/drivers/platform/x86/intel/ifs/load.c
+++ b/drivers/platform/x86/intel/ifs/load.c
@@ -6,6 +6,7 @@

#include <linux/firmware.h>
#include <linux/platform_device.h>
+#include <asm/microcode_intel.h>

#include "ifs.h"
static const char *ifs_path = "intel/ifs/";
@@ -27,6 +28,67 @@ struct ifs_header {
#define IFS_HEADER_SIZE (sizeof(struct ifs_header))
static struct ifs_header *ifs_header_ptr; /* pointer to the ifs image header */
static u64 ifs_hash_ptr; /* Address of ifs metadata (hash) */
+static int ifs_sanity_check(void *mc)
+{
+ struct microcode_header_intel *mc_header = mc;
+ unsigned long total_size, data_size;
+ u32 sum, i;
+
+ total_size = get_totalsize(mc_header);
+ data_size = get_datasize(mc_header);
+
+ if ((data_size + MC_HEADER_SIZE > total_size) || (total_size % sizeof(u32))) {
+ pr_err("bad ifs data file size.\n");
+ return -EINVAL;
+ }
+
+ if (mc_header->ldrver != 1 || mc_header->hdrver != 1) {
+ pr_err("invalid/unknown ifs update format.\n");
+ return -EINVAL;
+ }
+
+ sum = 0;
+ i = total_size / sizeof(u32);
+ while (i--)
+ sum += ((u32 *)mc)[i];
+
+ if (sum) {
+ pr_err("bad ifs data checksum, aborting.\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static bool find_ifs_matching_signature(struct ucode_cpu_info *uci, void *mc)
+{
+ struct microcode_header_intel *shdr;
+ unsigned int mc_size;
+
+ shdr = (struct microcode_header_intel *)mc;
+ mc_size = get_totalsize(shdr);
+
+ if (!mc_size || ifs_sanity_check(shdr) < 0) {
+ pr_err("ifs sanity check failure\n");
+ return false;
+ }
+
+ if (!cpu_signatures_match(uci->cpu_sig.sig, uci->cpu_sig.pf, shdr->sig, shdr->pf)) {
+ pr_err("ifs signature, pf not matching\n");
+ return false;
+ }
+
+ return true;
+}
+
+static bool ifs_image_sanity_check(void *data)
+{
+ struct ucode_cpu_info uci;
+
+ collect_cpu_info_early(&uci);
+
+ return find_ifs_matching_signature(&uci, data);
+}

static const struct firmware *load_binary(const char *path)
{
@@ -45,6 +107,11 @@ static const struct firmware *load_binary(const char *path)
goto out;
}

+ if (!ifs_image_sanity_check((void *)fw->data)) {
+ pr_err("ifs header sanity check failed\n");
+ release_firmware(fw);
+ fw = NULL;
+ }
out:
platform_device_unregister(ifs_pdev);

--
2.17.1

2022-03-01 23:49:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Tue, Mar 01, 2022 at 11:54:47AM -0800, Jithu Joseph wrote:
> Note to Maintainers:
> Requesting x86 Maintainers to take a look at patch01 as it
> touches arch/x86 portion of the kernel. Also would like to guide them
> to patch07 which sets up hotplug notifiers and creates kthreads.
>
> Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to review it.
>
> Requesting Greg KH to review the sysfs changes added by patch08.

"RFC" means you are not comfortable submitting the changes yet, so you
don't need my review at this point in time. Become confident in your
changes before asking for others to review the code please.

thanks,

greg k-h

2022-03-02 01:34:35

by Joseph, Jithu

[permalink] [raw]
Subject: Re: [RFC 01/10] x86/microcode/intel: expose collect_cpu_info_early() for IFS



On 3/1/2022 12:08 PM, Greg KH wrote:
> On Tue, Mar 01, 2022 at 11:54:48AM -0800, Jithu Joseph wrote:
>> IFS uses a image provided by Intel that can be regarded as firmware.
>> IFS image carries the Processor Signature such as family/model/stepping
>> similar to what we find in the microcode header.
>>
>> Expose collect_cpu_info_early() and cpu_signatures_match() for IFS image
>
> If this is going to be a global symbol, how about
> cpu_collect_info_early() to keep the correct prefix?

Will make this change

2022-03-02 02:52:09

by Joseph, Jithu

[permalink] [raw]
Subject: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

Implement sysfs interface to trigger ifs test for a targeted core or
all cores. For all core testing, the kernel will start testing from core 0
and proceed to the next core one after another. After the ifs test on the
last core, the test stops until the administrator starts another round of
tests. A "targeted core" test runs a single ifs on a single core. The
kernel will only test the target core.

The basic usage is as below.

1. For all cores testing:
echo 1 > /sys/devices/system/cpu/ifs/run_test
cat /sys/devices/system/cpu/ifs/status

2. For "targeted core" testing:
To start test, for example, cpu0:
echo 1 > /sys/devices/system/cpu/cpu#/ifs/run_test
cat /sys/devices/system/cpu/cpu#/ifs/status

3. For reloading IFS image: (e.g, when new IFS image is released)
- copy the new image to /lib/firmware/intel/ifs/
- rename it as {family/model/stepping}.{testname}
- echo 1 > /sys/devices/system/cpu/ifs/reload

This module accepts two tunable parameters. Defaults could be overridden
by passing appropriate values during load time. The parameters are as
described below.

1. noint: When set, system interrupts are not allowed to interrupt a scan.
2. retry: Maximum retry counter when the test is not executed due to an
event such as interrupt.

Originally-by: Kyung Min Park <[email protected]>
Signed-off-by: Jithu Joseph <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
drivers/platform/x86/intel/ifs/Makefile | 2 +-
drivers/platform/x86/intel/ifs/core.c | 8 +
drivers/platform/x86/intel/ifs/ifs.h | 4 +
drivers/platform/x86/intel/ifs/sysfs.c | 394 ++++++++++++++++++++++++
4 files changed, 407 insertions(+), 1 deletion(-)
create mode 100644 drivers/platform/x86/intel/ifs/sysfs.c

diff --git a/drivers/platform/x86/intel/ifs/Makefile b/drivers/platform/x86/intel/ifs/Makefile
index 105b377de410..a2e05bf78c3e 100644
--- a/drivers/platform/x86/intel/ifs/Makefile
+++ b/drivers/platform/x86/intel/ifs/Makefile
@@ -4,4 +4,4 @@

obj-$(CONFIG_INTEL_IFS) += intel_ifs.o

-intel_ifs-objs := core.o load.o
+intel_ifs-objs := core.o load.o sysfs.o
diff --git a/drivers/platform/x86/intel/ifs/core.c b/drivers/platform/x86/intel/ifs/core.c
index 6747b523587a..c9ca385082e9 100644
--- a/drivers/platform/x86/intel/ifs/core.c
+++ b/drivers/platform/x86/intel/ifs/core.c
@@ -283,11 +283,16 @@ static void ifs_first_time(unsigned int cpu)

static int ifs_online_cpu(unsigned int cpu)
{
+ int ret;
+
/* If the CPU is coming online for the first time*/
if (per_cpu(ifs_state, cpu).first_time == 0)
ifs_first_time(cpu);

cpumask_clear_cpu(cpu, &(per_cpu(ifs_state, cpu).mask));
+ ret = ifs_sysfs_create(cpu);
+ if (ret)
+ return ret;

per_cpu(ifs_state, cpu).scan_task = kthread_create_on_node(scan_test_worker, (void *)&cpu,
cpu_to_node(cpu), "ifsCpu/%u",
@@ -311,6 +316,7 @@ static int ifs_offline_cpu(unsigned int cpu)

if (thread)
kthread_stop(thread);
+ ifs_sysfs_remove(cpu);

return 0;
}
@@ -336,6 +342,7 @@ static int __init ifs_init(void)
return ret;
}

+ cpu_ifs_init();
init_completion(&test_thread_done);
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/ifs:online",
ifs_online_cpu, ifs_offline_cpu);
@@ -361,6 +368,7 @@ static void __exit ifs_exit(void)
if (thread)
kthread_stop(thread);
}
+ cpu_ifs_exit();
cpus_read_unlock();
cpuhp_remove_state(cpuhp_scan_state);

diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h
index fcbbb49faa19..4442ccd626c6 100644
--- a/drivers/platform/x86/intel/ifs/ifs.h
+++ b/drivers/platform/x86/intel/ifs/ifs.h
@@ -143,6 +143,10 @@ struct ifs_state {
DECLARE_PER_CPU(struct ifs_state, ifs_state);

int load_ifs_binary(void);
+void cpu_ifs_init(void);
+void cpu_ifs_exit(void);
+int ifs_sysfs_create(unsigned int cpu);
+void ifs_sysfs_remove(unsigned int cpu);
extern struct ifs_params ifs_params;
extern atomic_t siblings_in;
extern atomic_t siblings_out;
diff --git a/drivers/platform/x86/intel/ifs/sysfs.c b/drivers/platform/x86/intel/ifs/sysfs.c
new file mode 100644
index 000000000000..f441968de642
--- /dev/null
+++ b/drivers/platform/x86/intel/ifs/sysfs.c
@@ -0,0 +1,394 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2021 Intel Corporation.
+ *
+ * Author: Jithu Joseph <[email protected]>
+ */
+
+#include <linux/cpu.h>
+#include <linux/delay.h>
+#include <linux/fs.h>
+#include <linux/semaphore.h>
+
+#include "ifs.h"
+
+static DEFINE_SEMAPHORE(ifs_sem);
+static int core_delay = 1;
+static bool ifs_disabled;
+
+/*
+ * Initiate per core test. It wakes up all sibling threads that belongs to the
+ * target cpu. Once all sibling threads wake up, the scan test gets executed and
+ * wait for all sibling threads to finish the scan test.
+ */
+static void do_core_test(int cpu)
+{
+ int sibling;
+
+ reinit_completion(&test_thread_done);
+ atomic_set(&siblings_in, 0);
+ atomic_set(&siblings_out, 0);
+
+ cpu_sibl_ct = cpumask_weight(topology_sibling_cpumask(cpu));
+
+ for_each_cpu(sibling, topology_sibling_cpumask(cpu))
+ cpumask_set_cpu(sibling, &per_cpu(ifs_state, sibling).mask);
+
+ for_each_cpu(sibling, topology_sibling_cpumask(cpu))
+ wake_up_interruptible(&per_cpu(ifs_state, sibling).scan_wq);
+
+ if (wait_for_completion_timeout(&test_thread_done, HZ) == 0) {
+ pr_err("Core locked up during IFS test? IFS disabled\n");
+ ifs_disabled = true;
+ }
+}
+
+/*
+ * The sysfs interface to check the test status:
+ * To check the result, for example, cpu0
+ * cat /sys/devices/system/cpu/cpu0/ifs/details
+ */
+static ssize_t details_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ unsigned int cpu = dev->id;
+ int ret;
+
+ if (down_trylock(&ifs_sem))
+ return -EBUSY;
+
+ ret = sprintf(buf, "%llx\n", per_cpu(ifs_state, cpu).scan_details);
+ up(&ifs_sem);
+
+ return ret;
+}
+
+static DEVICE_ATTR_RO(details);
+
+/*
+ * The sysfs interface to check the test status:
+ * To check the status, for example, cpu0
+ * cat /sys/devices/system/cpu/cpu0/ifs/status
+ */
+static ssize_t status_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ unsigned int cpu = dev->id;
+ u32 scan_result;
+ int ret;
+
+ if (down_trylock(&ifs_sem))
+ return -EBUSY;
+
+ scan_result = per_cpu(ifs_state, cpu).status;
+
+ if (scan_result == SCAN_TEST_FAIL)
+ ret = sprintf(buf, "fail\n");
+ else if (scan_result == SCAN_NOT_TESTED)
+ ret = sprintf(buf, "untested\n");
+ else
+ ret = sprintf(buf, "pass\n");
+
+ up(&ifs_sem);
+
+ return ret;
+}
+
+static DEVICE_ATTR_RO(status);
+
+/*
+ * The sysfs interface for single core testing
+ * To start test, for example, cpu0
+ * echo 1 > /sys/devices/system/cpu/cpu0/ifs/run_test
+ * To check the result:
+ * cat /sys/devices/system/cpu/cpu0/ifs/result
+ * The sibling core gets tested at the same time.
+ */
+static ssize_t run_test_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ unsigned int cpu = dev->id;
+ bool var;
+ int rc;
+
+ if (ifs_disabled)
+ return -ENXIO;
+
+ rc = kstrtobool(buf, &var);
+ if (rc < 0 || var != 1)
+ return -EINVAL;
+
+ if (down_trylock(&ifs_sem)) {
+ pr_info("another instance in progress.\n");
+ return -EBUSY;
+ }
+ cpu_hotplug_disable();
+ do_core_test(cpu);
+ cpu_hotplug_enable();
+ up(&ifs_sem);
+
+ return count;
+}
+
+static DEVICE_ATTR_WO(run_test);
+
+/* per-cpu scan sysfs attributes */
+static struct attribute *ifs_attrs[] = {
+ &dev_attr_run_test.attr,
+ &dev_attr_status.attr,
+ &dev_attr_details.attr,
+ NULL
+};
+
+const struct attribute_group ifs_attr_group = {
+ .attrs = ifs_attrs,
+ .name = "ifs",
+};
+
+/* Creates the sysfs files under /sys/devices/system/cpu/cpuX/ifs */
+int ifs_sysfs_create(unsigned int cpu)
+{
+ struct device *dev;
+ int ret;
+
+ dev = get_cpu_device(cpu);
+ ret = sysfs_create_group(&dev->kobj, &ifs_attr_group);
+ if (ret) {
+ pr_err("failed to create sysfs group\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+/* Removes the sysfs files under /sys/devices/system/cpu/cpuX/ifs */
+void ifs_sysfs_remove(unsigned int cpu)
+{
+ struct device *dev;
+
+ dev = get_cpu_device(cpu);
+ sysfs_remove_group(&dev->kobj, &ifs_attr_group);
+}
+
+/*
+ * Reload the IFS image. When user wants to install new IFS image
+ * image, reloading must be done.
+ */
+static ssize_t reload_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ bool var;
+ int rc;
+
+ if (ifs_disabled)
+ return -ENXIO;
+
+ rc = kstrtobool(buf, &var);
+ if (rc < 0 || var != 1)
+ return -EINVAL;
+
+ down(&ifs_sem);
+ rc = load_ifs_binary();
+ up(&ifs_sem);
+ if (rc < 0) {
+ pr_info("failed to reload ifs hash and test\n");
+ return rc;
+ }
+
+ return count;
+}
+
+static DEVICE_ATTR_WO(reload);
+
+static int run_allcpu_scan_test(void)
+{
+ int cpu;
+
+ if (down_trylock(&ifs_sem)) {
+ pr_info("another instance in progress.\n");
+ return -EBUSY;
+ }
+
+ cpu_hotplug_disable();
+ for_each_cpu(cpu, cpu_online_mask) {
+ /* Only execute test on the first thread on each core */
+ if (cpumask_first(topology_sibling_cpumask(cpu)) != cpu)
+ continue;
+ do_core_test(cpu);
+ mdelay(core_delay);
+ }
+ cpu_hotplug_enable();
+
+ up(&ifs_sem);
+ return 0;
+}
+
+/*
+ * The sysfs interface to execute scan test for all online cpus.
+ * The test can be triggered as below:
+ * echo 1 > /sys/devices/system/cpu/ifs/run_test
+ */
+static ssize_t allcpu_run_test_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ bool var;
+ int rc;
+
+ if (ifs_disabled)
+ return -ENXIO;
+
+ rc = kstrtobool(buf, &var);
+ if (rc < 0 || var != 1)
+ return -EINVAL;
+
+ rc = run_allcpu_scan_test();
+ if (rc < 0)
+ return rc;
+
+ return count;
+}
+
+/*
+ * Percpu and allcpu ifs have attributes named "run_test".
+ * Since the former is defined in this same file using DEVICE_ATTR_WO()
+ * the latter is defined directly.
+ */
+static struct device_attribute dev_attr_allcpu_run_test = {
+ .attr = { .name = "run_test", .mode = 0200 },
+ .store = allcpu_run_test_store,
+};
+
+/*
+ * Currently loaded IFS image version.
+ */
+static ssize_t image_version_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%x\n", ifs_params.loaded_version);
+}
+
+static DEVICE_ATTR_RO(image_version);
+
+/*
+ * Currently loaded IFS image version.
+ */
+static ssize_t cpu_fail_list_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ int ret;
+
+ if (down_trylock(&ifs_sem))
+ return -EBUSY;
+
+ ret = sprintf(buf, "%*pbl\n", cpumask_pr_args(&ifs_params.fail_mask));
+ up(&ifs_sem);
+ return ret;
+}
+
+static DEVICE_ATTR_RO(cpu_fail_list);
+
+static ssize_t cpu_untested_list_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ int ret;
+
+ if (down_trylock(&ifs_sem))
+ return -EBUSY;
+
+ ret = sprintf(buf, "%*pbl\n", cpumask_pr_args(&ifs_params.not_tested_mask));
+ up(&ifs_sem);
+
+ return ret;
+}
+
+static DEVICE_ATTR_RO(cpu_untested_list);
+
+static ssize_t cpu_pass_list_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ int ret;
+
+ if (down_trylock(&ifs_sem))
+ return -EBUSY;
+
+ ret = sprintf(buf, "%*pbl\n", cpumask_pr_args(&ifs_params.pass_mask));
+ up(&ifs_sem);
+
+ return ret;
+}
+
+static DEVICE_ATTR_RO(cpu_pass_list);
+
+/*
+ * Status for global ifs test
+ */
+static ssize_t allcpu_status_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ int ret;
+
+ if (down_trylock(&ifs_sem))
+ return -EBUSY;
+
+ if (!cpumask_empty(&ifs_params.fail_mask))
+ ret = sprintf(buf, "fail\n");
+ else if (!cpumask_empty(&ifs_params.not_tested_mask))
+ ret = sprintf(buf, "untested\n");
+ else
+ ret = sprintf(buf, "pass\n");
+
+ up(&ifs_sem);
+
+ return ret;
+}
+
+/*
+ * Percpu and allcpu ifs have attributes named "status".
+ * Since the former is defined in this same file using DEVICE_ATTR_RO()
+ * the latter is defined directly.
+ */
+static struct device_attribute dev_attr_allcpu_status = {
+ .attr = { .name = "status", .mode = 0444 },
+ .show = allcpu_status_show,
+};
+
+/* global scan sysfs attributes */
+static struct attribute *cpu_ifs_attrs[] = {
+ &dev_attr_reload.attr,
+ &dev_attr_allcpu_run_test.attr,
+ &dev_attr_image_version.attr,
+ &dev_attr_cpu_fail_list.attr,
+ &dev_attr_cpu_untested_list.attr,
+ &dev_attr_cpu_pass_list.attr,
+ &dev_attr_allcpu_status.attr,
+ NULL
+};
+
+const struct attribute_group cpu_ifs_attr_group = {
+ .attrs = cpu_ifs_attrs,
+};
+
+const struct attribute_group *cpu_ifs_attr_groups[] = {
+ &cpu_ifs_attr_group,
+ NULL,
+};
+
+static struct device *cpu_scan_device;
+
+/* Creates the sysfs files under /sys/devices/system/cpu/ifs */
+void cpu_ifs_init(void)
+{
+ struct device *root;
+
+ root = cpu_subsys.dev_root;
+ cpu_scan_device = cpu_device_create(root, NULL, cpu_ifs_attr_groups, "ifs");
+}
+
+void cpu_ifs_exit(void)
+{
+ device_unregister(cpu_scan_device);
+}
--
2.17.1

2022-03-02 04:35:34

by Joseph, Jithu

[permalink] [raw]
Subject: [RFC 07/10] platform/x86/intel/ifs: Create kthreads for online cpus for scan test

Create scan kthreads for online logical cpus. Once scan test is triggered,
it wakes up the corresponding thread and its sibling threads to execute
the test. Once the scan test is done, the threads go back to thread wait
for next signal to start a new scan.

In a core, the scan engine is shared between siblings. When a scan test
is triggered on a core, all the siblings rendezvous before the test execution.
The scan results are same for all siblings.

Scan may be aborted by some reasons. Scan test will be aborted in certain
circumstances such as when interrupt occurred or cpu does not have enough
power budget for scan. In this case, the kernel restart scan from the chunk
where it stopped. Scan will also be aborted when the test is failed. In
this case, the test is immediately stopped without retry.

Originally-by: Kyung Min Park <[email protected]>
Signed-off-by: Jithu Joseph <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
drivers/platform/x86/intel/ifs/core.c | 317 ++++++++++++++++++++++++++
drivers/platform/x86/intel/ifs/ifs.h | 91 ++++++++
2 files changed, 408 insertions(+)

diff --git a/drivers/platform/x86/intel/ifs/core.c b/drivers/platform/x86/intel/ifs/core.c
index 765d9a2c4683..6747b523587a 100644
--- a/drivers/platform/x86/intel/ifs/core.c
+++ b/drivers/platform/x86/intel/ifs/core.c
@@ -4,11 +4,38 @@
* Author: Jithu Joseph <[email protected]>
*/

+#include <linux/cpu.h>
+#include <linux/delay.h>
+#include <linux/kthread.h>
#include <linux/module.h>
+#include <linux/nmi.h>
#include <asm/cpu_device_id.h>

#include "ifs.h"
+
+static enum cpuhp_state cpuhp_scan_state;
struct ifs_params ifs_params;
+int cpu_sibl_ct;
+atomic_t siblings_in; /* sibling count for joining rendezvous.*/
+atomic_t siblings_out; /* sibling count for exiting rendezvous.*/
+struct completion test_thread_done; /* set when scan are done for all siblings threads.*/
+
+DEFINE_PER_CPU(struct ifs_state, ifs_state);
+
+static int ifs_retry_set(const char *val, const struct kernel_param *kp);
+static const struct kernel_param_ops ifs_retry_ops = {
+ .set = ifs_retry_set,
+ .get = param_get_int,
+};
+
+static int retry = 5;
+module_param_cb(retry, &ifs_retry_ops, &retry, 0644);
+
+MODULE_PARM_DESC(retry, "Maximum retry count when the test is not executed");
+
+static bool noint = 1;
+module_param(noint, bool, 0644);
+MODULE_PARM_DESC(noint, "Option to enable/disable interrupt during test");

#define X86_MATCH(model) \
X86_MATCH_VENDOR_FAM_MODEL_FEATURE(INTEL, 6, \
@@ -21,6 +48,273 @@ static const struct x86_cpu_id ifs_cpu_ids[] __initconst = {

MODULE_DEVICE_TABLE(x86cpu, ifs_cpu_ids);

+static int ifs_retry_set(const char *val, const struct kernel_param *kp)
+{
+ int var = 0;
+
+ if (kstrtoint(val, 0, &var)) {
+ pr_err("unable to parse retry\n");
+ return -EINVAL;
+ }
+
+ /* validate retry value for sanity */
+ if (var < 1 || var > 20) {
+ pr_err("retry parameter should be between 1 and 20\n");
+ return -EINVAL;
+ }
+
+ return param_set_int(val, kp);
+}
+
+static unsigned long msec_to_tsc(unsigned long msec)
+{
+ return tsc_khz * 1000 * msec / MSEC_PER_SEC;
+}
+
+static const char * const scan_test_status[] = {
+ "SCAN no error",
+ "Other thread could not join.",
+ "Interrupt occurred prior to SCAN coordination.",
+ "Core Abort SCAN Response due to power management condition.",
+ "Non valid chunks in the range",
+ "Mismatch in arguments between threads T0/T1.",
+ "Core not capable of performing SCAN currently",
+ "Unassigned error code 0x7",
+ "Exceeded number of Logical Processors (LP) allowed to run Scan-At-Field concurrently",
+ "Interrupt occurred prior to SCAN start",
+};
+
+static void message_not_tested(int cpu, union ifs_status status)
+{
+ if (status.error_code < ARRAY_SIZE(scan_test_status))
+ pr_warn("CPU %d: SCAN operation did not start. %s\n", cpu,
+ scan_test_status[status.error_code]);
+ else if (status.error_code == IFS_SW_TIMEOUT)
+ pr_warn("CPU %d: software timeout during scan\n", cpu);
+ else if (status.error_code == IFS_SW_PARTIAL_COMPLETION)
+ pr_warn("CPU %d: %s\n", cpu,
+ "Not all scan chunks were executed. Maximum forward progress retries exceeded");
+ else
+ pr_warn("CPU %d: SCAN unknown status %llx\n", cpu, status.data);
+}
+
+static void message_fail(int cpu, union ifs_status status)
+{
+ if (status.control_error) {
+ pr_err("CPU %d: scan failed. %s\n", cpu,
+ "Suggest reload scan file: # echo 1 > /sys/devices/system/cpu/ifs/reload");
+ }
+ if (status.signature_error) {
+ pr_err("CPU %d: test signature incorrect. %s\n", cpu,
+ "Suggest retry scan to check if problem is transient");
+ }
+}
+
+static bool can_restart(union ifs_status status)
+{
+ /* Signature for chunk is bad, or scan test failed */
+ if (status.signature_error || status.control_error)
+ return false;
+
+ switch (status.error_code) {
+ case IFS_POWER_MGMT_INADEQUATE_FOR_SCAN:
+ mdelay(1);
+ fallthrough;
+ case IFS_NO_ERROR:
+ case IFS_OTHER_THREAD_DID_NOT_JOIN:
+ case IFS_INTERRUPTED_BEFORE_RENDEZVOUS:
+ case IFS_EXCEED_NUMBER_OF_THREADS_CONCURRENT:
+ case IFS_INTERRUPTED_DURING_EXECUTION:
+ return true;
+ }
+ return false;
+}
+
+static bool wait_for_siblings(atomic_t *t, long long timeout)
+{
+ atomic_inc(t);
+ while (atomic_read(t) < cpu_sibl_ct) {
+ if (timeout < SPINUNIT) {
+ pr_err("Timeout while waiting for CPUs rendezvous, remaining: %d\n",
+ cpu_sibl_ct - atomic_read(t));
+ return false;
+ }
+
+ ndelay(SPINUNIT);
+ timeout -= SPINUNIT;
+
+ touch_nmi_watchdog();
+ }
+
+ return true;
+}
+
+/*
+ * Scan test kthreads bound with each logical cpu.
+ * Wait for the sibling thread to join before the execution.
+ * Execute the scan test by running wrmsr(MSR_ACTIVATE_SCAN).
+ */
+static int scan_test_worker(void *info)
+{
+ int cpu = smp_processor_id();
+ union ifs_scan activate;
+ union ifs_status status;
+ unsigned long timeout;
+ int retries;
+ u32 first;
+
+ activate.rsvd = 0;
+ activate.delay = msec_to_tsc(THREAD_WAIT);
+ activate.sigmce = 0;
+
+ while (1) {
+ /* wait event until cpumask set from user */
+ wait_event_interruptible(per_cpu(ifs_state, cpu).scan_wq,
+ (cpumask_test_cpu(cpu, &per_cpu(ifs_state, cpu).mask) ||
+ kthread_should_stop()));
+
+ if (kthread_should_stop())
+ break;
+
+ /*
+ * Need to get (and keep) the threads on this core executing close together
+ * so that the writes to MSR_ACTIVATE_SCAN below will succeed in entering
+ * IFS test mode on this core. Interrupts on each thread are expected to be
+ * brief. But preemption would be a problem.
+ */
+ preempt_disable();
+
+ /* wait for the sibling threads to join */
+ first = cpumask_first(topology_sibling_cpumask(cpu));
+ if (!wait_for_siblings(&siblings_in, NSEC_PER_SEC)) {
+ preempt_enable();
+ return -1;
+ }
+
+ activate.start = 0;
+ activate.stop = ifs_params.valid_chunks - 1;
+ timeout = jiffies + HZ / 2;
+ retries = retry;
+
+ while (activate.start <= activate.stop) {
+ if (time_after(jiffies, timeout)) {
+ status.error_code = IFS_SW_TIMEOUT;
+ break;
+ }
+
+ if (noint)
+ local_irq_disable();
+ /* scan start */
+ wrmsrl(MSR_ACTIVATE_SCAN, activate.data);
+
+ if (noint)
+ local_irq_enable();
+
+ /*
+ * All logical CPUs on this core are now running IFS test. When it completes
+ * execution or is interrupted, the following RDMSR gets the scan status.
+ */
+
+ rdmsrl(MSR_SCAN_STATUS, status.data);
+
+ /* Some cases can be retried, give up for others */
+ if (!can_restart(status))
+ break;
+
+ if (status.chunk_num == activate.start) {
+ /* Check for forward progress */
+ if (retries-- == 0) {
+ if (status.error_code == IFS_NO_ERROR)
+ status.error_code = IFS_SW_PARTIAL_COMPLETION;
+ break;
+ }
+ } else {
+ retries = retry;
+ activate.start = status.chunk_num;
+ }
+ }
+
+ preempt_enable();
+
+ /* Update status for this core */
+ per_cpu(ifs_state, cpu).scan_details = status.data;
+
+ if (status.control_error || status.signature_error) {
+ per_cpu(ifs_state, cpu).status = SCAN_TEST_FAIL;
+ cpumask_set_cpu(cpu, &ifs_params.fail_mask);
+ cpumask_clear_cpu(cpu, &ifs_params.not_tested_mask);
+ cpumask_clear_cpu(cpu, &ifs_params.pass_mask);
+ message_fail(cpu, status);
+ } else if (status.error_code) {
+ per_cpu(ifs_state, cpu).status = SCAN_NOT_TESTED;
+ cpumask_set_cpu(cpu, &ifs_params.not_tested_mask);
+ cpumask_clear_cpu(cpu, &ifs_params.fail_mask);
+ cpumask_clear_cpu(cpu, &ifs_params.pass_mask);
+ message_not_tested(cpu, status);
+ } else {
+ per_cpu(ifs_state, cpu).status = SCAN_TEST_PASS;
+ cpumask_set_cpu(cpu, &ifs_params.pass_mask);
+ cpumask_clear_cpu(cpu, &ifs_params.not_tested_mask);
+ cpumask_clear_cpu(cpu, &ifs_params.fail_mask);
+ }
+
+ cpumask_clear_cpu(cpu, &per_cpu(ifs_state, cpu).mask);
+
+ if (!wait_for_siblings(&siblings_out, NSEC_PER_SEC))
+ return -1;
+
+ if (cpu == first)
+ complete(&test_thread_done);
+ }
+
+ return 0;
+}
+
+static void ifs_first_time(unsigned int cpu)
+{
+ init_waitqueue_head(&(per_cpu(ifs_state, cpu).scan_wq));
+
+ per_cpu(ifs_state, cpu).first_time = 1;
+ per_cpu(ifs_state, cpu).status = SCAN_NOT_TESTED;
+ cpumask_set_cpu(cpu, &ifs_params.not_tested_mask);
+ cpumask_clear_cpu(cpu, &ifs_params.fail_mask);
+ cpumask_clear_cpu(cpu, &ifs_params.pass_mask);
+}
+
+static int ifs_online_cpu(unsigned int cpu)
+{
+ /* If the CPU is coming online for the first time*/
+ if (per_cpu(ifs_state, cpu).first_time == 0)
+ ifs_first_time(cpu);
+
+ cpumask_clear_cpu(cpu, &(per_cpu(ifs_state, cpu).mask));
+
+ per_cpu(ifs_state, cpu).scan_task = kthread_create_on_node(scan_test_worker, (void *)&cpu,
+ cpu_to_node(cpu), "ifsCpu/%u",
+ cpu);
+ if (IS_ERR(per_cpu(ifs_state, cpu).scan_task)) {
+ pr_err("scan_test_worker task create failed\n");
+ return PTR_ERR(per_cpu(ifs_state, cpu).scan_task);
+ }
+ kthread_bind(per_cpu(ifs_state, cpu).scan_task, cpu);
+ wake_up_process(per_cpu(ifs_state, cpu).scan_task);
+
+ return 0;
+}
+
+static int ifs_offline_cpu(unsigned int cpu)
+{
+ struct task_struct *thread;
+
+ thread = per_cpu(ifs_state, cpu).scan_task;
+ per_cpu(ifs_state, cpu).scan_task = NULL;
+
+ if (thread)
+ kthread_stop(thread);
+
+ return 0;
+}
+
static int __init ifs_init(void)
{
const struct x86_cpu_id *m;
@@ -42,11 +336,34 @@ static int __init ifs_init(void)
return ret;
}

+ init_completion(&test_thread_done);
+ ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/ifs:online",
+ ifs_online_cpu, ifs_offline_cpu);
+
+ if (ret < 0) {
+ pr_err("cpuhp_setup_failed\n");
+ return ret;
+ }
+ cpuhp_scan_state = ret;
+
return 0;
}

static void __exit ifs_exit(void)
{
+ struct task_struct *thread;
+ int cpu;
+
+ cpus_read_lock();
+ for_each_online_cpu(cpu) {
+ thread = per_cpu(ifs_state, cpu).scan_task;
+ per_cpu(ifs_state, cpu).scan_task = NULL;
+ if (thread)
+ kthread_stop(thread);
+ }
+ cpus_read_unlock();
+ cpuhp_remove_state(cpuhp_scan_state);
+
pr_info("unloaded 'In-Field Scan' module\n");
}

diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h
index 8f9abdb304b0..fcbbb49faa19 100644
--- a/drivers/platform/x86/intel/ifs/ifs.h
+++ b/drivers/platform/x86/intel/ifs/ifs.h
@@ -18,6 +18,13 @@
#define MSR_SCAN_HASHES_STATUS 0x000002c3
#define MSR_AUTHENTICATE_AND_COPY_CHUNK 0x000002c4
#define MSR_CHUNKS_AUTHENTICATION_STATUS 0x000002c5
+#define MSR_ACTIVATE_SCAN 0x000002c6
+#define MSR_SCAN_STATUS 0x000002c7
+#define SCAN_TEST_PASS 0
+#define SCAN_TEST_FAIL 1
+#define SCAN_NOT_TESTED 2
+#define SPINUNIT 100
+#define THREAD_WAIT 5

/* MSR_SCAN_HASHES_STATUS bit fields */
union ifs_scan_hashes_status {
@@ -45,16 +52,100 @@ union ifs_chunks_auth_status {
};
};

+/* MSR_ACTIVATE_SCAN bit fields */
+union ifs_scan {
+ u64 data;
+ struct {
+ u64 start :8;
+ u64 stop :8;
+ u64 rsvd :16;
+ u64 delay :31;
+ u64 sigmce :1;
+ };
+};
+
+/* MSR_SCAN_STATUS bit fields */
+union ifs_status {
+ u64 data;
+ struct {
+ u64 chunk_num :8;
+ u64 chunk_stop_index :8;
+ u64 rsvd1 :16;
+ u64 error_code :8;
+ u64 rsvd2 :22;
+ u64 control_error :1;
+ u64 signature_error :1;
+ };
+};
+
+/*
+ * ifs_status.error_code values after rdmsr(SCAN_STATUS)
+ * 0x0: no error.
+ * 0x1: scan did not start because all sibling threads did not join.
+ * 0x2: scan did not start because interrupt occurred prior to threads rendezvous
+ * 0x3: scan did not start because power management conditions are inadequate.
+ * 0x4: scan did not start because non-valid chunks in range stop_index:start_index.
+ * 0x5: scan did not start because of mismatches in arguments between sibling threads.
+ * 0x6: scan did not start because core is not capable of performing scan currently.
+ * 0x7: not assigned.
+ * 0x8: scan did not start because of exceed number of concurrent CPUs attempt to run scan.
+ * 0x9: interrupt occurred. Scan operation aborted prematurely. Not all chunks executed.
+ * 0xFE: not all scan chunks were executed. Maximum forward progress retries exceeded.
+ * This is a driver populated error-code as hardware returns success in this scenario.
+ */
+#define IFS_NO_ERROR 0x0
+#define IFS_OTHER_THREAD_DID_NOT_JOIN 0x1
+#define IFS_INTERRUPTED_BEFORE_RENDEZVOUS 0x2
+#define IFS_POWER_MGMT_INADEQUATE_FOR_SCAN 0x3
+#define IFS_INVALID_CHUNK_RANGE 0x4
+#define IFS_MISMATCH_ARGUMENTS_BETWEEN_THREADS 0x5
+#define IFS_CORE_NOT_CAPABLE_CURRENTLY 0x6
+/* Code 0x7 not assigned */
+#define IFS_EXCEED_NUMBER_OF_THREADS_CONCURRENT 0x8
+#define IFS_INTERRUPTED_DURING_EXECUTION 0x9
+
+#define IFS_SW_TIMEOUT 0xFD
+#define IFS_SW_PARTIAL_COMPLETION 0xFE
+
/**
* struct ifs_params - global ifs parameter for all cpus.
* @loaded_version: stores the currently loaded ifs image version.
* @valid_chunks: number of chunks which could be validated.
+ * @fail_mask: stores the cpus which failed the scan.
+ * @not_tested_mask: stores the cpus which have not been tested.
*/
struct ifs_params {
int loaded_version;
int valid_chunks;
+ struct cpumask fail_mask;
+ struct cpumask pass_mask;
+ struct cpumask not_tested_mask;
};

+/**
+ * struct ifs_state - per-cpu ifs parameter.
+ * @scan_task: scan_task for kthread to run scan test on each cpu.
+ * @first_time: to track if cpu is coming online for the first time.
+ * @status: it holds simple status pass/fail/untested.
+ * @scan_details: opaque scan status code from h/w.
+ * @scan_wq: kthread task wait queue.
+ * @mask: triggering the test by setting the mask.
+ */
+struct ifs_state {
+ struct task_struct *scan_task;
+ int first_time;
+ int status;
+ u64 scan_details;
+ wait_queue_head_t scan_wq;
+ struct cpumask mask;
+};
+
+DECLARE_PER_CPU(struct ifs_state, ifs_state);
+
int load_ifs_binary(void);
extern struct ifs_params ifs_params;
+extern atomic_t siblings_in;
+extern atomic_t siblings_out;
+extern struct completion test_thread_done;
+extern int cpu_sibl_ct;
#endif
--
2.17.1

2022-03-02 05:32:43

by Joseph, Jithu

[permalink] [raw]
Subject: [RFC 04/10] platform/x86/intel/ifs: Load IFS Image

IFS uses a test image that can be regarded as firmware. The image is
specific to a processor family, model and stepping. IFS requires that a
test image be loaded before any ifs test is initiated. Load the image
that matches processor signature. The IFS image is signed by Intel.

The IFS image file follows a similar naming convention as used for
Intel CPU microcode files. The file must be located in the firmware
directory where the microcode files are placed and named as {family/model
/stepping}.scan as below:

/lib/firmware/intel/ifs/{ff-mm-ss}.scan

Originally-by: Kyung Min Park <[email protected]>
Signed-off-by: Jithu Joseph <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
drivers/platform/x86/intel/ifs/Makefile | 2 +-
drivers/platform/x86/intel/ifs/core.c | 8 +++
drivers/platform/x86/intel/ifs/ifs.h | 13 ++++
drivers/platform/x86/intel/ifs/load.c | 95 +++++++++++++++++++++++++
4 files changed, 117 insertions(+), 1 deletion(-)
create mode 100644 drivers/platform/x86/intel/ifs/load.c

diff --git a/drivers/platform/x86/intel/ifs/Makefile b/drivers/platform/x86/intel/ifs/Makefile
index 05b4925402b4..105b377de410 100644
--- a/drivers/platform/x86/intel/ifs/Makefile
+++ b/drivers/platform/x86/intel/ifs/Makefile
@@ -4,4 +4,4 @@

obj-$(CONFIG_INTEL_IFS) += intel_ifs.o

-intel_ifs-objs := core.o
+intel_ifs-objs := core.o load.o
diff --git a/drivers/platform/x86/intel/ifs/core.c b/drivers/platform/x86/intel/ifs/core.c
index fb3c864d3085..765d9a2c4683 100644
--- a/drivers/platform/x86/intel/ifs/core.c
+++ b/drivers/platform/x86/intel/ifs/core.c
@@ -8,6 +8,7 @@
#include <asm/cpu_device_id.h>

#include "ifs.h"
+struct ifs_params ifs_params;

#define X86_MATCH(model) \
X86_MATCH_VENDOR_FAM_MODEL_FEATURE(INTEL, 6, \
@@ -24,6 +25,7 @@ static int __init ifs_init(void)
{
const struct x86_cpu_id *m;
u64 ia32_core_caps;
+ int ret;

/* ifs capability check */
m = x86_match_cpu(ifs_cpu_ids);
@@ -34,6 +36,12 @@ static int __init ifs_init(void)
if (!(ia32_core_caps & MSR_IA32_CORE_CAPS_INTEGRITY))
return -ENODEV;

+ ret = load_ifs_binary();
+ if (ret) {
+ pr_err("loading ifs binaries failed\n");
+ return ret;
+ }
+
return 0;
}

diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h
index f3f924fced06..f2daf2cfd3e6 100644
--- a/drivers/platform/x86/intel/ifs/ifs.h
+++ b/drivers/platform/x86/intel/ifs/ifs.h
@@ -7,8 +7,21 @@
#ifndef _IFS_H_
#define _IFS_H_

+#undef pr_fmt
+#define pr_fmt(fmt) "ifs: " fmt
+
/* These bits are in the IA32_CORE_CAPABILITIES MSR */
#define MSR_IA32_CORE_CAPS_INTEGRITY_BIT 2
#define MSR_IA32_CORE_CAPS_INTEGRITY BIT(MSR_IA32_CORE_CAPS_INTEGRITY_BIT)

+/**
+ * struct ifs_params - global ifs parameter for all cpus.
+ * @loaded_version: stores the currently loaded ifs image version.
+ */
+struct ifs_params {
+ int loaded_version;
+};
+
+int load_ifs_binary(void);
+extern struct ifs_params ifs_params;
#endif
diff --git a/drivers/platform/x86/intel/ifs/load.c b/drivers/platform/x86/intel/ifs/load.c
new file mode 100644
index 000000000000..1a5e906c51af
--- /dev/null
+++ b/drivers/platform/x86/intel/ifs/load.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2021 Intel Corporation.
+ *
+ * Author: Jithu Joseph <[email protected]>
+ */
+
+#include <linux/firmware.h>
+#include <linux/platform_device.h>
+
+#include "ifs.h"
+static const char *ifs_path = "intel/ifs/";
+
+struct ifs_header {
+ u32 header_ver;
+ u32 blob_revision;
+ u32 date;
+ u32 processor_sig;
+ u32 check_sum;
+ u32 loader_rev;
+ u32 processor_flags;
+ u32 metadata_size;
+ u32 total_size;
+ u32 fusa_info;
+ u64 reserved;
+};
+
+#define IFS_HEADER_SIZE (sizeof(struct ifs_header))
+static struct ifs_header *ifs_header_ptr; /* pointer to the ifs image header */
+static u64 ifs_hash_ptr; /* Address of ifs metadata (hash) */
+
+static const struct firmware *load_binary(const char *path)
+{
+ struct platform_device *ifs_pdev;
+ const struct firmware *fw;
+ int err;
+
+ ifs_pdev = platform_device_register_simple("ifs", -1, NULL, 0);
+ if (IS_ERR(ifs_pdev)) {
+ pr_err("platform device register failed\n");
+ return NULL;
+ }
+ err = request_firmware_direct(&fw, path, &ifs_pdev->dev);
+ if (err) {
+ pr_err("ifs file %s load failed\n", path);
+ goto out;
+ }
+
+out:
+ platform_device_unregister(ifs_pdev);
+
+ return fw;
+}
+
+/*
+ * Compare the image version whenever loading a new image.
+ * Load the new image only if it is later or equal than the current version.
+ */
+static bool is_newer_binary(int current_loaded_version, struct ifs_header *new_image_ptr)
+{
+ return current_loaded_version <= new_image_ptr->blob_revision;
+}
+
+/*
+ * Load ifs image. Before loading ifs module, the ifs image must be located
+ * in /lib/firmware/intel/ifs and named as {family/model/stepping}.{testname}.
+ */
+int load_ifs_binary(void)
+{
+ const struct firmware *scan_fw;
+ char scan_path[256];
+ int ret;
+
+ snprintf(scan_path, sizeof(scan_path), "%s%02x-%02x-%02x.scan", ifs_path,
+ boot_cpu_data.x86, boot_cpu_data.x86_model, boot_cpu_data.x86_stepping);
+
+ scan_fw = load_binary(scan_path);
+ if (!scan_fw)
+ return -ENOENT;
+
+ /* only reload new scan image for later version than currently loaded */
+ if (!is_newer_binary(ifs_params.loaded_version, (struct ifs_header *)scan_fw->data)) {
+ pr_warn("Refusing to load older binary");
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ifs_header_ptr = (struct ifs_header *)scan_fw->data;
+ ifs_hash_ptr = (u64)(ifs_header_ptr + 1);
+
+ ret = 0;
+out:
+ release_firmware(scan_fw);
+
+ return ret;
+}
--
2.17.1

2022-03-02 07:47:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC 01/10] x86/microcode/intel: expose collect_cpu_info_early() for IFS

On Tue, Mar 01, 2022 at 11:54:48AM -0800, Jithu Joseph wrote:
> IFS uses a image provided by Intel that can be regarded as firmware.
> IFS image carries the Processor Signature such as family/model/stepping
> similar to what we find in the microcode header.
>
> Expose collect_cpu_info_early() and cpu_signatures_match() for IFS image

If this is going to be a global symbol, how about
cpu_collect_info_early() to keep the correct prefix?

thanks,

greg k-h

2022-03-02 08:30:13

by Joseph, Jithu

[permalink] [raw]
Subject: [RFC 10/10] trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations

From: Tony Luck <[email protected]>

Add tracing support which may be useful for debugging systems that fail to complete
In Field Scan tests.

Signed-off-by: Tony Luck <[email protected]>
---
drivers/platform/x86/intel/ifs/core.c | 5 ++++
include/trace/events/ifs.h | 38 +++++++++++++++++++++++++++
2 files changed, 43 insertions(+)
create mode 100644 include/trace/events/ifs.h

diff --git a/drivers/platform/x86/intel/ifs/core.c b/drivers/platform/x86/intel/ifs/core.c
index c9ca385082e9..dec2a72eb95a 100644
--- a/drivers/platform/x86/intel/ifs/core.c
+++ b/drivers/platform/x86/intel/ifs/core.c
@@ -13,6 +13,9 @@

#include "ifs.h"

+#define CREATE_TRACE_POINTS
+#include <trace/events/ifs.h>
+
static enum cpuhp_state cpuhp_scan_state;
struct ifs_params ifs_params;
int cpu_sibl_ct;
@@ -217,6 +220,8 @@ static int scan_test_worker(void *info)

rdmsrl(MSR_SCAN_STATUS, status.data);

+ trace_ifs_status(activate, status);
+
/* Some cases can be retried, give up for others */
if (!can_restart(status))
break;
diff --git a/include/trace/events/ifs.h b/include/trace/events/ifs.h
new file mode 100644
index 000000000000..3c6ef33c7b3b
--- /dev/null
+++ b/include/trace/events/ifs.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ifs
+
+#if !defined(_TRACE_IFS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_IFS_H
+
+#include <linux/ktime.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(ifs_status,
+
+ TP_PROTO(union ifs_scan activate, union ifs_status status),
+
+ TP_ARGS(activate, status),
+
+ TP_STRUCT__entry(
+ __field( u8, start )
+ __field( u8, stop )
+ __field( u64, status )
+ ),
+
+ TP_fast_assign(
+ __entry->start = activate.start;
+ __entry->stop = activate.stop;
+ __entry->status = status.data;
+ ),
+
+ TP_printk("start: %.2x, stop: %.2x, status: %llx",
+ __entry->start,
+ __entry->stop,
+ __entry->status)
+);
+
+#endif /* _TRACE_IFS_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
2.17.1

2022-03-02 13:53:06

by Joseph, Jithu

[permalink] [raw]
Subject: [RFC 03/10] platform/x86/intel/ifs: Add driver for In-Field Scan

In-Field Scan (IFS) provides hardware hooks to perform core tests and
report failures for portions of silicon that lack error detection
capabilities, which will be available in some server SKUs starting with
Sapphire Rapids. It offers infrastructure to specific users such as cloud
providers or OEMs to schedule tests and find in-field failures due to aging
in silicon that might not necessarily be reported with normal machine
checks.

Add basic parts of the IFS module (initialization and check IFS capability
support in a processor).

MSR IA32_CORE_CAPABILITY is a feature-enumerating MSR, bit 2 of which
reports MSR_INTEGRITY_CAPABILITIES. Processor that supports IFS
should reports the MSR_INTEGRITY_CAPABILITIES enabled.

Please check the latest Intel 64 and IA-32 Architectures Software
Developer's Manual for more detailed information on the MSR and the
MSR_INTEGRITY_CAPABILITIES.

Originally-by: Kyung Min Park <[email protected]>
Signed-off-by: Jithu Joseph <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
MAINTAINERS | 7 ++++
drivers/platform/x86/intel/Kconfig | 1 +
drivers/platform/x86/intel/Makefile | 1 +
drivers/platform/x86/intel/ifs/Kconfig | 9 +++++
drivers/platform/x86/intel/ifs/Makefile | 7 ++++
drivers/platform/x86/intel/ifs/core.c | 49 +++++++++++++++++++++++++
drivers/platform/x86/intel/ifs/ifs.h | 14 +++++++
7 files changed, 88 insertions(+)
create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
create mode 100644 drivers/platform/x86/intel/ifs/Makefile
create mode 100644 drivers/platform/x86/intel/ifs/core.c
create mode 100644 drivers/platform/x86/intel/ifs/ifs.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 777cd6fa2b3d..4c9912c0d725 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9685,6 +9685,13 @@ B: https://bugzilla.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git
F: drivers/idle/intel_idle.c

+INTEL IN FIELD SCAN (IFS) DRIVER
+M: Jithu Joseph <[email protected]>
+R: Ashok Raj <[email protected]>
+R: Tony Luck <[email protected]>
+S: Maintained
+F: drivers/platform/x86/intel/ifs
+
INTEL INTEGRATED SENSOR HUB DRIVER
M: Srinivas Pandruvada <[email protected]>
M: Jiri Kosina <[email protected]>
diff --git a/drivers/platform/x86/intel/Kconfig b/drivers/platform/x86/intel/Kconfig
index 8e65086bb6c8..7339e7daf0a1 100644
--- a/drivers/platform/x86/intel/Kconfig
+++ b/drivers/platform/x86/intel/Kconfig
@@ -4,6 +4,7 @@
#

source "drivers/platform/x86/intel/atomisp2/Kconfig"
+source "drivers/platform/x86/intel/ifs/Kconfig"
source "drivers/platform/x86/intel/int1092/Kconfig"
source "drivers/platform/x86/intel/int33fe/Kconfig"
source "drivers/platform/x86/intel/int3472/Kconfig"
diff --git a/drivers/platform/x86/intel/Makefile b/drivers/platform/x86/intel/Makefile
index 35f2066578b2..bd7f2ef5e767 100644
--- a/drivers/platform/x86/intel/Makefile
+++ b/drivers/platform/x86/intel/Makefile
@@ -5,6 +5,7 @@
#

obj-$(CONFIG_INTEL_ATOMISP2_PDX86) += atomisp2/
+obj-$(CONFIG_INTEL_IFS) += ifs/
obj-$(CONFIG_INTEL_SAR_INT1092) += int1092/
obj-$(CONFIG_INTEL_CHT_INT33FE) += int33fe/
obj-$(CONFIG_INTEL_SKL_INT3472) += int3472/
diff --git a/drivers/platform/x86/intel/ifs/Kconfig b/drivers/platform/x86/intel/ifs/Kconfig
new file mode 100644
index 000000000000..88e3d4fa1759
--- /dev/null
+++ b/drivers/platform/x86/intel/ifs/Kconfig
@@ -0,0 +1,9 @@
+config INTEL_IFS
+ tristate "Intel In Field Scan"
+ depends on X86 && 64BIT && SMP
+ help
+ Enable support for In Field Scan in Intel CPU to perform core
+ logic test in the field. To compile this driver as a module, choose
+ M here. The module will be called intel_ifs.
+
+ If unsure, say N.
diff --git a/drivers/platform/x86/intel/ifs/Makefile b/drivers/platform/x86/intel/ifs/Makefile
new file mode 100644
index 000000000000..05b4925402b4
--- /dev/null
+++ b/drivers/platform/x86/intel/ifs/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the In-Field Scan driver
+#
+
+obj-$(CONFIG_INTEL_IFS) += intel_ifs.o
+
+intel_ifs-objs := core.o
diff --git a/drivers/platform/x86/intel/ifs/core.c b/drivers/platform/x86/intel/ifs/core.c
new file mode 100644
index 000000000000..fb3c864d3085
--- /dev/null
+++ b/drivers/platform/x86/intel/ifs/core.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2021 Intel Corporation.
+ *
+ * Author: Jithu Joseph <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <asm/cpu_device_id.h>
+
+#include "ifs.h"
+
+#define X86_MATCH(model) \
+ X86_MATCH_VENDOR_FAM_MODEL_FEATURE(INTEL, 6, \
+ INTEL_FAM6_##model, X86_FEATURE_CORE_CAPABILITIES, NULL)
+
+static const struct x86_cpu_id ifs_cpu_ids[] __initconst = {
+ X86_MATCH(SAPPHIRERAPIDS_X),
+ {}
+};
+
+MODULE_DEVICE_TABLE(x86cpu, ifs_cpu_ids);
+
+static int __init ifs_init(void)
+{
+ const struct x86_cpu_id *m;
+ u64 ia32_core_caps;
+
+ /* ifs capability check */
+ m = x86_match_cpu(ifs_cpu_ids);
+ if (!m)
+ return -ENODEV;
+ if (rdmsrl_safe(MSR_IA32_CORE_CAPS, &ia32_core_caps))
+ return -ENODEV;
+ if (!(ia32_core_caps & MSR_IA32_CORE_CAPS_INTEGRITY))
+ return -ENODEV;
+
+ return 0;
+}
+
+static void __exit ifs_exit(void)
+{
+ pr_info("unloaded 'In-Field Scan' module\n");
+}
+
+MODULE_LICENSE("GPL");
+MODULE_INFO(name, "ifs");
+MODULE_DESCRIPTION("ifs");
+module_init(ifs_init);
+module_exit(ifs_exit);
diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h
new file mode 100644
index 000000000000..f3f924fced06
--- /dev/null
+++ b/drivers/platform/x86/intel/ifs/ifs.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2021 Intel Corporation.
+ *
+ * Author: Jithu Joseph <[email protected]>
+ */
+
+#ifndef _IFS_H_
+#define _IFS_H_
+
+/* These bits are in the IA32_CORE_CAPABILITIES MSR */
+#define MSR_IA32_CORE_CAPS_INTEGRITY_BIT 2
+#define MSR_IA32_CORE_CAPS_INTEGRITY BIT(MSR_IA32_CORE_CAPS_INTEGRITY_BIT)
+
+#endif
--
2.17.1

2022-03-02 14:06:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC 01/10] x86/microcode/intel: expose collect_cpu_info_early() for IFS

On Tue, Mar 01, 2022 at 11:54:48AM -0800, Jithu Joseph wrote:
> @@ -58,6 +58,7 @@ static inline bool cpu_signatures_match(unsigned int s1, unsigned int p1,
> /* ... or they intersect. */
> return p1 & p2;
> }
> +EXPORT_SYMBOL_GPL(cpu_signatures_match);
>
> /*
> * Returns 1 if update has been found, 0 otherwise.
> @@ -342,7 +343,7 @@ scan_microcode(void *data, size_t size, struct ucode_cpu_info *uci, bool save)
> return patch;
> }
>
> -static int collect_cpu_info_early(struct ucode_cpu_info *uci)
> +int collect_cpu_info_early(struct ucode_cpu_info *uci)
> {
> unsigned int val[2];
> unsigned int family, model;
> @@ -372,6 +373,7 @@ static int collect_cpu_info_early(struct ucode_cpu_info *uci)
>
> return 0;
> }
> +EXPORT_SYMBOL_GPL(collect_cpu_info_early);

This is not how this is done - simply exporting a couple of random
functions just because you need them:

drivers/platform/x86/intel/ifs/load.c: In function ‘find_ifs_matching_signature’:
drivers/platform/x86/intel/ifs/load.c:213:6: error: invalid use of void expression
213 | if (!cpu_signatures_match(uci->cpu_sig.sig, uci->cpu_sig.pf, shdr->sig, shdr->pf)) {
| ^
make[5]: *** [scripts/Makefile.build:288: drivers/platform/x86/intel/ifs/load.o] Error 1
make[4]: *** [scripts/Makefile.build:550: drivers/platform/x86/intel/ifs] Error 2
make[3]: *** [scripts/Makefile.build:550: drivers/platform/x86/intel] Error 2
make[2]: *** [scripts/Makefile.build:550: drivers/platform/x86] Error 2
make[1]: *** [scripts/Makefile.build:550: drivers/platform] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:1831: drivers] Error 2

Hell, even your function signatures don't match.

And then, with that fixed it would build but then one would wonder a
long time why that ifs thing doesn't work. Well:

# CONFIG_MICROCODE is not set

So the proper thing to do is to extract the generic functionality for
comparing microcode patch signatures and for gathering the information
collect_cpu_info_early() collects, into intel-generic functions, like
I've done with intel_get_microcode_revision(), move that functionality
to arch/x86/kernel/cpu/intel.c and then use it *both* in the microcode
loader *and* your driver.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-03-02 14:09:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Tue, Mar 01, 2022 at 09:10:20PM +0100, Greg KH wrote:
> On Tue, Mar 01, 2022 at 11:54:47AM -0800, Jithu Joseph wrote:
> > Note to Maintainers:
> > Requesting x86 Maintainers to take a look at patch01 as it
> > touches arch/x86 portion of the kernel. Also would like to guide them
> > to patch07 which sets up hotplug notifiers and creates kthreads.
> >
> > Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to review it.
> >
> > Requesting Greg KH to review the sysfs changes added by patch08.
>
> "RFC" means you are not comfortable submitting the changes yet, so you
> don't need my review at this point in time. Become confident in your
> changes before asking for others to review the code please.

Hint, it needs work, sysfs_emit() for one thing, lack of reference
counting on your cpu objects is another...

2022-03-02 14:26:15

by Joseph, Jithu

[permalink] [raw]
Subject: [RFC 06/10] platform/x86/intel/ifs: Authenticate and copy to secured memory

The IFS image contains hashes that will be used to authenticate the ifs
test chunks. First, use WRMSR to copy the hashes and enumerate the number
of test chunks, chunk size and the maximum number of cores that can run
scan test simultaneously.

Next, use WRMSR to authenticate each and every scan test chunk which is
also stored in the IFS image. The CPU will check if the test chunks match
the hashes, otherwise failure is indicated to system software. If the test
chunk is authenticated, it is automatically copied to secured memory.

The ifs hash copy and authentication only needs to be done on the first
logical cpu of each socket.

Originally-by: Kyung Min Park <[email protected]>
Signed-off-by: Jithu Joseph <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
drivers/platform/x86/intel/ifs/ifs.h | 33 ++++++
drivers/platform/x86/intel/ifs/load.c | 139 +++++++++++++++++++++++++-
2 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h
index f2daf2cfd3e6..8f9abdb304b0 100644
--- a/drivers/platform/x86/intel/ifs/ifs.h
+++ b/drivers/platform/x86/intel/ifs/ifs.h
@@ -14,12 +14,45 @@
#define MSR_IA32_CORE_CAPS_INTEGRITY_BIT 2
#define MSR_IA32_CORE_CAPS_INTEGRITY BIT(MSR_IA32_CORE_CAPS_INTEGRITY_BIT)

+#define MSR_COPY_SCAN_HASHES 0x000002c2
+#define MSR_SCAN_HASHES_STATUS 0x000002c3
+#define MSR_AUTHENTICATE_AND_COPY_CHUNK 0x000002c4
+#define MSR_CHUNKS_AUTHENTICATION_STATUS 0x000002c5
+
+/* MSR_SCAN_HASHES_STATUS bit fields */
+union ifs_scan_hashes_status {
+ u64 data;
+ struct {
+ u64 chunk_size :16;
+ u64 num_chunks :8;
+ u64 rsvd1 :8;
+ u64 error_code :8;
+ u64 rsvd2 :11;
+ u64 max_core_limit :12;
+ u64 valid :1;
+ };
+};
+
+/* MSR_CHUNKS_AUTH_STATUS bit fields */
+union ifs_chunks_auth_status {
+ u64 data;
+ struct {
+ u64 valid_chunks :8;
+ u64 total_chunks :8;
+ u64 rsvd1 :16;
+ u64 error_code :8;
+ u64 rsvd2 :24;
+ };
+};
+
/**
* struct ifs_params - global ifs parameter for all cpus.
* @loaded_version: stores the currently loaded ifs image version.
+ * @valid_chunks: number of chunks which could be validated.
*/
struct ifs_params {
int loaded_version;
+ int valid_chunks;
};

int load_ifs_binary(void);
diff --git a/drivers/platform/x86/intel/ifs/load.c b/drivers/platform/x86/intel/ifs/load.c
index b40f70258f8e..a0cb0e1718bc 100644
--- a/drivers/platform/x86/intel/ifs/load.c
+++ b/drivers/platform/x86/intel/ifs/load.c
@@ -6,10 +6,13 @@

#include <linux/firmware.h>
#include <linux/platform_device.h>
+#include <linux/slab.h>
#include <asm/microcode_intel.h>

#include "ifs.h"
+
static const char *ifs_path = "intel/ifs/";
+static bool ifs_loading_error; /* error occurred during ifs hashes/chunk authentication.*/

struct ifs_header {
u32 header_ver;
@@ -28,6 +31,140 @@ struct ifs_header {
#define IFS_HEADER_SIZE (sizeof(struct ifs_header))
static struct ifs_header *ifs_header_ptr; /* pointer to the ifs image header */
static u64 ifs_hash_ptr; /* Address of ifs metadata (hash) */
+static u64 ifs_test_image_ptr; /* 256B aligned address of test pattern */
+
+static const char * const scan_hash_status[] = {
+ "Reserved",
+ "Attempt to copy scan hashes when copy already in progress",
+ "Secure Memory not set up correctly",
+ "FuSaInfo.ProgramID does not match or ff-mm-ss does not match",
+ "Reserved",
+ "Integrity check failed",
+ "Scan test is in progress"
+};
+
+static const char * const scan_authentication_status[] = {
+ "No error reported",
+ "Attempt to authenticate a chunk which is already marked as authentic",
+ "Chunk authentication error. The hash of chunk did not match expected value"
+};
+
+/*
+ * To copy scan hashes and authenticate test chunks, the initiating cpu must point
+ * to the EDX:EAX to the test image in linear address.
+ * Run wrmsr(MSR_COPY_SCAN_HASHES) for scan hash copy and run wrmsr(MSR_AUTHENTICATE_AND_COPY_CHUNK)
+ * for scan hash copy and test chunk authentication.
+ */
+static int copy_hashes_authenticate_chunks(void *arg)
+{
+ union ifs_scan_hashes_status hashes_status;
+ union ifs_chunks_auth_status chunk_status;
+ int i, num_chunks, chunk_size;
+ u64 linear_addr, base;
+ u32 err_code;
+
+ /* run scan hash copy */
+ wrmsrl(MSR_COPY_SCAN_HASHES, ifs_hash_ptr);
+ rdmsrl(MSR_SCAN_HASHES_STATUS, hashes_status.data);
+
+ /* enumerate the scan image information */
+ num_chunks = hashes_status.num_chunks;
+ chunk_size = hashes_status.chunk_size * 1024;
+ err_code = hashes_status.error_code;
+
+ if (!hashes_status.valid) {
+ ifs_loading_error = true;
+ if (err_code >= ARRAY_SIZE(scan_hash_status)) {
+ pr_err("invalid error code 0x%x for hash copy\n", err_code);
+ return -EINVAL;
+ }
+ pr_err("ifs: %s", scan_hash_status[err_code]);
+ return -ENODEV;
+ }
+ pr_info("the total chunk number: %d\n", num_chunks);
+
+ /* base linear address to the scan data */
+ base = ifs_test_image_ptr;
+
+ /* scan data authentication and copy chunks to secured memory */
+ for (i = 0; i < num_chunks; i++) {
+ linear_addr = base + i * chunk_size;
+ linear_addr |= i;
+
+ wrmsrl(MSR_AUTHENTICATE_AND_COPY_CHUNK, linear_addr);
+ rdmsrl(MSR_CHUNKS_AUTHENTICATION_STATUS, chunk_status.data);
+
+ ifs_params.valid_chunks = chunk_status.valid_chunks;
+ err_code = chunk_status.error_code;
+
+ if (err_code) {
+ ifs_loading_error = true;
+ if (err_code >= ARRAY_SIZE(scan_authentication_status)) {
+ pr_err("invalid error code 0x%x for authentication\n", err_code);
+ return -EINVAL;
+ }
+ pr_err("%s\n", scan_authentication_status[err_code]);
+ return -ENODEV;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * IFS requires scan chunks authenticated per each socket in the platform.
+ * Once the test chunk is authenticated, it is automatically copied to secured memory
+ * and proceed the authentication for the next chunk.
+ */
+static int scan_chunks_sanity_check(void)
+{
+ int metadata_size, curr_pkg, cpu, ret = -ENOMEM;
+ bool *package_authenticated;
+ char *test_ptr;
+
+ package_authenticated = kcalloc(topology_max_packages(), sizeof(bool), GFP_KERNEL);
+ if (!package_authenticated)
+ return ret;
+
+ metadata_size = ifs_header_ptr->metadata_size;
+
+ /* Spec says that if the Meta Data Size = 0 then it should be treated as 2000 */
+ if (metadata_size == 0)
+ metadata_size = 2000;
+
+ /* Scan chunk start must be 256 byte aligned */
+ if ((metadata_size + IFS_HEADER_SIZE) % 256) {
+ pr_err("Scan pattern offset within the binary is not 256 byte aligned\n");
+ return -EINVAL;
+ }
+
+ test_ptr = (char *)ifs_header_ptr + IFS_HEADER_SIZE + metadata_size;
+
+ ifs_test_image_ptr = (u64)test_ptr;
+ ifs_params.loaded_version = ifs_header_ptr->blob_revision;
+
+ /* copy the scan hash and authenticate per package */
+ cpus_read_lock();
+ for_each_online_cpu(cpu) {
+ curr_pkg = topology_physical_package_id(cpu);
+ if (package_authenticated[curr_pkg])
+ continue;
+ package_authenticated[curr_pkg] = 1;
+ ret = smp_call_function_single(cpu, (void *)copy_hashes_authenticate_chunks,
+ NULL, 1);
+ if (ret || ifs_loading_error) {
+ ret = ifs_loading_error ? -ENOMEM : ret;
+ goto out;
+ }
+ }
+
+out:
+ cpus_read_unlock();
+ kfree(package_authenticated);
+
+ return ret;
+}
+
static int ifs_sanity_check(void *mc)
{
struct microcode_header_intel *mc_header = mc;
@@ -154,7 +291,7 @@ int load_ifs_binary(void)
ifs_header_ptr = (struct ifs_header *)scan_fw->data;
ifs_hash_ptr = (u64)(ifs_header_ptr + 1);

- ret = 0;
+ ret = scan_chunks_sanity_check();
out:
release_firmware(scan_fw);

--
2.17.1

2022-03-02 14:29:42

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver



On Tue, Mar 1, 2022, at 11:54 AM, Jithu Joseph wrote:
> Note to Maintainers:
> Requesting x86 Maintainers to take a look at patch01 as it
> touches arch/x86 portion of the kernel. Also would like to guide them
> to patch07 which sets up hotplug notifiers and creates kthreads.
>
> Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to
> review it.
>
> Requesting Greg KH to review the sysfs changes added by patch08.
>
> Patch10 adds tracing support, requesting Steven Rostedt to review that.
>
> Rest of the patches adds the IFS platform driver, requesting Platform
> driver maintainers
> to review them.
>
>
> In Field Scan (IFS) is a hardware feature to run circuit level tests on
> a CPU core to detect problems that are not caught by parity or ECC checks.
>
> Intel will provide a firmware file containing the scan tests. Similar to
> microcode there is a separate file for each family-model-stepping. The
> tests in the file are divided into some number of "chunks" that can be
> run individually.
>
> The driver loads the tests into memory reserved BIOS local to each CPU
> socket in a two step process using writes to MSRs to first load the
> SHA hashes for the test. Then the tests themselves. Status MSRs provide
> feedback on the success/failure of these steps.
>
> Tests are run by synchronizing execution of all threads on a core and
> then writing to the ACTIVATE_SCAN MSR on all threads. Instruction
> execution continues when:
>
> 1) all tests have completed
> 2) execution was interrupted
> 3) a test detected a problem
>
> In all cases reading the SCAN_STATUS MSR provides details on what
> happened. Interrupted tests may be restarted.
>
> The IFS driver provides interfaces from /sys to reload tests and to
> control execution:
>
> /sys/devices/system/cpu/ifs/reload
> Writing "1" to this file will reload the tests from
> /lib/firmware/intel/ifs/{ff-mm-ss}.scan

IMO this interface is wrong. /lib/firmware is for firmware (or ucode, etc) files that should be provided by a distribution and loaded, as needed, by a driver so the hardware can function. This is not at all what IFS does. For IFS, an administrator wants to run a specific test, and the test blob is part of the instruction to run the test. The distribution should not be involved, and this should work even on systems where /lib/firmware is immutable.

So either the blob should be written to a file in sysfs or it should be supplied by write or ioctl to a device node.

>
> /sys/devices/system/cpu/ifs/run_test
> Writing "1" to this file will trigger a scan on each core
> sequentially by logical CPU number (when HT is enabled this only
> runs the tests once for each core)
>
> /sys/devices/system/cpu/cpu#/ifs/run_test
> Writing "1" to one of these files will trigger a scan on just
> that core.
>
> Results of the tests are also provided in /sys:
>
> /sys/devices/system/cpu/ifs/status
> Global status. Will show the most serious status across
> all cores (fail > untested > pass)
>
> /sys/devices/system/cpu/ifs/cpu_fail_list
> /sys/devices/system/cpu/ifs/cpu_pass_list
> /sys/devices/system/cpu/ifs/cpu_untested_list
> CPU lists showing which CPUs have which test status
>
> /sys/devices/system/cpu/cpu#/ifs/status
> Status (pass/fail/untested) of each core
>
> /sys/devices/system/cpu/cpu#/ifs/details
> Hex value of the SCAN_STATUS MSR for the most recent test on
> this core. Note that the error_code field may contain driver
> defined software code not defined in the Intel SDM.
>
> Current driver limitations:
>
> 1) The ACTIVATE_SCAN MSR allows for running any consecutive subrange or
> available tests. But the driver always tries to run all tests and only
> uses the subrange feature to restart an interrupted test.
>
> 2) Hardware allows for some number of cores to be tested in parallel.
> The driver does not make use of this, it only tests one core at a time.
>
>
> Jithu Joseph (8):
> x86/microcode/intel: expose collect_cpu_info_early() for IFS
> platform/x86/intel/ifs: Add driver for In-Field Scan
> platform/x86/intel/ifs: Load IFS Image
> platform/x86/intel/ifs: Check IFS Image sanity
> platform/x86/intel/ifs: Authenticate and copy to secured memory
> platform/x86/intel/ifs: Create kthreads for online cpus for scan test
> platform/x86/intel/ifs: Add IFS sysfs interface
> platform/x86/intel/ifs: add ABI documentation for IFS
>
> Tony Luck (2):
> Documentation: In-Field Scan
> trace: platform/x86/intel/ifs: Add trace point to track Intel IFS
> operations
>
> Documentation/ABI/stable/sysfs-driver-ifs | 85 +++++
> Documentation/x86/ifs.rst | 108 ++++++
> Documentation/x86/index.rst | 1 +
> MAINTAINERS | 7 +
> arch/x86/include/asm/microcode_intel.h | 6 +
> arch/x86/kernel/cpu/microcode/intel.c | 8 +-
> drivers/platform/x86/intel/Kconfig | 1 +
> drivers/platform/x86/intel/Makefile | 1 +
> drivers/platform/x86/intel/ifs/Kconfig | 9 +
> drivers/platform/x86/intel/ifs/Makefile | 7 +
> drivers/platform/x86/intel/ifs/core.c | 387 +++++++++++++++++++++
> drivers/platform/x86/intel/ifs/ifs.h | 155 +++++++++
> drivers/platform/x86/intel/ifs/load.c | 299 ++++++++++++++++
> drivers/platform/x86/intel/ifs/sysfs.c | 394 ++++++++++++++++++++++
> include/trace/events/ifs.h | 38 +++
> 15 files changed, 1503 insertions(+), 3 deletions(-)
> create mode 100644 Documentation/ABI/stable/sysfs-driver-ifs
> create mode 100644 Documentation/x86/ifs.rst
> create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
> create mode 100644 drivers/platform/x86/intel/ifs/Makefile
> create mode 100644 drivers/platform/x86/intel/ifs/core.c
> create mode 100644 drivers/platform/x86/intel/ifs/ifs.h
> create mode 100644 drivers/platform/x86/intel/ifs/load.c
> create mode 100644 drivers/platform/x86/intel/ifs/sysfs.c
> create mode 100644 include/trace/events/ifs.h
>
> --
> 2.17.1

2022-03-02 17:44:37

by Joseph, Jithu

[permalink] [raw]
Subject: [RFC 09/10] platform/x86/intel/ifs: add ABI documentation for IFS

Add the sysfs attributes in ABI/stable for In-Field Scan.

Originally-by: Kyung Min Park <[email protected]>
Signed-off-by: Jithu Joseph <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
Documentation/ABI/stable/sysfs-driver-ifs | 85 +++++++++++++++++++++++
1 file changed, 85 insertions(+)
create mode 100644 Documentation/ABI/stable/sysfs-driver-ifs

diff --git a/Documentation/ABI/stable/sysfs-driver-ifs b/Documentation/ABI/stable/sysfs-driver-ifs
new file mode 100644
index 000000000000..8b6b9472f57e
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-driver-ifs
@@ -0,0 +1,85 @@
+What: /sys/devices/system/cpu/ifs/run_test
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: echo 1 to trigger ifs test for all online cores.
+
+What: /sys/devices/system/cpu/ifs/status
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: Global status. Shows the most serious status across
+ all cores (fail > untested > pass)
+
+What: /sys/devices/system/cpu/ifs/image_version
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: Version of loaded IFS binary image.
+
+What: /sys/devices/system/cpu/ifs/reload
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: echo 1 to reload IFS image.
+
+What: /sys/devices/system/cpu/ifs/cpu_pass_list
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: List of cpus which passed the IFS test.
+
+What: /sys/devices/system/cpu/ifs/cpu_fail_list
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: List of cpus which failed the IFS test.
+
+What: /sys/devices/system/cpu/ifs/cpu_untested_list
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: List of cpus which could not be tested.
+
+What: /sys/module/intel_ifs/parameters/noint
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: SAF tunable parameter that user can modify before
+ the scan run if they wish to override default value.
+
+ When set, system interrupts are not allowed to interrupt an IFS. The
+ default state for this parameter is set.
+
+What: /sys/module/intel_ifs/parameters/retry
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: SAF tunable parameter that user can modify at
+ load time if they wish to override default value.
+
+ Maximum retry counter when the test is not executed due to an
+ event such as interrupt. The default value is 5, it can be set to any
+ value from 1 to 20.
+
+What: /sys/devices/system/cpu/cpu#/ifs/run_test
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: IFS target core testing. echo 1 to trigger scan test on cpu#.
+
+What: /sys/devices/system/cpu/cpu#/ifs/status
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: The status of IFS test on a specific cpu#. It can be one of "pass", "fail"
+ or "untested".
+
+What: /sys/devices/system/cpu/cpu#/ifs/details
+Date: Feb 28, 2022
+KernelVersion: 5.18.0
+Contact: [email protected]
+Description: The details file reports the hex value of the SCAN_STATUS MSR. Note that
+ the error_code field may contain driver defined software code not defined
+ in the Intel SDM.
+
--
2.17.1

2022-03-02 20:32:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Wed, Mar 02, 2022 at 10:33:13AM -0500, Steven Rostedt wrote:
> On Tue, 1 Mar 2022 21:10:20 +0100
> Greg KH <[email protected]> wrote:
>
> > "RFC" means you are not comfortable submitting the changes yet, so you
> > don't need my review at this point in time. Become confident in your
> > changes before asking for others to review the code please.
>
> I guess you and I have a different understanding of RFC (Request for
> Comments). As to me, comments are a form of review.
>
> In other words, RFC to me means the review is "does this design look like
> it will work", and we should be reviewing the design and overview of the
> patches. Not the nitty gritty details (like missed error handling, unless
> the design will prevent it). Although, you could add those comments in a
> review.
>
> When I post RFCs, it's not that I'm not comfortable submitting the change,
> it's because I want to know if what I'm doing makes sense, and I might be
> missing something that will make this effort in vain.
>
> What ever happen to the "Submit early, submit often" mantra?

For patches from "experienced" submitters like this, with reviews from
other experienced reviewers already (look at the s-o-b chain here),
there's no excuse for it to be a RFC unless something is really odd as
the experienced reviewers should have already handled the "comments"
portion already. Otherwise their review was kind of pointless, right?

I'm all for submitting early, but be confident!

Also, I have way too many non-RFC patches to review in my queue, so that
means any RFC patches fall to the bottom so I like to give people a
reason why I'm not reviewing them, like I did here.

thanks,

greg k-h

2022-03-02 22:34:53

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver



On Wed, Mar 2, 2022, at 12:29 PM, Luck, Tony wrote:
> On Wed, Mar 02, 2022 at 05:59:59AM -0800, Andy Lutomirski wrote:
>> > /sys/devices/system/cpu/ifs/reload
>> > Writing "1" to this file will reload the tests from
>> > /lib/firmware/intel/ifs/{ff-mm-ss}.scan
>>
>> IMO this interface is wrong. /lib/firmware is for firmware (or
>> ucode, etc) files that should be provided by a distribution and loaded,
>> as needed, by a driver so the hardware can function. This is not at
>> all what IFS does. For IFS, an administrator wants to run a specific
>> test, and the test blob is part of the instruction to run the test.
>> The distribution should not be involved, and this should work even on
>> systems where /lib/firmware is immutable.
>
> "so the hardware can function"
>
> Data center customers want to know which aging systems in their
> data centers are not functioning correctly. So this is not just
> a random test that people might run when they suspect they have
> a problem. It is expected that every core will run this test
> periodically (period dependent on paranoia level of the system
> owner ... maybe daily ... perhaps even more often).
>
> This is so that the data centre can function.
>

How does this work? Is there an Intel IFS blob v1.17 that is expected to be *the* blob for a given CPU until an update happens? Or is the expectation that several different blobs might all useful on the same system and operators might want to run different blobs under different circumstances?

>>
>> So either the blob should be written to a file in sysfs or it should
>> be supplied by write or ioctl to a device node.
>
> I don't see the drive to create a new mechanism for the kernel
> to load from a file when the firmware loader already exists.
>
> If the problem is just immuatbility of /lib ... then make
> an immutable symlink from /lib/firmware/intel/ifs to some
> other place in the file system (which is what some OS
> vendors already do for microcode).
>
> -Tony

2022-03-02 23:25:10

by Luck, Tony

[permalink] [raw]
Subject: RE: [RFC 00/10] Introduce In Field Scan driver

> How does this work? Is there an Intel IFS blob v1.17 that is expected
> to be *the* blob for a given CPU until an update happens?

This is the model. Although internally the blob is divided into chunks
that can be run separately, folks outside Intel have no visibility into
which chunk tests which circuits (even *inside* ... I don't know what
each chunk does :-) )

How often will updates occur? No idea. Since this is new, I'd expect
that there might be some improvements when there is feedback from
large CSPs running on many more systems than we have.

> Or is the
> expectation that several different blobs might all useful on the same
> system and operators might want to run different blobs under different
> circumstances?

One of our early implementations included extra sysfs hooks to only
test specific chunks ... but we dropped that complexity as there's no
way for end users to decide which chunks to run.

So the posted series just iterates all chunks for a core.

-Tony

2022-03-02 23:44:40

by Luck, Tony

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Wed, Mar 02, 2022 at 05:59:59AM -0800, Andy Lutomirski wrote:
> > /sys/devices/system/cpu/ifs/reload
> > Writing "1" to this file will reload the tests from
> > /lib/firmware/intel/ifs/{ff-mm-ss}.scan
>
> IMO this interface is wrong. /lib/firmware is for firmware (or
> ucode, etc) files that should be provided by a distribution and loaded,
> as needed, by a driver so the hardware can function. This is not at
> all what IFS does. For IFS, an administrator wants to run a specific
> test, and the test blob is part of the instruction to run the test.
> The distribution should not be involved, and this should work even on
> systems where /lib/firmware is immutable.

"so the hardware can function"

Data center customers want to know which aging systems in their
data centers are not functioning correctly. So this is not just
a random test that people might run when they suspect they have
a problem. It is expected that every core will run this test
periodically (period dependent on paranoia level of the system
owner ... maybe daily ... perhaps even more often).

This is so that the data centre can function.

>
> So either the blob should be written to a file in sysfs or it should
> be supplied by write or ioctl to a device node.

I don't see the drive to create a new mechanism for the kernel
to load from a file when the firmware loader already exists.

If the problem is just immuatbility of /lib ... then make
an immutable symlink from /lib/firmware/intel/ifs to some
other place in the file system (which is what some OS
vendors already do for microcode).

-Tony

2022-03-02 23:44:46

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Tue, 1 Mar 2022 21:10:20 +0100
Greg KH <[email protected]> wrote:

> "RFC" means you are not comfortable submitting the changes yet, so you
> don't need my review at this point in time. Become confident in your
> changes before asking for others to review the code please.

I guess you and I have a different understanding of RFC (Request for
Comments). As to me, comments are a form of review.

In other words, RFC to me means the review is "does this design look like
it will work", and we should be reviewing the design and overview of the
patches. Not the nitty gritty details (like missed error handling, unless
the design will prevent it). Although, you could add those comments in a
review.

When I post RFCs, it's not that I'm not comfortable submitting the change,
it's because I want to know if what I'm doing makes sense, and I might be
missing something that will make this effort in vain.

What ever happen to the "Submit early, submit often" mantra?

-- Steve

2022-03-03 00:35:52

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
> Note to Maintainers:
> Requesting x86 Maintainers to take a look at patch01 as it
> touches arch/x86 portion of the kernel. Also would like to guide them
> to patch07 which sets up hotplug notifiers and creates kthreads.
>
> Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to review it.
>
> Requesting Greg KH to review the sysfs changes added by patch08.
>
> Patch10 adds tracing support, requesting Steven Rostedt to review that.
>
> Rest of the patches adds the IFS platform driver, requesting Platform driver maintainers
> to review them.
>
>
> In Field Scan (IFS) is a hardware feature to run circuit level tests on
> a CPU core to detect problems that are not caught by parity or ECC checks.
>
> Intel will provide a firmware file containing the scan tests.  Similar to
> microcode there is a separate file for each family-model-stepping. The
> tests in the file are divided into some number of "chunks" that can be
> run individually.
>
> The driver loads the tests into memory reserved BIOS local to each CPU
> socket in a two step process using writes to MSRs to first load the
> SHA hashes for the test. Then the tests themselves. Status MSRs provide
> feedback on the success/failure of these steps.
>
> Tests are run by synchronizing execution of all threads on a core and
> then writing to the ACTIVATE_SCAN MSR on all threads. Instruction
> execution continues when:
>
> 1) all tests have completed
> 2) execution was interrupted
> 3) a test detected a problem
>
> In all cases reading the SCAN_STATUS MSR provides details on what
> happened. Interrupted tests may be restarted.

Can you say a bit about what motivates upstream to want to carry this
support? For example, if the test content comes from out of tree (i.e.
there is no source for tests other then a location under a URL on
github.com/intel), and nothing in the kernel consumes the results, then
what breaks if that blob from the test URL also has a a few source
files and a Kbuild file to produce a .ko alongside the .scan file? This
is more a comment on the cover letter basics than an actual proposal to
change the distribution model. Just, in general, the cover letter
clarifies why upstream should care about the patches.



2022-03-03 00:44:42

by Ashok Raj

[permalink] [raw]
Subject: Re: [RFC 03/10] platform/x86/intel/ifs: Add driver for In-Field Scan

On Wed, Mar 02, 2022 at 03:24:47PM -0800, Dan Williams wrote:
> On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
> > In-Field Scan (IFS) provides hardware hooks to perform core tests and
> > report failures for portions of silicon that lack error detection
> > capabilities, which will be available in some server SKUs starting with
> > Sapphire Rapids. It offers infrastructure to specific users such as cloud
> > providers or OEMs to schedule tests and find in-field failures due to aging
> > in silicon that might not necessarily be reported with normal machine
> > checks.
> >
> > Add basic parts of the IFS module (initialization and check IFS capability
> > support in a processor).
> >
> > MSR IA32_CORE_CAPABILITY is a feature-enumerating MSR, bit 2 of which
> > reports MSR_INTEGRITY_CAPABILITIES. Processor that supports IFS
> > should reports the MSR_INTEGRITY_CAPABILITIES enabled.
> >
> > Please check the latest Intel 64 and IA-32 Architectures Software
> > Developer's Manual for more detailed information on the MSR and the
> > MSR_INTEGRITY_CAPABILITIES.
> >
> > Originally-by: Kyung Min Park <[email protected]>
> > Signed-off-by: Jithu Joseph <[email protected]>
> > Reviewed-by: Ashok Raj <[email protected]>
> > Reviewed-by: Tony Luck <[email protected]>
> > ---
> > MAINTAINERS | 7 ++++
> > drivers/platform/x86/intel/Kconfig | 1 +
> > drivers/platform/x86/intel/Makefile | 1 +
> > drivers/platform/x86/intel/ifs/Kconfig | 9 +++++
> > drivers/platform/x86/intel/ifs/Makefile | 7 ++++
> > drivers/platform/x86/intel/ifs/core.c | 49 +++++++++++++++++++++++++
> > drivers/platform/x86/intel/ifs/ifs.h | 14 +++++++
> > 7 files changed, 88 insertions(+)
> > create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
> > create mode 100644 drivers/platform/x86/intel/ifs/Makefile
> > create mode 100644 drivers/platform/x86/intel/ifs/core.c
> > create mode 100644 drivers/platform/x86/intel/ifs/ifs.h
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 777cd6fa2b3d..4c9912c0d725 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -9685,6 +9685,13 @@ B: https://bugzilla.kernel.org
> > T: git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git
> > F: drivers/idle/intel_idle.c
> >
> > +INTEL IN FIELD SCAN (IFS) DRIVER
> > +M: Jithu Joseph <[email protected]>
> > +R: Ashok Raj <[email protected]>
> > +R: Tony Luck <[email protected]>
> > +S: Maintained
> > +F: drivers/platform/x86/intel/ifs
> > +
> > INTEL INTEGRATED SENSOR HUB DRIVER
> > M: Srinivas Pandruvada <[email protected]>
> > M: Jiri Kosina <[email protected]>
> > diff --git a/drivers/platform/x86/intel/Kconfig b/drivers/platform/x86/intel/Kconfig
> > index 8e65086bb6c8..7339e7daf0a1 100644
> > --- a/drivers/platform/x86/intel/Kconfig
> > +++ b/drivers/platform/x86/intel/Kconfig
> > @@ -4,6 +4,7 @@
> > #
> >
> > source "drivers/platform/x86/intel/atomisp2/Kconfig"
> > +source "drivers/platform/x86/intel/ifs/Kconfig"
> > source "drivers/platform/x86/intel/int1092/Kconfig"
> > source "drivers/platform/x86/intel/int33fe/Kconfig"
> > source "drivers/platform/x86/intel/int3472/Kconfig"
> > diff --git a/drivers/platform/x86/intel/Makefile b/drivers/platform/x86/intel/Makefile
> > index 35f2066578b2..bd7f2ef5e767 100644
> > --- a/drivers/platform/x86/intel/Makefile
> > +++ b/drivers/platform/x86/intel/Makefile
> > @@ -5,6 +5,7 @@
> > #
> >
> > obj-$(CONFIG_INTEL_ATOMISP2_PDX86) += atomisp2/
> > +obj-$(CONFIG_INTEL_IFS) += ifs/
> > obj-$(CONFIG_INTEL_SAR_INT1092) += int1092/
> > obj-$(CONFIG_INTEL_CHT_INT33FE) += int33fe/
> > obj-$(CONFIG_INTEL_SKL_INT3472) += int3472/
> > diff --git a/drivers/platform/x86/intel/ifs/Kconfig b/drivers/platform/x86/intel/ifs/Kconfig
> > new file mode 100644
> > index 000000000000..88e3d4fa1759
> > --- /dev/null
> > +++ b/drivers/platform/x86/intel/ifs/Kconfig
> > @@ -0,0 +1,9 @@
> > +config INTEL_IFS
> > + tristate "Intel In Field Scan"
> > + depends on X86 && 64BIT && SMP
>
> Are there actual CONFIG_SMP and CONFIG_64BIT compilation dependencies
> in this driver? It looks like this could compile without those config
> dependencies.

Don't think this is anything specific to compile dependencies.

This is a server feature and only targetted and validated in those configs,
untested and unsupported in 32bit configs.

2022-03-03 00:46:47

by Luck, Tony

[permalink] [raw]
Subject: RE: [RFC 03/10] platform/x86/intel/ifs: Add driver for In-Field Scan

>> +config INTEL_IFS
>> + tristate "Intel In Field Scan"
>> + depends on X86 && 64BIT && SMP
>
> Are there actual CONFIG_SMP and CONFIG_64BIT compilation dependencies
> in this driver? It looks like this could compile without those config
> dependencies.

Maybe CONFIG_SMP is there as a kindness to kernel building robots :-)

Yes, you can build this for a UP kernel (I just did, and there were no build
errors). But why? Nobody is going to take their dual socket server with over
a hundred logical processors and run a UP kernel on it.

Probably ditto for 64BIT. I didn't try the build without that. But minimum
memory on these systems is larger than supported by 32-bit kernels.

-Tony

2022-03-03 00:49:49

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 03/10] platform/x86/intel/ifs: Add driver for In-Field Scan

On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
> In-Field Scan (IFS) provides hardware hooks to perform core tests and
> report failures for portions of silicon that lack error detection
> capabilities, which will be available in some server SKUs starting with
> Sapphire Rapids. It offers infrastructure to specific users such as cloud
> providers or OEMs to schedule tests and find in-field failures due to aging
> in silicon that might not necessarily be reported with normal machine
> checks.
>
> Add basic parts of the IFS module (initialization and check IFS capability
> support in a processor).
>
> MSR IA32_CORE_CAPABILITY is a feature-enumerating MSR, bit 2 of which
> reports MSR_INTEGRITY_CAPABILITIES. Processor that supports IFS
> should reports the MSR_INTEGRITY_CAPABILITIES enabled.
>
> Please check the latest Intel 64 and IA-32 Architectures Software
> Developer's Manual for more detailed information on the MSR and the
> MSR_INTEGRITY_CAPABILITIES.
>
> Originally-by: Kyung Min Park <[email protected]>
> Signed-off-by: Jithu Joseph <[email protected]>
> Reviewed-by: Ashok Raj <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
>  MAINTAINERS                             |  7 ++++
>  drivers/platform/x86/intel/Kconfig      |  1 +
>  drivers/platform/x86/intel/Makefile     |  1 +
>  drivers/platform/x86/intel/ifs/Kconfig  |  9 +++++
>  drivers/platform/x86/intel/ifs/Makefile |  7 ++++
>  drivers/platform/x86/intel/ifs/core.c   | 49 +++++++++++++++++++++++++
>  drivers/platform/x86/intel/ifs/ifs.h    | 14 +++++++
>  7 files changed, 88 insertions(+)
>  create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
>  create mode 100644 drivers/platform/x86/intel/ifs/Makefile
>  create mode 100644 drivers/platform/x86/intel/ifs/core.c
>  create mode 100644 drivers/platform/x86/intel/ifs/ifs.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 777cd6fa2b3d..4c9912c0d725 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9685,6 +9685,13 @@ B:       https://bugzilla.kernel.org
>  T:     git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git
>  F:     drivers/idle/intel_idle.c
>  
> +INTEL IN FIELD SCAN (IFS) DRIVER
> +M:     Jithu Joseph <[email protected]>
> +R:     Ashok Raj <[email protected]>
> +R:     Tony Luck <[email protected]>
> +S:     Maintained
> +F:     drivers/platform/x86/intel/ifs
> +
>  INTEL INTEGRATED SENSOR HUB DRIVER
>  M:     Srinivas Pandruvada <[email protected]>
>  M:     Jiri Kosina <[email protected]>
> diff --git a/drivers/platform/x86/intel/Kconfig b/drivers/platform/x86/intel/Kconfig
> index 8e65086bb6c8..7339e7daf0a1 100644
> --- a/drivers/platform/x86/intel/Kconfig
> +++ b/drivers/platform/x86/intel/Kconfig
> @@ -4,6 +4,7 @@
>  #
>  
>  source "drivers/platform/x86/intel/atomisp2/Kconfig"
> +source "drivers/platform/x86/intel/ifs/Kconfig"
>  source "drivers/platform/x86/intel/int1092/Kconfig"
>  source "drivers/platform/x86/intel/int33fe/Kconfig"
>  source "drivers/platform/x86/intel/int3472/Kconfig"
> diff --git a/drivers/platform/x86/intel/Makefile b/drivers/platform/x86/intel/Makefile
> index 35f2066578b2..bd7f2ef5e767 100644
> --- a/drivers/platform/x86/intel/Makefile
> +++ b/drivers/platform/x86/intel/Makefile
> @@ -5,6 +5,7 @@
>  #
>  
>  obj-$(CONFIG_INTEL_ATOMISP2_PDX86)     += atomisp2/
> +obj-$(CONFIG_INTEL_IFS)                        += ifs/
>  obj-$(CONFIG_INTEL_SAR_INT1092)                += int1092/
>  obj-$(CONFIG_INTEL_CHT_INT33FE)                += int33fe/
>  obj-$(CONFIG_INTEL_SKL_INT3472)                += int3472/
> diff --git a/drivers/platform/x86/intel/ifs/Kconfig b/drivers/platform/x86/intel/ifs/Kconfig
> new file mode 100644
> index 000000000000..88e3d4fa1759
> --- /dev/null
> +++ b/drivers/platform/x86/intel/ifs/Kconfig
> @@ -0,0 +1,9 @@
> +config INTEL_IFS
> +       tristate "Intel In Field Scan"
> +       depends on X86 && 64BIT && SMP

Are there actual CONFIG_SMP and CONFIG_64BIT compilation dependencies
in this driver? It looks like this could compile without those config
dependencies.

> +       help
> +         Enable support for In Field Scan in Intel CPU to perform core
> +         logic test in the field. To compile this driver as a module, choose
> +         M here. The module will be called intel_ifs.
> +
> +         If unsure, say N.
> diff --git a/drivers/platform/x86/intel/ifs/Makefile b/drivers/platform/x86/intel/ifs/Makefile
> new file mode 100644
> index 000000000000..05b4925402b4
> --- /dev/null
> +++ b/drivers/platform/x86/intel/ifs/Makefile
> @@ -0,0 +1,7 @@
> +#
> +# Makefile for the In-Field Scan driver
> +#
> +
> +obj-$(CONFIG_INTEL_IFS)                        += intel_ifs.o
> +
> +intel_ifs-objs                         := core.o
> diff --git a/drivers/platform/x86/intel/ifs/core.c b/drivers/platform/x86/intel/ifs/core.c
> new file mode 100644
> index 000000000000..fb3c864d3085
> --- /dev/null
> +++ b/drivers/platform/x86/intel/ifs/core.c
> @@ -0,0 +1,49 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2021 Intel Corporation.
> + *
> + * Author: Jithu Joseph <[email protected]>

I don't see any need to include author info in source files, that's
what 'git blame' is for.

> + */
> +
> +#include <linux/module.h>
> +#include <asm/cpu_device_id.h>
> +
> +#include "ifs.h"
> +
> +#define X86_MATCH(model)                                       \
> +       X86_MATCH_VENDOR_FAM_MODEL_FEATURE(INTEL, 6,            \
> +               INTEL_FAM6_##model, X86_FEATURE_CORE_CAPABILITIES, NULL)
> +
> +static const struct x86_cpu_id ifs_cpu_ids[] __initconst = {
> +       X86_MATCH(SAPPHIRERAPIDS_X),
> +       {}
> +};
> +
> +MODULE_DEVICE_TABLE(x86cpu, ifs_cpu_ids);
> +
> +static int __init ifs_init(void)
> +{
> +       const struct x86_cpu_id *m;
> +       u64 ia32_core_caps;
> +
> +       /* ifs capability check */
> +       m = x86_match_cpu(ifs_cpu_ids);
> +       if (!m)
> +               return -ENODEV;
> +       if (rdmsrl_safe(MSR_IA32_CORE_CAPS, &ia32_core_caps))
> +               return -ENODEV;
> +       if (!(ia32_core_caps & MSR_IA32_CORE_CAPS_INTEGRITY))
> +               return -ENODEV;
> +
> +       return 0;
> +}
> +
> +static void __exit ifs_exit(void)
> +{
> +       pr_info("unloaded 'In-Field Scan' module\n");
> +}
> +
> +MODULE_LICENSE("GPL");
> +MODULE_INFO(name, "ifs");
> +MODULE_DESCRIPTION("ifs");

Just omit MODULE_INFO and MODULE_DESCRIPTION if nothing of importance
needs to be added.

> +module_init(ifs_init);
> +module_exit(ifs_exit);
> diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h
> new file mode 100644
> index 000000000000..f3f924fced06
> --- /dev/null
> +++ b/drivers/platform/x86/intel/ifs/ifs.h
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright(c) 2021 Intel Corporation.
> + *
> + * Author: Jithu Joseph <[email protected]>
> + */
> +
> +#ifndef _IFS_H_
> +#define _IFS_H_
> +
> +/* These bits are in the IA32_CORE_CAPABILITIES MSR */
> +#define MSR_IA32_CORE_CAPS_INTEGRITY_BIT       2
> +#define MSR_IA32_CORE_CAPS_INTEGRITY           BIT(MSR_IA32_CORE_CAPS_INTEGRITY_BIT)

Is this header going to grow any more definitions? Otherwise these 2
lines can just move into the source file.

2022-03-03 01:36:21

by Joseph, Jithu

[permalink] [raw]
Subject: Re: [RFC 01/10] x86/microcode/intel: expose collect_cpu_info_early() for IFS



On 3/2/2022 2:30 AM, Borislav Petkov wrote:
> On Tue, Mar 01, 2022 at 11:54:48AM -0800, Jithu Joseph wrote:
>> @@ -58,6 +58,7 @@ static inline bool cpu_signatures_match(unsigned int s1, unsigned int p1,
>> /* ... or they intersect. */
>> return p1 & p2;
>> }
>> +EXPORT_SYMBOL_GPL(cpu_signatures_match);
...
>
> Hell, even your function signatures don't match.
>
> And then, with that fixed it would build but then one would wonder a
> long time why that ifs thing doesn't work. Well:
>
> # CONFIG_MICROCODE is not set

Thanks for reviewing the patch and pointing out the mistakes of not setting the module build dependency(on CONFIG_MICROCODE) and that of signature mismatch

>
> So the proper thing to do is to extract the generic functionality for
> comparing microcode patch signatures and for gathering the information
> collect_cpu_info_early() collects, into intel-generic functions, like
> I've done with intel_get_microcode_revision(), move that functionality
> to arch/x86/kernel/cpu/intel.c and then use it *both* in the microcode
> loader *and* your driver.
>

Will move these two function definitions from arch/x86/kernel/cpu/microcode/intel.c to arch/x86/kernel/cpu/intel.c in the next revision as you suggest above.
Need to find an appropriate header for these moved functions too.


Jithu

2022-03-03 02:07:03

by Joseph, Jithu

[permalink] [raw]
Subject: Re: [RFC 03/10] platform/x86/intel/ifs: Add driver for In-Field Scan



On 3/2/2022 3:24 PM, Williams, Dan J wrote:

>
> I don't see any need to include author info in source files, that's
> what 'git blame' is for.

Noted. Will remove it then


>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_INFO(name, "ifs");
>> +MODULE_DESCRIPTION("ifs");
>
> Just omit MODULE_INFO and MODULE_DESCRIPTION if nothing of importance
> needs to be added.

Will try to be more informative and descriptive

>
>> +module_init(ifs_init);
>> +module_exit(ifs_exit);
>> diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h
>> new file mode 100644
>> index 000000000000..f3f924fced06
>> --- /dev/null
>> +++ b/drivers/platform/x86/intel/ifs/ifs.h
>> @@ -0,0 +1,14 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/* Copyright(c) 2021 Intel Corporation.
>> + *
>> + * Author: Jithu Joseph <[email protected]>
>> + */
>> +
>> +#ifndef _IFS_H_
>> +#define _IFS_H_
>> +
>> +/* These bits are in the IA32_CORE_CAPABILITIES MSR */
>> +#define MSR_IA32_CORE_CAPS_INTEGRITY_BIT       2
>> +#define MSR_IA32_CORE_CAPS_INTEGRITY           BIT(MSR_IA32_CORE_CAPS_INTEGRITY_BIT)
>
> Is this header going to grow any more definitions? Otherwise these 2
> lines can just move into the source file.

Subsequent patches adds more definitions to this header.


Jithu

2022-03-03 03:00:40

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 04/10] platform/x86/intel/ifs: Load IFS Image

On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
> IFS uses a test image that can be regarded as firmware. The image is
> specific to a processor family, model and stepping. IFS requires that a
> test image be loaded before any ifs test is initiated. Load the image
> that matches processor signature. The IFS image is signed by Intel.
>
> The IFS image file follows a similar naming convention as used for
> Intel CPU microcode files. The file must be located in the firmware
> directory where the microcode files are placed and named as {family/model
> /stepping}.scan as below:
>
> /lib/firmware/intel/ifs/{ff-mm-ss}.scan
>
> Originally-by: Kyung Min Park <[email protected]>
> Signed-off-by: Jithu Joseph <[email protected]>

Sender signoff last, please.

> Reviewed-by: Ashok Raj <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
>  drivers/platform/x86/intel/ifs/Makefile |  2 +-
>  drivers/platform/x86/intel/ifs/core.c   |  8 +++
>  drivers/platform/x86/intel/ifs/ifs.h    | 13 ++++
>  drivers/platform/x86/intel/ifs/load.c   | 95 +++++++++++++++++++++++++
>  4 files changed, 117 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/platform/x86/intel/ifs/load.c
>
> diff --git a/drivers/platform/x86/intel/ifs/Makefile b/drivers/platform/x86/intel/ifs/Makefile
> index 05b4925402b4..105b377de410 100644
> --- a/drivers/platform/x86/intel/ifs/Makefile
> +++ b/drivers/platform/x86/intel/ifs/Makefile
> @@ -4,4 +4,4 @@
>  
>  obj-$(CONFIG_INTEL_IFS)                        += intel_ifs.o
>  
> -intel_ifs-objs                         := core.o
> +intel_ifs-objs                         := core.o load.o
> diff --git a/drivers/platform/x86/intel/ifs/core.c b/drivers/platform/x86/intel/ifs/core.c
> index fb3c864d3085..765d9a2c4683 100644
> --- a/drivers/platform/x86/intel/ifs/core.c
> +++ b/drivers/platform/x86/intel/ifs/core.c
> @@ -8,6 +8,7 @@
>  #include <asm/cpu_device_id.h>
>  
>  #include "ifs.h"
> +struct ifs_params ifs_params;
>  
>  #define X86_MATCH(model)                                       \
>         X86_MATCH_VENDOR_FAM_MODEL_FEATURE(INTEL, 6,            \
> @@ -24,6 +25,7 @@ static int __init ifs_init(void)
>  {
>         const struct x86_cpu_id *m;
>         u64 ia32_core_caps;
> +       int ret;
>  
>         /* ifs capability check */
>         m = x86_match_cpu(ifs_cpu_ids);
> @@ -34,6 +36,12 @@ static int __init ifs_init(void)
>         if (!(ia32_core_caps & MSR_IA32_CORE_CAPS_INTEGRITY))
>                 return -ENODEV;
>  
> +       ret = load_ifs_binary();
> +       if (ret) {
> +               pr_err("loading ifs binaries failed\n");

What's wrong the error code returned to echo, why spam the log?

Similar comment I forgot to add on the pr_info() upon unloading the
module in the previous patch. What's wrong with the error code returned
to "modprobe -r"?

> +               return ret;
> +       }
> +
>         return 0;
>  }
>  
> diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h
> index f3f924fced06..f2daf2cfd3e6 100644
> --- a/drivers/platform/x86/intel/ifs/ifs.h
> +++ b/drivers/platform/x86/intel/ifs/ifs.h
> @@ -7,8 +7,21 @@
>  #ifndef _IFS_H_
>  #define _IFS_H_
>  
> +#undef pr_fmt
> +#define pr_fmt(fmt) "ifs: " fmt

It's unfortunate that this is needed when dev_{err,info,dbg} family of
functions would scope the messages appropriately automatically. If only
there was a 'struct device' this driver could reference. More on this
below...

> +
>  /* These bits are in the IA32_CORE_CAPABILITIES MSR */
>  #define MSR_IA32_CORE_CAPS_INTEGRITY_BIT       2
>  #define MSR_IA32_CORE_CAPS_INTEGRITY           BIT(MSR_IA32_CORE_CAPS_INTEGRITY_BIT)
>  
> +/**
> + * struct ifs_params - global ifs parameter for all cpus.
> + * @loaded_version: stores the currently loaded ifs image version.
> + */
> +struct ifs_params {
> +       int loaded_version;
> +};
> +
> +int load_ifs_binary(void);
> +extern struct ifs_params ifs_params;
>  #endif
> diff --git a/drivers/platform/x86/intel/ifs/load.c b/drivers/platform/x86/intel/ifs/load.c
> new file mode 100644
> index 000000000000..1a5e906c51af
> --- /dev/null
> +++ b/drivers/platform/x86/intel/ifs/load.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2021 Intel Corporation.
> + *
> + * Author: Jithu Joseph <[email protected]>
> + */
> +
> +#include <linux/firmware.h>
> +#include <linux/platform_device.h>
> +
> +#include "ifs.h"
> +static const char *ifs_path = "intel/ifs/";
> +
> +struct ifs_header {
> +       u32 header_ver;
> +       u32 blob_revision;
> +       u32 date;
> +       u32 processor_sig;
> +       u32 check_sum;
> +       u32 loader_rev;
> +       u32 processor_flags;
> +       u32 metadata_size;
> +       u32 total_size;
> +       u32 fusa_info;
> +       u64 reserved;
> +};
> +
> +#define IFS_HEADER_SIZE        (sizeof(struct ifs_header))
> +static struct ifs_header *ifs_header_ptr;      /* pointer to the ifs image header */
> +static u64 ifs_hash_ptr;                       /* Address of ifs metadata (hash) */
> +
> +static const struct firmware *load_binary(const char *path)
> +{
> +       struct platform_device *ifs_pdev;
> +       const struct firmware *fw;
> +       int err;
> +
> +       ifs_pdev = platform_device_register_simple("ifs", -1, NULL, 0);

This looks like an abuse of the platform_device_register_simple() API
to me, i.e. to register and tear down a device every time someone
echoes a value to a sysfs file. This registration process fires off a
KOBJ_ADD event to tell udev a new device has appeared, and before udev
scripts have a chance to do anything useful this device is gone again.
If I were someone that wanted to automate testing my CPUs as uevent
notifying me when the "ifs" device has arrived would be useful.

If ifs_init() registered a platform device to represent the ifs
interface it would also provide a private place to hang the ifs sysfs
interface rather than glomming onto the /sys/devices/system/cpu core
ABI. I personally don't think ifs functionality belongs under
/sys/devices/system/cpu. Also, with statically defined sysfs attributes
automation scripts would be able to safely assume that the sysfs
interface is ready coincident with the KOBJ_ADD event.



2022-03-03 04:59:27

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 07/10] platform/x86/intel/ifs: Create kthreads for online cpus for scan test

On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
> Create scan kthreads for online logical cpus. Once scan test is triggered,
> it wakes up the corresponding thread and its sibling threads to execute
> the test. Once the scan test is done, the threads go back to thread wait
> for next signal to start a new scan.
>
> In a core, the scan engine is shared between siblings. When a scan test
> is triggered on a core, all the siblings rendezvous before the test execution.
> The scan results are same for all siblings.
>
> Scan may be aborted by some reasons. Scan test will be aborted in certain
> circumstances such as when interrupt occurred or cpu does not have enough
> power budget for scan. In this case, the kernel restart scan from the chunk
> where it stopped. Scan will also be aborted when the test is failed. In
> this case, the test is immediately stopped without retry.

This needs to explain why existing kernel facilities for running code
on a given target CPU are insufficient and a new open-coded thread-pool
needed to be created. For example, what's wrong with queue_work_on()?

>
> Originally-by: Kyung Min Park <[email protected]>
> Signed-off-by: Jithu Joseph <[email protected]>
> Reviewed-by: Ashok Raj <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
>  drivers/platform/x86/intel/ifs/core.c | 317 ++++++++++++++++++++++++++
>  drivers/platform/x86/intel/ifs/ifs.h  |  91 ++++++++
>  2 files changed, 408 insertions(+)
>
> diff --git a/drivers/platform/x86/intel/ifs/core.c b/drivers/platform/x86/intel/ifs/core.c
> index 765d9a2c4683..6747b523587a 100644
> --- a/drivers/platform/x86/intel/ifs/core.c
> +++ b/drivers/platform/x86/intel/ifs/core.c
> @@ -4,11 +4,38 @@
>   * Author: Jithu Joseph <[email protected]>
>   */
>  
> +#include <linux/cpu.h>
> +#include <linux/delay.h>
> +#include <linux/kthread.h>
>  #include <linux/module.h>
> +#include <linux/nmi.h>
>  #include <asm/cpu_device_id.h>
>  
>  #include "ifs.h"
> +
> +static enum cpuhp_state cpuhp_scan_state;
>  struct ifs_params ifs_params;
> +int cpu_sibl_ct;
> +atomic_t siblings_in;  /* sibling count for joining rendezvous.*/
> +atomic_t siblings_out; /* sibling count for exiting rendezvous.*/
> +struct completion test_thread_done; /* set when scan are done for all siblings threads.*/
> +
> +DEFINE_PER_CPU(struct ifs_state, ifs_state);
> +
> +static int ifs_retry_set(const char *val, const struct kernel_param *kp);
> +static const struct kernel_param_ops ifs_retry_ops = {
> +       .set = ifs_retry_set,
> +       .get = param_get_int,
> +};
> +
> +static int retry = 5;
> +module_param_cb(retry, &ifs_retry_ops, &retry, 0644);
> +
> +MODULE_PARM_DESC(retry, "Maximum retry count when the test is not executed");

Why does this retry need to be in the kernel? Can't the test runner
retry the test if they want? If it stays in the kernel, why a module
parameter and not a sysfs attribute?

> +
> +static bool noint = 1;
> +module_param(noint, bool, 0644);
> +MODULE_PARM_DESC(noint, "Option to enable/disable interrupt during test");

Same sysfs vs module parameter question...

>  
>  #define X86_MATCH(model)                                       \
>         X86_MATCH_VENDOR_FAM_MODEL_FEATURE(INTEL, 6,            \
> @@ -21,6 +48,273 @@ static const struct x86_cpu_id ifs_cpu_ids[] __initconst = {
>  
>  MODULE_DEVICE_TABLE(x86cpu, ifs_cpu_ids);
>  
> +static int ifs_retry_set(const char *val, const struct kernel_param *kp)
> +{
> +       int var = 0;
> +
> +       if (kstrtoint(val, 0, &var)) {
> +               pr_err("unable to parse retry\n");
> +               return -EINVAL;
> +       }
> +
> +       /* validate retry value for sanity */
> +       if (var < 1 || var > 20) {
> +               pr_err("retry parameter should be between 1 and 20\n");
> +               return -EINVAL;
> +       }
> +
> +       return param_set_int(val, kp);
> +}
> +
> +static unsigned long msec_to_tsc(unsigned long msec)
> +{
> +       return tsc_khz * 1000 * msec / MSEC_PER_SEC;
> +}
> +
> +static const char * const scan_test_status[] = {
> +       "SCAN no error",
> +       "Other thread could not join.",
> +       "Interrupt occurred prior to SCAN coordination.",
> +       "Core Abort SCAN Response due to power management condition.",
> +       "Non valid chunks in the range",
> +       "Mismatch in arguments between threads T0/T1.",
> +       "Core not capable of performing SCAN currently",
> +       "Unassigned error code 0x7",
> +       "Exceeded number of Logical Processors (LP) allowed to run Scan-At-Field concurrently",
> +       "Interrupt occurred prior to SCAN start",

This looks unmaintainable...

/me finds large comment block around IFS_* error codes and suggests
killing 2 birds with one stone, i.e. delete that comment and make this
self documenting:

static const char * const scan_test_status[] = {
[IFS_NO_ERROR] = "SCAN no error",
[IFS_OTHER_THREAD_DID_NOT_JOIN] = "Other thread could not join.",
...etc...
};

Btw, which is it "did not join" and "could not join"? If the symbol
name is going to be that long might as well make it match the log
message verbatim.

That way you can add / delete error codes without wondering if you
managed to match the code number to the right position in the array.

Even better would be to kick this out of the kernel and let the user
tool translate the error codes to test result log messages.

> +};
> +
> +static void message_not_tested(int cpu, union ifs_status status)
> +{
> +       if (status.error_code < ARRAY_SIZE(scan_test_status))
> +               pr_warn("CPU %d: SCAN operation did not start. %s\n", cpu,
> +                       scan_test_status[status.error_code]);
> +       else if (status.error_code == IFS_SW_TIMEOUT)
> +               pr_warn("CPU %d: software timeout during scan\n", cpu);
> +       else if (status.error_code == IFS_SW_PARTIAL_COMPLETION)
> +               pr_warn("CPU %d: %s\n", cpu,
> +                       "Not all scan chunks were executed. Maximum forward progress retries exceeded");

Why are these codes not in the scan_test_status set? I see that
IFS_SW_PARTIAL_COMPLETION and IFS_SW_TIMEOUT are defined with larger
values, but why?

> +       else
> +               pr_warn("CPU %d: SCAN unknown status %llx\n", cpu, status.data);
> +}
> +
> +static void message_fail(int cpu, union ifs_status status)
> +{
> +       if (status.control_error) {
> +               pr_err("CPU %d: scan failed. %s\n", cpu,
> +                      "Suggest reload scan file: # echo 1 > /sys/devices/system/cpu/ifs/reload");
> +       }
> +       if (status.signature_error) {
> +               pr_err("CPU %d: test signature incorrect. %s\n", cpu,
> +                      "Suggest retry scan to check if problem is transient");
> +       }

This looks and feels more like tools/testing/selftests/ style code
printing information for a kernel developer to read. For a production
capability I would expect these debug messages to be elided and have an
ifs user tool that knows when to "Suggest reload scan file". Otherwise,
it's not scalable to use the kernel log buffer like a utility's stdout.

> +}
> +
> +static bool can_restart(union ifs_status status)
> +{
> +       /* Signature for chunk is bad, or scan test failed */
> +       if (status.signature_error || status.control_error)
> +               return false;
> +
> +       switch (status.error_code) {
> +       case IFS_POWER_MGMT_INADEQUATE_FOR_SCAN:
> +               mdelay(1);
> +               fallthrough;
> +       case IFS_NO_ERROR:
> +       case IFS_OTHER_THREAD_DID_NOT_JOIN:
> +       case IFS_INTERRUPTED_BEFORE_RENDEZVOUS:
> +       case IFS_EXCEED_NUMBER_OF_THREADS_CONCURRENT:
> +       case IFS_INTERRUPTED_DURING_EXECUTION:
> +               return true;

If the error codes were an enum compiler would be able to check here
that all possible codes were handled, as is this needs manual review
which is a maintenance tax.

> +       }
> +       return false;
> +}
> +
> +static bool wait_for_siblings(atomic_t *t, long long timeout)
> +{
> +       atomic_inc(t);
> +       while (atomic_read(t) < cpu_sibl_ct) {
> +               if (timeout < SPINUNIT) {
> +                       pr_err("Timeout while waiting for CPUs rendezvous, remaining: %d\n",
> +                              cpu_sibl_ct - atomic_read(t));
> +                       return false;
> +               }
> +
> +               ndelay(SPINUNIT);
> +               timeout -= SPINUNIT;
> +
> +               touch_nmi_watchdog();

Live spin-waiting for threads to rendevous with active defeat of the
lockup detector? Is this attempting to replicate an open coded SMM
entry?

> +       }
> +
> +       return true;
> +}
> +
> +/*
> + * Scan test kthreads bound with each logical cpu.
> + * Wait for the sibling thread to join before the execution.
> + * Execute the scan test by running wrmsr(MSR_ACTIVATE_SCAN).
> + */
> +static int scan_test_worker(void *info)
> +{
> +       int cpu = smp_processor_id();
> +       union ifs_scan activate;
> +       union ifs_status status;
> +       unsigned long timeout;
> +       int retries;
> +       u32 first;
> +
> +       activate.rsvd = 0;
> +       activate.delay = msec_to_tsc(THREAD_WAIT);
> +       activate.sigmce = 0;
> +
> +       while (1) {
> +               /* wait event until cpumask set from user */
> +               wait_event_interruptible(per_cpu(ifs_state, cpu).scan_wq,
> +                                        (cpumask_test_cpu(cpu, &per_cpu(ifs_state, cpu).mask) ||
> +                                        kthread_should_stop()));
> +
> +               if (kthread_should_stop())
> +                       break;
> +
> +               /*
> +                * Need to get (and keep) the threads on this core executing close together
> +                * so that the writes to MSR_ACTIVATE_SCAN below will succeed in entering
> +                * IFS test mode on this core. Interrupts on each thread are expected to be
> +                * brief. But preemption would be a problem.

What is this requirement to try to synchronize CPU execution? Comments
should explain the "why", the code usually explains the "what".

> +                */
> +               preempt_disable();
> +
> +               /* wait for the sibling threads to join */
> +               first = cpumask_first(topology_sibling_cpumask(cpu));
> +               if (!wait_for_siblings(&siblings_in, NSEC_PER_SEC)) {
> +                       preempt_enable();
> +                       return -1;
> +               }
> +
> +               activate.start = 0;
> +               activate.stop = ifs_params.valid_chunks - 1;
> +               timeout = jiffies + HZ / 2;
> +               retries = retry;
> +
> +               while (activate.start <= activate.stop) {
> +                       if (time_after(jiffies, timeout)) {
> +                               status.error_code = IFS_SW_TIMEOUT;
> +                               break;
> +                       }
> +
> +                       if (noint)
> +                               local_irq_disable();
> +                       /* scan start */
> +                       wrmsrl(MSR_ACTIVATE_SCAN, activate.data);
> +
> +                       if (noint)
> +                               local_irq_enable();
> +
> +                       /*
> +                        * All logical CPUs on this core are now running IFS test. When it completes
> +                        * execution or is interrupted, the following RDMSR gets the scan status.
> +                        */
> +
> +                       rdmsrl(MSR_SCAN_STATUS, status.data);
> +
> +                       /* Some cases can be retried, give up for others */
> +                       if (!can_restart(status))
> +                               break;
> +
> +                       if (status.chunk_num == activate.start) {
> +                               /* Check for forward progress */
> +                               if (retries-- == 0) {
> +                                       if (status.error_code == IFS_NO_ERROR)
> +                                               status.error_code = IFS_SW_PARTIAL_COMPLETION;
> +                                       break;
> +                               }
> +                       } else {
> +                               retries = retry;
> +                               activate.start = status.chunk_num;
> +                       }
> +               }
> +
> +               preempt_enable();
> +
> +               /* Update status for this core */
> +               per_cpu(ifs_state, cpu).scan_details = status.data;
> +
> +               if (status.control_error || status.signature_error) {
> +                       per_cpu(ifs_state, cpu).status = SCAN_TEST_FAIL;
> +                       cpumask_set_cpu(cpu, &ifs_params.fail_mask);
> +                       cpumask_clear_cpu(cpu, &ifs_params.not_tested_mask);
> +                       cpumask_clear_cpu(cpu, &ifs_params.pass_mask);
> +                       message_fail(cpu, status);
> +               } else if (status.error_code) {
> +                       per_cpu(ifs_state, cpu).status = SCAN_NOT_TESTED;
> +                       cpumask_set_cpu(cpu, &ifs_params.not_tested_mask);
> +                       cpumask_clear_cpu(cpu, &ifs_params.fail_mask);
> +                       cpumask_clear_cpu(cpu, &ifs_params.pass_mask);
> +                       message_not_tested(cpu, status);
> +               } else {
> +                       per_cpu(ifs_state, cpu).status = SCAN_TEST_PASS;
> +                       cpumask_set_cpu(cpu, &ifs_params.pass_mask);
> +                       cpumask_clear_cpu(cpu, &ifs_params.not_tested_mask);
> +                       cpumask_clear_cpu(cpu, &ifs_params.fail_mask);
> +               }
> +
> +               cpumask_clear_cpu(cpu, &per_cpu(ifs_state, cpu).mask);
> +
> +               if (!wait_for_siblings(&siblings_out, NSEC_PER_SEC))
> +                       return -1;
> +
> +               if (cpu == first)
> +                       complete(&test_thread_done);
> +       }
> +
> +       return 0;
> +}
> +
> +static void ifs_first_time(unsigned int cpu)
> +{
> +       init_waitqueue_head(&(per_cpu(ifs_state, cpu).scan_wq));
> +
> +       per_cpu(ifs_state, cpu).first_time = 1;
> +       per_cpu(ifs_state, cpu).status = SCAN_NOT_TESTED;
> +       cpumask_set_cpu(cpu, &ifs_params.not_tested_mask);
> +       cpumask_clear_cpu(cpu, &ifs_params.fail_mask);
> +       cpumask_clear_cpu(cpu, &ifs_params.pass_mask);
> +}
> +
> +static int ifs_online_cpu(unsigned int cpu)
> +{
> +       /* If the CPU is coming online for the first time*/
> +       if (per_cpu(ifs_state, cpu).first_time == 0)
> +               ifs_first_time(cpu);
> +
> +       cpumask_clear_cpu(cpu, &(per_cpu(ifs_state, cpu).mask));
> +
> +       per_cpu(ifs_state, cpu).scan_task = kthread_create_on_node(scan_test_worker, (void *)&cpu,
> +                                                                  cpu_to_node(cpu), "ifsCpu/%u",
> +                                                                  cpu);
> +       if (IS_ERR(per_cpu(ifs_state, cpu).scan_task)) {
> +               pr_err("scan_test_worker task create failed\n");
> +               return PTR_ERR(per_cpu(ifs_state, cpu).scan_task);
> +       }
> +       kthread_bind(per_cpu(ifs_state, cpu).scan_task, cpu);
> +       wake_up_process(per_cpu(ifs_state, cpu).scan_task);
> +
> +       return 0;
> +}
> +
> +static int ifs_offline_cpu(unsigned int cpu)
> +{
> +       struct task_struct *thread;
> +
> +       thread = per_cpu(ifs_state, cpu).scan_task;
> +       per_cpu(ifs_state, cpu).scan_task = NULL;
> +
> +       if (thread)
> +               kthread_stop(thread);

Per-cpu workqueues already handle cpu-online / offline and as far as I
can see a dedicated workqueue for ifs could allow you jettison some of
this infrastructure code.

> +
> +       return 0;
> +}
> +
>  static int __init ifs_init(void)
>  {
>         const struct x86_cpu_id *m;
> @@ -42,11 +336,34 @@ static int __init ifs_init(void)
>                 return ret;
>         }
>  
> +       init_completion(&test_thread_done);
> +       ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/ifs:online",
> +                               ifs_online_cpu, ifs_offline_cpu);
> +
> +       if (ret < 0) {
> +               pr_err("cpuhp_setup_failed\n");
> +               return ret;
> +       }
> +       cpuhp_scan_state = ret;
> +
>         return 0;
>  }
>  
>  static void __exit ifs_exit(void)
>  {
> +       struct task_struct *thread;
> +       int cpu;
> +
> +       cpus_read_lock();
> +       for_each_online_cpu(cpu) {
> +               thread = per_cpu(ifs_state, cpu).scan_task;
> +               per_cpu(ifs_state, cpu).scan_task = NULL;
> +               if (thread)
> +                       kthread_stop(thread);
> +       }
> +       cpus_read_unlock();
> +       cpuhp_remove_state(cpuhp_scan_state);
> +
>         pr_info("unloaded 'In-Field Scan' module\n");
>  }
>  
> diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h
> index 8f9abdb304b0..fcbbb49faa19 100644
> --- a/drivers/platform/x86/intel/ifs/ifs.h
> +++ b/drivers/platform/x86/intel/ifs/ifs.h
> @@ -18,6 +18,13 @@
>  #define MSR_SCAN_HASHES_STATUS                 0x000002c3
>  #define MSR_AUTHENTICATE_AND_COPY_CHUNK                0x000002c4
>  #define MSR_CHUNKS_AUTHENTICATION_STATUS       0x000002c5
> +#define MSR_ACTIVATE_SCAN                      0x000002c6
> +#define MSR_SCAN_STATUS                                0x000002c7
> +#define SCAN_TEST_PASS                         0
> +#define SCAN_TEST_FAIL                         1
> +#define SCAN_NOT_TESTED                                2
> +#define SPINUNIT                               100
> +#define THREAD_WAIT                            5
>  
>  /* MSR_SCAN_HASHES_STATUS bit fields */
>  union ifs_scan_hashes_status {
> @@ -45,16 +52,100 @@ union ifs_chunks_auth_status {
>         };
>  };
>  
> +/* MSR_ACTIVATE_SCAN bit fields */
> +union ifs_scan {
> +       u64     data;
> +       struct {
> +               u64     start   :8;
> +               u64     stop    :8;
> +               u64     rsvd    :16;
> +               u64     delay   :31;
> +               u64     sigmce  :1;
> +       };
> +};
> +
> +/* MSR_SCAN_STATUS bit fields */
> +union ifs_status {
> +       u64     data;
> +       struct {
> +               u64     chunk_num               :8;
> +               u64     chunk_stop_index        :8;
> +               u64     rsvd1                   :16;
> +               u64     error_code              :8;
> +               u64     rsvd2                   :22;
> +               u64     control_error           :1;
> +               u64     signature_error         :1;
> +       };
> +};
> +
> +/*
> + * ifs_status.error_code values after rdmsr(SCAN_STATUS)
> + * 0x0: no error.
> + * 0x1: scan did not start because all sibling threads did not join.
> + * 0x2: scan did not start because interrupt occurred prior to threads rendezvous
> + * 0x3: scan did not start because power management conditions are inadequate.
> + * 0x4: scan did not start because non-valid chunks in range stop_index:start_index.
> + * 0x5: scan did not start because of mismatches in arguments between sibling threads.
> + * 0x6: scan did not start because core is not capable of performing scan currently.
> + * 0x7: not assigned.
> + * 0x8: scan did not start because of exceed number of concurrent CPUs attempt to run scan.
> + * 0x9: interrupt occurred. Scan operation aborted prematurely. Not all chunks executed.
> + * 0xFE: not all scan chunks were executed. Maximum forward progress retries exceeded.
> + *      This is a driver populated error-code as hardware returns success in this scenario.
> + */
> +#define IFS_NO_ERROR                           0x0
> +#define IFS_OTHER_THREAD_DID_NOT_JOIN          0x1
> +#define IFS_INTERRUPTED_BEFORE_RENDEZVOUS      0x2
> +#define IFS_POWER_MGMT_INADEQUATE_FOR_SCAN     0x3
> +#define IFS_INVALID_CHUNK_RANGE                        0x4
> +#define IFS_MISMATCH_ARGUMENTS_BETWEEN_THREADS 0x5
> +#define IFS_CORE_NOT_CAPABLE_CURRENTLY         0x6
> +/* Code 0x7 not assigned */
> +#define IFS_EXCEED_NUMBER_OF_THREADS_CONCURRENT        0x8
> +#define IFS_INTERRUPTED_DURING_EXECUTION       0x9
> +
> +#define IFS_SW_TIMEOUT                         0xFD
> +#define IFS_SW_PARTIAL_COMPLETION              0xFE
> +
>  /**
>   * struct ifs_params - global ifs parameter for all cpus.
>   * @loaded_version: stores the currently loaded ifs image version.
>   * @valid_chunks: number of chunks which could be validated.
> + * @fail_mask: stores the cpus which failed the scan.
> + * @not_tested_mask: stores the cpus which have not been tested.
>   */
>  struct ifs_params {
>         int loaded_version;
>         int valid_chunks;
> +       struct cpumask fail_mask;
> +       struct cpumask pass_mask;
> +       struct cpumask not_tested_mask;
>  };
>  
> +/**
> + * struct ifs_state - per-cpu ifs parameter.
> + * @scan_task: scan_task for kthread to run scan test on each cpu.
> + * @first_time: to track if cpu is coming online for the first time.
> + * @status: it holds simple status pass/fail/untested.
> + * @scan_details: opaque scan status code from h/w.
> + * @scan_wq: kthread task wait queue.
> + * @mask: triggering the test by setting the mask.
> + */
> +struct ifs_state {
> +       struct task_struct *scan_task;
> +       int first_time;
> +       int status;
> +       u64 scan_details;
> +       wait_queue_head_t scan_wq;
> +       struct cpumask mask;
> +};
> +
> +DECLARE_PER_CPU(struct ifs_state, ifs_state);
> +
>  int load_ifs_binary(void);
>  extern struct ifs_params ifs_params;
> +extern atomic_t siblings_in;
> +extern atomic_t siblings_out;
> +extern struct completion test_thread_done;
> +extern int cpu_sibl_ct;
>  #endif

2022-03-03 23:06:12

by Luck, Tony

[permalink] [raw]
Subject: Re: [RFC 07/10] platform/x86/intel/ifs: Create kthreads for online cpus for scan test

On Wed, Mar 02, 2022 at 08:17:32PM -0800, Williams, Dan J wrote:

> What is this requirement to try to synchronize CPU execution? Comments
> should explain the "why", the code usually explains the "what".

I need to put some more bits into the Documentation/x86/ifs.rst
because if I had explained the IFS feature better there, you wouldn't
have had to ask this (and some other questions).

IFS works on one CORE at a time. But with HT enabled there are two
logical CPUs that are on that core.

Entering IFS test mode is done by all HT threads writing to the
ACTIVATE_SCAN MSR "together". The microcode for that MSR will
make the first logical CPU to write wait for a while. User can
choose how many cycles to wait with some of the bits in the value
written to the MSR ... in this driver we hard coded 5 milli-seconds.
That seemed plenty to allow for the bottom half of interrupts, or
a perf NMI to knock the HT threads a little bit out of sync with
each other.

So the code flow when running a test is to wake the kthreads for
the logical CPUs that share the core. The threads may wake at
different times, so there is the software sync to get them close
enough.

Then comes the loop to execute the test ... it is a loop because
the core may not complete all "chunks" in one ACTIVATE_SCAN
MSR write ... if it doesn't, the loop restarts execution from
the chunk where IFS execution stopped.

Interrupts are only blocked during the ACTIVATE_SCAN to
increase the chances of completion. But preemption is
disabled for the whole loop so that the threads won't
get too far out of sync.

-Tony

2022-03-04 06:35:26

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
> Implement sysfs interface to trigger ifs test for a targeted core or
> all cores. For all core testing, the kernel will start testing from core 0
> and proceed to the next core one after another. After the ifs test on the
> last core, the test stops until the administrator starts another round of
> tests. A "targeted core" test runs a single ifs on a single core. The
> kernel will only test the target core.
>
> The basic usage is as below.
>
> 1. For all cores testing:
>    echo 1 > /sys/devices/system/cpu/ifs/run_test
>    cat /sys/devices/system/cpu/ifs/status
>
> 2. For "targeted core" testing:
>    To start test, for example, cpu0:
>    echo 1 > /sys/devices/system/cpu/cpu#/ifs/run_test
>    cat /sys/devices/system/cpu/cpu#/ifs/status
>
> 3. For reloading IFS image: (e.g, when new IFS image is released)
>    - copy the new image to /lib/firmware/intel/ifs/
>    - rename it as {family/model/stepping}.{testname}
>    - echo 1 > /sys/devices/system/cpu/ifs/reload
>
> This module accepts two tunable parameters. Defaults could be overridden
> by passing appropriate values during load time. The parameters are as
> described below.
>
> 1. noint: When set, system interrupts are not allowed to interrupt a scan.
> 2. retry: Maximum retry counter when the test is not executed due to an
>           event such as interrupt.
>
> Originally-by: Kyung Min Park <[email protected]>
> Signed-off-by: Jithu Joseph <[email protected]>
> Reviewed-by: Ashok Raj <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
>  drivers/platform/x86/intel/ifs/Makefile |   2 +-
>  drivers/platform/x86/intel/ifs/core.c   |   8 +
>  drivers/platform/x86/intel/ifs/ifs.h    |   4 +
>  drivers/platform/x86/intel/ifs/sysfs.c  | 394 ++++++++++++++++++++++++
>  4 files changed, 407 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/platform/x86/intel/ifs/sysfs.c
>
> diff --git a/drivers/platform/x86/intel/ifs/Makefile b/drivers/platform/x86/intel/ifs/Makefile
> index 105b377de410..a2e05bf78c3e 100644
> --- a/drivers/platform/x86/intel/ifs/Makefile
> +++ b/drivers/platform/x86/intel/ifs/Makefile
> @@ -4,4 +4,4 @@
>  
>  obj-$(CONFIG_INTEL_IFS)                        += intel_ifs.o
>  
> -intel_ifs-objs                         := core.o load.o
> +intel_ifs-objs                         := core.o load.o sysfs.o
> diff --git a/drivers/platform/x86/intel/ifs/core.c b/drivers/platform/x86/intel/ifs/core.c
> index 6747b523587a..c9ca385082e9 100644
> --- a/drivers/platform/x86/intel/ifs/core.c
> +++ b/drivers/platform/x86/intel/ifs/core.c
> @@ -283,11 +283,16 @@ static void ifs_first_time(unsigned int cpu)
>  
>  static int ifs_online_cpu(unsigned int cpu)
>  {
> +       int ret;
> +
>         /* If the CPU is coming online for the first time*/
>         if (per_cpu(ifs_state, cpu).first_time == 0)
>                 ifs_first_time(cpu);
>  
>         cpumask_clear_cpu(cpu, &(per_cpu(ifs_state, cpu).mask));
> +       ret = ifs_sysfs_create(cpu);
> +       if (ret)
> +               return ret;
>  
>         per_cpu(ifs_state, cpu).scan_task = kthread_create_on_node(scan_test_worker, (void *)&cpu,
>                                                                    cpu_to_node(cpu), "ifsCpu/%u",
> @@ -311,6 +316,7 @@ static int ifs_offline_cpu(unsigned int cpu)
>  
>         if (thread)
>                 kthread_stop(thread);
> +       ifs_sysfs_remove(cpu);
>  
>         return 0;
>  }
> @@ -336,6 +342,7 @@ static int __init ifs_init(void)
>                 return ret;
>         }
>  
> +       cpu_ifs_init();
>         init_completion(&test_thread_done);
>         ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/ifs:online",
>                                 ifs_online_cpu, ifs_offline_cpu);
> @@ -361,6 +368,7 @@ static void __exit ifs_exit(void)
>                 if (thread)
>                         kthread_stop(thread);
>         }
> +       cpu_ifs_exit();
>         cpus_read_unlock();
>         cpuhp_remove_state(cpuhp_scan_state);
>  
> diff --git a/drivers/platform/x86/intel/ifs/ifs.h b/drivers/platform/x86/intel/ifs/ifs.h
> index fcbbb49faa19..4442ccd626c6 100644
> --- a/drivers/platform/x86/intel/ifs/ifs.h
> +++ b/drivers/platform/x86/intel/ifs/ifs.h
> @@ -143,6 +143,10 @@ struct ifs_state {
>  DECLARE_PER_CPU(struct ifs_state, ifs_state);
>  
>  int load_ifs_binary(void);
> +void cpu_ifs_init(void);
> +void cpu_ifs_exit(void);
> +int ifs_sysfs_create(unsigned int cpu);
> +void ifs_sysfs_remove(unsigned int cpu);
>  extern struct ifs_params ifs_params;
>  extern atomic_t siblings_in;
>  extern atomic_t siblings_out;
> diff --git a/drivers/platform/x86/intel/ifs/sysfs.c b/drivers/platform/x86/intel/ifs/sysfs.c
> new file mode 100644
> index 000000000000..f441968de642
> --- /dev/null
> +++ b/drivers/platform/x86/intel/ifs/sysfs.c
> @@ -0,0 +1,394 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2021 Intel Corporation.
> + *
> + * Author: Jithu Joseph <[email protected]>
> + */
> +
> +#include <linux/cpu.h>
> +#include <linux/delay.h>
> +#include <linux/fs.h>
> +#include <linux/semaphore.h>
> +
> +#include "ifs.h"
> +
> +static DEFINE_SEMAPHORE(ifs_sem);
> +static int core_delay = 1;
> +static bool ifs_disabled;
> +
> +/*
> + * Initiate per core test. It wakes up all sibling threads that belongs to the
> + * target cpu. Once all sibling threads wake up, the scan test gets executed and
> + * wait for all sibling threads to finish the scan test.
> + */
> +static void do_core_test(int cpu)
> +{
> +       int sibling;
> +
> +       reinit_completion(&test_thread_done);
> +       atomic_set(&siblings_in, 0);
> +       atomic_set(&siblings_out, 0);
> +
> +       cpu_sibl_ct = cpumask_weight(topology_sibling_cpumask(cpu));
> +
> +       for_each_cpu(sibling, topology_sibling_cpumask(cpu))
> +               cpumask_set_cpu(sibling, &per_cpu(ifs_state, sibling).mask);
> +
> +       for_each_cpu(sibling, topology_sibling_cpumask(cpu))
> +               wake_up_interruptible(&per_cpu(ifs_state, sibling).scan_wq);
> +
> +       if (wait_for_completion_timeout(&test_thread_done, HZ) == 0) {
> +               pr_err("Core locked up during IFS test? IFS disabled\n");
> +               ifs_disabled = true;
> +       }
> +}
> +
> +/*
> + * The sysfs interface to check the test status:
> + * To check the result, for example, cpu0
> + * cat /sys/devices/system/cpu/cpu0/ifs/details
> + */
> +static ssize_t details_show(struct device *dev,
> +                           struct device_attribute *attr,
> +                           char *buf)
> +{
> +       unsigned int cpu = dev->id;
> +       int ret;
> +
> +       if (down_trylock(&ifs_sem))
> +               return -EBUSY;

What is the ifs_sem protecting? This result is immediately invalid
after the lock is dropped anyway, so why hold it over reading the
value? You can't prevent 2 threads racing each other here.

> +
> +       ret = sprintf(buf, "%llx\n", per_cpu(ifs_state, cpu).scan_details);

Should be sysfs_emit() which includes the page buffer safety.

Also, you likely want that format string to be %#llx so that userspace
knows explicitly that this is a hexadecimal value.


> +       up(&ifs_sem);
> +
> +       return ret;
> +}
> +
> +static DEVICE_ATTR_RO(details);
> +
> +/*
> + * The sysfs interface to check the test status:
> + * To check the status, for example, cpu0
> + * cat /sys/devices/system/cpu/cpu0/ifs/status
> + */
> +static ssize_t status_show(struct device *dev,
> +                          struct device_attribute *attr,
> +                          char *buf)
> +{
> +       unsigned int cpu = dev->id;
> +       u32 scan_result;
> +       int ret;
> +
> +       if (down_trylock(&ifs_sem))
> +               return -EBUSY;
> +
> +       scan_result = per_cpu(ifs_state, cpu).status;
> +
> +       if (scan_result == SCAN_TEST_FAIL)
> +               ret = sprintf(buf, "fail\n");
> +       else if (scan_result == SCAN_NOT_TESTED)
> +               ret = sprintf(buf, "untested\n");
> +       else
> +               ret = sprintf(buf, "pass\n");

sysfs_emit() for all of the above.

> +
> +       up(&ifs_sem);
> +
> +       return ret;
> +}
> +
> +static DEVICE_ATTR_RO(status);
> +
> +/*
> + * The sysfs interface for single core testing
> + * To start test, for example, cpu0
> + * echo 1 > /sys/devices/system/cpu/cpu0/ifs/run_test
> + * To check the result:
> + * cat /sys/devices/system/cpu/cpu0/ifs/result

Just have a CPU mask as an input parameter and avoid needing to hang
ifs sysfs attributes underneath /sys/device/system/cpu/ifs.

> + * The sibling core gets tested at the same time.
> + */
> +static ssize_t run_test_store(struct device *dev,
> +                             struct device_attribute *attr,
> +                             const char *buf, size_t count)
> +{
> +       unsigned int cpu = dev->id;
> +       bool var;
> +       int rc;
> +
> +       if (ifs_disabled)
> +               return -ENXIO;
> +
> +       rc = kstrtobool(buf, &var);
> +       if (rc < 0 || var != 1)
> +               return -EINVAL;
> +
> +       if (down_trylock(&ifs_sem)) {
> +               pr_info("another instance in progress.\n");
> +               return -EBUSY;
> +       }
> +       cpu_hotplug_disable();
> +       do_core_test(cpu);
> +       cpu_hotplug_enable();
> +       up(&ifs_sem);
> +
> +       return count;
> +}
> +
> +static DEVICE_ATTR_WO(run_test);
> +
> +/* per-cpu scan sysfs attributes */
> +static struct attribute *ifs_attrs[] = {
> +       &dev_attr_run_test.attr,
> +       &dev_attr_status.attr,
> +       &dev_attr_details.attr,
> +       NULL
> +};
> +
> +const struct attribute_group ifs_attr_group = {
> +       .attrs  = ifs_attrs,
> +       .name = "ifs",
> +};
> +
> +/* Creates the sysfs files under /sys/devices/system/cpu/cpuX/ifs */
> +int ifs_sysfs_create(unsigned int cpu)
> +{
> +       struct device *dev;
> +       int ret;
> +
> +       dev = get_cpu_device(cpu);

get_cpu_device() neither takes a reference nor does it guarantee that
the cpu device stays registered. So this looks broken.

> +       ret = sysfs_create_group(&dev->kobj, &ifs_attr_group);

Dynamic creation of sysfs attributes sometime after the driver loads is
not friendly for something that likely only needs the module loaded for
a test run and then unloaded again. I think this effort would be better
served by building a sysfs topology registered underneath an ifs
platform device.

The ifs platform device attributes can be defined statically such that
when the KOBJ_ADD event fires for the ifs platform device the sysfs
interface will already be up and ready.

With an attribute to configure the CPU mask for a test it also
eliminates the need to have per-CPU ifs/{run_test,result} files.

> +       if (ret) {
> +               pr_err("failed to create sysfs group\n");
> +               return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +/* Removes the sysfs files under /sys/devices/system/cpu/cpuX/ifs */
> +void ifs_sysfs_remove(unsigned int cpu)
> +{
> +       struct device *dev;
> +
> +       dev = get_cpu_device(cpu);
> +       sysfs_remove_group(&dev->kobj, &ifs_attr_group);
> +}
> +
> +/*
> + * Reload the IFS image. When user wants to install new IFS image
> + * image, reloading must be done.
> + */
> +static ssize_t reload_store(struct device *dev,
> +                           struct device_attribute *attr,
> +                           const char *buf, size_t count)
> +{
> +       bool var;
> +       int rc;
> +
> +       if (ifs_disabled)
> +               return -ENXIO;
> +
> +       rc = kstrtobool(buf, &var);
> +       if (rc < 0 || var != 1)
> +               return -EINVAL;

See below about sysfs_eq()

> +
> +       down(&ifs_sem);
> +       rc = load_ifs_binary();
> +       up(&ifs_sem);
> +       if (rc < 0) {
> +               pr_info("failed to reload ifs hash and test\n");
> +               return rc;
> +       }
> +
> +       return count;
> +}
> +
> +static DEVICE_ATTR_WO(reload);
> +
> +static int run_allcpu_scan_test(void)
> +{
> +       int cpu;
> +
> +       if (down_trylock(&ifs_sem)) {
> +               pr_info("another instance in progress.\n");
> +               return -EBUSY;
> +       }
> +
> +       cpu_hotplug_disable();
> +       for_each_cpu(cpu, cpu_online_mask) {
> +               /* Only execute test on the first thread on each core */
> +               if (cpumask_first(topology_sibling_cpumask(cpu)) != cpu)
> +                       continue;
> +               do_core_test(cpu);
> +               mdelay(core_delay);
> +       }
> +       cpu_hotplug_enable();
> +
> +       up(&ifs_sem);
> +       return 0;
> +}
> +
> +/*
> + * The sysfs interface to execute scan test for all online cpus.
> + * The test can be triggered as below:
> + * echo 1 > /sys/devices/system/cpu/ifs/run_test
> + */
> +static ssize_t allcpu_run_test_store(struct device *dev,
> +                                    struct device_attribute *attr,
> +                                    const char *buf, size_t count)
> +{
> +       bool var;
> +       int rc;
> +
> +       if (ifs_disabled)
> +               return -ENXIO;
> +
> +       rc = kstrtobool(buf, &var);
> +       if (rc < 0 || var != 1)
> +               return -EINVAL;

You could just cut to the chase and do: sysfs_eq(buf, "1")

> +
> +       rc = run_allcpu_scan_test();
> +       if (rc < 0)
> +               return rc;
> +
> +       return count;
> +}
> +
> +/*
> + * Percpu and allcpu ifs have attributes named "run_test".
> + * Since the former is defined in this same file using DEVICE_ATTR_WO()
> + * the latter is defined directly.
> + */
> +static struct device_attribute dev_attr_allcpu_run_test = {

Same feedback to just use DEVICE_ATTR() for this.

> +       .attr   = { .name = "run_test", .mode = 0200 },
> +       .store  = allcpu_run_test_store,
> +};
> +
> +/*
> + * Currently loaded IFS image version.
> + */
> +static ssize_t image_version_show(struct device *dev,
> +                                 struct device_attribute *attr, char *buf)
> +{
> +       return sprintf(buf, "%x\n", ifs_params.loaded_version);

more %#x and sysfs_emit() feedback...

> +}
> +
> +static DEVICE_ATTR_RO(image_version);
> +
> +/*
> + * Currently loaded IFS image version.
> + */
> +static ssize_t cpu_fail_list_show(struct device *dev,
> +                                 struct device_attribute *attr, char *buf)
> +{
> +       int ret;
> +
> +       if (down_trylock(&ifs_sem))
> +               return -EBUSY;
> +
> +       ret = sprintf(buf, "%*pbl\n", cpumask_pr_args(&ifs_params.fail_mask));
> +       up(&ifs_sem);
> +       return ret;
> +}
> +
> +static DEVICE_ATTR_RO(cpu_fail_list);
> +
> +static ssize_t cpu_untested_list_show(struct device *dev,
> +                                     struct device_attribute *attr, char *buf)
> +{
> +       int ret;
> +
> +       if (down_trylock(&ifs_sem))
> +               return -EBUSY;
> +
> +       ret = sprintf(buf, "%*pbl\n", cpumask_pr_args(&ifs_params.not_tested_mask));
> +       up(&ifs_sem);
> +
> +       return ret;
> +}
> +
> +static DEVICE_ATTR_RO(cpu_untested_list);
> +
> +static ssize_t cpu_pass_list_show(struct device *dev,
> +                                 struct device_attribute *attr, char *buf)
> +{
> +       int ret;
> +
> +       if (down_trylock(&ifs_sem))
> +               return -EBUSY;
> +
> +       ret = sprintf(buf, "%*pbl\n", cpumask_pr_args(&ifs_params.pass_mask));
> +       up(&ifs_sem);
> +
> +       return ret;
> +}
> +
> +static DEVICE_ATTR_RO(cpu_pass_list);
> +
> +/*
> + * Status for global ifs test
> + */
> +static ssize_t allcpu_status_show(struct device *dev,
> +                                 struct device_attribute *attr, char *buf)
> +{
> +       int ret;
> +
> +       if (down_trylock(&ifs_sem))
> +               return -EBUSY;
> +
> +       if (!cpumask_empty(&ifs_params.fail_mask))
> +               ret = sprintf(buf, "fail\n");
> +       else if (!cpumask_empty(&ifs_params.not_tested_mask))
> +               ret = sprintf(buf, "untested\n");
> +       else
> +               ret = sprintf(buf, "pass\n");
> +
> +       up(&ifs_sem);
> +
> +       return ret;
> +}
> +
> +/*
> + * Percpu and allcpu ifs have attributes named "status".
> + * Since the former is defined in this same file using DEVICE_ATTR_RO()
> + * the latter is defined directly.
> + */
> +static struct device_attribute dev_attr_allcpu_status = {
> +       .attr   = { .name = "status", .mode = 0444 },
> +       .show   = allcpu_status_show,
> +};

Can still do the one line declartion like this and skip the comment.

DEVICE_ATTR(status, 0444, allcpu_status_show, NULL);

> +
> +/* global scan sysfs attributes */
> +static struct attribute *cpu_ifs_attrs[] = {
> +       &dev_attr_reload.attr,
> +       &dev_attr_allcpu_run_test.attr,
> +       &dev_attr_image_version.attr,
> +       &dev_attr_cpu_fail_list.attr,
> +       &dev_attr_cpu_untested_list.attr,
> +       &dev_attr_cpu_pass_list.attr,
> +       &dev_attr_allcpu_status.attr,
> +       NULL
> +};
> +
> +const struct attribute_group cpu_ifs_attr_group = {

static?

> +       .attrs = cpu_ifs_attrs,
> +};
> +
> +const struct attribute_group *cpu_ifs_attr_groups[] = {

static?

> +       &cpu_ifs_attr_group,
> +       NULL,
> +};
> +
> +static struct device *cpu_scan_device;
> +
> +/* Creates the sysfs files under /sys/devices/system/cpu/ifs */
> +void cpu_ifs_init(void)
> +{
> +       struct device *root;
> +
> +       root = cpu_subsys.dev_root;
> +       cpu_scan_device = cpu_device_create(root, NULL, cpu_ifs_attr_groups, "ifs");

I just don't think ifs rises to the level of something that belongs in
the /sys/device/system/cpu/ifs namespace vs /sys/devices/platform/ifs/
especially since the module need not remain resident while a test is
not running. IIUC it's not a core capability that needs to be available
at all times it's something that will be kicked off during service
downtime to take inventory of CPU health, right?

2022-03-04 08:39:58

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 09/10] platform/x86/intel/ifs: add ABI documentation for IFS

On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
> Add the sysfs attributes in ABI/stable for In-Field Scan.
>
> Originally-by: Kyung Min Park <[email protected]>
> Signed-off-by: Jithu Joseph <[email protected]>
> Reviewed-by: Ashok Raj <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
>  Documentation/ABI/stable/sysfs-driver-ifs | 85 +++++++++++++++++++++++

If you end up keeping this functionality under /sys/device/system/cpu
then I think this documentation belongs in:

Documentation/ABI/testing/sysfs-devices-system-cpu

...otherwise, I think it is better off in:

Documentation/ABI/testing/sysfs-devices-platform-ifs


>  1 file changed, 85 insertions(+)
>  create mode 100644 Documentation/ABI/stable/sysfs-driver-ifs
>
> diff --git a/Documentation/ABI/stable/sysfs-driver-ifs b/Documentation/ABI/stable/sysfs-driver-ifs
> new file mode 100644
> index 000000000000..8b6b9472f57e
> --- /dev/null
> +++ b/Documentation/ABI/stable/sysfs-driver-ifs
> @@ -0,0 +1,85 @@
> +What:          /sys/devices/system/cpu/ifs/run_test
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   echo 1 to trigger ifs test for all online cores.

Somewhere in this file is would be good to reference back to the core
documentation, because if this is the first place somebody lands, this
description is not that useful.

> +
> +What:          /sys/devices/system/cpu/ifs/status
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   Global status. Shows the most serious status across
> +               all cores (fail > untested > pass)

Rather than this lossy interface you might want to emit uevents on test
completion as a way to notify the results. That can add parameters to
an environment when calling a helper to process the event. See how NVME
takes advantage of this in nvme_aen_uevent() and nvme_class_uevent().

> +
> +What:          /sys/devices/system/cpu/ifs/image_version
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   Version of loaded IFS binary image.
> +
> +What:          /sys/devices/system/cpu/ifs/reload
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   echo 1 to reload IFS image.
> +
> +What:          /sys/devices/system/cpu/ifs/cpu_pass_list
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   List of cpus which passed the IFS test.

Format of this field? Is it even necessary if the user tooling can just
capture the per-core uevents associated with a test run?

> +
> +What:          /sys/devices/system/cpu/ifs/cpu_fail_list
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   List of cpus which failed the IFS test.
> +
> +What:          /sys/devices/system/cpu/ifs/cpu_untested_list
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   List of cpus which could not be tested.
> +
> +What:          /sys/module/intel_ifs/parameters/noint
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   SAF tunable parameter that user can modify before

"SAF" is never defined.

> +               the scan run if they wish to override default value.
> +
> +               When set, system interrupts are not allowed to interrupt an IFS. The
> +               default state for this parameter is set.

User implications of this setting? Like:

"Note: this setting may causes applications to miss latency / quality
of service deadlines, use with care."



> +
> +What:          /sys/module/intel_ifs/parameters/retry
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   SAF tunable parameter that user can modify at
> +               load time if they wish to override default value.
> +
> +               Maximum retry counter when the test is not executed due to an
> +               event such as interrupt. The default value is 5, it can be set to any
> +               value from 1 to 20.

Just seems like this is something the test tool can trivially handle
itself to just retry the test if it wants upon a failure.

> +
> +What:          /sys/devices/system/cpu/cpu#/ifs/run_test
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   IFS target core testing. echo 1 to trigger scan test on cpu#.

As mentioned on the last patch, if a CPU mask was an input parameter
then this would not need to be a per-CPU file.

> +
> +What:          /sys/devices/system/cpu/cpu#/ifs/status
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   The status of IFS test on a specific cpu#. It can be one of "pass", "fail"
> +               or "untested".
> +
> +What:          /sys/devices/system/cpu/cpu#/ifs/details
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       [email protected]
> +Description:   The details file reports the hex value of the SCAN_STATUS MSR. Note that
> +               the error_code field may contain driver defined software code not defined
> +               in the Intel SDM.

'status' and 'details' could be uevent output variables per cpu.

2022-03-04 20:32:06

by Luck, Tony

[permalink] [raw]
Subject: RE: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

> Dynamic creation of sysfs attributes sometime after the driver loads is
> not friendly for something that likely only needs the module loaded for
> a test run and then unloaded again. I think this effort would be better
> served by building a sysfs topology registered underneath an ifs
> platform device.

Ah. Another gap in the Documentation/x86/ifs.rst that I need to fix.

Expected use case isn't just "run this test once to check your system
for signs of silicon aging". It is load this and run the tests at regular
intervals to catch developing problems as soon as possible.

I'll see if I can get the architects of the feature to give some input
on what they expect to be a "regular interval" to be included in
the documentation.

Note that even if a user only plans to run infrequently, there isn't
much benefit from unloading the module. The kernel memory used
by this driver is dwarfed by the memory reserved by BIOS to hold
the validated copies of the scan tests. Reloading the tests from the
file takes longer than a single run of the tests (though perhaps
that can be avoided if reading the MSRs can confirm that the
tests already loaded in reserved memory).

-Tony

2022-03-04 21:04:01

by Luck, Tony

[permalink] [raw]
Subject: RE: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

>> You could just cut to the chase and do: sysfs_eq(buf, "1")
>
> Thanks will use this

$ git grep sysfs_eq
$

I don't see this defined (or used) anywhere in the kernel. Is
it spelled differently from how typed here?

-Tony

2022-03-04 21:18:24

by Joseph, Jithu

[permalink] [raw]
Subject: Re: [RFC 07/10] platform/x86/intel/ifs: Create kthreads for online cpus for scan test



On 3/2/2022 8:17 PM, Williams, Dan J wrote:
> On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
>> Create scan kthreads for online logical cpus. Once scan test is triggered,
>> it wakes up the corresponding thread and its sibling threads to execute
>> the test. Once the scan test is done, the threads go back to thread wait
>> for next signal to start a new scan.
>>
...
>> +
>> +static int retry = 5;
>> +module_param_cb(retry, &ifs_retry_ops, &retry, 0644);
>> +
>> +MODULE_PARM_DESC(retry, "Maximum retry count when the test is not executed");
>
> Why does this retry need to be in the kernel? Can't the test runner
> retry the test if they want? If it stays in the kernel, why a module
> parameter and not a sysfs attribute?

A core is tested by writing 1 to "runtest" file. When user writes a 1 to run_test file it tests all the subtests (chunks) on a core.
Distinct from this, retry parameter describes the autoretry driver would do at "chunk" granularity (bit more on why, is available in the doc)

Why not a sysfs attribute: good qn, Our earlier prototype had this as a percpu sysfs attribute, however this was removed to keep the sysfs entries simple/minimal and less confusing.
(and tunable options which are not strictly needed in the normal course of use were moved to module parameters)
In the percpu sysfs we now only have the essential stuff i.e run_test , status , and details making it simpler for user who wants to test the core.


>
>> +
>> +static bool noint = 1;
>> +module_param(noint, bool, 0644);
>> +MODULE_PARM_DESC(noint, "Option to enable/disable interrupt during test");
>
> Same sysfs vs module parameter question...

Same as above

>

>> +
>> +static const char * const scan_test_status[] = {
>> +       "SCAN no error",
>> +       "Other thread could not join.",
>> +       "Interrupt occurred prior to SCAN coordination.",
>> +       "Core Abort SCAN Response due to power management condition.",
>> +       "Non valid chunks in the range",
>> +       "Mismatch in arguments between threads T0/T1.",
>> +       "Core not capable of performing SCAN currently",
>> +       "Unassigned error code 0x7",
>> +       "Exceeded number of Logical Processors (LP) allowed to run Scan-At-Field concurrently",
>> +       "Interrupt occurred prior to SCAN start",
>
> This looks unmaintainable...
>
> /me finds large comment block around IFS_* error codes and suggests
> killing 2 birds with one stone, i.e. delete that comment and make this
> self documenting:
>
> static const char * const scan_test_status[] = {
> [IFS_NO_ERROR] = "SCAN no error",
> [IFS_OTHER_THREAD_DID_NOT_JOIN] = "Other thread could not join.",
> ...etc...
> };

Will use this format.

>
> Btw, which is it "did not join" and "could not join"? If the symbol
> name is going to be that long might as well make it match the log
> message verbatim.

Will make them identical. Thanks for pointing this out.

>
> That way you can add / delete error codes without wondering if you
> managed to match the code number to the right position in the array.
>
> Even better would be to kick this out of the kernel and let the user
> tool translate the error codes to test result log messages.
>
>> +};
>> +
>> +static void message_not_tested(int cpu, union ifs_status status)
>> +{
>> +       if (status.error_code < ARRAY_SIZE(scan_test_status))
>> +               pr_warn("CPU %d: SCAN operation did not start. %s\n", cpu,
>> +                       scan_test_status[status.error_code]);
>> +       else if (status.error_code == IFS_SW_TIMEOUT)
>> +               pr_warn("CPU %d: software timeout during scan\n", cpu);
>> +       else if (status.error_code == IFS_SW_PARTIAL_COMPLETION)
>> +               pr_warn("CPU %d: %s\n", cpu,
>> +                       "Not all scan chunks were executed. Maximum forward progress retries exceeded");
>
> Why are these codes not in the scan_test_status set? I see that
> IFS_SW_PARTIAL_COMPLETION and IFS_SW_TIMEOUT are defined with larger
> values, but why?

These are software(driver) defined error codes. Rest of the error codes are supplied by
the hardware. Software defined error codes were kept at the other end to provide ample space
in case (future) hardware decides to provide extend error codes.

>
>> +       else
>> +               pr_warn("CPU %d: SCAN unknown status %llx\n", cpu, status.data);
>> +}
>> +
>> +static void message_fail(int cpu, union ifs_status status)
>> +{
>> +       if (status.control_error) {
>> +               pr_err("CPU %d: scan failed. %s\n", cpu,
>> +                      "Suggest reload scan file: # echo 1 > /sys/devices/system/cpu/ifs/reload");
>> +       }
>> +       if (status.signature_error) {
>> +               pr_err("CPU %d: test signature incorrect. %s\n", cpu,
>> +                      "Suggest retry scan to check if problem is transient");
>> +       }
>
> This looks and feels more like tools/testing/selftests/ style code
> printing information for a kernel developer to read. For a production
> capability I would expect these debug messages to be elided and have an
> ifs user tool that knows when to "Suggest reload scan file". Otherwise,
> it's not scalable to use the kernel log buffer like a utility's stdout.

The two pr_err here are for really really grave errors and warrants being displayed to console,
possibly indicating some fault with the particular core. They are never expected to come in normal
course of testing on a working core.

But I see your larger point. We will convert all the pr_warn preceeding this (in message_not_tested())
to pr_dbg so that they dont normally take up the kernel log buffer. (they are not as grave a scenario as the earlier one).

The same information is also available from percpu sysfs/cpu#/ifs/status for user spaces tools to operate on

Jithu



2022-03-04 21:22:58

by Joseph, Jithu

[permalink] [raw]
Subject: Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface



On 3/3/2022 4:31 PM, Williams, Dan J wrote:
> On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
>> Implement sysfs interface to trigger ifs test for a targeted core or
>> all cores. For all core testing, the kernel will start testing from core 0
>> and proceed to the next core one after another. After the ifs test on the
>> last core, the test stops until the administrator starts another round of

>> +
>> +/*
>> + * The sysfs interface to check the test status:
>> + * To check the result, for example, cpu0
>> + * cat /sys/devices/system/cpu/cpu0/ifs/details
>> + */
>> +static ssize_t details_show(struct device *dev,
>> +                           struct device_attribute *attr,
>> +                           char *buf)
>> +{
>> +       unsigned int cpu = dev->id;
>> +       int ret;
>> +
>> +       if (down_trylock(&ifs_sem))
>> +               return -EBUSY;
>
> What is the ifs_sem protecting? This result is immediately invalid
> after the lock is dropped anyway, so why hold it over reading the
> value? You can't prevent 2 threads racing each other here.

percpu thread running scan_test_worker() will update per_cpu(ifs_state, cpu).scan_details. (before signalling this thread to run, this lock would be acquired)
This is to protect against the scenario where if the percpu thread is running a test and if at the same time a user is querying its status, they would see busy.

>
>> +
>> +       ret = sprintf(buf, "%llx\n", per_cpu(ifs_state, cpu).scan_details);
>
> Should be sysfs_emit() which includes the page buffer safety.

grep KH also pointed this out ... will replace this throughout

>
> Also, you likely want that format string to be %#llx so that userspace
> knows explicitly that this is a hexadecimal value.

Agreed will do this


>> +
>> +/*
>> + * The sysfs interface for single core testing
>> + * To start test, for example, cpu0
>> + * echo 1 > /sys/devices/system/cpu/cpu0/ifs/run_test
>> + * To check the result:
>> + * cat /sys/devices/system/cpu/cpu0/ifs/result
(there is a typo in the comment result -> status)
>
> Just have a CPU mask as an input parameter and avoid needing to hang
> ifs sysfs attributes underneath /sys/device/system/cpu/ifs.

The percpu sysfs has the additional function of providing percpu status and details.
The global interface is unable to provide the status and details for all the cores in the system. It does give a summary, which
guides the user to the appropriate percpu status/details


>> + */
>> +static ssize_t allcpu_run_test_store(struct device *dev,
>> +                                    struct device_attribute *attr,
>> +                                    const char *buf, size_t count)
>> +{
>> +       bool var;
>> +       int rc;
>> +
>> +       if (ifs_disabled)
>> +               return -ENXIO;
>> +
>> +       rc = kstrtobool(buf, &var);
>> +       if (rc < 0 || var != 1)
>> +               return -EINVAL;
>
> You could just cut to the chase and do: sysfs_eq(buf, "1")

Thanks will use this



>> +
>> +/*
>> + * Percpu and allcpu ifs have attributes named "status".
>> + * Since the former is defined in this same file using DEVICE_ATTR_RO()
>> + * the latter is defined directly.
>> + */
>> +static struct device_attribute dev_attr_allcpu_status = {
>> +       .attr   = { .name = "status", .mode = 0444 },
>> +       .show   = allcpu_status_show,
>> +};
>
> Can still do the one line declartion like this and skip the comment.
>
> DEVICE_ATTR(status, 0444, allcpu_status_show, NULL);

Will change as per your suggestion above

>
>> +
>> +/* global scan sysfs attributes */
>> +static struct attribute *cpu_ifs_attrs[] = {
>> +       &dev_attr_reload.attr,
>> +       &dev_attr_allcpu_run_test.attr,
>> +       &dev_attr_image_version.attr,
>> +       &dev_attr_cpu_fail_list.attr,
>> +       &dev_attr_cpu_untested_list.attr,
>> +       &dev_attr_cpu_pass_list.attr,
>> +       &dev_attr_allcpu_status.attr,
>> +       NULL
>> +};
>> +
>> +const struct attribute_group cpu_ifs_attr_group = {
>
> static?

will do

Jithu

2022-03-07 21:35:42

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

On Mon, Mar 7, 2022 at 11:09 AM Joseph, Jithu <[email protected]> wrote:
>
>
>
> On 3/7/2022 9:38 AM, Dan Williams wrote:
> > On Fri, Mar 4, 2022 at 12:42 PM Joseph, Jithu <[email protected]> wrote:
> >>
> >>
> >>
>
> >>>> + */
> >>>> +static ssize_t details_show(struct device *dev,
> >>>> + struct device_attribute *attr,
> >>>> + char *buf)
> >>>> +{
> >>>> + unsigned int cpu = dev->id;
> >>>> + int ret;
> >>>> +
> >>>> + if (down_trylock(&ifs_sem))
> >>>> + return -EBUSY;
> >>>
> >>> What is the ifs_sem protecting? This result is immediately invalid
> >>> after the lock is dropped anyway, so why hold it over reading the
> >>> value? You can't prevent 2 threads racing each other here.
> >>
> >> percpu thread running scan_test_worker() will update per_cpu(ifs_state, cpu).scan_details. (before signalling this thread to run, this lock would be acquired)
> >> This is to protect against the scenario where if the percpu thread is running a test and if at the same time a user is querying its status, they would see busy.
> >
> > That begs the question why would userspace be polling this file? Is it
> > because it does not know when a test completes otherwise? How does it
> > know that the result it is seeing is from the test it ran and not some
> > other invocation to start a new test?
>
> I think I should have explained the need for locking at a higher level .
> It is to make sure that only one of the below action happens at any time
>
> 1. single percpu test
> 2. all-cpu test (tests all cores sequentially)
> 3. scan binary reload
> 4. querying the status
>
> (There are h/w reasons for why we restrict to a single IFS test at any time)
> If 4 happens while 1 or 2 is in progress , we return busy. My earlier explanation was trying to explain why we are preventing 4 when 1 or 2 is in progress.
>
> As to the question of why would a user do 4 while 1 or 2 is in progress, there is no reason for him to do that, both the run_test (percpu and global) are blocking.
> But if he issues a test from one shell and uses another shell to query the status (while it is still in progress) he will see busy.

...and what about the race that the semaphore cannot solve of test run
launch racing collection of previous run results?

>
>
> >>>
> >>> Just have a CPU mask as an input parameter and avoid needing to hang
> >>> ifs sysfs attributes underneath /sys/device/system/cpu/ifs.
> >>
> >> The percpu sysfs has the additional function of providing percpu status and details.
> >
> > That still does not answer the question about the input parameter for
> > selecting cores.
>
> see below
>
> >
> >> The global interface is unable to provide the status and details for all the cores in the system. It does give a summary, which
> >> guides the user to the appropriate percpu status/details
> >
> > It does not sound like the global sysfs interface is all that useful,
> > especially as it just expands the window for the global results to
> > become out of sync with the per-cpu results. With a uevent the tool
> > can get called to handle results on per-cpu / per-test,chunk basis
> > atomically. I.e. a udev rule like:
>
> The global/percpu design was chosen after some thought on how a user might want to test his system. And the sysfs files are grouped accordingly.
> To start with he might want to see if everything is fine with his system (all cores). The global interface is for this. He can do that with a single line command
>
> echo 1 > /sys/devices/system/cpu/ifs/run_test
>
> if "/sys/devices/system/cpu/ifs/status" says anything other than pass, he can identify the failed/untested cores using "/sys/devices/system/cpu/ifs/cpu_fail_list" and "cpu_untested_list".
>
> Now to understand why a particular core failed or retest that particular core, the percpu interface is present.
> (So global and percpu views are a simple and convenient way to expose the available functionality and we went with that … I agree that an alternative was to provide an input parameter for selecting cores)
>
> The whole testing can be done with simple interaction with sysfs file … without the need for any elaborate user space tool.

I have not asked for an "elaborate" user tool, sysfs events and
uevents allow for basic shell script tooling. Now, a more polished
tool is certainly welcome, but a sample script is better than nothing.

> Perhaps we can add recommended testing flow in the Documentation

Documentation is good for describing architecture internals and
motivation, a sample user tool / script is good for validating ABI fit
for purpose.

2022-03-07 21:39:29

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

On Mon, Mar 7, 2022 at 11:55 AM Joseph, Jithu <[email protected]> wrote:
>
>
>
> On 3/7/2022 11:15 AM, Dan Williams wrote:
> > On Mon, Mar 7, 2022 at 11:09 AM Joseph, Jithu <[email protected]> wrote:
> >>
> >>
> >>
> >> On 3/7/2022 9:38 AM, Dan Williams wrote:
> >>> On Fri, Mar 4, 2022 at 12:42 PM Joseph, Jithu <[email protected]> wrote:
> >>>>
> >>>>
> >>>>
> >>
> >>>>>> + */
> >>>>>> +static ssize_t details_show(struct device *dev,
> >>>>>> + struct device_attribute *attr,
> >>>>>> + char *buf)
> >>>>>> +{
> >>>>>> + unsigned int cpu = dev->id;
> >>>>>> + int ret;
> >>>>>> +
> >>>>>> + if (down_trylock(&ifs_sem))
> >>>>>> + return -EBUSY;
> >>>>>
> >>>>> What is the ifs_sem protecting? This result is immediately invalid
> >>>>> after the lock is dropped anyway, so why hold it over reading the
> >>>>> value? You can't prevent 2 threads racing each other here.
> >>>>
> >>>> percpu thread running scan_test_worker() will update per_cpu(ifs_state, cpu).scan_details. (before signalling this thread to run, this lock would be acquired)
> >>>> This is to protect against the scenario where if the percpu thread is running a test and if at the same time a user is querying its status, they would see busy.
> >>>
> >>> That begs the question why would userspace be polling this file? Is it
> >>> because it does not know when a test completes otherwise? How does it
> >>> know that the result it is seeing is from the test it ran and not some
> >>> other invocation to start a new test?
> >>
> >> I think I should have explained the need for locking at a higher level .
> >> It is to make sure that only one of the below action happens at any time
> >>
> >> 1. single percpu test
> >> 2. all-cpu test (tests all cores sequentially)
> >> 3. scan binary reload
> >> 4. querying the status
> >>
> >> (There are h/w reasons for why we restrict to a single IFS test at any time)
> >> If 4 happens while 1 or 2 is in progress , we return busy. My earlier explanation was trying to explain why we are preventing 4 when 1 or 2 is in progress.
> >>
> >> As to the question of why would a user do 4 while 1 or 2 is in progress, there is no reason for him to do that, both the run_test (percpu and global) are blocking.
> >> But if he issues a test from one shell and uses another shell to query the status (while it is still in progress) he will see busy.
> >
> > ...and what about the race that the semaphore cannot solve of test run
> > launch racing collection of previous run results?
>
>
> pardon me if I am missing something obvious here … but everybody (the 4 scenarios listed above) tries to acquire the same semaphore, or returns -EBUSY if try_lock() fails.
> So in case of triggering "run_test" and viewing "status" racing scenario you mention, the below are the 2 interleaving I see
>
> if "echo 1 > /sys/devices/sysem/cpu/cpu10/ifs/run_test" gets the lock , the parallel "cat /sys/devices/sysem/cpu/cpu10/ifs/status" will return busy (and not the previous status)
> if "cat /sys/devices/sysem/cpu/cpu10/ifs/status", gets the lock it will display the status of the last test result and the parallel "echo 1 > /sys/devices/sysem/cpu/cpu10/ifs/run_test" will fail with busy
>
> This I believe is consistent behavior.

I am speaking of the state of the case where 2 threads are doing
run_test and polling for results. Unless you can guarantee that run2
does not start before the results of run1 have been collected then
they are lost in that scenario. No amount of kernel locking can
resolve that race to collect previous result which would not be a
problem in the first place if there was an atomic way to log test
results.

2022-03-07 22:40:16

by Joseph, Jithu

[permalink] [raw]
Subject: Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface



On 3/7/2022 9:38 AM, Dan Williams wrote:
> On Fri, Mar 4, 2022 at 12:42 PM Joseph, Jithu <[email protected]> wrote:
>>
>>
>>

>>>> + */
>>>> +static ssize_t details_show(struct device *dev,
>>>> + struct device_attribute *attr,
>>>> + char *buf)
>>>> +{
>>>> + unsigned int cpu = dev->id;
>>>> + int ret;
>>>> +
>>>> + if (down_trylock(&ifs_sem))
>>>> + return -EBUSY;
>>>
>>> What is the ifs_sem protecting? This result is immediately invalid
>>> after the lock is dropped anyway, so why hold it over reading the
>>> value? You can't prevent 2 threads racing each other here.
>>
>> percpu thread running scan_test_worker() will update per_cpu(ifs_state, cpu).scan_details. (before signalling this thread to run, this lock would be acquired)
>> This is to protect against the scenario where if the percpu thread is running a test and if at the same time a user is querying its status, they would see busy.
>
> That begs the question why would userspace be polling this file? Is it
> because it does not know when a test completes otherwise? How does it
> know that the result it is seeing is from the test it ran and not some
> other invocation to start a new test?

I think I should have explained the need for locking at a higher level .
It is to make sure that only one of the below action happens at any time

1. single percpu test
2. all-cpu test (tests all cores sequentially)
3. scan binary reload
4. querying the status

(There are h/w reasons for why we restrict to a single IFS test at any time)
If 4 happens while 1 or 2 is in progress , we return busy. My earlier explanation was trying to explain why we are preventing 4 when 1 or 2 is in progress.

As to the question of why would a user do 4 while 1 or 2 is in progress, there is no reason for him to do that, both the run_test (percpu and global) are blocking.
But if he issues a test from one shell and uses another shell to query the status (while it is still in progress) he will see busy.


>>>
>>> Just have a CPU mask as an input parameter and avoid needing to hang
>>> ifs sysfs attributes underneath /sys/device/system/cpu/ifs.
>>
>> The percpu sysfs has the additional function of providing percpu status and details.
>
> That still does not answer the question about the input parameter for
> selecting cores.

see below

>
>> The global interface is unable to provide the status and details for all the cores in the system. It does give a summary, which
>> guides the user to the appropriate percpu status/details
>
> It does not sound like the global sysfs interface is all that useful,
> especially as it just expands the window for the global results to
> become out of sync with the per-cpu results. With a uevent the tool
> can get called to handle results on per-cpu / per-test,chunk basis
> atomically. I.e. a udev rule like:

The global/percpu design was chosen after some thought on how a user might want to test his system. And the sysfs files are grouped accordingly.
To start with he might want to see if everything is fine with his system (all cores). The global interface is for this. He can do that with a single line command

echo 1 > /sys/devices/system/cpu/ifs/run_test

if "/sys/devices/system/cpu/ifs/status" says anything other than pass, he can identify the failed/untested cores using "/sys/devices/system/cpu/ifs/cpu_fail_list" and "cpu_untested_list".

Now to understand why a particular core failed or retest that particular core, the percpu interface is present.
(So global and percpu views are a simple and convenient way to expose the available functionality and we went with that … I agree that an alternative was to provide an input parameter for selecting cores)

The whole testing can be done with simple interaction with sysfs file … without the need for any elaborate user space tool.
Perhaps we can add recommended testing flow in the Documentation

Jithu

2022-03-08 02:45:27

by Joseph, Jithu

[permalink] [raw]
Subject: Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface



On 3/7/2022 12:25 PM, Dan Williams wrote:

>
> I am speaking of the state of the case where 2 threads are doing
> run_test and polling for results. Unless you can guarantee that run2
> does not start before the results of run1 have been collected then
> they are lost in that scenario. No amount of kernel locking can
> resolve that race to collect previous result which would not be a
> problem in the first place if there was an atomic way to log test
> results.


Yes "status" shows the status of the latest run. You cannot get the status of the previous run.

Also some context on test frequency: Hardware has strict rate limiting of tests.
Every core can be tested only once in every 30 minutes. So it is pointless to test at high frequency.

2022-03-08 07:35:48

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 07/10] platform/x86/intel/ifs: Create kthreads for online cpus for scan test

On Fri, Mar 4, 2022 at 11:20 AM Joseph, Jithu <[email protected]> wrote:
>
>
>
> On 3/2/2022 8:17 PM, Williams, Dan J wrote:
> > On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
> >> Create scan kthreads for online logical cpus. Once scan test is triggered,
> >> it wakes up the corresponding thread and its sibling threads to execute
> >> the test. Once the scan test is done, the threads go back to thread wait
> >> for next signal to start a new scan.
> >>
> ...
> >> +
> >> +static int retry = 5;
> >> +module_param_cb(retry, &ifs_retry_ops, &retry, 0644);
> >> +
> >> +MODULE_PARM_DESC(retry, "Maximum retry count when the test is not executed");
> >
> > Why does this retry need to be in the kernel? Can't the test runner
> > retry the test if they want? If it stays in the kernel, why a module
> > parameter and not a sysfs attribute?
>
> A core is tested by writing 1 to "runtest" file. When user writes a 1 to run_test file it tests all the subtests (chunks) on a core.
> Distinct from this, retry parameter describes the autoretry driver would do at "chunk" granularity (bit more on why, is available in the doc)
>
> Why not a sysfs attribute: good qn, Our earlier prototype had this as a percpu sysfs attribute, however this was removed to keep the sysfs entries simple/minimal and less confusing.
> (and tunable options which are not strictly needed in the normal course of use were moved to module parameters)
> In the percpu sysfs we now only have the essential stuff i.e run_test , status , and details making it simpler for user who wants to test the core.

...but you are putting it in sysfs, just in a different directory:

/sys/module/ifs/parameters

vs

/sys/devices/{system/cpu/,platform}/ifs

Just unify it all in one place, otherwise, I fail to see the
simplification for the user over spreading settings over multiple
locations.

>
>
> >
> >> +
> >> +static bool noint = 1;
> >> +module_param(noint, bool, 0644);
> >> +MODULE_PARM_DESC(noint, "Option to enable/disable interrupt during test");
> >
> > Same sysfs vs module parameter question...
>
> Same as above

Same multiple sysfs ABI location concern.

>
> >
>
> >> +
> >> +static const char * const scan_test_status[] = {
> >> + "SCAN no error",
> >> + "Other thread could not join.",
> >> + "Interrupt occurred prior to SCAN coordination.",
> >> + "Core Abort SCAN Response due to power management condition.",
> >> + "Non valid chunks in the range",
> >> + "Mismatch in arguments between threads T0/T1.",
> >> + "Core not capable of performing SCAN currently",
> >> + "Unassigned error code 0x7",
> >> + "Exceeded number of Logical Processors (LP) allowed to run Scan-At-Field concurrently",
> >> + "Interrupt occurred prior to SCAN start",
> >
> > This looks unmaintainable...
> >
> > /me finds large comment block around IFS_* error codes and suggests
> > killing 2 birds with one stone, i.e. delete that comment and make this
> > self documenting:
> >
> > static const char * const scan_test_status[] = {
> > [IFS_NO_ERROR] = "SCAN no error",
> > [IFS_OTHER_THREAD_DID_NOT_JOIN] = "Other thread could not join.",
> > ...etc...
> > };
>
> Will use this format.
>
> >
> > Btw, which is it "did not join" and "could not join"? If the symbol
> > name is going to be that long might as well make it match the log
> > message verbatim.
>
> Will make them identical. Thanks for pointing this out.
>
> >
> > That way you can add / delete error codes without wondering if you
> > managed to match the code number to the right position in the array.
> >
> > Even better would be to kick this out of the kernel and let the user
> > tool translate the error codes to test result log messages.
> >
> >> +};
> >> +
> >> +static void message_not_tested(int cpu, union ifs_status status)
> >> +{
> >> + if (status.error_code < ARRAY_SIZE(scan_test_status))
> >> + pr_warn("CPU %d: SCAN operation did not start. %s\n", cpu,
> >> + scan_test_status[status.error_code]);
> >> + else if (status.error_code == IFS_SW_TIMEOUT)
> >> + pr_warn("CPU %d: software timeout during scan\n", cpu);
> >> + else if (status.error_code == IFS_SW_PARTIAL_COMPLETION)
> >> + pr_warn("CPU %d: %s\n", cpu,
> >> + "Not all scan chunks were executed. Maximum forward progress retries exceeded");
> >
> > Why are these codes not in the scan_test_status set? I see that
> > IFS_SW_PARTIAL_COMPLETION and IFS_SW_TIMEOUT are defined with larger
> > values, but why?
>
> These are software(driver) defined error codes. Rest of the error codes are supplied by
> the hardware. Software defined error codes were kept at the other end to provide ample space
> in case (future) hardware decides to provide extend error codes.

Why put them in the same number space? Separate software results from
the raw hardware results and have a separate mechanism to convey each.

>
> >
> >> + else
> >> + pr_warn("CPU %d: SCAN unknown status %llx\n", cpu, status.data);
> >> +}
> >> +
> >> +static void message_fail(int cpu, union ifs_status status)
> >> +{
> >> + if (status.control_error) {
> >> + pr_err("CPU %d: scan failed. %s\n", cpu,
> >> + "Suggest reload scan file: # echo 1 > /sys/devices/system/cpu/ifs/reload");
> >> + }
> >> + if (status.signature_error) {
> >> + pr_err("CPU %d: test signature incorrect. %s\n", cpu,
> >> + "Suggest retry scan to check if problem is transient");
> >> + }
> >
> > This looks and feels more like tools/testing/selftests/ style code
> > printing information for a kernel developer to read. For a production
> > capability I would expect these debug messages to be elided and have an
> > ifs user tool that knows when to "Suggest reload scan file". Otherwise,
> > it's not scalable to use the kernel log buffer like a utility's stdout.
>
> The two pr_err here are for really really grave errors and warrants being displayed to console,
> possibly indicating some fault with the particular core. They are never expected to come in normal
> course of testing on a working core.

Kernel log messages with user action recommendations belong in a user tool.

> But I see your larger point. We will convert all the pr_warn preceeding this (in message_not_tested())
> to pr_dbg so that they dont normally take up the kernel log buffer. (they are not as grave a scenario as the earlier one).
>
> The same information is also available from percpu sysfs/cpu#/ifs/status for user spaces tools to operate on

"Suggest retry" does not seem like "grave error" to me. User feedback
belongs in a user tool.

2022-03-08 08:05:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

On Mon, Mar 07, 2022 at 12:56:08PM -0800, Joseph, Jithu wrote:
>
>
> On 3/7/2022 12:25 PM, Dan Williams wrote:
>
> >
> > I am speaking of the state of the case where 2 threads are doing
> > run_test and polling for results. Unless you can guarantee that run2
> > does not start before the results of run1 have been collected then
> > they are lost in that scenario. No amount of kernel locking can
> > resolve that race to collect previous result which would not be a
> > problem in the first place if there was an atomic way to log test
> > results.
>
>
> Yes "status" shows the status of the latest run. You cannot get the status of the previous run.
>
> Also some context on test frequency: Hardware has strict rate limiting of tests.
> Every core can be tested only once in every 30 minutes. So it is pointless to test at high frequency.

What limits this, the kernel?

2022-03-08 08:22:36

by Luck, Tony

[permalink] [raw]
Subject: RE: [RFC 07/10] platform/x86/intel/ifs: Create kthreads for online cpus for scan test

>> These are software(driver) defined error codes. Rest of the error codes are supplied by
>> the hardware. Software defined error codes were kept at the other end to provide ample space
>> in case (future) hardware decides to provide extend error codes.
>
> Why put them in the same number space? Separate software results from
> the raw hardware results and have a separate mechanism to convey each.

We wanted to include in the "details" file, which is otherwise a direct copy of
the SCAN_STATUS MSR. Making sure the software error codes didn't overlap
with any h/w generated codes seemed like a good idea.

But maybe we should have done this with additional string values in the status
file:

Current:

pass
untested
fail

Add a couple of new options for the s/w cases:

sw_timeout
sw_retries_exceeded

-Tony

2022-03-08 10:09:11

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

On Mon, Mar 7, 2022 at 12:56 PM Joseph, Jithu <[email protected]> wrote:
>
>
>
> On 3/7/2022 12:25 PM, Dan Williams wrote:
>
> >
> > I am speaking of the state of the case where 2 threads are doing
> > run_test and polling for results. Unless you can guarantee that run2
> > does not start before the results of run1 have been collected then
> > they are lost in that scenario. No amount of kernel locking can
> > resolve that race to collect previous result which would not be a
> > problem in the first place if there was an atomic way to log test
> > results.
>
>
> Yes "status" shows the status of the latest run. You cannot get the status of the previous run.
>
> Also some context on test frequency: Hardware has strict rate limiting of tests.
> Every core can be tested only once in every 30 minutes. So it is pointless to test at high frequency.

Is that enforced in the ABI? Did I miss that detail in the
documentation? Does that per-core time limitation also mitigate the
race to read the global status? I.e. back-to-back tests on the same
core may need to be spaced by 30 minutes, but between different cores
as well. Again, these are questions that can not even be posed with an
ABI that eliminates the possibility of result races like uevent.
Please think through that suggestion, that's my main motivation for
continuing to poke at this ABI as currently presented.

2022-03-08 13:08:29

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

On Fri, Mar 4, 2022 at 12:42 PM Joseph, Jithu <[email protected]> wrote:
>
>
>
> On 3/3/2022 4:31 PM, Williams, Dan J wrote:
> > On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
> >> Implement sysfs interface to trigger ifs test for a targeted core or
> >> all cores. For all core testing, the kernel will start testing from core 0
> >> and proceed to the next core one after another. After the ifs test on the
> >> last core, the test stops until the administrator starts another round of
>
> >> +
> >> +/*
> >> + * The sysfs interface to check the test status:
> >> + * To check the result, for example, cpu0
> >> + * cat /sys/devices/system/cpu/cpu0/ifs/details
> >> + */
> >> +static ssize_t details_show(struct device *dev,
> >> + struct device_attribute *attr,
> >> + char *buf)
> >> +{
> >> + unsigned int cpu = dev->id;
> >> + int ret;
> >> +
> >> + if (down_trylock(&ifs_sem))
> >> + return -EBUSY;
> >
> > What is the ifs_sem protecting? This result is immediately invalid
> > after the lock is dropped anyway, so why hold it over reading the
> > value? You can't prevent 2 threads racing each other here.
>
> percpu thread running scan_test_worker() will update per_cpu(ifs_state, cpu).scan_details. (before signalling this thread to run, this lock would be acquired)
> This is to protect against the scenario where if the percpu thread is running a test and if at the same time a user is querying its status, they would see busy.

That begs the question why would userspace be polling this file? Is it
because it does not know when a test completes otherwise? How does it
know that the result it is seeing is from the test it ran and not some
other invocation to start a new test?

These questions would be easier to answer with a sample tool
implementation to look at even if that's only test code that lives in
tools/testing/. At first glance it seems possible to fake a scan to
test the ABI.

> >> +
> >> + ret = sprintf(buf, "%llx\n", per_cpu(ifs_state, cpu).scan_details);
> >
> > Should be sysfs_emit() which includes the page buffer safety.
>
> grep KH also pointed this out ... will replace this throughout
>
> >
> > Also, you likely want that format string to be %#llx so that userspace
> > knows explicitly that this is a hexadecimal value.
>
> Agreed will do this
>
>
> >> +
> >> +/*
> >> + * The sysfs interface for single core testing
> >> + * To start test, for example, cpu0
> >> + * echo 1 > /sys/devices/system/cpu/cpu0/ifs/run_test
> >> + * To check the result:
> >> + * cat /sys/devices/system/cpu/cpu0/ifs/result
> (there is a typo in the comment result -> status)
> >
> > Just have a CPU mask as an input parameter and avoid needing to hang
> > ifs sysfs attributes underneath /sys/device/system/cpu/ifs.
>
> The percpu sysfs has the additional function of providing percpu status and details.

That still does not answer the question about the input parameter for
selecting cores.

> The global interface is unable to provide the status and details for all the cores in the system. It does give a summary, which
> guides the user to the appropriate percpu status/details

It does not sound like the global sysfs interface is all that useful,
especially as it just expands the window for the global results to
become out of sync with the per-cpu results. With a uevent the tool
can get called to handle results on per-cpu / per-test,chunk basis
atomically. I.e. a udev rule like:

ACTION=="change", DRIVER=="ifs", SUBSYSTEM=="platform" \
RUN+="/bin/ifs-log
--testid=$env{IFS_TESTID}\t--cpu=$end{IFS_CPU}\t--result=$env{IFS_RESULT}\t--detail=$env{IFS_DETAIL}\t--hardware-status=$env{IFS_HWSTATUS}\t--software-status=$env{IFS_SWSTATUS}

...that way this retrieves all the relevant details without a need to
poll sysfs, it does so atomically so there is no worry about running
back to back tests as that will just increment IFS_TESTID to keep all
the runs distinct in the log, it allows for separation of software and
hardware error codes, and it's extensible for new fields if the need
arises.

>
>
> >> + */
> >> +static ssize_t allcpu_run_test_store(struct device *dev,
> >> + struct device_attribute *attr,
> >> + const char *buf, size_t count)
> >> +{
> >> + bool var;
> >> + int rc;
> >> +
> >> + if (ifs_disabled)
> >> + return -ENXIO;

I missed this earlier, but you could remove the sysfs ABI visibility
entirely when this happens with sysfs_update_group() rather than leave
dead ABI files.

2022-03-08 23:08:24

by Luck, Tony

[permalink] [raw]
Subject: RE: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

> > Also some context on test frequency: Hardware has strict rate limiting of tests.
> > Every core can be tested only once in every 30 minutes. So it is pointless to test at high frequency.
>
> What limits this, the kernel?

Microcode enforces the per-core repeat rate on the first implementation.

But this limit isn't architectural. Future implementations may not have
the same, or perhaps any, limit.

-Tony

2022-03-09 00:44:35

by Joseph, Jithu

[permalink] [raw]
Subject: Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface



On 3/7/2022 11:15 AM, Dan Williams wrote:
> On Mon, Mar 7, 2022 at 11:09 AM Joseph, Jithu <[email protected]> wrote:
>>
>>
>>
>> On 3/7/2022 9:38 AM, Dan Williams wrote:
>>> On Fri, Mar 4, 2022 at 12:42 PM Joseph, Jithu <[email protected]> wrote:
>>>>
>>>>
>>>>
>>
>>>>>> + */
>>>>>> +static ssize_t details_show(struct device *dev,
>>>>>> + struct device_attribute *attr,
>>>>>> + char *buf)
>>>>>> +{
>>>>>> + unsigned int cpu = dev->id;
>>>>>> + int ret;
>>>>>> +
>>>>>> + if (down_trylock(&ifs_sem))
>>>>>> + return -EBUSY;
>>>>>
>>>>> What is the ifs_sem protecting? This result is immediately invalid
>>>>> after the lock is dropped anyway, so why hold it over reading the
>>>>> value? You can't prevent 2 threads racing each other here.
>>>>
>>>> percpu thread running scan_test_worker() will update per_cpu(ifs_state, cpu).scan_details. (before signalling this thread to run, this lock would be acquired)
>>>> This is to protect against the scenario where if the percpu thread is running a test and if at the same time a user is querying its status, they would see busy.
>>>
>>> That begs the question why would userspace be polling this file? Is it
>>> because it does not know when a test completes otherwise? How does it
>>> know that the result it is seeing is from the test it ran and not some
>>> other invocation to start a new test?
>>
>> I think I should have explained the need for locking at a higher level .
>> It is to make sure that only one of the below action happens at any time
>>
>> 1. single percpu test
>> 2. all-cpu test (tests all cores sequentially)
>> 3. scan binary reload
>> 4. querying the status
>>
>> (There are h/w reasons for why we restrict to a single IFS test at any time)
>> If 4 happens while 1 or 2 is in progress , we return busy. My earlier explanation was trying to explain why we are preventing 4 when 1 or 2 is in progress.
>>
>> As to the question of why would a user do 4 while 1 or 2 is in progress, there is no reason for him to do that, both the run_test (percpu and global) are blocking.
>> But if he issues a test from one shell and uses another shell to query the status (while it is still in progress) he will see busy.
>
> ...and what about the race that the semaphore cannot solve of test run
> launch racing collection of previous run results?


pardon me if I am missing something obvious here … but everybody (the 4 scenarios listed above) tries to acquire the same semaphore, or returns -EBUSY if try_lock() fails.
So in case of triggering "run_test" and viewing "status" racing scenario you mention, the below are the 2 interleaving I see

if "echo 1 > /sys/devices/sysem/cpu/cpu10/ifs/run_test" gets the lock , the parallel "cat /sys/devices/sysem/cpu/cpu10/ifs/status" will return busy (and not the previous status)
if "cat /sys/devices/sysem/cpu/cpu10/ifs/status", gets the lock it will display the status of the last test result and the parallel "echo 1 > /sys/devices/sysem/cpu/cpu10/ifs/run_test" will fail with busy

This I believe is consistent behavior.

2022-03-11 06:14:32

by Kok, Auke

[permalink] [raw]
Subject: Re: [RFC 07/10] platform/x86/intel/ifs: Create kthreads for online cpus for scan test


On 3/7/22 09:46, Luck, Tony wrote:
>>> These are software(driver) defined error codes. Rest of the error codes are supplied by
>>> the hardware. Software defined error codes were kept at the other end to provide ample space
>>> in case (future) hardware decides to provide extend error codes.
>> Why put them in the same number space? Separate software results from
>> the raw hardware results and have a separate mechanism to convey each.
> We wanted to include in the "details" file, which is otherwise a direct copy of
> the SCAN_STATUS MSR. Making sure the software error codes didn't overlap
> with any h/w generated codes seemed like a good idea.
>
> But maybe we should have done this with additional string values in the status
> file:
>
> Current:
>
> pass
> untested
> fail
>
> Add a couple of new options for the s/w cases:
>
> sw_timeout
> sw_retries_exceeded


We've made a userspace implementation for this API already as part of
opendcdiag that uses it:

https://github.com/opendcdiag/opendcdiag/commit/0cbfcee30e0666b0f79a2e452d7f8167d2a0cb90

What I really like is that with this proposed API, we can unambiguously
determine whether "the core failed" or "everything is fine, for now" by
reading a single file. I hate to see this file become unusable because
its content changes from "pass" to "sw_timeout" or, even worse, it
changes from "fail" to "sw_timeout". That would render it useless for
the purpose that I think our users will be looking at it.

So, my preference would be to keep this file functioning as-is in this
patch series.

I would think that some sort of expandable "statistics" file would be a
better way to output various metrics:

```

sw_timeout: 0

sw_retries_exceeded: 2

runs: 42

first_run: 1405529347

last_run: 1646948140

<etc..>

```

just as a suggested alternative for more/incompatble output values or a
complex, dynamic format.

I don't have any use in opendcdiag for these values and data. If someone
does, they should want to chime in perhaps.


Auke

2022-03-15 15:59:17

by Luck, Tony

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Tue, Mar 01, 2022 at 09:14:26PM +0100, Greg KH wrote:
> On Tue, Mar 01, 2022 at 09:10:20PM +0100, Greg KH wrote:
> > On Tue, Mar 01, 2022 at 11:54:47AM -0800, Jithu Joseph wrote:
> > > Note to Maintainers:
> > > Requesting x86 Maintainers to take a look at patch01 as it
> > > touches arch/x86 portion of the kernel. Also would like to guide them
> > > to patch07 which sets up hotplug notifiers and creates kthreads.
> > >
> > > Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to review it.
> > >
> > > Requesting Greg KH to review the sysfs changes added by patch08.
> >
> > "RFC" means you are not comfortable submitting the changes yet, so you
> > don't need my review at this point in time. Become confident in your
> > changes before asking for others to review the code please.
>
> Hint, it needs work, sysfs_emit() for one thing, lack of reference
> counting on your cpu objects is another...

Greg,

Thanks for the comments. They triggered a bunch of internal
re-thinking of the interface. One idea that has some traction
(Credit/Blame: Dan Williams) is to:

1) Don't put anything in /sys/devices/system/cpu/*
2) Driver creates some info/control files in its own
corner of /sys/devices/.../ifs
3) No per-cpu files ... run a test with:
# echo ${cpu} > /sys/devices/.../ifs/run_test
4) No test result files.
When tests complete they report using uevents

Using uevent to report means that we can easily have
mutiple parts to the result (pass/fail/incomplete status, as well
as diagnostic details about the reason for the failure,
or why the test was not completed).

This seems a novel use of uevent ... is it OK, or is is abuse?

Thanks

-Tony

2022-03-15 16:45:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Mon, Mar 14, 2022 at 04:10:39PM -0700, Luck, Tony wrote:
> On Tue, Mar 01, 2022 at 09:14:26PM +0100, Greg KH wrote:
> > On Tue, Mar 01, 2022 at 09:10:20PM +0100, Greg KH wrote:
> > > On Tue, Mar 01, 2022 at 11:54:47AM -0800, Jithu Joseph wrote:
> > > > Note to Maintainers:
> > > > Requesting x86 Maintainers to take a look at patch01 as it
> > > > touches arch/x86 portion of the kernel. Also would like to guide them
> > > > to patch07 which sets up hotplug notifiers and creates kthreads.
> > > >
> > > > Patch 2/10 - Adds Documentation. Requesting Documentation maintainer to review it.
> > > >
> > > > Requesting Greg KH to review the sysfs changes added by patch08.
> > >
> > > "RFC" means you are not comfortable submitting the changes yet, so you
> > > don't need my review at this point in time. Become confident in your
> > > changes before asking for others to review the code please.
> >
> > Hint, it needs work, sysfs_emit() for one thing, lack of reference
> > counting on your cpu objects is another...
>
> Greg,
>
> Thanks for the comments. They triggered a bunch of internal
> re-thinking of the interface. One idea that has some traction
> (Credit/Blame: Dan Williams) is to:

First off, I did not pay attention to this thread at all, given that the
very basics of this patch series had such obvious problems. I only saw
the contents, not the context in which you wanted to make these changes.

So I have no real thoughts as to what your design should be, as I have
no idea what it is you even want to accomplish at all.

That being said, I do have one comment:

> 1) Don't put anything in /sys/devices/system/cpu/*
> 2) Driver creates some info/control files in its own
> corner of /sys/devices/.../ifs
> 3) No per-cpu files ... run a test with:
> # echo ${cpu} > /sys/devices/.../ifs/run_test
> 4) No test result files.
> When tests complete they report using uevents
>
> Using uevent to report means that we can easily have
> mutiple parts to the result (pass/fail/incomplete status, as well
> as diagnostic details about the reason for the failure,
> or why the test was not completed).
>
> This seems a novel use of uevent ... is it OK, or is is abuse?

Don't create "novel" uses of uevents. They are there to express a
change in state of a device so that userspace can then go and do
something with that information. If that pattern fits here, wonderful.

I doubt you can report "test results" via a uevent in a way that the
current uevent states and messages would properly convey, but hey, maybe
I'm wrong.

good luck!

greg k-h

2022-03-15 22:43:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Tue, Mar 15, 2022 at 02:59:03PM +0000, Luck, Tony wrote:
> >> This seems a novel use of uevent ... is it OK, or is is abuse?
> >
> > Don't create "novel" uses of uevents. They are there to express a
> > change in state of a device so that userspace can then go and do
> > something with that information. If that pattern fits here, wonderful.
>
> Maybe Dan will chime in here to better explain his idea. I think for
> the case where the core test fails, there is a good match with uevent.
> The device (one CPU core) has changed state from "working" to
> "untrustworthy". Userspace can do things like: take the logical CPUs
> on that core offline, initiate a service call, or in a VMM cluster environment
> migrate work to a different node.

Again, I have no idea what you are doing at all with this driver, nor
what you want to do with it.

Start over please.

What is the hardware you have to support?

What is the expectation from userspace with regards to using the
hardware?

> > I doubt you can report "test results" via a uevent in a way that the
> > current uevent states and messages would properly convey, but hey, maybe
> > I'm wrong.
>
> But here things get a bit sketchy. Reporting "pass", or "didn't complete the test"
> isn't a state change. But it seems like a poor interface if there is no feedback
> that the test was run. Using different methods to report pass/fail/incomplete
> also seems user hostile.

We have an in-kernel "test" framework. Yes, it's for kernel code, but
why not extend that to also include hardware tests?

thanks,

greg k-h

2022-03-16 13:29:17

by Luck, Tony

[permalink] [raw]
Subject: RE: [RFC 00/10] Introduce In Field Scan driver

>> This seems a novel use of uevent ... is it OK, or is is abuse?
>
> Don't create "novel" uses of uevents. They are there to express a
> change in state of a device so that userspace can then go and do
> something with that information. If that pattern fits here, wonderful.

Maybe Dan will chime in here to better explain his idea. I think for
the case where the core test fails, there is a good match with uevent.
The device (one CPU core) has changed state from "working" to
"untrustworthy". Userspace can do things like: take the logical CPUs
on that core offline, initiate a service call, or in a VMM cluster environment
migrate work to a different node.

> I doubt you can report "test results" via a uevent in a way that the
> current uevent states and messages would properly convey, but hey, maybe
> I'm wrong.

But here things get a bit sketchy. Reporting "pass", or "didn't complete the test"
isn't a state change. But it seems like a poor interface if there is no feedback
that the test was run. Using different methods to report pass/fail/incomplete
also seems user hostile.

> good luck!
Thanks ... we may need it :-)

-Tony

2022-03-16 15:07:20

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Tue, Mar 15, 2022 at 8:27 AM Greg KH <[email protected]> wrote:
>
> On Tue, Mar 15, 2022 at 02:59:03PM +0000, Luck, Tony wrote:
> > >> This seems a novel use of uevent ... is it OK, or is is abuse?
> > >
> > > Don't create "novel" uses of uevents. They are there to express a
> > > change in state of a device so that userspace can then go and do
> > > something with that information. If that pattern fits here, wonderful.
> >
> > Maybe Dan will chime in here to better explain his idea. I think for
> > the case where the core test fails, there is a good match with uevent.
> > The device (one CPU core) has changed state from "working" to
> > "untrustworthy". Userspace can do things like: take the logical CPUs
> > on that core offline, initiate a service call, or in a VMM cluster environment
> > migrate work to a different node.
>
> Again, I have no idea what you are doing at all with this driver, nor
> what you want to do with it.
>
> Start over please.
>
> What is the hardware you have to support?
>
> What is the expectation from userspace with regards to using the
> hardware?

Here is what I have learned about this driver since engaging on this
patch set. Cores go bad at run time. Datacenters can detect them at
scale. When I worked at Facebook there was an epic story of debugging
random user login failures that resulted in the discovery of a
marginal lot-number of CPUs in a certain cluster. In that case the
crypto instructions on a few cores of those CPUs gave wrong answers.
Whether that was an electromigration effect, or just a marginal bin of
CPUs, the only detection method was A-B testing different clusters of
CPUs to isolate the differences.

This driver takes advantage of a CPU feature to inject a diagnostic
test similar to what can be done via JTAG to validate the
functionality of a given core on a CPU at a low level. The diagnostic
is run periodically since some failures may be sensitive to thermals
while other failures may be be related to the lifetime of the CPU. The
result of the diagnostic is "here are 1 or more cores that may
miscalculate, stop using them and replace the CPU".

At a base level the ABI need only be something that conveys "core X
failed its last diagnostic". All the other details are just extra, and
in my opinion can be dropped save for maybe "core X was unable to run
the diagnostic".

The thought process that got me from the proposal on the table "extend
/sys/devices/system/cpu with per-cpu result state and other details"
to "emit uevents on each test completion" were the following:
-The complexity and maintenance burden of dynamically extending
/sys/devices/system/cpu: Given that you identified a reference
counting issue, I wondered why this was trying to use
/sys/devices/system/cpu in the first instance.

- The result of the test is an event that kicks off remediation
actions: When this fails a tech is paged to replace the CPU and in the
meantime the system can either be taken offline, or if some of the
cores are still good the workloads can be moved off of the bad cores
to keep some capacity online until the replacement can be made.

- KOBJ_CHANGE uevents are already deployed in NVME for AEN
(Asynchronous Event Notifications): If the results of the test were
conveyed only in sysfs then there would be a program that would scrape
sysfs and turn around and fire an event for the downstream remediation
actions. Uevent cuts to the chase and lets udev rule policy log,
notify, and/or take pre-emptive CPU offline action. The CPU state has
changed after a test run. It has either changed to a failed CPU, or it
has changed to one that has recently asserted its health.

> > > I doubt you can report "test results" via a uevent in a way that the
> > > current uevent states and messages would properly convey, but hey, maybe
> > > I'm wrong.
> >
> > But here things get a bit sketchy. Reporting "pass", or "didn't complete the test"
> > isn't a state change. But it seems like a poor interface if there is no feedback
> > that the test was run. Using different methods to report pass/fail/incomplete
> > also seems user hostile.
>
> We have an in-kernel "test" framework. Yes, it's for kernel code, but
> why not extend that to also include hardware tests?

This is where my head was at when starting out with this, but this is
more of an asynchronous error reporting mechanism like machine check,
or PCIe AER, than a test. The only difference being that the error in
this case is only reported by first requesting an error check. So it
is more similar to something like a background patrol scrub that seeks
out latent ECC errors in memory.

2022-03-16 15:10:47

by Luck, Tony

[permalink] [raw]
Subject: RE: [RFC 00/10] Introduce In Field Scan driver

> Again, I have no idea what you are doing at all with this driver, nor
> what you want to do with it.
>
> Start over please.

TL;DR is that silicon ages and some things break that don't have parity/ECC checks.
So systems start behaving erratically. If you are lucky they crash. If you are less lucky
they give incorrect results.

There's a paper (and even a movie 11 minutes) that describe the research by
Google on this.
https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s01-hochschild.pdf
(https://www.youtube.com/watch?v=QMF3rqhjYuM)

> What is the hardware you have to support?

Feature first available in Sapphire Rapids (Xeon: coming later this year)

> What is the expectation from userspace with regards to using the
> hardware?

Expectation from users is that they can run these tests frequently (many times
per day) to catch silicon that has developed faults quickly and take action to
isolate the cores that have issues.

On HT enabled systems both threads that share a core need to be put into
test mode together. The current version of tests takes around 50 milli-seconds
(so for many workloads doesn't need much prep ... those with high sensitivity
to latency would need to do some additional userspace task binding to make
sure those workloads were moved to another core while the h/w test runs).

There are three outcomes from running a test:

1) The test passes all stages.
2) The test did not complete (for a variety of reasons, e.g. power states)
3) The test indicates failure. Recommendation is to run one more time in case
the failure was transient .. e.g. cause by a neutron/alpha strike.

-Tony

2022-03-17 03:28:23

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Tue, Mar 15, 2022 at 9:04 AM Dan Williams <[email protected]> wrote:
>
> On Tue, Mar 15, 2022 at 8:27 AM Greg KH <[email protected]> wrote:
> >
> > On Tue, Mar 15, 2022 at 02:59:03PM +0000, Luck, Tony wrote:
> > > >> This seems a novel use of uevent ... is it OK, or is is abuse?
> > > >
> > > > Don't create "novel" uses of uevents. They are there to express a
> > > > change in state of a device so that userspace can then go and do
> > > > something with that information. If that pattern fits here, wonderful.
> > >
> > > Maybe Dan will chime in here to better explain his idea. I think for
> > > the case where the core test fails, there is a good match with uevent.
> > > The device (one CPU core) has changed state from "working" to
> > > "untrustworthy". Userspace can do things like: take the logical CPUs
> > > on that core offline, initiate a service call, or in a VMM cluster environment
> > > migrate work to a different node.
> >
> > Again, I have no idea what you are doing at all with this driver, nor
> > what you want to do with it.
> >
> > Start over please.
> >
> > What is the hardware you have to support?
> >
> > What is the expectation from userspace with regards to using the
> > hardware?
>
> Here is what I have learned about this driver since engaging on this
> patch set. Cores go bad at run time. Datacenters can detect them at
> scale.

Tony pointed me to this video if you have not seen it:

https://www.youtube.com/watch?v=QMF3rqhjYuM

2022-03-17 04:04:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC 00/10] Introduce In Field Scan driver

On Tue, Mar 15, 2022 at 04:10:59PM +0000, Luck, Tony wrote:
> > Again, I have no idea what you are doing at all with this driver, nor
> > what you want to do with it.
> >
> > Start over please.
>
> TL;DR is that silicon ages and some things break that don't have parity/ECC checks.
> So systems start behaving erratically. If you are lucky they crash. If you are less lucky
> they give incorrect results.
>
> There's a paper (and even a movie 11 minutes) that describe the research by
> Google on this.
> https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s01-hochschild.pdf
> (https://www.youtube.com/watch?v=QMF3rqhjYuM)

Both you and Dan are assuming that I actually care about this hardware
and driver enough to read a presentation or watch a video about it.
Sorry, but that's not happening :)

I'm saying these questions as you all need to be asking yourself that,
and figuring out what the proper api is. That's not my job here. I was
just pointing out the problems in your original submission that you all
should have caught before sending it out...

good luck!

greg k-h

2022-03-21 23:34:21

by Luck, Tony

[permalink] [raw]
Subject: Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

On Fri, Mar 04, 2022 at 09:01:03PM +0000, Luck, Tony wrote:
> >> You could just cut to the chase and do: sysfs_eq(buf, "1")
> >
> > Thanks will use this
>
> $ git grep sysfs_eq
> $
>
> I don't see this defined (or used) anywhere in the kernel. Is
> it spelled differently from how typed here?

Just to close out this thread ... the function name
that Dan was looking for was sysfs_streq().

-Tony