2024-05-21 21:11:42

by Zaid Alali

[permalink] [raw]
Subject: [RFC PATCH v2 0/8] Enable EINJv2 support

This patch set intends to enable EINJv2 support. The goal of this
update is to allow the driver to simultaneously support EINJ and EINJv2.
The implementation follows a proposed ACPI specs(1)(2) that enables the
driver to discover system capabilities through GET_ERROR_TYPE.

Note: this revision includes new updates from the last review, where
some changes remove redundant code as well as converting decimal
values to hex for consistency purposes. This revision also includes
CXL error injection updates.

Note: The first two ACPICA patches are to be dropped once merged in
ACPICA project(3).

(1) https://bugzilla.tianocore.org/show_bug.cgi?id=4615
(2) https://bugzilla.tianocore.org/attachment.cgi?id=1446
(3) https://lore.kernel.org/acpica-devel/[email protected]/

Zaid Alali (8):
ACPICA: Update values to hex to follow ACPI specs
ACPICA: Add EINJv2 get error type action
ACPI: APEI: EINJ: Remove redundant calls to
einj_get_available_error_type
ACPI: APEI: EINJ: Enable the discovery of EINJv2 capabilities
ACPI: APEI: EINJ: Add einjv2 extension struct
ACPI: APEI: EINJ: Add debugfs files for EINJv2 support
ACPI: APEI: EINJ: Enable EINJv2 error injections
ACPI: APEI: EINJ: Update the documentation for EINJv2 support

.../firmware-guide/acpi/apei/einj.rst | 51 ++++-
drivers/acpi/apei/apei-internal.h | 2 +-
drivers/acpi/apei/einj-core.c | 177 +++++++++++++++---
drivers/acpi/apei/einj-cxl.c | 2 +-
include/acpi/actbl1.h | 25 +--
5 files changed, 214 insertions(+), 43 deletions(-)

--
2.34.1



2024-05-21 21:12:06

by Zaid Alali

[permalink] [raw]
Subject: [RFC PATCH v2 1/8] ACPICA: Update values to hex to follow ACPI specs

ACPI specs(1) define Error Injection Actions in hex values.
This commit intends to update values from decimal to hex to be
consistent with ACPI specs. This commit and the following one are
not to be merged and will come form ACPICA project(2).

(1) https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html
(2) https://lore.kernel.org/acpica-devel/[email protected]/

Signed-off-by: Zaid Alali <[email protected]>
---
include/acpi/actbl1.h | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 841ef9f22795..b321d481b09a 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -1017,18 +1017,18 @@ struct acpi_einj_entry {
/* Values for Action field above */

enum acpi_einj_actions {
- ACPI_EINJ_BEGIN_OPERATION = 0,
- ACPI_EINJ_GET_TRIGGER_TABLE = 1,
- ACPI_EINJ_SET_ERROR_TYPE = 2,
- ACPI_EINJ_GET_ERROR_TYPE = 3,
- ACPI_EINJ_END_OPERATION = 4,
- ACPI_EINJ_EXECUTE_OPERATION = 5,
- ACPI_EINJ_CHECK_BUSY_STATUS = 6,
- ACPI_EINJ_GET_COMMAND_STATUS = 7,
- ACPI_EINJ_SET_ERROR_TYPE_WITH_ADDRESS = 8,
- ACPI_EINJ_GET_EXECUTE_TIMINGS = 9,
- ACPI_EINJ_ACTION_RESERVED = 10, /* 10 and greater are reserved */
- ACPI_EINJ_TRIGGER_ERROR = 0xFF /* Except for this value */
+ ACPI_EINJ_BEGIN_OPERATION = 0x0,
+ ACPI_EINJ_GET_TRIGGER_TABLE = 0x1,
+ ACPI_EINJ_SET_ERROR_TYPE = 0x2,
+ ACPI_EINJ_GET_ERROR_TYPE = 0x3,
+ ACPI_EINJ_END_OPERATION = 0x4,
+ ACPI_EINJ_EXECUTE_OPERATION = 0x5,
+ ACPI_EINJ_CHECK_BUSY_STATUS = 0x6,
+ ACPI_EINJ_GET_COMMAND_STATUS = 0x7,
+ ACPI_EINJ_SET_ERROR_TYPE_WITH_ADDRESS = 0x8,
+ ACPI_EINJ_GET_EXECUTE_TIMINGS = 0x9,
+ ACPI_EINJ_ACTION_RESERVED = 0xA, /* 0xA and greater are reserved */
+ ACPI_EINJ_TRIGGER_ERROR = 0xFF /* Except for this value */
};

/* Values for Instruction field above */
--
2.34.1


2024-05-21 21:12:13

by Zaid Alali

[permalink] [raw]
Subject: [RFC PATCH v2 2/8] ACPICA: Add EINJv2 get error type action

This commit adds EINJV2_GET_ERROR_TYPE as defined in the proposed
specs(1)(2).

Proposed ACPI spces for EINJv2:
(1) https://bugzilla.tianocore.org/show_bug.cgi?id=4615
(2) https://bugzilla.tianocore.org/attachment.cgi?id=1446

This commit is not a direct merge, it will come from ACPICA
project(3).

(3) https://lore.kernel.org/acpica-devel/[email protected]/

Signed-off-by: Zaid Alali <[email protected]>
---
include/acpi/actbl1.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index b321d481b09a..35a90a92276e 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -1027,7 +1027,8 @@ enum acpi_einj_actions {
ACPI_EINJ_GET_COMMAND_STATUS = 0x7,
ACPI_EINJ_SET_ERROR_TYPE_WITH_ADDRESS = 0x8,
ACPI_EINJ_GET_EXECUTE_TIMINGS = 0x9,
- ACPI_EINJ_ACTION_RESERVED = 0xA, /* 0xA and greater are reserved */
+ ACPI_EINJV2_GET_ERROR_TYPE = 0x11,
+ ACPI_EINJ_ACTION_RESERVED = 0x12, /* 0x12 and greater are reserved */
ACPI_EINJ_TRIGGER_ERROR = 0xFF /* Except for this value */
};

--
2.34.1


2024-05-21 21:12:46

by Zaid Alali

[permalink] [raw]
Subject: [RFC PATCH v2 4/8] ACPI: APEI: EINJ: Enable the discovery of EINJv2 capabilities

Enable the driver to show all supported error injections for EINJ
and EINJv2 at the same time. EINJv2 capabilities can be discovered
by checking the return value of get_error_type, where bit 30 set
indicates EINJv2 support.

Signed-off-by: Zaid Alali <[email protected]>
---
drivers/acpi/apei/apei-internal.h | 2 +-
drivers/acpi/apei/einj-core.c | 35 ++++++++++++++++++++++++-------
drivers/acpi/apei/einj-cxl.c | 2 +-
3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
index cd2766c69d78..9a3dbaeed39a 100644
--- a/drivers/acpi/apei/apei-internal.h
+++ b/drivers/acpi/apei/apei-internal.h
@@ -131,7 +131,7 @@ static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)

int apei_osc_setup(void);

-int einj_get_available_error_type(u32 *type);
+int einj_get_available_error_type(u32 *type, int version);
int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3,
u64 param4);
int einj_cxl_rch_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
index b1bbbee9c664..cc5ad1f45ea4 100644
--- a/drivers/acpi/apei/einj-core.c
+++ b/drivers/acpi/apei/einj-core.c
@@ -33,6 +33,7 @@
#define SLEEP_UNIT_MAX 5000 /* 5ms */
/* Firmware should respond within 1 seconds */
#define FIRMWARE_TIMEOUT (1 * USEC_PER_SEC)
+#define ACPI65_EINJV2_SUPP BIT(30)
#define ACPI5_VENDOR_BIT BIT(31)
#define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
ACPI_EINJ_MEMORY_UNCORRECTABLE | \
@@ -84,6 +85,7 @@ static struct debugfs_blob_wrapper vendor_errors;
static char vendor_dev[64];

static u32 available_error_type;
+static u32 available_error_type_v2;

/*
* Some BIOSes allow parameters to the SET_ERROR_TYPE entries in the
@@ -159,13 +161,13 @@ static void einj_exec_ctx_init(struct apei_exec_context *ctx)
EINJ_TAB_ENTRY(einj_tab), einj_tab->entries);
}

-static int __einj_get_available_error_type(u32 *type)
+static int __einj_get_available_error_type(u32 *type, int version)
{
struct apei_exec_context ctx;
int rc;

einj_exec_ctx_init(&ctx);
- rc = apei_exec_run(&ctx, ACPI_EINJ_GET_ERROR_TYPE);
+ rc = apei_exec_run(&ctx, version);
if (rc)
return rc;
*type = apei_exec_ctx_get_output(&ctx);
@@ -174,12 +176,12 @@ static int __einj_get_available_error_type(u32 *type)
}

/* Get error injection capabilities of the platform */
-int einj_get_available_error_type(u32 *type)
+int einj_get_available_error_type(u32 *type, int version)
{
int rc;

mutex_lock(&einj_mutex);
- rc = __einj_get_available_error_type(type);
+ rc = __einj_get_available_error_type(type, version);
mutex_unlock(&einj_mutex);

return rc;
@@ -647,15 +649,27 @@ static struct { u32 mask; const char *str; } const einj_error_type_string[] = {
{ BIT(11), "Platform Uncorrectable fatal"},
{ BIT(31), "Vendor Defined Error Types" },
};
+static struct { u32 mask; const char *str; } const einjv2_error_type_string[] = {
+ { BIT(0), "EINJV2 Processor Error" },
+ { BIT(1), "EINJV2 Memory Error" },
+ { BIT(2), "EINJV2 PCI Express Error" },
+};

static int available_error_type_show(struct seq_file *m, void *v)
{

+ seq_printf(m, "EINJ error types:\n");
for (int pos = 0; pos < ARRAY_SIZE(einj_error_type_string); pos++)
if (available_error_type & einj_error_type_string[pos].mask)
seq_printf(m, "0x%08x\t%s\n", einj_error_type_string[pos].mask,
- einj_error_type_string[pos].str);
-
+ einj_error_type_string[pos].str);
+ if (available_error_type & ACPI65_EINJV2_SUPP) {
+ seq_printf(m, "EINJv2 error types:\n");
+ for (int pos = 0; pos < ARRAY_SIZE(einjv2_error_type_string); pos++)
+ if (available_error_type_v2 & einjv2_error_type_string[pos].mask)
+ seq_printf(m, "0x%08x\t%s\n", einjv2_error_type_string[pos].mask,
+ einjv2_error_type_string[pos].str);
+ }
return 0;
}

@@ -692,7 +706,7 @@ int einj_validate_error_type(u64 type)
if (tval & (tval - 1))
return -EINVAL;
if (!vendor)
- if (!(type & available_error_type))
+ if (!(type & (available_error_type | available_error_type_v2)))
return -EINVAL;

return 0;
@@ -769,9 +783,14 @@ static int __init einj_probe(struct platform_device *pdev)
goto err_put_table;
}

- rc = einj_get_available_error_type(&available_error_type);
+ rc = einj_get_available_error_type(&available_error_type, ACPI_EINJ_GET_ERROR_TYPE);
if (rc)
return rc;
+ if (available_error_type & ACPI65_EINJV2_SUPP) {
+ rc = einj_get_available_error_type(&available_error_type_v2, ACPI_EINJV2_GET_ERROR_TYPE);
+ if (rc)
+ return rc;
+ }

rc = -ENOMEM;
einj_debug_dir = debugfs_create_dir("einj", apei_get_debugfs_dir());
diff --git a/drivers/acpi/apei/einj-cxl.c b/drivers/acpi/apei/einj-cxl.c
index 8b8be0c90709..25adc9b03d18 100644
--- a/drivers/acpi/apei/einj-cxl.c
+++ b/drivers/acpi/apei/einj-cxl.c
@@ -30,7 +30,7 @@ int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
int cxl_err, rc;
u32 available_error_type = 0;

- rc = einj_get_available_error_type(&available_error_type);
+ rc = einj_get_available_error_type(&available_error_type, ACPI_EINJ_GET_ERROR_TYPE);
if (rc)
return rc;

--
2.34.1


2024-05-21 21:12:46

by Zaid Alali

[permalink] [raw]
Subject: [RFC PATCH v2 3/8] ACPI: APEI: EINJ: Remove redundant calls to einj_get_available_error_type

A single call to einj_get_available_error_type in init function is
sufficient to save the return value in a global variable to be used
later in various places in the code. This commit does not introduce
any functional changes, but only removing unnecessary redundant
function calls.

Signed-off-by: Zaid Alali <[email protected]>
---
drivers/acpi/apei/einj-core.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
index 9515bcfe5e97..b1bbbee9c664 100644
--- a/drivers/acpi/apei/einj-core.c
+++ b/drivers/acpi/apei/einj-core.c
@@ -83,6 +83,8 @@ static struct debugfs_blob_wrapper vendor_blob;
static struct debugfs_blob_wrapper vendor_errors;
static char vendor_dev[64];

+static u32 available_error_type;
+
/*
* Some BIOSes allow parameters to the SET_ERROR_TYPE entries in the
* EINJ table through an unpublished extension. Use with caution as
@@ -648,14 +650,9 @@ static struct { u32 mask; const char *str; } const einj_error_type_string[] = {

static int available_error_type_show(struct seq_file *m, void *v)
{
- int rc;
- u32 error_type = 0;

- rc = einj_get_available_error_type(&error_type);
- if (rc)
- return rc;
for (int pos = 0; pos < ARRAY_SIZE(einj_error_type_string); pos++)
- if (error_type & einj_error_type_string[pos].mask)
+ if (available_error_type & einj_error_type_string[pos].mask)
seq_printf(m, "0x%08x\t%s\n", einj_error_type_string[pos].mask,
einj_error_type_string[pos].str);

@@ -678,8 +675,7 @@ bool einj_is_cxl_error_type(u64 type)

int einj_validate_error_type(u64 type)
{
- u32 tval, vendor, available_error_type = 0;
- int rc;
+ u32 tval, vendor;

/* Only low 32 bits for error type are valid */
if (type & GENMASK_ULL(63, 32))
@@ -695,13 +691,9 @@ int einj_validate_error_type(u64 type)
/* Only one error type can be specified */
if (tval & (tval - 1))
return -EINVAL;
- if (!vendor) {
- rc = einj_get_available_error_type(&available_error_type);
- if (rc)
- return rc;
+ if (!vendor)
if (!(type & available_error_type))
return -EINVAL;
- }

return 0;
}
@@ -777,6 +769,10 @@ static int __init einj_probe(struct platform_device *pdev)
goto err_put_table;
}

+ rc = einj_get_available_error_type(&available_error_type);
+ if (rc)
+ return rc;
+
rc = -ENOMEM;
einj_debug_dir = debugfs_create_dir("einj", apei_get_debugfs_dir());

--
2.34.1


2024-05-21 21:12:58

by Zaid Alali

[permalink] [raw]
Subject: [RFC PATCH v2 5/8] ACPI: APEI: EINJ: Add einjv2 extension struct

Add einjv2 extension struct and EINJv2 error types to prepare
the driver for EINJv2 support. ACPI specifications(1) enables
EINJv2 by extending set_error_type_with_address strcut.

Signed-off-by: Zaid Alali <[email protected]>
---
drivers/acpi/apei/einj-core.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)

diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
index cc5ad1f45ea4..2021bea02996 100644
--- a/drivers/acpi/apei/einj-core.c
+++ b/drivers/acpi/apei/einj-core.c
@@ -50,6 +50,28 @@
*/
static int acpi5;

+struct syndrome_array {
+ union {
+ u32 acpi_id;
+ u32 device_id;
+ u32 pcie_sbdf;
+ u8 fru_id[16];
+ } comp_id;
+ union {
+ u32 proc_synd;
+ u32 mem_synd;
+ u32 pcie_synd;
+ u8 vendor_synd[16];
+ } comp_synd;
+};
+
+struct einjv2_extension_struct {
+ u32 length;
+ u16 revision;
+ u16 component_arr_count;
+ struct syndrome_array component_arr[];
+};
+
struct set_error_type_with_address {
u32 type;
u32 vendor_extension;
@@ -58,6 +80,7 @@ struct set_error_type_with_address {
u64 memory_address;
u64 memory_address_range;
u32 pcie_sbdf;
+ struct einjv2_extension_struct einjv2_struct;
};
enum {
SETWA_FLAGS_APICID = 1,
--
2.34.1


2024-05-21 21:13:34

by Zaid Alali

[permalink] [raw]
Subject: [RFC PATCH v2 6/8] ACPI: APEI: EINJ: Add debugfs files for EINJv2 support

Create a debugfs blob file to be used for reading the user
input for the component array. EINJv2 enables users to inject
errors to multiple components/devices at the same time using
component array.

Signed-off-by: Zaid Alali <[email protected]>
---
drivers/acpi/apei/einj-core.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
index 2021bea02996..2e30ebed079b 100644
--- a/drivers/acpi/apei/einj-core.c
+++ b/drivers/acpi/apei/einj-core.c
@@ -33,6 +33,7 @@
#define SLEEP_UNIT_MAX 5000 /* 5ms */
/* Firmware should respond within 1 seconds */
#define FIRMWARE_TIMEOUT (1 * USEC_PER_SEC)
+#define COMP_ARR_SIZE 1024
#define ACPI65_EINJV2_SUPP BIT(30)
#define ACPI5_VENDOR_BIT BIT(31)
#define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
@@ -107,6 +108,9 @@ static struct debugfs_blob_wrapper vendor_blob;
static struct debugfs_blob_wrapper vendor_errors;
static char vendor_dev[64];

+static struct debugfs_blob_wrapper einjv2_component_arr;
+static u64 component_count;
+static void *user_input;
static u32 available_error_type;
static u32 available_error_type_v2;

@@ -859,6 +863,19 @@ static int __init einj_probe(struct platform_device *pdev)
&error_param4);
debugfs_create_x32("notrigger", S_IRUSR | S_IWUSR,
einj_debug_dir, &notrigger);
+ if (available_error_type & ACPI65_EINJV2_SUPP) {
+ debugfs_create_x64("einjv2_component_count", S_IRUSR | S_IWUSR,
+ einj_debug_dir, &component_count);
+ user_input = kzalloc(COMP_ARR_SIZE, GFP_KERNEL);
+ if (!user_input) {
+ rc = -ENOMEM;
+ goto err_release;
+ }
+ einjv2_component_arr.data = user_input;
+ einjv2_component_arr.size = COMP_ARR_SIZE;
+ debugfs_create_blob("einjv2_component_array", S_IRUSR | S_IWUSR,
+ einj_debug_dir, &einjv2_component_arr);
+ }
}

if (vendor_dev[0]) {
@@ -908,6 +925,7 @@ static void __exit einj_remove(struct platform_device *pdev)
apei_resources_fini(&einj_resources);
debugfs_remove_recursive(einj_debug_dir);
acpi_put_table((struct acpi_table_header *)einj_tab);
+ kfree(user_input);
}

static struct platform_device *einj_dev;
--
2.34.1


2024-05-21 21:14:03

by Zaid Alali

[permalink] [raw]
Subject: [RFC PATCH v2 7/8] ACPI: APEI: EINJ: Enable EINJv2 error injections

Enable the driver to inject EINJv2 type errors. The component
array values are parsed from user_input and expected to contain
hex values for component id and syndrome separated by space,
and multiple components are separated by new line as follows:

component_id1 component_syndrome1
component_id2 component_syndrome2
:
component_id(n) component_syndrome(n)

for example:

$comp_arr="0x1 0x2
>0x1 0x4
>0x2 0x4"
$cd /sys/kernel/debug/apei/einj/
$echo "$comp_arr" > einjv2_component_array

Signed-off-by: Zaid Alali <[email protected]>
---
drivers/acpi/apei/einj-core.c | 81 ++++++++++++++++++++++++++++++++---
1 file changed, 75 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
index 2e30ebed079b..2e5c00b34a4b 100644
--- a/drivers/acpi/apei/einj-core.c
+++ b/drivers/acpi/apei/einj-core.c
@@ -87,6 +87,13 @@ enum {
SETWA_FLAGS_APICID = 1,
SETWA_FLAGS_MEM = 2,
SETWA_FLAGS_PCIE_SBDF = 4,
+ SETWA_FLAGS_EINJV2 = 8,
+};
+
+enum {
+ EINJV2_PROCESSOR_ERROR = 0x1,
+ EINJV2_MEMORY_ERROR = 0x2,
+ EINJV2_PCIE_ERROR = 0x4,
};

/*
@@ -111,6 +118,7 @@ static char vendor_dev[64];
static struct debugfs_blob_wrapper einjv2_component_arr;
static u64 component_count;
static void *user_input;
+static int nr_components;
static u32 available_error_type;
static u32 available_error_type_v2;

@@ -287,8 +295,18 @@ static void *einj_get_parameter_address(void)

v5param = acpi_os_map_iomem(pa_v5, sizeof(*v5param));
if (v5param) {
+ int offset, len;
+
acpi5 = 1;
check_vendor_extension(pa_v5, v5param);
+ if (available_error_type & ACPI65_EINJV2_SUPP) {
+ len = v5param->einjv2_struct.length;
+ offset = offsetof(struct einjv2_extension_struct, component_arr);
+ nr_components = (len - offset) / 32;
+ acpi_os_unmap_iomem(v5param, sizeof(*v5param));
+ v5param = acpi_os_map_iomem(pa_v5, sizeof(*v5param) + (
+ (nr_components) * sizeof(struct syndrome_array)));
+ }
return v5param;
}
}
@@ -494,10 +512,52 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
v5param->flags = vendor_flags;
} else if (flags) {
v5param->flags = flags;
- v5param->memory_address = param1;
- v5param->memory_address_range = param2;
- v5param->apicid = param3;
- v5param->pcie_sbdf = param4;
+ if (flags & SETWA_FLAGS_MEM) {
+ v5param->memory_address = param1;
+ v5param->memory_address_range = param2;
+ }
+ if (flags & SETWA_FLAGS_EINJV2) {
+ int count = 0, bytes_read, pos = 0;
+ unsigned int comp, synd;
+ struct syndrome_array *component_arr;
+
+ if (component_count > nr_components)
+ goto err_out;
+
+ v5param->einjv2_struct.component_arr_count = component_count;
+ component_arr = v5param->einjv2_struct.component_arr;
+
+ while (sscanf(user_input+pos, "%x %x\n%n", &comp, &synd,
+ &bytes_read) == 2) {
+ count++;
+ pos += bytes_read;
+ if (count > component_count)
+ goto err_out;
+
+ switch (type) {
+ case EINJV2_PROCESSOR_ERROR:
+ component_arr[count-1].comp_id.acpi_id = comp;
+ component_arr[count-1].comp_synd.proc_synd = synd;
+ break;
+ case EINJV2_MEMORY_ERROR:
+ component_arr[count-1].comp_id.device_id = comp;
+ component_arr[count-1].comp_synd.mem_synd = synd;
+ break;
+ case EINJV2_PCIE_ERROR:
+ component_arr[count-1].comp_id.pcie_sbdf = comp;
+ component_arr[count-1].comp_synd.pcie_synd = synd;
+ break;
+ }
+ }
+ if (count != component_count)
+ goto err_out;
+
+ /* clear buffer after user input for next injection */
+ memset(user_input, 0, COMP_ARR_SIZE);
+ } else {
+ v5param->apicid = param3;
+ v5param->pcie_sbdf = param4;
+ }
} else {
switch (type) {
case ACPI_EINJ_PROCESSOR_CORRECTABLE:
@@ -570,6 +630,9 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
rc = apei_exec_run_optional(&ctx, ACPI_EINJ_END_OPERATION);

return rc;
+err_out:
+ memset(user_input, 0, COMP_ARR_SIZE);
+ return -EINVAL;
}

/* Inject the specified hardware error */
@@ -581,9 +644,14 @@ int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3,

/* If user manually set "flags", make sure it is legal */
if (flags && (flags &
- ~(SETWA_FLAGS_APICID|SETWA_FLAGS_MEM|SETWA_FLAGS_PCIE_SBDF)))
+ ~(SETWA_FLAGS_APICID|SETWA_FLAGS_MEM|SETWA_FLAGS_PCIE_SBDF|SETWA_FLAGS_EINJV2)))
return -EINVAL;

+ /*check if type is a valid EINJv2 error type*/
+ if (flags & SETWA_FLAGS_EINJV2) {
+ if (!(type & available_error_type_v2))
+ return -EINVAL;
+ }
/*
* We need extra sanity checks for memory errors.
* Other types leap directly to injection.
@@ -915,7 +983,8 @@ static void __exit einj_remove(struct platform_device *pdev)
sizeof(struct set_error_type_with_address) :
sizeof(struct einj_parameter);

- acpi_os_unmap_iomem(einj_param, size);
+ acpi_os_unmap_iomem(einj_param,
+ size + (nr_components * sizeof(struct syndrome_array)));
if (vendor_errors.size)
acpi_os_unmap_memory(vendor_errors.data, vendor_errors.size);
}
--
2.34.1


2024-05-21 21:15:24

by Zaid Alali

[permalink] [raw]
Subject: [RFC PATCH v2 8/8] ACPI: APEI: EINJ: Update the documentation for EINJv2 support

Add documentation for the updated ACPI specs for EINJv2(1)(2)

(1)https://bugzilla.tianocore.org/show_bug.cgi?id=4615
(2)https://bugzilla.tianocore.org/attachment.cgi?id=1446

Signed-off-by: Zaid Alali <[email protected]>
---
.../firmware-guide/acpi/apei/einj.rst | 51 +++++++++++++++++--
1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c52b9da08fa9..f2751cee9698 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -61,8 +61,18 @@ The following files belong to it:
0x00000800 Platform Uncorrectable fatal
================ ===================================

+ ================ ===================================
+ Error Type Value Error Description
+ ================ ===================================
+ 0x00000001 EINJV2 Processor Error
+ 0x00000002 EINJV2 Memory Error
+ 0x00000004 EINJV2 PCI Express Error
+ ================ ===================================
+
The format of the file contents are as above, except present are only
- the available error types.
+ the available error types. The available Error types are discovered by
+ calling GET_ERROR_TYPE command, and if bit 30 is set in the returned
+ value, then EINJv2 is supported by the system.

- error_type

@@ -85,9 +95,11 @@ The following files belong to it:
Bit 0
Processor APIC field valid (see param3 below).
Bit 1
- Memory address and mask valid (param1 and param2).
+ Memory address and range valid (param1 and param2).
Bit 2
PCIe (seg,bus,dev,fn) valid (see param4 below).
+ Bit 3
+ EINJv2 extension structure is valid

If set to zero, legacy behavior is mimicked where the type of
injection specifies just one bit set, and param1 is multiplexed.
@@ -110,6 +122,7 @@ The following files belong to it:
Used when the 0x1 bit is set in "flags" to specify the APIC id

- param4
+
Used when the 0x4 bit is set in "flags" to specify target PCIe device

- notrigger
@@ -122,6 +135,18 @@ The following files belong to it:
this actually works depends on what operations the BIOS actually
includes in the trigger phase.

+- einjv2_component_count
+
+ The value from this file is used to set the "Component Array Count"
+ field of EINJv2 Extension Structure.
+
+- einjv2_component_array
+
+ The contents of this file are used to set the "Component Array" field
+ of the EINJv2 Extension Structure. The expected format is hex values
+ for component id and syndrome separated by space, and multiple
+ components are separated by new line.
+
CXL error types are supported from ACPI 6.5 onwards (given a CXL port
is present). The EINJ user interface for CXL error types is at
<debugfs mount point>/cxl. The following files belong to it:
@@ -139,7 +164,6 @@ is present). The EINJ user interface for CXL error types is at
under <debugfs mount point>/apei/einj, while CXL 1.1/1.0 port injections
must use this file.

-
BIOS versions based on the ACPI 4.0 specification have limited options
in controlling where the errors are injected. Your BIOS may support an
extension (enabled with the param_extension=1 module parameter, or boot
@@ -194,6 +218,27 @@ An error injection example::
# echo 0x8 > error_type # Choose correctable memory error
# echo 1 > error_inject # Inject now

+An EINJv2 error injection example::
+
+ # cd /sys/kernel/debug/apei/einj
+ # cat available_error_type # See which errors can be injected
+ 0x00000002 Processor Uncorrectable non-fatal
+ 0x00000008 Memory Correctable
+ 0x00000010 Memory Uncorrectable non-fatal
+ ==================
+ 0x00000001 EINJV2 Processor Error
+ 0x00000002 EINJV2 Memory Error
+
+ # echo 0x12345000 > param1 # Set memory address for injection
+ # echo 0xfffffffffffff000 > param2 # Range - anywhere in this page
+ # comp_arr="0x1 0x2 # Fill in the component array
+ >0x1 0x4
+ >0x2 0x4"
+ # echo "$comp_arr" > einjv2_component_array
+ # echo 0x2 > error_type # Choose EINJv2 memory error
+ # echo 0xa > flags # set flags to indicate EINJv2
+ # echo 1 > error_inject # Inject now
+
You should see something like this in dmesg::

[22715.830801] EDAC sbridge MC3: HANDLING MCE MEMORY ERROR
--
2.34.1


2024-05-22 10:10:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/8] ACPICA: Update values to hex to follow ACPI specs

On Tue, May 21, 2024 at 11:11 PM Zaid Alali
<[email protected]> wrote:
>
> ACPI specs(1) define Error Injection Actions in hex values.
> This commit intends to update values from decimal to hex to be
> consistent with ACPI specs. This commit and the following one are
> not to be merged and will come form ACPICA project(2).
>
> (1) https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html
> (2) https://lore.kernel.org/acpica-devel/[email protected]/
>
> Signed-off-by: Zaid Alali <[email protected]>

In order to modify the ACPICA code in the Linux kernel, you need to
submit a corresponding pull request to the upstream ACPICA project on
GitHub. Once that pull request has been merged, please send the Linux
patch with a Link: tag pointing to the upstream ACPICA pull request
corresponding to it.

Thanks!

> ---
> include/acpi/actbl1.h | 24 ++++++++++++------------
> 1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
> index 841ef9f22795..b321d481b09a 100644
> --- a/include/acpi/actbl1.h
> +++ b/include/acpi/actbl1.h
> @@ -1017,18 +1017,18 @@ struct acpi_einj_entry {
> /* Values for Action field above */
>
> enum acpi_einj_actions {
> - ACPI_EINJ_BEGIN_OPERATION = 0,
> - ACPI_EINJ_GET_TRIGGER_TABLE = 1,
> - ACPI_EINJ_SET_ERROR_TYPE = 2,
> - ACPI_EINJ_GET_ERROR_TYPE = 3,
> - ACPI_EINJ_END_OPERATION = 4,
> - ACPI_EINJ_EXECUTE_OPERATION = 5,
> - ACPI_EINJ_CHECK_BUSY_STATUS = 6,
> - ACPI_EINJ_GET_COMMAND_STATUS = 7,
> - ACPI_EINJ_SET_ERROR_TYPE_WITH_ADDRESS = 8,
> - ACPI_EINJ_GET_EXECUTE_TIMINGS = 9,
> - ACPI_EINJ_ACTION_RESERVED = 10, /* 10 and greater are reserved */
> - ACPI_EINJ_TRIGGER_ERROR = 0xFF /* Except for this value */
> + ACPI_EINJ_BEGIN_OPERATION = 0x0,
> + ACPI_EINJ_GET_TRIGGER_TABLE = 0x1,
> + ACPI_EINJ_SET_ERROR_TYPE = 0x2,
> + ACPI_EINJ_GET_ERROR_TYPE = 0x3,
> + ACPI_EINJ_END_OPERATION = 0x4,
> + ACPI_EINJ_EXECUTE_OPERATION = 0x5,
> + ACPI_EINJ_CHECK_BUSY_STATUS = 0x6,
> + ACPI_EINJ_GET_COMMAND_STATUS = 0x7,
> + ACPI_EINJ_SET_ERROR_TYPE_WITH_ADDRESS = 0x8,
> + ACPI_EINJ_GET_EXECUTE_TIMINGS = 0x9,
> + ACPI_EINJ_ACTION_RESERVED = 0xA, /* 0xA and greater are reserved */
> + ACPI_EINJ_TRIGGER_ERROR = 0xFF /* Except for this value */
> };
>
> /* Values for Instruction field above */
> --
> 2.34.1
>

2024-05-22 16:51:22

by Ben Cheatham

[permalink] [raw]
Subject: Re: [RFC PATCH v2 7/8] ACPI: APEI: EINJ: Enable EINJv2 error injections

On 5/21/24 4:10 PM, Zaid Alali wrote:
> Enable the driver to inject EINJv2 type errors. The component
> array values are parsed from user_input and expected to contain
> hex values for component id and syndrome separated by space,
> and multiple components are separated by new line as follows:
>
> component_id1 component_syndrome1
> component_id2 component_syndrome2
> :
> component_id(n) component_syndrome(n)
>
> for example:
>
> $comp_arr="0x1 0x2
>> 0x1 0x4
>> 0x2 0x4"
> $cd /sys/kernel/debug/apei/einj/
> $echo "$comp_arr" > einjv2_component_array
>

I think it would be good to change this from being newline-delimited to comma-delimited instead.
So instead of your first example above it would be:

component_id1 component_syndrome1, component_id2 component_syndrome2, ...

My reasoning here is that it's less error-prone. For example, if I run your example but forget to
quote the comp_arr variable in the last line, i.e.:

$ echo $comp_arr > einjv2_component_array

I would effectively be running (at least in my stock Ubuntu 22.04 terminal):

$ echo "0x1 0x2 0x1 0x4 0x2 0x4" > einjv2_component_array

Which would result in an error for something that isn't necessarily readily apparent.
I also think keeping the input on a single line is nicer on the eyes, but that's a subjective thing
and I'd understand if you think differently.

> Signed-off-by: Zaid Alali <[email protected]>
> ---
> drivers/acpi/apei/einj-core.c | 81 ++++++++++++++++++++++++++++++++---
> 1 file changed, 75 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
> index 2e30ebed079b..2e5c00b34a4b 100644
> --- a/drivers/acpi/apei/einj-core.c
> +++ b/drivers/acpi/apei/einj-core.c
> @@ -87,6 +87,13 @@ enum {
> SETWA_FLAGS_APICID = 1,
> SETWA_FLAGS_MEM = 2,
> SETWA_FLAGS_PCIE_SBDF = 4,
> + SETWA_FLAGS_EINJV2 = 8,
> +};
> +
> +enum {
> + EINJV2_PROCESSOR_ERROR = 0x1,
> + EINJV2_MEMORY_ERROR = 0x2,
> + EINJV2_PCIE_ERROR = 0x4,
> };
>
> /*
> @@ -111,6 +118,7 @@ static char vendor_dev[64];
> static struct debugfs_blob_wrapper einjv2_component_arr;
> static u64 component_count;
> static void *user_input;
> +static int nr_components;
> static u32 available_error_type;
> static u32 available_error_type_v2;
>
> @@ -287,8 +295,18 @@ static void *einj_get_parameter_address(void)
>
> v5param = acpi_os_map_iomem(pa_v5, sizeof(*v5param));
> if (v5param) {
> + int offset, len;
> +
> acpi5 = 1;
> check_vendor_extension(pa_v5, v5param);
> + if (available_error_type & ACPI65_EINJV2_SUPP) {
> + len = v5param->einjv2_struct.length;
> + offset = offsetof(struct einjv2_extension_struct, component_arr);
> + nr_components = (len - offset) / 32;
> + acpi_os_unmap_iomem(v5param, sizeof(*v5param));
> + v5param = acpi_os_map_iomem(pa_v5, sizeof(*v5param) + (
> + (nr_components) * sizeof(struct syndrome_array)));
> + }
> return v5param;
> }
> }
> @@ -494,10 +512,52 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> v5param->flags = vendor_flags;
> } else if (flags) {
> v5param->flags = flags;
> - v5param->memory_address = param1;
> - v5param->memory_address_range = param2;
> - v5param->apicid = param3;
> - v5param->pcie_sbdf = param4;
> + if (flags & SETWA_FLAGS_MEM) {
> + v5param->memory_address = param1;
> + v5param->memory_address_range = param2;
> + }

I don't think you need this if statement since the values will be ignored
when the SETWA_FLAGS_MEM bit isn't set anyway as per the spec.

> + if (flags & SETWA_FLAGS_EINJV2) {
> + int count = 0, bytes_read, pos = 0;
> + unsigned int comp, synd;
> + struct syndrome_array *component_arr;
> +
> + if (component_count > nr_components)
> + goto err_out;
> +
> + v5param->einjv2_struct.component_arr_count = component_count;
> + component_arr = v5param->einjv2_struct.component_arr;
> +
> + while (sscanf(user_input+pos, "%x %x\n%n", &comp, &synd,
> + &bytes_read) == 2) {
> + count++;
> + pos += bytes_read;
> + if (count > component_count)
> + goto err_out;
> +
> + switch (type) {
> + case EINJV2_PROCESSOR_ERROR:
> + component_arr[count-1].comp_id.acpi_id = comp;
> + component_arr[count-1].comp_synd.proc_synd = synd;
> + break;
> + case EINJV2_MEMORY_ERROR:
> + component_arr[count-1].comp_id.device_id = comp;
> + component_arr[count-1].comp_synd.mem_synd = synd;
> + break;
> + case EINJV2_PCIE_ERROR:
> + component_arr[count-1].comp_id.pcie_sbdf = comp;
> + component_arr[count-1].comp_synd.pcie_synd = synd;
> + break;
> + }
> + }
> + if (count != component_count)
> + goto err_out;

Nitpick here, but you could use count directly instead of count-1 when indexing component_arr[]
if you move the count++ to the bottom of the loop and change the above if to:

if (count != component_count - 1)
goto err_out;

or use count + 1 instead of component_count - 1.

> +
> + /* clear buffer after user input for next injection */
> + memset(user_input, 0, COMP_ARR_SIZE);
> + } else {
> + v5param->apicid = param3;
> + v5param->pcie_sbdf = param4;
> + }
> } else {
> switch (type) {
> case ACPI_EINJ_PROCESSOR_CORRECTABLE:
> @@ -570,6 +630,9 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> rc = apei_exec_run_optional(&ctx, ACPI_EINJ_END_OPERATION);
>
> return rc;
> +err_out:
> + memset(user_input, 0, COMP_ARR_SIZE);
> + return -EINVAL;
> }
>
> /* Inject the specified hardware error */
> @@ -581,9 +644,14 @@ int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3,
>
> /* If user manually set "flags", make sure it is legal */
> if (flags && (flags &
> - ~(SETWA_FLAGS_APICID|SETWA_FLAGS_MEM|SETWA_FLAGS_PCIE_SBDF)))
> + ~(SETWA_FLAGS_APICID|SETWA_FLAGS_MEM|SETWA_FLAGS_PCIE_SBDF|SETWA_FLAGS_EINJV2)))
> return -EINVAL;
>
> + /*check if type is a valid EINJv2 error type*/
> + if (flags & SETWA_FLAGS_EINJV2) {
> + if (!(type & available_error_type_v2))
> + return -EINVAL;
> + }
> /*
> * We need extra sanity checks for memory errors.
> * Other types leap directly to injection.
> @@ -915,7 +983,8 @@ static void __exit einj_remove(struct platform_device *pdev)
> sizeof(struct set_error_type_with_address) :
> sizeof(struct einj_parameter);
>
> - acpi_os_unmap_iomem(einj_param, size);
> + acpi_os_unmap_iomem(einj_param,
> + size + (nr_components * sizeof(struct syndrome_array)));
> if (vendor_errors.size)
> acpi_os_unmap_memory(vendor_errors.data, vendor_errors.size);
> }

2024-05-22 16:51:36

by Ben Cheatham

[permalink] [raw]
Subject: Re: [RFC PATCH v2 8/8] ACPI: APEI: EINJ: Update the documentation for EINJv2 support

On 5/21/24 4:10 PM, Zaid Alali wrote:
> Add documentation for the updated ACPI specs for EINJv2(1)(2)
>
> (1)https://bugzilla.tianocore.org/show_bug.cgi?id=4615
> (2)https://bugzilla.tianocore.org/attachment.cgi?id=1446
>
> Signed-off-by: Zaid Alali <[email protected]>
> ---
> .../firmware-guide/acpi/apei/einj.rst | 51 +++++++++++++++++--
> 1 file changed, 48 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
> index c52b9da08fa9..f2751cee9698 100644
> --- a/Documentation/firmware-guide/acpi/apei/einj.rst
> +++ b/Documentation/firmware-guide/acpi/apei/einj.rst
> @@ -61,8 +61,18 @@ The following files belong to it:
> 0x00000800 Platform Uncorrectable fatal
> ================ ===================================
>
> + ================ ===================================
> + Error Type Value Error Description
> + ================ ===================================
> + 0x00000001 EINJV2 Processor Error
> + 0x00000002 EINJV2 Memory Error
> + 0x00000004 EINJV2 PCI Express Error
> + ================ ===================================
> +
> The format of the file contents are as above, except present are only
> - the available error types.
> + the available error types. The available Error types are discovered by
> + calling GET_ERROR_TYPE command, and if bit 30 is set in the returned
> + value, then EINJv2 is supported by the system.
>

I think this is a little too much information. I don't think it's really relevant
to how the available error types are determined since the raw value returned from
GET_ERROR_TYPE isn't user visible.

Thanks,
Ben

> - error_type
>
> @@ -85,9 +95,11 @@ The following files belong to it:
> Bit 0
> Processor APIC field valid (see param3 below).
> Bit 1
> - Memory address and mask valid (param1 and param2).
> + Memory address and range valid (param1 and param2).
> Bit 2
> PCIe (seg,bus,dev,fn) valid (see param4 below).
> + Bit 3
> + EINJv2 extension structure is valid
>
> If set to zero, legacy behavior is mimicked where the type of
> injection specifies just one bit set, and param1 is multiplexed.
> @@ -110,6 +122,7 @@ The following files belong to it:
> Used when the 0x1 bit is set in "flags" to specify the APIC id
>
> - param4
> +
> Used when the 0x4 bit is set in "flags" to specify target PCIe device
>
> - notrigger
> @@ -122,6 +135,18 @@ The following files belong to it:
> this actually works depends on what operations the BIOS actually
> includes in the trigger phase.
>
> +- einjv2_component_count
> +
> + The value from this file is used to set the "Component Array Count"
> + field of EINJv2 Extension Structure.
> +
> +- einjv2_component_array
> +
> + The contents of this file are used to set the "Component Array" field
> + of the EINJv2 Extension Structure. The expected format is hex values
> + for component id and syndrome separated by space, and multiple
> + components are separated by new line.
> +
> CXL error types are supported from ACPI 6.5 onwards (given a CXL port
> is present). The EINJ user interface for CXL error types is at
> <debugfs mount point>/cxl. The following files belong to it:
> @@ -139,7 +164,6 @@ is present). The EINJ user interface for CXL error types is at
> under <debugfs mount point>/apei/einj, while CXL 1.1/1.0 port injections
> must use this file.
>
> -
> BIOS versions based on the ACPI 4.0 specification have limited options
> in controlling where the errors are injected. Your BIOS may support an
> extension (enabled with the param_extension=1 module parameter, or boot
> @@ -194,6 +218,27 @@ An error injection example::
> # echo 0x8 > error_type # Choose correctable memory error
> # echo 1 > error_inject # Inject now
>
> +An EINJv2 error injection example::
> +
> + # cd /sys/kernel/debug/apei/einj
> + # cat available_error_type # See which errors can be injected
> + 0x00000002 Processor Uncorrectable non-fatal
> + 0x00000008 Memory Correctable
> + 0x00000010 Memory Uncorrectable non-fatal
> + ==================
> + 0x00000001 EINJV2 Processor Error
> + 0x00000002 EINJV2 Memory Error
> +
> + # echo 0x12345000 > param1 # Set memory address for injection
> + # echo 0xfffffffffffff000 > param2 # Range - anywhere in this page
> + # comp_arr="0x1 0x2 # Fill in the component array
> + >0x1 0x4
> + >0x2 0x4"
> + # echo "$comp_arr" > einjv2_component_array
> + # echo 0x2 > error_type # Choose EINJv2 memory error
> + # echo 0xa > flags # set flags to indicate EINJv2
> + # echo 1 > error_inject # Inject now
> +
> You should see something like this in dmesg::
>
> [22715.830801] EDAC sbridge MC3: HANDLING MCE MEMORY ERROR

2024-05-22 16:51:47

by Ben Cheatham

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/8] ACPI: APEI: EINJ: Enable the discovery of EINJv2 capabilities

Hi Zaid,

I've got comments inline with a couple (mostly little) concerns, but this looks really good so far!

On 5/21/24 4:10 PM, Zaid Alali wrote:
> Enable the driver to show all supported error injections for EINJ
> and EINJv2 at the same time. EINJv2 capabilities can be discovered
> by checking the return value of get_error_type, where bit 30 set
> indicates EINJv2 support.
>
> Signed-off-by: Zaid Alali <[email protected]>
> ---
> drivers/acpi/apei/apei-internal.h | 2 +-
> drivers/acpi/apei/einj-core.c | 35 ++++++++++++++++++++++++-------
> drivers/acpi/apei/einj-cxl.c | 2 +-
> 3 files changed, 29 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
> index cd2766c69d78..9a3dbaeed39a 100644
> --- a/drivers/acpi/apei/apei-internal.h
> +++ b/drivers/acpi/apei/apei-internal.h
> @@ -131,7 +131,7 @@ static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
>
> int apei_osc_setup(void);
>
> -int einj_get_available_error_type(u32 *type);
> +int einj_get_available_error_type(u32 *type, int version);
> int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3,
> u64 param4);
> int einj_cxl_rch_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
> index b1bbbee9c664..cc5ad1f45ea4 100644
> --- a/drivers/acpi/apei/einj-core.c
> +++ b/drivers/acpi/apei/einj-core.c
> @@ -33,6 +33,7 @@
> #define SLEEP_UNIT_MAX 5000 /* 5ms */
> /* Firmware should respond within 1 seconds */
> #define FIRMWARE_TIMEOUT (1 * USEC_PER_SEC)
> +#define ACPI65_EINJV2_SUPP BIT(30)
> #define ACPI5_VENDOR_BIT BIT(31)
> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
> ACPI_EINJ_MEMORY_UNCORRECTABLE | \
> @@ -84,6 +85,7 @@ static struct debugfs_blob_wrapper vendor_errors;
> static char vendor_dev[64];
>
> static u32 available_error_type;
> +static u32 available_error_type_v2;
>
> /*
> * Some BIOSes allow parameters to the SET_ERROR_TYPE entries in the
> @@ -159,13 +161,13 @@ static void einj_exec_ctx_init(struct apei_exec_context *ctx)
> EINJ_TAB_ENTRY(einj_tab), einj_tab->entries);
> }
>
> -static int __einj_get_available_error_type(u32 *type)
> +static int __einj_get_available_error_type(u32 *type, int version)
> {
> struct apei_exec_context ctx;
> int rc;
>
> einj_exec_ctx_init(&ctx);
> - rc = apei_exec_run(&ctx, ACPI_EINJ_GET_ERROR_TYPE);
> + rc = apei_exec_run(&ctx, version);
> if (rc)
> return rc;
> *type = apei_exec_ctx_get_output(&ctx);
> @@ -174,12 +176,12 @@ static int __einj_get_available_error_type(u32 *type)
> }
>
> /* Get error injection capabilities of the platform */
> -int einj_get_available_error_type(u32 *type)
> +int einj_get_available_error_type(u32 *type, int version)
> {
> int rc;
>
> mutex_lock(&einj_mutex);
> - rc = __einj_get_available_error_type(type);
> + rc = __einj_get_available_error_type(type, version);
> mutex_unlock(&einj_mutex);
>
> return rc;
> @@ -647,15 +649,27 @@ static struct { u32 mask; const char *str; } const einj_error_type_string[] = {
> { BIT(11), "Platform Uncorrectable fatal"},
> { BIT(31), "Vendor Defined Error Types" },
> };
> +static struct { u32 mask; const char *str; } const einjv2_error_type_string[] = {
> + { BIT(0), "EINJV2 Processor Error" },
> + { BIT(1), "EINJV2 Memory Error" },
> + { BIT(2), "EINJV2 PCI Express Error" },
> +};
>
> static int available_error_type_show(struct seq_file *m, void *v)
> {
>
> + seq_printf(m, "EINJ error types:\n");
> for (int pos = 0; pos < ARRAY_SIZE(einj_error_type_string); pos++)
> if (available_error_type & einj_error_type_string[pos].mask)
> seq_printf(m, "0x%08x\t%s\n", einj_error_type_string[pos].mask,
> - einj_error_type_string[pos].str);
> -
> + einj_error_type_string[pos].str);
> + if (available_error_type & ACPI65_EINJV2_SUPP) {
> + seq_printf(m, "EINJv2 error types:\n");

I think this print and the added one above are not needed since the EINJv2 error type
strings have EINJV2 in them already.

> + for (int pos = 0; pos < ARRAY_SIZE(einjv2_error_type_string); pos++)
> + if (available_error_type_v2 & einjv2_error_type_string[pos].mask)
> + seq_printf(m, "0x%08x\t%s\n", einjv2_error_type_string[pos].mask,
> + einjv2_error_type_string[pos].str);
> + }
> return 0;
> }
>
> @@ -692,7 +706,7 @@ int einj_validate_error_type(u64 type)
> if (tval & (tval - 1))
> return -EINVAL;
> if (!vendor)
> - if (!(type & available_error_type))
> + if (!(type & (available_error_type | available_error_type_v2)))
> return -EINVAL;

I don't think this will work? Take the following scenario:

available_error_type = 0x2
available_error_type_v2 = 0x1

If I specify an error type of 0x1 and then inject a EINJv1 error I will have
injected an invalid error type, but still have passed the validation check.
I think you can just get rid of the check for the EINJv2 type here since you also
check it before the actual injection in patch 7/8.

>
> return 0;
> @@ -769,9 +783,14 @@ static int __init einj_probe(struct platform_device *pdev)
> goto err_put_table;
> }
>
> - rc = einj_get_available_error_type(&available_error_type);
> + rc = einj_get_available_error_type(&available_error_type, ACPI_EINJ_GET_ERROR_TYPE);
> if (rc)
> return rc;
> + if (available_error_type & ACPI65_EINJV2_SUPP) {
> + rc = einj_get_available_error_type(&available_error_type_v2, ACPI_EINJV2_GET_ERROR_TYPE);
> + if (rc)
> + return rc;
> + }
>
> rc = -ENOMEM;
> einj_debug_dir = debugfs_create_dir("einj", apei_get_debugfs_dir());
> diff --git a/drivers/acpi/apei/einj-cxl.c b/drivers/acpi/apei/einj-cxl.c
> index 8b8be0c90709..25adc9b03d18 100644
> --- a/drivers/acpi/apei/einj-cxl.c
> +++ b/drivers/acpi/apei/einj-cxl.c
> @@ -30,7 +30,7 @@ int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
> int cxl_err, rc;
> u32 available_error_type = 0;
>
> - rc = einj_get_available_error_type(&available_error_type);
> + rc = einj_get_available_error_type(&available_error_type, ACPI_EINJ_GET_ERROR_TYPE);
> if (rc)
> return rc;
>

2024-05-22 16:51:53

by Ben Cheatham

[permalink] [raw]
Subject: Re: [RFC PATCH v2 5/8] ACPI: APEI: EINJ: Add einjv2 extension struct

On 5/21/24 4:10 PM, Zaid Alali wrote:
> Add einjv2 extension struct and EINJv2 error types to prepare
> the driver for EINJv2 support. ACPI specifications(1) enables
> EINJv2 by extending set_error_type_with_address strcut.
>
> Signed-off-by: Zaid Alali <[email protected]>
> ---
> drivers/acpi/apei/einj-core.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
> index cc5ad1f45ea4..2021bea02996 100644
> --- a/drivers/acpi/apei/einj-core.c
> +++ b/drivers/acpi/apei/einj-core.c
> @@ -50,6 +50,28 @@
> */
> static int acpi5;
>
> +struct syndrome_array {
> + union {
> + u32 acpi_id;
> + u32 device_id;
> + u32 pcie_sbdf;
> + u8 fru_id[16];

I would rename fru_id to vendor_id since this isn't necessarily a FRU id. It also has the
added benefit of matching the naming of the vendor field in comp_synd as well.

> + } comp_id;
> + union {
> + u32 proc_synd;
> + u32 mem_synd;
> + u32 pcie_synd;
> + u8 vendor_synd[16];
> + } comp_synd;
> +};
> +
> +struct einjv2_extension_struct {
> + u32 length;
> + u16 revision;
> + u16 component_arr_count;
> + struct syndrome_array component_arr[];
> +};
> +
> struct set_error_type_with_address {
> u32 type;
> u32 vendor_extension;
> @@ -58,6 +80,7 @@ struct set_error_type_with_address {
> u64 memory_address;
> u64 memory_address_range;
> u32 pcie_sbdf;
> + struct einjv2_extension_struct einjv2_struct;
> };
> enum {
> SETWA_FLAGS_APICID = 1,

2024-05-23 15:14:13

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [RFC PATCH v2 3/8] ACPI: APEI: EINJ: Remove redundant calls to einj_get_available_error_type

On Tue, 21 May 2024 14:10:31 -0700
Zaid Alali <[email protected]> wrote:

> A single call to einj_get_available_error_type in init function is
> sufficient to save the return value in a global variable to be used
> later in various places in the code. This commit does not introduce
> any functional changes, but only removing unnecessary redundant
> function calls.
>
> Signed-off-by: Zaid Alali <[email protected]>

Seems reasonable to me.
Reviewed-by: Jonathan Cameron <[email protected]>

2024-05-23 16:03:03

by Luck, Tony

[permalink] [raw]
Subject: RE: [RFC PATCH v2 3/8] ACPI: APEI: EINJ: Remove redundant calls to einj_get_available_error_type

>> A single call to einj_get_available_error_type in init function is
>> sufficient to save the return value in a global variable to be used
>> later in various places in the code. This commit does not introduce
>> any functional changes, but only removing unnecessary redundant
>> function calls.
>>
>> Signed-off-by: Zaid Alali <[email protected]>
>
> Seems reasonable to me.
> Reviewed-by: Jonathan Cameron <[email protected]>

Agreed. I've thought about making this change many times, but always
distracted by other issues. Thanks for doing this.

Acked-by: Tony Luck <[email protected]>

-Tony