2023-11-29 07:51:22

by M K, Muralidhara

[permalink] [raw]
Subject: [PATCH 0/4] Persist FRU memory poisons

From: Muralidhara M K <[email protected]>

This patch set is based on the patches submitted
https://lore.kernel.org/linux-edac/[email protected]/T/#t

MI300A has on-die HBMv3 memory embedded on to socket. Upon reaching threshold
of memory errors socket has to be replaced. Define the criteria to identify the
Field Replicable Unit(FRU) based on number of poisoned pages in the socket by
persisting them in a non-volatile storage.

Notifier is registered to handle the FRU memory poisons and poison count
incremented based on injected MCE errors until it reaches maximum number of
fru poison entries.
Sysfs entry per FRU will ease the use to look into the poison details.

During boot, Read the ERST records for identifying the poison address and
retire all system physical addresses in that HBM row.

Patch 1:
Add an API to get the maximum CPER record size to be stored in NV storage

Patch 2:
Add FRU memory poison module

Patch 3:
Add sysfs entry to print the required error information from poison records

Patch 4:
Add documentation on FRU memory poisons.

Muralidhara M K (4):
ACPI/APEI: Add erst_get_size() API
RAS/fmp: Add FRU memory poison CPER support for Error persistence
EDAC/amd64: Add sysfs entry to read FRU poison data
RAS/fmp: Add Documentation on Persistence of FRU memory poisons

Documentation/RAS/ras.rst | 122 +++++++
MAINTAINERS | 8 +
drivers/acpi/apei/erst.c | 9 +
drivers/edac/amd64_edac.c | 25 ++
drivers/ras/Kconfig | 1 +
drivers/ras/Makefile | 1 +
drivers/ras/fmp/Kconfig | 18 +
drivers/ras/fmp/Makefile | 10 +
drivers/ras/fmp/fru_mem_poison.c | 595 +++++++++++++++++++++++++++++++
include/acpi/apei.h | 1 +
include/linux/cper.h | 24 ++
include/linux/fru_mem_poison.h | 17 +
12 files changed, 831 insertions(+)
create mode 100644 drivers/ras/fmp/Kconfig
create mode 100644 drivers/ras/fmp/Makefile
create mode 100644 drivers/ras/fmp/fru_mem_poison.c
create mode 100644 include/linux/fru_mem_poison.h

--
2.25.1


2023-11-29 07:51:31

by M K, Muralidhara

[permalink] [raw]
Subject: [PATCH 3/4] EDAC/amd64: Add sysfs entry to read FRU poison data

From: Muralidhara M K <[email protected]>

Create sysfs file for each FRU ID with a list of DRAM MCE address and
MCA IPID value stored in ERST non-volatile storage.

Read the CPER Record information at any time when the system is up
using below command of particular node or FRU index of sysfs entry.
Example: cat /sys/devices/system/edac/mc/<node index>/fmpl

Data in sysfs entries is able to identify the list of poisoned
addresses and FRU index to decide on the replaceble criteria instead
of iterating over the kernel logs.

Co-developed-by: Naveen Krishna Chatradhi <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Co-developed-by: Sathya Priya Kumar <[email protected]>
Signed-off-by: Sathya Priya Kumar <[email protected]>
Signed-off-by: Muralidhara M K <[email protected]>
---
drivers/edac/amd64_edac.c | 25 +++++++++++
drivers/ras/fmp/fru_mem_poison.c | 77 +++++++++++++++++++++++++++++++-
include/linux/fru_mem_poison.h | 2 +
3 files changed, 102 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 9872ede7eca9..3790adfa78b5 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2,6 +2,7 @@
#include "amd64_edac.h"
#include <asm/amd_nb.h>
#include <linux/amd-atl.h>
+#include <linux/fru_mem_poison.h>

static struct edac_pci_ctl_info *pci_ctl;

@@ -574,6 +575,28 @@ static ssize_t dram_hole_show(struct device *dev, struct device_attribute *mattr
hole_size);
}

+/* sysfs entry to read FRU(Field Repaceable Unit) memory Poisons */
+static ssize_t fmpl_show(struct device *dev, struct device_attribute *mattr,
+ char *data)
+{
+ struct mem_ctl_info *mci = to_mci(dev);
+ struct amd64_pvt *pvt = mci->pvt_info;
+ ssize_t ret_len = 0, buf_size = PAGE_SIZE;
+ char *buf;
+
+ buf = kmalloc(buf_size, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ ret_len = copy_fmp_data_from_cache(pvt->mc_node_id, buf, buf_size);
+ if (!ret_len)
+ return -EINVAL;
+
+ memcpy(data, buf, ret_len);
+ kfree(buf);
+ return ret_len;
+}
+
/*
* update NUM_DBG_ATTRS in case you add new members
*/
@@ -581,6 +604,7 @@ static DEVICE_ATTR(dhar, S_IRUGO, dhar_show, NULL);
static DEVICE_ATTR(dbam, S_IRUGO, dbam0_show, NULL);
static DEVICE_ATTR(topmem, S_IRUGO, top_mem_show, NULL);
static DEVICE_ATTR(topmem2, S_IRUGO, top_mem2_show, NULL);
+static DEVICE_ATTR(fmpl, S_IRUGO, fmpl_show, NULL);
static DEVICE_ATTR_RO(dram_hole);

static struct attribute *dbg_attrs[] = {
@@ -589,6 +613,7 @@ static struct attribute *dbg_attrs[] = {
&dev_attr_topmem.attr,
&dev_attr_topmem2.attr,
&dev_attr_dram_hole.attr,
+ &dev_attr_fmpl.attr,
NULL
};

diff --git a/drivers/ras/fmp/fru_mem_poison.c b/drivers/ras/fmp/fru_mem_poison.c
index c21e736c3ed1..bd85ae527c7f 100644
--- a/drivers/ras/fmp/fru_mem_poison.c
+++ b/drivers/ras/fmp/fru_mem_poison.c
@@ -39,6 +39,11 @@ struct system_fru_poison_info {
struct cper_fru_poison_record *fru_record;
};

+#define REC_HDR() \
+ " FRU_IDX| FRU_ID\t | P_NUM | TIMESTAMP\t\t | MCA_IPID\t | MCA_ADDR\t| SPA\t\t |\n"
+#define REC_DATA() \
+ " %d\t| 0x%llx| %d\t | %s| 0x%017llx| 0x%013llx | 0x%013llx|\n"
+
#define CPER_CREATOR_FMP \
GUID_INIT(0xcd5c2993, 0xf4b2, 0x41b2, 0xb5, 0xd4, 0xf9, 0xc3, \
0xa0, 0x33, 0x08, 0x75)
@@ -122,15 +127,83 @@ static u64 calc_checksum(struct cper_sec_fru_mem_poisons *fmp)
return checksum;
}

-struct tm get_timestamp(u64 timestamp)
+ssize_t get_timestamp(u64 timestamp, char *tbuf, ssize_t t_size)
{
struct timespec64 ts;
struct tm tm;
+ ssize_t tlen = 0;

ts.tv_sec = timestamp;
time64_to_tm(ts.tv_sec, 0, &tm);
- return tm;
+ tlen = scnprintf(tbuf, t_size, "%ld-%02d-%02d %02d:%02d:%02d",
+ tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday, tm.tm_hour, tm.tm_min,
+ tm.tm_sec);
+
+ return tlen;
+}
+
+/*
+ * buffer is filled with poison records information and exports to
+ * amd64_edac module to provide the info via sysfs entries
+ * /sys/devices/system/edac/mc/mc<sock_indx>/fmpl
+ */
+ssize_t copy_fmp_data_from_cache(int fru_idx, char *buf, ssize_t buf_size)
+{
+ struct cper_fru_poison_data *temp, *base;
+ int j, p_count;
+ struct mce *m;
+ ssize_t len = 0;
+ ssize_t t_len = 0;
+ ssize_t tb_size = 100;
+ u64 sys_addr;
+ char *t_buf;
+
+ pr_info("FRU_Idx[%d] Record information:\n", fru_idx);
+ pr_info("Record_ID : 0x%llx\n", sys_fmp_info[fru_idx]->recordid);
+ pr_info("FRU_ID : 0x%llx\n", sys_fmp_info[fru_idx]->sys_fru_id);
+
+ p_count = sys_fmp_info[fru_idx]->fru_record->fmp.poison_count;
+ pr_info("FRU Memory poison details under FRU_idx[%d]: %d\n", fru_idx, p_count);
+
+ base = (struct cper_fru_poison_data *)&sys_fmp_info[fru_idx]->fru_record->fmp.p_list_off;
+ len = scnprintf(buf, buf_size, REC_HDR());
+ buf += len;
+ buf_size -= len;
+
+ for (j = 1; j <= p_count; j++) {
+ temp = base + j * sizeof(struct cper_fru_poison_data);
+ m = (struct mce *)temp;
+ fill_mce_poison_data(m, temp, fru_idx);
+
+ if (amd_umc_mca_addr_to_sys_addr(m, &sys_addr)) {
+ pr_warn("normalized address failed for mce addr:0x%llx\n", m->addr);
+ sys_addr = 0;
+ }
+ t_buf = kmalloc(tb_size, GFP_KERNEL);
+ if (!t_buf)
+ return -ENOMEM;
+
+ t_len = get_timestamp(temp->timestamp, t_buf, tb_size);
+ if (!t_len)
+ kfree(t_buf);
+
+ pr_info("poison_number[%d] hw_id:0x%llx addr:0x%llx\n", j, temp->hw_id, temp->addr);
+ len = scnprintf(buf, buf_size, REC_DATA(),
+ fru_idx, sys_fmp_info[fru_idx]->sys_fru_id, j, t_buf, temp->hw_id,
+ temp->addr, sys_addr);
+
+ buf_size -= len;
+ if ((buf_size - len) <= 0) {
+ pr_warn("%s FMP cache Buffer full!", __func__);
+ goto out;
+ }
+ buf += len;
+ }
+out:
+ kfree(t_buf);
+ return (PAGE_SIZE - buf_size);
}
+EXPORT_SYMBOL(copy_fmp_data_from_cache);

/* Fill initial fmp structure variable during empty record creation */
static int init_fru_poison_fmp_cache(struct system_fru_poison_info *p)
diff --git a/include/linux/fru_mem_poison.h b/include/linux/fru_mem_poison.h
index d3e567c990aa..d2642e1224de 100644
--- a/include/linux/fru_mem_poison.h
+++ b/include/linux/fru_mem_poison.h
@@ -12,4 +12,6 @@

struct system_fru_poison_info **sys_fmp_info;

+ssize_t copy_fmp_data_from_cache(int fru_idx, char *buf, ssize_t buf_size);
+
#endif /* _X86_FMP_H */
--
2.25.1

2023-11-29 07:51:38

by M K, Muralidhara

[permalink] [raw]
Subject: [PATCH 2/4] RAS/fmp: Add FRU memory poison CPER support for Error persistence

From: Muralidhara M K <[email protected]>

Large-scale Data center servers such as MI300A has on-chip stacked HBM3
(High Bandwidth Memory v3) memory embedded on to CPU socket.

Many memory errors tend to be consistent or intermittent and may recur.
Upon reaching a certain threshold of these errors, the specific memory area
is deemed faulty and should be replaced. In the case of on-die HBM, any
returns due to these issues will likely be directed to the socket vendor.

Define a criteria to identify the Field Replicable Unit(FRU) by evaluating
the count of "poisoned" pages within the socket and log these poisoned
pages persistently in a non-volatile storage. This process assists in
retaining information about defective memory address within the socket for
potential replacement.

To achieve this, CPER structure for FRU memory poisoning is defined.
For each FRU, identified by its Protected Processor Inventory Number(PPIN),
one CPER is allocated. Each CPER includes a poison list offset to the
corresponding PPIN, capable of holding n number of poison entries.

During boot, OS creates CPER FMP records based on number of sockets
available in the system and size of each record is calculated based on
available ERST non-volatile storage provided by BIOS.

In mission mode, Notifier is registered to handle the FRU memory poison
events. The poison count is incremented through MCE error injections,
storing MCE address, MCA_IPID and poison count in the poison list entries
until it attains maximum number of FRU poison entries.

Once maximum FRU poison entries are attained, further storage of poison
events is no longer possible. instead, a warning message is printed.
The user can configure the number of FRU poison entries through kernel
command line argument, by passing "fru_mem_poison.fru_poison_entries=N".

During next bootup, the OS reads the ERST records. The OS matches a system
PPIN with MCE PPIN, enables the retirement of all the poison addresses in
a column of the particular row in a HBM to prevent the usage of compromised
memory address in subsequent boot.

Co-developed-by: Naveen Krishna Chatradhi <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Co-developed-by: Sathya Priya Kumar <[email protected]>
Signed-off-by: Sathya Priya Kumar <[email protected]>
Signed-off-by: Muralidhara M K <[email protected]>
---
MAINTAINERS | 8 +
drivers/ras/Kconfig | 1 +
drivers/ras/Makefile | 1 +
drivers/ras/fmp/Kconfig | 18 ++
drivers/ras/fmp/Makefile | 10 +
drivers/ras/fmp/fru_mem_poison.c | 522 +++++++++++++++++++++++++++++++
include/linux/cper.h | 24 ++
include/linux/fru_mem_poison.h | 15 +
8 files changed, 599 insertions(+)
create mode 100644 drivers/ras/fmp/Kconfig
create mode 100644 drivers/ras/fmp/Makefile
create mode 100644 drivers/ras/fmp/fru_mem_poison.c
create mode 100644 include/linux/fru_mem_poison.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 764e400f060a..70817bd5203a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -900,6 +900,14 @@ S: Supported
F: drivers/ras/amd/atl
F: include/linux/amd-atl.h

+RAS PERSIST FRU MEMORY POISON (FMP)
+M: Naveen Krishna Chatradhi <[email protected]>
+L: [email protected]
+S: Maintained
+F: Documentation/driver-api/persist_fru_memory_poison.rst
+F: drivers/ras/fmp/fru_mem_poison.c
+F: include/linux/fru_mem_poison.h
+
AMD CDX BUS DRIVER
M: Nipun Gupta <[email protected]>
M: Nikhil Agarwal <[email protected]>
diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
index 2e969f59c0ca..5cec40e3ccb3 100644
--- a/drivers/ras/Kconfig
+++ b/drivers/ras/Kconfig
@@ -33,5 +33,6 @@ if RAS

source "arch/x86/ras/Kconfig"
source "drivers/ras/amd/atl/Kconfig"
+source "drivers/ras/fmp/Kconfig"

endif
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index 3595b9547c25..11d082680f35 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_RAS) += ras.o
obj-$(CONFIG_DEBUG_FS) += debugfs.o
obj-$(CONFIG_RAS_CEC) += cec.o
obj-$(CONFIG_AMD_ATL) += amd/atl/
+obj-$(CONFIG_RAS_FMP) += fmp/
diff --git a/drivers/ras/fmp/Kconfig b/drivers/ras/fmp/Kconfig
new file mode 100644
index 000000000000..7c5d80c6b593
--- /dev/null
+++ b/drivers/ras/fmp/Kconfig
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# FRU(Field Replaceable Unit) Memory Poison Kconfig
+#
+# Copyright (c) 2023, Advanced Micro Devices, Inc.
+# All Rights Reserved.
+#
+# Author: Naveen Krishna Chatradhi <[email protected]>
+
+config RAS_FMP
+ tristate "Support persisting FRU memory poisons"
+ default m
+ depends on X86_MCE && X86_LOCAL_APIC
+ help
+ Additional support for persisting FRU(Field Replaceable Unit)
+ memory poisons.
+ Include support of retiring the poison pages during boot, idea is
+ don't try and use the broken memory again during boot.
diff --git a/drivers/ras/fmp/Makefile b/drivers/ras/fmp/Makefile
new file mode 100644
index 000000000000..b476d219fd20
--- /dev/null
+++ b/drivers/ras/fmp/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# FRU(Field Replaceable Unit) Memory Poison Makefile
+#
+# Copyright (c) 2023, Advanced Micro Devices, Inc.
+# All Rights Reserved.
+#
+# Author: Naveen Krishna Chatradhi <[email protected]>
+
+obj-$(CONFIG_RAS_FMP) += fru_mem_poison.o
diff --git a/drivers/ras/fmp/fru_mem_poison.c b/drivers/ras/fmp/fru_mem_poison.c
new file mode 100644
index 000000000000..c21e736c3ed1
--- /dev/null
+++ b/drivers/ras/fmp/fru_mem_poison.c
@@ -0,0 +1,522 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Persisting FRU(Field Replaceable Unit) Memory Poison Error information in
+ * non-volatile storage
+ *
+ * fru_mem_poison.c : Persist FRU memory poison linux module
+ *
+ * Copyright (c) 2023, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author:
+ * Naveen Krishna Chatradhi <[email protected]>
+ * Muralidhara M K <[email protected]>
+ */
+#include <linux/kernel.h>
+#include <acpi/apei.h>
+#include <asm/amd_nb.h>
+#include <asm/mce.h>
+
+#include <linux/amd-atl.h>
+#include <linux/cper.h>
+#include <linux/fru_mem_poison.h>
+#include <linux/ktime.h>
+#include <linux/timekeeping.h>
+
+/* Persist FRU Poison Data */
+struct cper_fru_poison_record {
+ struct cper_record_header hdr;
+ struct cper_section_descriptor sec_hdr;
+ struct cper_sec_fru_mem_poisons fmp;
+} __packed;
+
+/* Poison data cache of the System with FRU components */
+struct system_fru_poison_info {
+ int nr_poisons_per_fru;
+ u64 sys_fru_id; //system fru_id is PPIN in case of AMD
+ u64 recordid;
+ u64 rec_len;
+ struct cper_fru_poison_record *fru_record;
+};
+
+#define CPER_CREATOR_FMP \
+ GUID_INIT(0xcd5c2993, 0xf4b2, 0x41b2, 0xb5, 0xd4, 0xf9, 0xc3, \
+ 0xa0, 0x33, 0x08, 0x75)
+#define CPER_SECTION_TYPE_FMP \
+ GUID_INIT(0x5e4706c1, 0x5356, 0x48c6, 0x93, 0x0b, 0x52, 0xf2, \
+ 0x12, 0x0a, 0x44, 0x58)
+
+/* Kernel module parameter to specify maximum number of fru poison entries per ppin */
+#define CPER_MAX_FRU_POISON_ENTRIES 512
+/**
+ * DOC: fru_poison_entries (int)
+ * Maximum number of records per socket. Valid setting is between 1 and
+ * default CPER_MAX_FRU_POISON_ENTRIES.
+ */
+static int fru_poison_entries;
+module_param(fru_poison_entries, int, 0644);
+MODULE_PARM_DESC(fru_poison_entries,
+ "Maximum number of memory poison entries per fru");
+
+/* Maximum number of sockets in a system */
+u16 max_nr_fru;
+
+static void fill_mce_poison_data(struct mce *m, struct cper_fru_poison_data *temp, int fru_idx)
+{
+ m->addr = temp->addr;
+ m->ipid = temp->hw_id;
+ m->socketid = fru_idx;
+ m->extcpu = (nr_cpus_node(fru_idx) / 2 * fru_idx);
+}
+
+/*
+ * During boot, Collect the poison address stored in ERST in the previous boot
+ * and identify all system physical pages in that particular row where poison
+ * address exists and retire all the pages in a row.
+ */
+static void retire_bad_pages_from_erst(void)
+{
+ struct cper_fru_poison_data *base, *temp;
+ struct mce *m;
+ int id, j;
+
+ for (id = 0; id < max_nr_fru; id++) {
+ base = (struct cper_fru_poison_data *)&sys_fmp_info[id]->fru_record->fmp.p_list_off;
+
+ for (j = 1; j <= sys_fmp_info[id]->fru_record->fmp.poison_count; j++) {
+ temp = base + j * sizeof(struct cper_fru_poison_data);
+ m = (struct mce *)temp;
+ fill_mce_poison_data(m, temp, id);
+
+ /* Don't try for mce address 0 */
+ if (m->addr)
+ amd_umc_retire_column_spa_from_row(m);
+ }
+ }
+}
+
+static u64 calc_checksum(struct cper_sec_fru_mem_poisons *fmp)
+{
+ struct cper_fru_poison_data *temp, *base;
+ u64 checksum = 0;
+ int j;
+
+ checksum = (u64)fmp->signature[0] | (u64)fmp->signature[1] << 8 |
+ (u64)fmp->signature[2] << 16;
+ checksum += fmp->checksum;
+ checksum += fmp->model_id_type;
+ checksum += fmp->model_id;
+ checksum += fmp->fru_id_type;
+ checksum += fmp->fru_id;
+ checksum += fmp->poison_count;
+ checksum += fmp->p_list_off;
+
+ base = (struct cper_fru_poison_data *)&fmp->p_list_off;
+ for (j = 0; j <= fmp->poison_count; j++) {
+ temp = base + j * sizeof(struct cper_fru_poison_data);
+ checksum += temp->hw_id_type;
+ checksum += temp->addr_type;
+ checksum += temp->hw_id;
+ checksum += temp->addr;
+ }
+ return checksum;
+}
+
+struct tm get_timestamp(u64 timestamp)
+{
+ struct timespec64 ts;
+ struct tm tm;
+
+ ts.tv_sec = timestamp;
+ time64_to_tm(ts.tv_sec, 0, &tm);
+ return tm;
+}
+
+/* Fill initial fmp structure variable during empty record creation */
+static int init_fru_poison_fmp_cache(struct system_fru_poison_info *p)
+{
+ struct cper_fru_poison_record *fru_r = p->fru_record;
+
+ memcpy(fru_r->fmp.signature, "FMP", 3);
+ fru_r->fmp.model_id_type = 0;
+ fru_r->fmp.model_id = 0;
+ fru_r->fmp.fru_id_type = 0;
+ fru_r->fmp.fru_id = p->sys_fru_id;
+ fru_r->fmp.poison_count = 0;
+ fru_r->hdr.record_id = p->recordid;
+ /* Update checksum */
+ fru_r->fmp.checksum -= calc_checksum(&fru_r->fmp);
+
+ return 0;
+}
+
+/* fill cper_fru_poison_record structure variables and write record per node */
+static int apei_write_fmp(struct system_fru_poison_info *p)
+{
+ struct cper_fru_poison_record *fru_r = p->fru_record;
+
+ /* Fill structure variables */
+ memcpy(fru_r->hdr.signature, CPER_SIG_RECORD, CPER_SIG_SIZE);
+ fru_r->hdr.revision = CPER_RECORD_REV;
+ fru_r->hdr.signature_end = CPER_SIG_END;
+ fru_r->hdr.section_count = 1;
+ fru_r->hdr.error_severity = CPER_SEV_FATAL;
+ fru_r->hdr.validation_bits = 0;
+ fru_r->hdr.record_length = p->rec_len;
+ fru_r->hdr.creator_id = CPER_CREATOR_FMP;
+ fru_r->hdr.notification_type = CPER_NOTIFY_MCE;
+ fru_r->hdr.flags = CPER_HW_ERROR_FLAGS_PREVERR;
+
+ fru_r->sec_hdr.section_offset = (void *)&fru_r->fmp - (void *)&fru_r;
+ fru_r->sec_hdr.section_length = sizeof(fru_r->fmp);
+ fru_r->sec_hdr.revision = CPER_SEC_REV;
+ fru_r->sec_hdr.validation_bits = 0;
+ fru_r->sec_hdr.flags = CPER_SEC_PRIMARY;
+ fru_r->sec_hdr.section_type = CPER_SECTION_TYPE_FMP;
+ fru_r->sec_hdr.section_severity = CPER_SEV_FATAL;
+
+ /* write to non-volatile storage */
+ return erst_write(&fru_r->hdr);
+}
+
+static int remove_extra_records(u32 idx, u64 *rec_to_remove)
+{
+ int i;
+
+ /* There are more ERST records and less FRUs in this boot */
+ for (i = 0; i < idx; i++) {
+ if (rec_to_remove[i] != 0 && erst_clear(rec_to_remove[i])) {
+ pr_warn("failed to clear erst record at index: %d\n", i);
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+static int init_fru_records(u32 available_extra_recs, u64 *rec_to_remove)
+{
+ int i, j, rc = 0;
+
+ /* Number of FRUs in this boot are more than ERST records */
+ pr_debug("available_extra_recs:%d\n", available_extra_recs);
+
+ for (i = 0; i < max_nr_fru; i++) {
+ if (sys_fmp_info[i]->recordid) {
+ pr_info("Existed FRU_idx[%d] record_id: 0x%llx\n",
+ i, sys_fmp_info[i]->recordid);
+ continue;
+ }
+
+ sys_fmp_info[i]->recordid = cper_next_record_id();
+ pr_info("created new record_id for FRU_idx[%d]: 0x%llx\n",
+ i, sys_fmp_info[i]->recordid);
+
+ if (init_fru_poison_fmp_cache(sys_fmp_info[i]))
+ return -EINVAL;
+
+ for (j = 0; j < available_extra_recs; j++) {
+ if (rec_to_remove[j] != 0) {
+ sys_fmp_info[i]->recordid = rec_to_remove[j];
+
+ rc = erst_read(sys_fmp_info[i]->recordid,
+ &sys_fmp_info[i]->fru_record->hdr,
+ sys_fmp_info[i]->rec_len);
+ if (rc <= 0)
+ return rc;
+
+ rec_to_remove[j] = 0;
+ available_extra_recs--;
+ }
+ }
+ }
+
+ return 0;
+}
+
+static int initialize_records(struct system_fru_poison_info **sys_fmp_info,
+ u32 idx, u64 *rec_to_remove)
+{
+ int fru_idx, rc = 0;
+
+ rc = init_fru_records(idx, rec_to_remove);
+ if (rc != 0)
+ pr_warn("Initializing CPER records failed\n");
+
+ erst_get_record_id_end();
+
+ if (remove_extra_records(idx, rec_to_remove))
+ pr_warn("Clear of extra records failed\n");
+
+ for (fru_idx = 0; fru_idx < max_nr_fru; fru_idx++) {
+ rc = apei_write_fmp(sys_fmp_info[fru_idx]);
+ if (rc) {
+ pr_warn("erst_write[%d] failed\n", fru_idx);
+ return rc;
+ }
+ }
+ return 0;
+}
+
+/* get number of records present in ERST non volatile storage */
+static int find_records(struct system_fru_poison_info **sys_fmp_info)
+{
+ struct cper_fru_poison_record *rcd;
+ int i = 0, rc, pos, count = 0;
+ bool record_matched = false;
+ u64 rec_to_remove[4] = {0};
+ u64 record_id;
+ u32 idx = 0;
+ void *temp;
+
+ temp = kzalloc(sys_fmp_info[0]->rec_len, GFP_KERNEL);
+ if (!temp)
+ return -ENOMEM;
+
+ rcd = (struct cper_fru_poison_record *)temp;
+
+ rc = erst_get_record_id_begin(&pos);
+ if (rc) {
+ kfree(temp);
+ return rc;
+ }
+
+ while (!erst_get_record_id_next(&pos, &record_id)) {
+ if (record_id == APEI_ERST_INVALID_RECORD_ID)
+ goto out;
+
+ memset(temp, 0, sys_fmp_info[0]->rec_len);
+
+ rc = erst_read(record_id, &rcd->hdr, sys_fmp_info[0]->rec_len);
+ if (rc > 0) {
+ pr_debug("Read record:0x%llx, data:%d bytes, fru_id:0x%llx, pcount:%lld\n",
+ record_id, rc, rcd->fmp.fru_id, rcd->fmp.poison_count);
+ } else {
+ pr_warn("Error reading record: 0x%llx\n", rcd->fmp.fru_id);
+ continue;
+ }
+
+ if (!guid_equal(&rcd->sec_hdr.section_type, &CPER_SECTION_TYPE_FMP)) {
+ pr_warn("Record type not matched\n");
+ continue;
+ }
+
+ for (i = 0; i < max_nr_fru; i++) {
+ if (rcd->fmp.fru_id == sys_fmp_info[i]->sys_fru_id) {
+ /* matching FRU is present on the system, store record_id */
+ sys_fmp_info[i]->recordid = record_id;
+ memcpy(sys_fmp_info[i]->fru_record, rcd, sys_fmp_info[i]->rec_len);
+
+ record_matched = true;
+ break;
+ }
+ }
+ if (!record_matched) {
+ rec_to_remove[idx] = record_id;
+ idx++;
+ }
+
+ record_matched = false;
+
+ count++;
+ }
+out:
+ pr_debug("number of records present: %d\n", count);
+ rc = initialize_records(sys_fmp_info, idx, rec_to_remove);
+ kfree(temp);
+ return rc;
+}
+
+static int append_to_fru_poisonlist(struct cper_sec_fru_mem_poisons *fmp_p, struct mce *m)
+{
+ struct cper_fru_poison_data *temp, *base;
+
+ if (!fmp_p && !m)
+ return -EINVAL;
+
+ if (fmp_p->poison_count >= (*sys_fmp_info)->nr_poisons_per_fru) {
+ pr_warn("Max number of poison entries for FRU Exiting\n");
+ return -EINVAL;
+ }
+
+ fmp_p->poison_count += 1;
+
+ base = (struct cper_fru_poison_data *)(&fmp_p->p_list_off);
+ temp = base + fmp_p->poison_count * sizeof(struct cper_fru_poison_data);
+ if (!temp)
+ return -ENOMEM;
+
+ /* Append poison error data to record_id */
+ temp->hw_id_type = 0;
+ temp->addr_type = 0;
+
+ temp->hw_id = m->ipid;
+ temp->addr = m->addr;
+ temp->timestamp = ktime_get_real_seconds();
+
+ /* Update checksum */
+ fmp_p->checksum -= calc_checksum(fmp_p);
+
+ pr_info("Append poison hw_id: 0x%llx addr: 0x%llx poison_num: %lld checksum: 0x%llx\n",
+ temp->hw_id, temp->addr, fmp_p->poison_count, fmp_p->checksum);
+ return 0;
+}
+
+static int fru_mem_poison_handler(struct notifier_block *nb,
+ unsigned long val, void *data)
+{
+ struct mce *m = (struct mce *)data;
+ int i, ret;
+
+ for (i = 0; i < max_nr_fru; i++) {
+ if (!sys_fmp_info[i])
+ return -ENOMEM;
+
+ if (sys_fmp_info[i]->sys_fru_id == m->ppin) {
+ ret = append_to_fru_poisonlist(&sys_fmp_info[i]->fru_record->fmp, m);
+ if (ret < 0)
+ return ret;
+
+ /* Write to storage*/
+ return erst_write(&sys_fmp_info[i]->fru_record->hdr);
+ }
+ pr_debug("PPIN did not match for FRU_idx[%d]\n", i);
+ }
+
+ pr_debug("No FMP record in NV storage for PPIN %llu\n", m->ppin);
+ return -ENODEV;
+}
+
+static struct notifier_block fru_mem_poison_nb = {
+ .notifier_call = fru_mem_poison_handler,
+ .priority = MCE_PRIO_LOWEST,
+};
+
+static int find_num_poison_entries_per_fru(u64 size)
+{
+ int num = (size - sizeof(struct cper_fru_poison_record)) /
+ sizeof(struct cper_fru_poison_data);
+
+ /* value passed through kernel parameters */
+ if (fru_poison_entries)
+ num = fru_poison_entries;
+
+ if (num >= CPER_MAX_FRU_POISON_ENTRIES)
+ num = CPER_MAX_FRU_POISON_ENTRIES;
+
+ return num;
+}
+
+static int allocate_sys_fru_cache(int nr_fru_entries)
+{
+ int i, ret = -ENOMEM;
+
+ /* On AMD MI300 platforms, max_nr_fru is number of sockets present */
+ max_nr_fru = topology_max_packages();
+ if (!max_nr_fru)
+ return -ENODEV;
+
+ /* Allocate memory for the struct system_fru_poison_info */
+ sys_fmp_info = kcalloc(max_nr_fru, sizeof(struct system_fru_poison_info *), GFP_KERNEL);
+ if (!sys_fmp_info)
+ goto err_out;
+
+ /* per node fru record */
+ for (i = 0; i < max_nr_fru; i++) {
+ sys_fmp_info[i] = kcalloc(1, sizeof(struct system_fru_poison_info), GFP_KERNEL);
+ if (!sys_fmp_info[i])
+ goto err_loop;
+
+ sys_fmp_info[i]->rec_len = sizeof(struct cper_fru_poison_record) +
+ nr_fru_entries * sizeof(struct cper_fru_poison_data);
+
+ sys_fmp_info[i]->fru_record = kzalloc(sys_fmp_info[i]->rec_len, GFP_KERNEL);
+ if (!sys_fmp_info[i]->fru_record)
+ goto err_rec;
+
+ sys_fmp_info[i]->nr_poisons_per_fru = nr_fru_entries;
+
+ /* Get ppins of all sockets present */
+ sys_fmp_info[i]->sys_fru_id = cpu_data(nr_cpus_node(i) / 2 * i).ppin;
+ pr_debug("socket[%d] sys_fru_id:0x%llx, num_of_poisons can hold : %d\n", i,
+ sys_fmp_info[i]->sys_fru_id, sys_fmp_info[i]->nr_poisons_per_fru);
+ }
+
+ pr_debug("Number of records present, count: %ld\n", erst_get_record_count());
+ return 0;
+
+err_rec:
+ for (i = 0; i < max_nr_fru; i++) {
+ kfree(sys_fmp_info[i]);
+ sys_fmp_info[i] = NULL;
+ }
+err_loop:
+ kfree(sys_fmp_info);
+err_out:
+ return ret;
+}
+
+static int __init fru_mem_poison_init(void)
+{
+ int ret, nr_fru_entries;
+ u64 erst_size;
+
+ if (!cpu_feature_enabled(X86_FEATURE_AMD_PPIN)) {
+ pr_warn("ppin processor id number not supported\n");
+ return 0;
+ }
+
+ ret = erst_get_size();
+ if (!ret)
+ return ret;
+ erst_size = ret;
+ pr_info("Max erst size per CPER record: %lld\n", erst_size);
+
+ ret = find_num_poison_entries_per_fru(erst_size);
+ if (!ret)
+ return ret;
+ nr_fru_entries = ret;
+ pr_info("Each FMP record can hold %d poison_entries\n", nr_fru_entries);
+
+ /* Allocate CPER record per node based on max number of fru entries */
+ ret = allocate_sys_fru_cache(nr_fru_entries);
+ if (ret)
+ return ret;
+
+ /* Check if there records present */
+ ret = find_records(sys_fmp_info);
+ if (ret)
+ return ret;
+
+ /* Register notifier for fru_memory poison handling */
+ mce_register_decode_chain(&fru_mem_poison_nb);
+
+ /* retire FRU poison memory poison pages during boot */
+ retire_bad_pages_from_erst();
+
+ pr_info("Initialized CPER FRU memory poison module\n");
+ return 0;
+}
+
+static void __exit fru_mem_poison_exit(void)
+{
+ int i;
+
+ for (i = 0; i < max_nr_fru; i++) {
+ if (sys_fmp_info[i]->fru_record)
+ sys_fmp_info[i]->fru_record = NULL;
+
+ if (sys_fmp_info[i])
+ sys_fmp_info[i] = NULL;
+ }
+ if (sys_fmp_info)
+ sys_fmp_info = NULL;
+
+ mce_unregister_decode_chain(&fru_mem_poison_nb);
+}
+
+late_initcall(fru_mem_poison_init);
+module_exit(fru_mem_poison_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Persist FRU Memory Poison Driver");
diff --git a/include/linux/cper.h b/include/linux/cper.h
index c1a7dc325121..7f7153ba7858 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -550,6 +550,30 @@ struct cper_sec_fw_err_rec_ref {
guid_t record_identifier_guid;
};

+/* FRU Memory Poison Error Section data, UEFI vX.Y sec N.X.Y */
+struct cper_fru_poison_data {
+ u32 hw_id_type;
+ u32 addr_type;
+ u64 hw_id; /* MCA_IPID register in case of AMD processors */
+ u64 addr; /* MCA_ADDR register in case of x86 processors */
+ u64 timestamp; /* Timestamp is UTC format */
+};
+
+/* Memory Poison Error Record Section, UEFI vX.Y sec N.X.Z */
+struct cper_sec_fru_mem_poisons {
+ char signature[4];
+ u64 checksum;
+ u32 model_id_type;
+ u32 model_id;
+ u32 fru_id_type;
+ u64 fru_id;
+ u64 poison_count;
+ u64 p_list_off; /* Allocate contiguous memory for max_number_poison_entries
+ * based on available NV storage at the end of the
+ * struct cper_sec_fru_mem_poisons
+ */
+};
+
/* Reset to default packing */
#pragma pack()

diff --git a/include/linux/fru_mem_poison.h b/include/linux/fru_mem_poison.h
new file mode 100644
index 000000000000..d3e567c990aa
--- /dev/null
+++ b/include/linux/fru_mem_poison.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Persist FRU Memory Poison driver
+ *
+ * Copyright (c) 2023, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Naveen Krishna Chatradhi <[email protected]>
+ */
+#ifndef _X86_FMP_H
+#define _X86_FMP_H
+
+struct system_fru_poison_info **sys_fmp_info;
+
+#endif /* _X86_FMP_H */
--
2.25.1

2023-11-29 07:51:46

by M K, Muralidhara

[permalink] [raw]
Subject: [PATCH 1/4] ACPI/APEI: Add erst_get_size() API

From: Muralidhara M K <[email protected]>

Support API for size of a record in the ERST.

Co-developed-by: Naveen Krishna Chatradhi <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Signed-off-by: Muralidhara M K <[email protected]>
---
drivers/acpi/apei/erst.c | 9 +++++++++
include/acpi/apei.h | 1 +
2 files changed, 10 insertions(+)

diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index bf65e3461531..aae1c133095a 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -420,6 +420,15 @@ static int erst_get_erange(struct erst_erange *range)
return 0;
}

+u64 erst_get_size(void)
+{
+ if (erst_disable)
+ return 0;
+
+ return erst_erange.size;
+}
+EXPORT_SYMBOL_GPL(erst_get_size);
+
static ssize_t __erst_get_record_count(void)
{
struct apei_exec_context ctx;
diff --git a/include/acpi/apei.h b/include/acpi/apei.h
index dc60f7db5524..3b34f463ea44 100644
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -41,6 +41,7 @@ static inline void acpi_hest_init(void) { }

int erst_write(const struct cper_record_header *record);
ssize_t erst_get_record_count(void);
+u64 erst_get_size(void);
int erst_get_record_id_begin(int *pos);
int erst_get_record_id_next(int *pos, u64 *record_id);
void erst_get_record_id_end(void);
--
2.25.1

2023-11-29 07:51:57

by M K, Muralidhara

[permalink] [raw]
Subject: [PATCH 4/4] RAS/fmp: Add Documentation on Persistence of FRU memory poisons

From: Muralidhara M K <[email protected]>

On Data center servers with On chip HBM3 memory, FRU identification
needs a mechanism to identify the bad page information by persisting
them in non volatile storage across reboots and read them during boot
helps to check the number of pages poisoned.

Co-developed-by: Naveen Krishna Chatradhi <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Co-developed-by: Sathya Priya Kumar <[email protected]>
Signed-off-by: Sathya Priya Kumar <[email protected]>
Signed-off-by: Muralidhara M K <[email protected]>
---
Documentation/RAS/ras.rst | 122 ++++++++++++++++++++++++++++++++++++++
1 file changed, 122 insertions(+)

diff --git a/Documentation/RAS/ras.rst b/Documentation/RAS/ras.rst
index 2556b397cd27..2f86bf02655a 100644
--- a/Documentation/RAS/ras.rst
+++ b/Documentation/RAS/ras.rst
@@ -24,3 +24,125 @@ Also, the user can pass particular family and model to decode the error
string::

$ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family> --model <CPU Model> --bank <BANK_NUM>
+
+=============================================================
+Persist FRU(Field Replaceable Unit) Memory Poison
+=============================================================
+
+Large scale Data center servers such as MI300A has on-chip stacked Memory
+High Bandwidth Memory v3 (HBM).
+ - Example: MI300A has 8 stacks of HBM/die, a total of 128Gb per socket.
+Host Operating system is responsible for memory management, allocating HBM3 pages.
+
+Many memory errors tend to be consistent or intermittent and may reoccur. Upon
+reaching a certain threshold of these errors, the specific memory area is deemed
+faulty and should be replaced. In the case of on-die High Bandwidth Memory (HBM),
+any returns due to these issues will likely be directed to the socket vendor.
+
+Define a criteria to identify the Field Replicable Unit(FRU) by evaluating the
+count of "poisoned" pages within the socket and log these poisoned pages persistently
+in a non-volatile storage. This process assists in retaining information about
+defective memory page within the socket for potential replacement.
+
+Linux supports retiring pages by marking the page HW_POISON. However, it doesn't
+persist these marked pages across reboots.
+To address this, a potential solution is to persist bad page details in non-volatile
+storage(ERST). This prevents the reuse of compromised memory region, ensuring they
+are not utilized again.
+
+ERST to persist Bad page information
+====================================
+
+ERST (Error Record Serialization Table) defined by ACPI/APEI provides a mechanism for
+storing and retrieve hardware error Information to and from a persistent memory.
+
+Platform FW(BIOS) with ERST support, reserves ERST tables usually 64KB in non-volatile
+storage. Configure Linux to select ERST as backend for Pstore (read/write from NV storage).
+
+Upon Specific MCE errors Linux would call pstore with CPER format per FRU, platform FW
+would store it in NV storage. and on next boot, Linux would query bad page information
+from ERST and retire the pages again.
+
+FRU memory poison Common Platform Error Record (CPER) definition
+================================================================
+
+One CPER per FRU (Protected Processor Inventory Number (PPIN)).
+1 CPER record per MI300A socket (4 X MI300As system) with the 4 CPERs in a system and
+Each FRU containing poison list offset of the given PPIN.
+
+The FRU poison CPER record size is (BIOS ERST memory) / (Number of FRUs).
+Each erst_write() or erst_read() will write/read this entire structure as one record.
+
+Number of poison entries that can be reached is based on the calculation below
+"(size - sizeof(struct cper_poison_record)) / sizeof(struct cper_fru_poison_data)"
+
+FRU Poison CPER definition for storing error record as below
+
+ struct cper_poison_record {
+ struct cper_record_header hdr;
+ struct cper_section_descriptor sec_hdr;
+ struct cper_sec_fru_mem_poisons fmpl;
+ } __packed;
+
+use 'struct cper_record_header' and 'struct cper_section_descriptor' as defined
+in 'include/linux/cper.h'
+
+ * Section body follows the description of a “non-standard section body” and is defined below.
+
+ * per FRU poison section data
+ struct cper_sec_fru_mem_poisons {
+ char signature[4];
+ u64 checksum;
+ u32 model_id_type;
+ u32 model_id;
+ u32 fru_id_type;
+ u64 fru_id;
+ u32 poison_count;
+ u64 p_list_off; //offset for contiguous memory to poison data structure
+ };
+
+ * FRU Poison data structure
+ struct cper_fru_poison_data {
+ u32 hw_id_type;
+ u32 addr_type;
+ u64 hw_id;
+ u64 addr;
+ };
+
+
+Implementation Notes on FRU Identification:
+==========================================
+
+ * HBM suppose to have total of 8 DRAM rows.
+ * When MCE error occurs, offline all the pages in that range in a particular row(8 columns in a row).
+ If all the 8 rows become bad, then entire socket has to be replaced.
+ * Perist the error information mentioned in "struct cper_fru_poison_data" to ERST storage.
+
+ * Don’t delete the FMP records once they are saved in persistence store. Keep them in ERST
+ forever until all the poison_data entries become full.
+ * Once the entries full, then do not save the error information in ERST.
+
+At OS boot:
+==========
+ * One CPER per FRU (Protected Processor Inventory Number (PPIN)) has been created.
+ * Size of each CPER will not exceed (1/4)th the available space.
+ * The node controller should make sure there is a CPER for each PPIN in the node. If this is a
+ new processor never seen before, then create a CPER with N=0.
+ * Read the CPERs through the Error Record Serialization Table (ERST).
+ * If OS matches a PPIN to a socket and identifes mce address, it will re-create the SPA for all
+ pages on the HBM row of the poisoned DA, retire all pages mapped to that row.
+ * If a CPER is found for a PPIN that isn’t in the node, OS will print a warning.
+ * If OS tries to persist more errors than fit in the CPER, will refuse to update the CPER
+ and print a message.
+ * OS creates sysfs file for each FRU_ID with a list of DRAM address, MCA_IPID which are retired.
+ $ ls /sys/devices/system/edac/mc/mc0/fmpl
+ * Example: mc0 for socket 0 and mc3 for socket 3.
+ * To read the CPER Record information at any time when the system is up follow below
+ $ cat /sys/devices/system/edac/mc/mc<socket_index>/fmpl
+ $ dmesg
+
+At Mission mode:
+===============
+ * Notifier is registered to handle the FRU memory poison errors.
+ * When the error is injected on particular PPIN, and If OS matches a system PPIN to a socket
+ with MCE PPIN, append the poison data until it reaches maximum number of poison entries.
--
2.25.1