LinuxLists.cc - [PATCH 0/2] FRU Memory Poison Manager

2024-02-14 03:35:50

Subject: [PATCH 0/2] FRU Memory Poison Manager

Hi all,

This set adds a new module to manage error records on persistent
storage.

Patch 1 moves a function from AMD64 EDAC to the AMD Address Translation
Library. This is needed for patch 2.

Patch 2 adds the new module. This is a near total rewrite based on patch
2 from the following set:
https://lore.kernel.org/r/[email protected]

I included questions in code comments where I think more attention is
needed.

I'd like to add Murali and Naveen as Co-developers, since this is based
on their work. Also, I kept Naveen as a maintainer in case he's still
interested.

Regarding the old set:
* Patch 1 exports a new function from the ERST driver. This is not
necessary.

* Patch 3 adds a new sysfs interface. This needs more work.

* Patch 4 old set adds documentation. This needs updating.

I did some basic testing on a 2P server system without ERST support.
Mostly I tried to check out the memory layout of the structures. And I
did some memory error injections to check out the record updating flow.
I did some fixups after testing, so I apologize if I missed anything.

Thanks,
Yazen

Yazen Ghannam (2):
RAS/AMD/ATL, EDAC/amd64: Move MI300 Row Retirement to ATL
RAS: Introduce the FRU Memory Poison Manager

MAINTAINERS | 7 +
drivers/edac/Kconfig | 1 -
drivers/edac/amd64_edac.c | 48 ---
drivers/ras/Kconfig | 13 +
drivers/ras/Makefile | 1 +
drivers/ras/amd/atl/Kconfig | 1 +
drivers/ras/amd/atl/umc.c | 51 +++
drivers/ras/amd/fmpm.c | 776 ++++++++++++++++++++++++++++++++++++
include/linux/ras.h | 2 +
9 files changed, 851 insertions(+), 49 deletions(-)
create mode 100644 drivers/ras/amd/fmpm.c

base-commit: c2064388aa8765abd7c2c5785e7bfe266a2f6cd3
--
2.34.1

2024-02-14 03:38:23

by Yazen Ghannam

[permalink] [raw]

Subject: [PATCH 1/2] RAS/AMD/ATL, EDAC/amd64: Move MI300 Row Retirement to ATL

DRAM row retirement depends on model-specific information that is best
done within the AMD Address Translation Library.

Export a generic wrapper function for other modules to use. Add any
model-specific helpers here.

Signed-off-by: Yazen Ghannam <[email protected]>
---
drivers/edac/Kconfig | 1 -
drivers/edac/amd64_edac.c | 48 ----------------------------------
drivers/ras/amd/atl/Kconfig | 1 +
drivers/ras/amd/atl/umc.c | 51 +++++++++++++++++++++++++++++++++++++
include/linux/ras.h | 2 ++
5 files changed, 54 insertions(+), 49 deletions(-)

diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 8b147403c955..16c8de5050e5 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -78,7 +78,6 @@ config EDAC_GHES
config EDAC_AMD64
tristate "AMD64 (Opteron, Athlon64)"
depends on AMD_NB && EDAC_DECODE_MCE
- depends on MEMORY_FAILURE
imply AMD_ATL
help
Support for error detection and correction of DRAM ECC errors on
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index ee2f3ff15ab7..ca9a8641652d 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2795,51 +2795,6 @@ static void umc_get_err_info(struct mce *m, struct err_info *err)
err->csrow = m->synd & 0x7;
}

-/*
- * When a DRAM ECC error occurs on MI300 systems, it is recommended to retire
- * all memory within that DRAM row. This applies to the memory with a DRAM
- * bank.
- *
- * To find the memory addresses, loop through permutations of the DRAM column
- * bits and find the System Physical address of each. The column bits are used
- * to calculate the intermediate Normalized address, so all permutations should
- * be checked.
- *
- * See amd_atl::convert_dram_to_norm_addr_mi300() for MI300 address formats.
- */
-#define MI300_UMC_MCA_COL GENMASK(5, 1)
-#define MI300_NUM_COL BIT(HWEIGHT(MI300_UMC_MCA_COL))
-static void retire_row_mi300(struct atl_err *a_err)
-{
- unsigned long addr;
- struct page *p;
- u8 col;
-
- for (col = 0; col < MI300_NUM_COL; col++) {
- a_err->addr &= ~MI300_UMC_MCA_COL;
- a_err->addr |= FIELD_PREP(MI300_UMC_MCA_COL, col);
-
- addr = amd_convert_umc_mca_addr_to_sys_addr(a_err);
- if (IS_ERR_VALUE(addr))
- continue;
-
- addr = PHYS_PFN(addr);
-
- /*
- * Skip invalid or already poisoned pages to avoid unnecessary
- * error messages from memory_failure().
- */
- p = pfn_to_online_page(addr);
- if (!p)
- continue;
-
- if (PageHWPoison(p))
- continue;
-
- memory_failure(addr, 0);
- }
-}
-
static void decode_umc_error(int node_id, struct mce *m)
{
u8 ecc_type = (m->status >> 45) & 0x3;
@@ -2890,9 +2845,6 @@ static void decode_umc_error(int node_id, struct mce *m)

error_address_to_page_and_offset(sys_addr, &err);

- if (pvt->fam == 0x19 && pvt->dram_type == MEM_HBM3)
- retire_row_mi300(&a_err);
-
log_error:
__log_ecc_error(mci, &err, ecc_type);
}
diff --git a/drivers/ras/amd/atl/Kconfig b/drivers/ras/amd/atl/Kconfig
index a43513a700f1..df49c23e7f62 100644
--- a/drivers/ras/amd/atl/Kconfig
+++ b/drivers/ras/amd/atl/Kconfig
@@ -10,6 +10,7 @@
config AMD_ATL
tristate "AMD Address Translation Library"
depends on AMD_NB && X86_64 && RAS
+ depends on MEMORY_FAILURE
default N
help
This library includes support for implementation-specific
diff --git a/drivers/ras/amd/atl/umc.c b/drivers/ras/amd/atl/umc.c
index 7e310d1dfcfc..08c6dbd44c62 100644
--- a/drivers/ras/amd/atl/umc.c
+++ b/drivers/ras/amd/atl/umc.c
@@ -239,6 +239,57 @@ static unsigned long convert_dram_to_norm_addr_mi300(unsigned long addr)
return addr;
}

+/*
+ * When a DRAM ECC error occurs on MI300 systems, it is recommended to retire
+ * all memory within that DRAM row. This applies to the memory with a DRAM
+ * bank.
+ *
+ * To find the memory addresses, loop through permutations of the DRAM column
+ * bits and find the System Physical address of each. The column bits are used
+ * to calculate the intermediate Normalized address, so all permutations should
+ * be checked.
+ *
+ * See amd_atl::convert_dram_to_norm_addr_mi300() for MI300 address formats.
+ */
+#define MI300_NUM_COL BIT(HWEIGHT(MI300_UMC_MCA_COL))
+static void retire_row_mi300(struct atl_err *a_err)
+{
+ unsigned long addr;
+ struct page *p;
+ u8 col;
+
+ for (col = 0; col < MI300_NUM_COL; col++) {
+ a_err->addr &= ~MI300_UMC_MCA_COL;
+ a_err->addr |= FIELD_PREP(MI300_UMC_MCA_COL, col);
+
+ addr = amd_convert_umc_mca_addr_to_sys_addr(a_err);
+ if (IS_ERR_VALUE(addr))
+ continue;
+
+ addr = PHYS_PFN(addr);
+
+ /*
+ * Skip invalid or already poisoned pages to avoid unnecessary
+ * error messages from memory_failure().
+ */
+ p = pfn_to_online_page(addr);
+ if (!p)
+ continue;
+
+ if (PageHWPoison(p))
+ continue;
+
+ memory_failure(addr, 0);
+ }
+}
+
+void amd_retire_dram_row(struct atl_err *a_err)
+{
+ if (df_cfg.rev == DF4p5 && df_cfg.flags.heterogeneous)
+ return retire_row_mi300(a_err);
+}
+EXPORT_SYMBOL_GPL(amd_retire_dram_row);
+
static unsigned long get_addr(unsigned long addr)
{
if (df_cfg.rev == DF4p5 && df_cfg.flags.heterogeneous)
diff --git a/include/linux/ras.h b/include/linux/ras.h
index 09c632832bf1..a64182bc72ad 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -45,8 +45,10 @@ struct atl_err {
#if IS_ENABLED(CONFIG_AMD_ATL)
void amd_atl_register_decoder(unsigned long (*f)(struct atl_err *));
void amd_atl_unregister_decoder(void);
+void amd_retire_dram_row(struct atl_err *err);
unsigned long amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err);
#else
+static inline void amd_retire_dram_row(struct atl_err *err) { }
static inline unsigned long
amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err) { return -EINVAL; }
#endif /* CONFIG_AMD_ATL */
--
2.34.1

2024-02-14 03:38:42

by Yazen Ghannam

[permalink] [raw]

Subject: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Memory errors are an expected occurrence on systems with high memory
density. Generally, errors within a small number of unique physical
locations is acceptable, based on manufacturer and/or admin policy.
During run time, memory with errors may be retired so it is no longer
used by the system. This is done in the kernel memory manager, and the
effect will remain until the system is restarted.

If a memory location is consistently faulty, then the same run time
error handling may occur in the next reboot cycle. Running jobs may be
terminated due to previously known bad memory. This could be prevented
if information from the previous boot was not lost.

Some add-in cards with driver-managed memory have on-board persistent
storage. Their driver may save memory error information to the
persistent storage during run time. The information may then be restored
after reset, and known bad memory may be retired before use. A running
log of bad memory locations is kept across multiple resets.

A similar solution is desirable for CPUs. However, this solution should
leverage industry-standard components, as much as possible, rather than
a bespoke platform driver.

Two components are needed: a record format and a persistent storage
interface.

A UEFI CPER "FRU Memory Poison Section" is being proposed, along with a
"Memory Poison Descriptor", to use for this purpose. These new structures
are minimal, saving space on limited non-volatile memory, and extensible.

CPER-aware persistent storage interfaces, like ACPI ERST and EFI Runtime
Variables, can be used. A new interface is not required.

Implement a new module to manage the record formats on persistent
storage. Use the requirements for an AMD MI300-based system to start.
Vendor- and platform-specific details can be abstracted later as needed.

Signed-off-by: Yazen Ghannam <[email protected]>
---
MAINTAINERS | 7 +
drivers/ras/Kconfig | 13 +
drivers/ras/Makefile | 1 +
drivers/ras/amd/fmpm.c | 776 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 797 insertions(+)
create mode 100644 drivers/ras/amd/fmpm.c

diff --git a/MAINTAINERS b/MAINTAINERS
index fc5996feba70..8541cc69c43b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18363,6 +18363,13 @@ F: drivers/ras/
F: include/linux/ras.h
F: include/ras/ras_event.h

+RAS FRU MEMORY POISON MANAGER (FMPM)
+M: Naveen Krishna Chatradhi <[email protected]>
+M: Yazen Ghannam <[email protected]>
+L: [email protected]
+S: Maintained
+F: drivers/ras/amd/fmpm.c
+
RC-CORE / LIRC FRAMEWORK
M: Sean Young <[email protected]>
L: [email protected]
diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
index 2e969f59c0ca..782951aa302f 100644
--- a/drivers/ras/Kconfig
+++ b/drivers/ras/Kconfig
@@ -34,4 +34,17 @@ if RAS
source "arch/x86/ras/Kconfig"
source "drivers/ras/amd/atl/Kconfig"

+config RAS_FMPM
+ tristate "FRU Memory Poison Manager"
+ default m
+ depends on X86_MCE
+ imply AMD_ATL
+ help
+ Support saving and restoring memory error information across reboot
+ cycles using ACPI ERST as persistent storage. Error information is
+ saved with the UEFI CPER "FRU Memory Poison" section format.
+
+ Memory may be retired during boot time and run time depending on
+ platform-specific policies.
+
endif
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index 3fac80f58005..11f95d59d397 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -3,4 +3,5 @@ obj-$(CONFIG_RAS) += ras.o
obj-$(CONFIG_DEBUG_FS) += debugfs.o
obj-$(CONFIG_RAS_CEC) += cec.o

+obj-$(CONFIG_RAS_FMPM) += amd/fmpm.o
obj-y += amd/atl/
diff --git a/drivers/ras/amd/fmpm.c b/drivers/ras/amd/fmpm.c
new file mode 100644
index 000000000000..077d9f35cc7d
--- /dev/null
+++ b/drivers/ras/amd/fmpm.c
@@ -0,0 +1,776 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * FRU (Field-Replaceable Unit) Memory Poison Manager
+ *
+ * Copyright (c) 2024, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Authors:
+ * Naveen Krishna Chatradhi <[email protected]>
+ * Muralidhara M K <[email protected]>
+ * Yazen Ghannam <[email protected]>
+ *
+ * Implementation notes, assumptions, and limitations:
+ *
+ * - FRU Memory Poison Section and Memory Poison Descriptor definitions are not yet
+ * included in the UEFI specification. So they are defined here. Afterwards, they
+ * may be moved to linux/cper.h, if appropriate.
+ *
+ * - Platforms based on AMD MI300 systems will be the first to use these structures.
+ * There are a number of assumptions made here that will need to be generalized
+ * to support other platforms.
+ *
+ * AMD MI300-based platform(s) assumptions:
+ * - Memory errors are reported through x86 MCA.
+ * - The entire DRAM row containing a memory error should be retired.
+ * - There will be (1) FRU Memory Poison Section per CPER.
+ * - The FRU will be the CPU Package (Processor Socket).
+ * - The default number of Memory Poison Descriptor entries should be (8).
+ * - The Platform will use ACPI ERST for persistent storage.
+ * - All FRU records should be saved to persistent storage. Module init will
+ * fail if any FRU record is not successfully written.
+ *
+ * - Source code will be under 'drivers/ras/amd/' unless and until there is interest
+ * to use this module for other vendors.
+ *
+ * - Boot time memory retirement may occur later than ideal due to dependencies
+ * on other libraries and drivers. This leaves a gap where bad memory may be
+ * accessed during early boot stages.
+ *
+ * - Enough memory should be pre-allocated for each FRU record to be able to hold
+ * the expected number of descriptor entries. This, mostly empty, record is
+ * written to storage during init time. Subsequent writes to the same record
+ * should allow the Platform to update the stored record in-place. Otherwise,
+ * if the record is extended, then the Platform may need to perform costly memory
+ * management operations on the storage. For example, the Platform may spend time
+ * in Firmware copying and invalidating memory on a relatively slow SPI ROM.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/cper.h>
+#include <linux/ras.h>
+
+#include <acpi/apei.h>
+
+#include <asm/cpu_device_id.h>
+#include <asm/mce.h>
+
+#pragma pack(1)
+
+/* Validation Bits */
+#define FMP_VALID_ARCH_TYPE BIT_ULL(0)
+#define FMP_VALID_ARCH BIT_ULL(1)
+#define FMP_VALID_ID_TYPE BIT_ULL(2)
+#define FMP_VALID_ID BIT_ULL(3)
+#define FMP_VALID_LIST_ENTRIES BIT_ULL(4)
+#define FMP_VALID_LIST BIT_ULL(5)
+
+/* FRU Architecture Types */
+#define FMP_ARCH_TYPE_X86_CPUID_1_EAX 0
+
+/* FRU ID Types */
+#define FMP_ID_TYPE_X86_PPIN 0
+
+/* FRU Memory Poison Section, UEFI vX.Y sec N.X.Z */
+struct cper_sec_fru_mem_poison {
+ u32 checksum;
+ u64 validation_bits;
+ u32 fru_arch_type;
+ u64 fru_arch;
+ u32 fru_id_type;
+ u64 fru_id;
+ u32 nr_entries;
+};
+
+/* FRU Descriptor ID Types */
+#define FPD_HW_ID_TYPE_MCA_IPID 0
+
+/* FRU Descriptor Address Types */
+#define FPD_ADDR_TYPE_MCA_ADDR 0
+
+/* Memory Poison Descriptor, UEFI vX.Y sec N.X.Y */
+struct cper_fru_poison_desc {
+ u64 timestamp;
+ u32 hw_id_type;
+ u64 hw_id;
+ u32 addr_type;
+ u64 addr;
+};
+
+/* Collection of headers and sections for easy pointer use. */
+struct fru_rec {
+ struct cper_record_header hdr;
+ struct cper_section_descriptor sec_desc;
+ struct cper_sec_fru_mem_poison fmp;
+ struct cper_fru_poison_desc entries[];
+};
+
+/* Reset to default packing */
+#pragma pack()
+
+/*
+ * Pointers to the complete CPER record of each FRU.
+ *
+ * Memory allocation will include padded space for descriptor entries.
+ */
+static struct fru_rec **fru_records;
+
+#define CPER_CREATOR_FMP \
+ GUID_INIT(0xcd5c2993, 0xf4b2, 0x41b2, 0xb5, 0xd4, 0xf9, 0xc3, \
+ 0xa0, 0x33, 0x08, 0x75)
+
+#define CPER_SECTION_TYPE_FMP \
+ GUID_INIT(0x5e4706c1, 0x5356, 0x48c6, 0x93, 0x0b, 0x52, 0xf2, \
+ 0x12, 0x0a, 0x44, 0x58)
+
+/**
+ * DOC: fru_poison_entries (byte)
+ * Maximum number of descriptor entries possible for each FRU.
+ *
+ * Values between '1' and '255' are valid.
+ * No input or '0' will default to FMPM_DEFAULT_MAX_NR_ENTRIES.
+ */
+static u8 max_nr_entries;
+module_param(max_nr_entries, byte, 0644);
+MODULE_PARM_DESC(max_nr_entries,
+ "Maximum number of memory poison descriptor entries per FRU");
+
+#define FMPM_DEFAULT_MAX_NR_ENTRIES 8
+
+/* Maximum number of FRUs in the system. */
+static unsigned int max_nr_fru;
+
+/* Total length of record including headers and list of descriptor entries. */
+static size_t max_rec_len;
+
+/*
+ * Protect the local cache and prevent concurrent writes to storage.
+ * This is only needed after init once notifier block registration is done.
+ */
+static DEFINE_MUTEX(fmpm_update_mutex);
+
+#define for_each_fru(i, rec) \
+ for (i = 0; rec = fru_records[i], i < max_nr_fru; i++)
+
+static inline struct cper_sec_fru_mem_poison *get_fmp(struct fru_rec *rec)
+{
+ return &rec->fmp;
+}
+
+static inline struct cper_fru_poison_desc *get_fpd(struct fru_rec *rec, u32 entry)
+{
+ return &rec->entries[entry];
+}
+
+static inline u32 get_fmp_len(struct fru_rec *rec)
+{
+ return rec->sec_desc.section_length - sizeof(struct cper_section_descriptor);
+}
+
+static struct fru_rec *get_fru_record(u64 fru_id)
+{
+ struct fru_rec *rec;
+ unsigned int i;
+
+ for_each_fru(i, rec) {
+ if (get_fmp(rec)->fru_id == fru_id)
+ return rec;
+ }
+
+ pr_debug("Record not found for FRU 0x%016llx", fru_id);
+ return NULL;
+}
+
+/*
+ * Sum up all bytes within the FRU Memory Poison Section including the Memory
+ * Poison Descriptor entries.
+ */
+static u32 do_fmp_checksum(struct cper_sec_fru_mem_poison *fmp, u32 len)
+{
+ u32 checksum = 0;
+ u8 *buf, *end;
+
+ buf = (u8 *)fmp;
+ end = buf + len;
+
+ while (buf < end)
+ checksum += (u8)(*(buf++));
+
+ return checksum;
+}
+
+/* Calculate a new checksum. */
+static u32 get_fmp_checksum(struct fru_rec *rec)
+{
+ struct cper_sec_fru_mem_poison *fmp = get_fmp(rec);
+ u32 len, checksum;
+
+ len = get_fmp_len(rec);
+
+ /* Get the current total. */
+ checksum = do_fmp_checksum(fmp, len);
+
+ /* Subtract the current checksum from total. */
+ checksum -= fmp->checksum;
+
+ /* Return the compliment value. */
+ return 0 - checksum;
+}
+
+static int update_record_on_storage(struct fru_rec *rec)
+{
+ int ret;
+
+ rec->fmp.checksum = get_fmp_checksum(rec);
+
+ pr_debug("Writing to storage");
+
+ ret = erst_write(&rec->hdr);
+ if (ret)
+ pr_warn("Storage update failed for FRU 0x%016llx", rec->fmp.fru_id);
+
+ return ret;
+}
+
+static void init_fpd(struct cper_fru_poison_desc *fpd, struct mce *m)
+{
+ memset(fpd, 0, sizeof(struct cper_fru_poison_desc));
+
+ fpd->timestamp = m->time;
+ fpd->hw_id_type = FPD_HW_ID_TYPE_MCA_IPID;
+ fpd->hw_id = m->ipid;
+ fpd->addr_type = FPD_ADDR_TYPE_MCA_ADDR;
+ fpd->addr = m->addr;
+}
+
+static bool has_valid_entries(u64 valid_bits)
+{
+ if (!(valid_bits & FMP_VALID_LIST_ENTRIES))
+ return false;
+
+ if (!(valid_bits & FMP_VALID_LIST))
+ return false;
+
+ return true;
+}
+
+static bool same_fpd(struct cper_fru_poison_desc *old, struct cper_fru_poison_desc *new)
+{
+ /*
+ * Ignore timestamp field.
+ * The same physical error may be reported multiple times due to stuck bits, etc.
+ *
+ * Also, order the checks from most->least likely to fail to shortcut the code.
+ */
+ if (old->addr != new->addr)
+ return false;
+
+ if (old->hw_id != new->hw_id)
+ return false;
+
+ if (old->addr_type != new->addr_type)
+ return false;
+
+ if (old->hw_id_type != new->hw_id_type)
+ return false;
+
+ return true;
+}
+
+static bool is_dup_fpd(struct fru_rec *rec, struct cper_fru_poison_desc *new)
+{
+ unsigned int i;
+
+ for (i = 0; i < get_fmp(rec)->nr_entries; i++) {
+ if (same_fpd(get_fpd(rec, i), new)) {
+ pr_debug("Found duplicate record");
+ return true;
+ }
+ }
+
+ return false;
+}
+
+static void update_fru_record(struct fru_rec *rec, struct mce *m)
+{
+ struct cper_sec_fru_mem_poison *fmp = get_fmp(rec);
+ struct cper_fru_poison_desc fpd;
+ u32 entry = 0;
+
+ mutex_lock(&fmpm_update_mutex);
+
+ init_fpd(&fpd, m);
+
+ /* This is the first entry, so just save it. */
+ if (!has_valid_entries(fmp->validation_bits))
+ goto save_fpd;
+
+ /* Ignore already recorded errors. */
+ if (is_dup_fpd(rec, &fpd))
+ goto out_unlock;
+
+ if (fmp->nr_entries >= max_nr_entries) {
+ pr_warn("Exceeded number of entries for FRU 0x%016llx", fmp->fru_id);
+ goto out_unlock;
+ }
+
+ entry = fmp->nr_entries;
+
+save_fpd:
+ memcpy(get_fpd(rec, entry), &fpd, sizeof(struct cper_fru_poison_desc));
+
+ fmp->nr_entries = entry + 1;
+ fmp->validation_bits |= FMP_VALID_LIST_ENTRIES;
+ fmp->validation_bits |= FMP_VALID_LIST;
+
+ pr_debug("Updated FRU 0x%016llx Entry #%u", fmp->fru_id, entry);
+
+ update_record_on_storage(rec);
+
+out_unlock:
+ mutex_unlock(&fmpm_update_mutex);
+}
+
+static void retire_dram_row(u64 addr, u64 id, u32 cpu)
+{
+ struct atl_err a_err;
+
+ memset(&a_err, 0, sizeof(struct atl_err));
+
+ a_err.addr = addr;
+ a_err.ipid = id;
+ a_err.cpu = cpu;
+
+ amd_retire_dram_row(&a_err);
+}
+
+static int fru_mem_poison_handler(struct notifier_block *nb, unsigned long val, void *data)
+{
+ struct mce *m = (struct mce *)data;
+ struct fru_rec *rec;
+
+ if (!mce_is_memory_error(m))
+ return NOTIFY_DONE;
+
+ retire_dram_row(m->addr, m->ipid, m->extcpu);
+
+ /*
+ * This should not happen on real errors. But it could happen from
+ * software error injection, etc.
+ */
+ rec = get_fru_record(m->ppin);
+ if (!rec)
+ return NOTIFY_DONE;
+
+ update_fru_record(rec, m);
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block fru_mem_poison_nb = {
+ .notifier_call = fru_mem_poison_handler,
+ .priority = MCE_PRIO_LOWEST,
+};
+
+static u32 get_cpu_from_fru_id(u64 fru_id)
+{
+ unsigned int cpu = 0;
+
+ /* Should there be more robust error handling if none found? */
+ for_each_online_cpu(cpu) {
+ if (topology_ppin(cpu) == fru_id)
+ break;
+ }
+
+ return cpu;
+}
+
+static void retire_mem_fmp(struct fru_rec *rec, u32 nr_entries, u32 cpu)
+{
+ struct cper_fru_poison_desc *fpd;
+ unsigned int i;
+
+ for (i = 0; i < nr_entries; i++) {
+ fpd = get_fpd(rec, i);
+
+ if (fpd->hw_id_type != FPD_HW_ID_TYPE_MCA_IPID)
+ continue;
+
+ if (fpd->addr_type != FPD_ADDR_TYPE_MCA_ADDR)
+ continue;
+
+ retire_dram_row(fpd->addr, fpd->hw_id, cpu);
+ }
+}
+
+static void retire_mem_records(void)
+{
+ struct cper_sec_fru_mem_poison *fmp;
+ struct fru_rec *rec;
+ unsigned int i;
+ u32 cpu;
+
+ for_each_fru(i, rec) {
+ fmp = get_fmp(rec);
+
+ if (!has_valid_entries(fmp->validation_bits))
+ continue;
+
+ cpu = get_cpu_from_fru_id(fmp->fru_id);
+
+ retire_mem_fmp(rec, fmp->nr_entries, cpu);
+ }
+}
+
+/* Set the CPER Record Header and CPER Section Descriptor fields. */
+static void set_rec_fields(struct fru_rec *rec)
+{
+ struct cper_section_descriptor *sec_desc = &rec->sec_desc;
+ struct cper_record_header *hdr = &rec->hdr;
+
+ memcpy(hdr->signature, CPER_SIG_RECORD, CPER_SIG_SIZE);
+ hdr->revision = CPER_RECORD_REV;
+ hdr->signature_end = CPER_SIG_END;
+
+ /*
+ * Currently, it is assumed that there is one FRU Memory Poison
+ * section per CPER. But this may change for other implementations.
+ */
+ hdr->section_count = 1;
+
+ /* The logged errors are recoverable. Otherwise, they'd never make it here. */
+ hdr->error_severity = CPER_SEV_RECOVERABLE;
+
+ hdr->validation_bits = 0;
+ hdr->record_length = max_rec_len;
+ hdr->creator_id = CPER_CREATOR_FMP;
+ hdr->notification_type = CPER_NOTIFY_MCE;
+ hdr->record_id = cper_next_record_id();
+ hdr->flags = CPER_HW_ERROR_FLAGS_PREVERR;
+
+ sec_desc->section_offset = sizeof(struct cper_record_header);
+ sec_desc->section_length = max_rec_len - sizeof(struct cper_record_header);
+ sec_desc->revision = CPER_SEC_REV;
+ sec_desc->validation_bits = 0;
+ sec_desc->flags = CPER_SEC_PRIMARY;
+ sec_desc->section_type = CPER_SECTION_TYPE_FMP;
+ sec_desc->section_severity = CPER_SEV_RECOVERABLE;
+}
+
+static int save_new_records(void)
+{
+ struct fru_rec *rec;
+ unsigned int i;
+ int ret = 0;
+
+ for_each_fru(i, rec) {
+ /* Skip restored records. Should these be fixed up? */
+ if (rec->hdr.record_length)
+ continue;
+
+ set_rec_fields(rec);
+
+ ret = update_record_on_storage(rec);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
+static bool is_valid_fmp(struct fru_rec *rec)
+{
+ struct cper_sec_fru_mem_poison *fmp = get_fmp(rec);
+ u32 len = get_fmp_len(rec);
+
+ if (!fmp)
+ return false;
+
+ if (!len)
+ return false;
+
+ /* Checksum must sum to zero for the entire section. */
+ if (do_fmp_checksum(fmp, len))
+ return false;
+
+ if (!(fmp->validation_bits & FMP_VALID_ARCH_TYPE))
+ return false;
+
+ if (fmp->fru_arch_type != FMP_ARCH_TYPE_X86_CPUID_1_EAX)
+ return false;
+
+ if (!(fmp->validation_bits & FMP_VALID_ARCH))
+ return false;
+
+ if (fmp->fru_arch != cpuid_eax(1))
+ return false;
+
+ if (!(fmp->validation_bits & FMP_VALID_ID_TYPE))
+ return false;
+
+ if (fmp->fru_id_type != FMP_ID_TYPE_X86_PPIN)
+ return false;
+
+ if (!(fmp->validation_bits & FMP_VALID_ID))
+ return false;
+
+ return true;
+}
+
+static void restore_record(struct fru_rec *new, struct fru_rec *old)
+{
+ /* Records larger than max_rec_len were skipped earlier. */
+ size_t len = min(max_rec_len, old->hdr.record_length);
+
+ memcpy(new, old, len);
+}
+
+static bool valid_record(struct fru_rec *old)
+{
+ struct fru_rec *new;
+
+ if (!is_valid_fmp(old)) {
+ pr_debug("Ignoring invalid record");
+ return false;
+ }
+
+ new = get_fru_record(old->fmp.fru_id);
+ if (!new) {
+ pr_debug("Ignoring record for absent FRU");
+ return false;
+ }
+
+ /* What if ERST has duplicate FRU entries? */
+ restore_record(new, old);
+
+ return true;
+}
+
+/*
+ * Fetch saved records from persistent storage.
+ *
+ * For each found record:
+ * - If it was not created by this module, then ignore it.
+ * - If it is valid, then copy its data to the local cache.
+ * - If it is not valid, then erase it.
+ */
+static int get_saved_records(void)
+{
+ struct fru_rec *old;
+ u64 record_id;
+ int ret, pos;
+ ssize_t len;
+
+ /*
+ * Assume saved records match current max size.
+ *
+ * However, this may not be true depending on module parameters.
+ */
+ old = kmalloc(max_rec_len, GFP_KERNEL);
+ if (!old) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = erst_get_record_id_begin(&pos);
+ if (ret < 0)
+ goto out_end;
+
+ while (!erst_get_record_id_next(&pos, &record_id)) {
+ /*
+ * Make sure to clear temporary buffer between reads to avoid
+ * leftover data from records of various sizes.
+ */
+ memset(old, 0, max_rec_len);
+
+ len = erst_read_record(record_id, &old->hdr, max_rec_len,
+ sizeof(struct fru_rec), &CPER_CREATOR_FMP);
+
+ /* Should this be retried if the temporary buffer is too small? */
+ if (len < 0)
+ continue;
+
+ if (!valid_record(old))
+ erst_clear(record_id);
+ }
+
+out_end:
+ erst_get_record_id_end();
+ kfree(old);
+out:
+ return ret;
+}
+
+static void set_fmp_fields(struct cper_sec_fru_mem_poison *fmp, unsigned int cpu)
+{
+ fmp->fru_arch_type = FMP_ARCH_TYPE_X86_CPUID_1_EAX;
+ fmp->validation_bits |= FMP_VALID_ARCH_TYPE;
+
+ /* Assume all CPUs in the system have the same value for now. */
+ fmp->fru_arch = cpuid_eax(1);
+ fmp->validation_bits |= FMP_VALID_ARCH;
+
+ fmp->fru_id_type = FMP_ID_TYPE_X86_PPIN;
+ fmp->validation_bits |= FMP_VALID_ID_TYPE;
+
+ fmp->fru_id = topology_ppin(cpu);
+ fmp->validation_bits |= FMP_VALID_ID;
+}
+
+static unsigned int get_cpu_for_fru_num(unsigned int i)
+{
+ unsigned int cpu = 0;
+
+ /* Should there be more robust error handling if none found? */
+ for_each_online_cpu(cpu) {
+ if (topology_physical_package_id(cpu) == i)
+ return cpu;
+ }
+
+ return cpu;
+}
+
+static void init_fmps(void)
+{
+ struct fru_rec *rec;
+ unsigned int i, cpu;
+
+ for_each_fru(i, rec) {
+ cpu = get_cpu_for_fru_num(i);
+ set_fmp_fields(get_fmp(rec), cpu);
+ }
+}
+
+static int get_system_info(void)
+{
+ u8 model = boot_cpu_data.x86_model;
+
+ /* Only load on MI300A systems for now. */
+ if (!(model >= 0x90 && model <= 0x9f))
+ return -ENODEV;
+
+ if (!cpu_feature_enabled(X86_FEATURE_AMD_PPIN)) {
+ pr_debug("PPIN feature not available");
+ return -ENODEV;
+ }
+
+ /* Use CPU Package (Socket) as FRU for MI300 systems. */
+ max_nr_fru = topology_max_packages();
+ if (!max_nr_fru)
+ return -ENODEV;
+
+ if (!max_nr_entries)
+ max_nr_entries = FMPM_DEFAULT_MAX_NR_ENTRIES;
+
+ max_rec_len = sizeof(struct fru_rec);
+ max_rec_len += sizeof(struct cper_fru_poison_desc) * max_nr_entries;
+
+ pr_debug("max_nr_fru=%u max_nr_entries=%u, max_rec_len=%lu",
+ max_nr_fru, max_nr_entries, max_rec_len);
+ return 0;
+}
+
+static void deallocate_records(void)
+{
+ struct fru_rec *rec;
+ int i;
+
+ for_each_fru(i, rec)
+ kfree(rec);
+
+ kfree(fru_records);
+}
+
+static int allocate_records(void)
+{
+ int i, ret = 0;
+
+ fru_records = kcalloc(max_nr_fru, sizeof(struct fru_rec *), GFP_KERNEL);
+ if (!fru_records) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ for (i = 0; i < max_nr_fru; i++) {
+ fru_records[i] = kzalloc(max_rec_len, GFP_KERNEL);
+ if (!fru_records[i]) {
+ ret = -ENOMEM;
+ goto out_free;
+ }
+ }
+
+ return ret;
+
+out_free:
+ for (; i >= 0; i--)
+ kfree(fru_records[i]);
+
+ kfree(fru_records);
+out:
+ return ret;
+}
+
+static const struct x86_cpu_id fmpm_cpuids[] = {
+ X86_MATCH_VENDOR_FAM(AMD, 0x19, NULL),
+ { }
+};
+MODULE_DEVICE_TABLE(x86cpu, fmpm_cpuids);
+
+static int __init fru_mem_poison_init(void)
+{
+ int ret;
+
+ if (!x86_match_cpu(fmpm_cpuids)) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ if (erst_disable) {
+ pr_debug("ERST not available");
+ ret = -ENODEV;
+ goto out;
+ }
+
+ ret = get_system_info();
+ if (ret)
+ goto out;
+
+ ret = allocate_records();
+ if (ret)
+ goto out;
+
+ init_fmps();
+
+ ret = get_saved_records();
+ if (ret)
+ goto out_free;
+
+ ret = save_new_records();
+ if (ret)
+ goto out_free;
+
+ retire_mem_records();
+
+ mce_register_decode_chain(&fru_mem_poison_nb);
+
+ pr_info("FRU Memory Poison Manager initialized");
+ return 0;
+
+out_free:
+ deallocate_records();
+out:
+ return ret;
+}
+
+static void __exit fru_mem_poison_exit(void)
+{
+ mce_unregister_decode_chain(&fru_mem_poison_nb);
+ deallocate_records();
+}
+
+module_init(fru_mem_poison_init);
+module_exit(fru_mem_poison_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("FRU Memory Poison Manager");
--
2.34.1

2024-02-14 07:53:14

Subject: [PATCH 0/2] FRU Memory Poison Manager

Subject: [PATCH 1/2] RAS/AMD/ATL, EDAC/amd64: Move MI300 Row Retirement to ATL

Subject: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 0/2] FRU Memory Poison Manager

Subject: Re: [PATCH 1/2] RAS/AMD/ATL, EDAC/amd64: Move MI300 Row Retirement to ATL

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 1/2] RAS/AMD/ATL, EDAC/amd64: Move MI300 Row Retirement to ATL

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 1/2] RAS/AMD/ATL, EDAC/amd64: Move MI300 Row Retirement to ATL

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

Subject: Re: [PATCH 0/2] FRU Memory Poison Manager

Subject: Re: [PATCH 0/2] FRU Memory Poison Manager