Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in
apci_init()") introduced a bug that ghes_edac_register() would be invoked
before edac_init(). Because at that time, the bus "edac" hasn't been even
registered, this created sysfs /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.
Changelog:
v3:
- refine the commit logs
- introduce ghes preferred and present flag (by Toshi)
- move force_load to setup parameter
- add the ghes_edac_preferred() check for x86/Arm edac drivers (incompleted list)
v2:
- add acked-by tag of Patch 1 from Ard
- split the notifier patch
- add 2 patch to get regular drivers selected when ghes edac is not loaded
- fix an errno in igen6 driver
- add a patch to fix the sparse warning of ghes
- refine the commit logs
Jia He (9):
efi/cper: export several helpers for ghes_edac to use
EDAC/ghes: Add a notifier for reporting memory errors
EDAC/ghes: Make ghes_edac a proper module to remove the dependency on
ghes
EDAC/ghes: Move ghes_edac.force_load to setup parameter
EDAC: Don't load chipset-specific edac drivers when ghes_edac is
preferred
ghes: Introduce a flag ghes_present
apei/ghes: Use unrcu_pointer for cmpxchg
EDAC/igen6: Keep returned errno consistent when edac mc has been
enabled
edac: Don't load Arm specific edac drivers when ghes_edac is preferred
.../admin-guide/kernel-parameters.txt | 5 +
drivers/acpi/apei/ghes.c | 94 ++++++++++++++--
drivers/edac/Kconfig | 4 +-
drivers/edac/amd64_edac.c | 3 +
drivers/edac/armada_xp_edac.c | 3 +
drivers/edac/edac_module.h | 1 +
drivers/edac/ghes_edac.c | 102 +++++++++++-------
drivers/edac/i10nm_base.c | 3 +
drivers/edac/igen6_edac.c | 5 +-
drivers/edac/layerscape_edac.c | 3 +
drivers/edac/pnd2_edac.c | 3 +
drivers/edac/sb_edac.c | 3 +
drivers/edac/skx_base.c | 3 +
drivers/edac/thunderx_edac.c | 3 +
drivers/edac/xgene_edac.c | 3 +
drivers/firmware/efi/cper.c | 3 +
include/acpi/ghes.h | 38 +++----
17 files changed, 204 insertions(+), 75 deletions(-)
--
2.25.1
edac_mc_add_mc* is too late in the init path and that check should happen
as the very first thing in the driver init function.
Signed-off-by: Jia He <[email protected]>
---
drivers/edac/armada_xp_edac.c | 3 +++
drivers/edac/layerscape_edac.c | 3 +++
drivers/edac/thunderx_edac.c | 3 +++
drivers/edac/xgene_edac.c | 3 +++
4 files changed, 12 insertions(+)
diff --git a/drivers/edac/armada_xp_edac.c b/drivers/edac/armada_xp_edac.c
index 038abbb83f4b..486532b92ce0 100644
--- a/drivers/edac/armada_xp_edac.c
+++ b/drivers/edac/armada_xp_edac.c
@@ -599,6 +599,9 @@ static int __init armada_xp_edac_init(void)
{
int res;
+ if (ghes_edac_preferred())
+ return -EBUSY;
+
/* only polling is supported */
edac_op_state = EDAC_OPSTATE_POLL;
diff --git a/drivers/edac/layerscape_edac.c b/drivers/edac/layerscape_edac.c
index 94cac7686a56..60ff4c6674cd 100644
--- a/drivers/edac/layerscape_edac.c
+++ b/drivers/edac/layerscape_edac.c
@@ -38,6 +38,9 @@ static int __init fsl_ddr_mc_init(void)
{
int res;
+ if (ghes_edac_preferred())
+ return -EBUSY;
+
/* make sure error reporting method is sane */
switch (edac_op_state) {
case EDAC_OPSTATE_POLL:
diff --git a/drivers/edac/thunderx_edac.c b/drivers/edac/thunderx_edac.c
index f13674081cb6..2c4baa6817a9 100644
--- a/drivers/edac/thunderx_edac.c
+++ b/drivers/edac/thunderx_edac.c
@@ -2114,6 +2114,9 @@ static int __init thunderx_edac_init(void)
{
int rc = 0;
+ if (ghes_edac_preferred())
+ return -EBUSY;
+
rc = pci_register_driver(&thunderx_lmc_driver);
if (rc)
return rc;
diff --git a/drivers/edac/xgene_edac.c b/drivers/edac/xgene_edac.c
index 54081403db4f..9aa68220b625 100644
--- a/drivers/edac/xgene_edac.c
+++ b/drivers/edac/xgene_edac.c
@@ -2004,6 +2004,9 @@ static int __init xgene_edac_init(void)
{
int rc;
+ if (ghes_edac_preferred())
+ return -EBUSY;
+
/* Make sure error reporting method is sane */
switch (edac_op_state) {
case EDAC_OPSTATE_POLL:
--
2.25.1
ghes_estatus_caches should be add rcu annotation to avoid sparse warnings.
drivers/acpi/apei/ghes.c:733:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
drivers/acpi/apei/ghes.c:733:25: sparse: struct ghes_estatus_cache [noderef] __rcu *
drivers/acpi/apei/ghes.c:733:25: sparse: struct ghes_estatus_cache *
drivers/acpi/apei/ghes.c:813:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
drivers/acpi/apei/ghes.c:813:25: sparse: struct ghes_estatus_cache [noderef] __rcu *
drivers/acpi/apei/ghes.c:813:25: sparse: struct ghes_estatus_cache *
unrcu_pointer is to strip the __rcu in cmpxchg.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Jia He <[email protected]>
---
drivers/acpi/apei/ghes.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 31c674639e86..94e3a15c9b06 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -165,7 +165,7 @@ struct ghes_vendor_record_entry {
static struct gen_pool *ghes_estatus_pool;
static unsigned long ghes_estatus_pool_size_request;
-static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
static atomic_t ghes_estatus_cache_alloced;
static int ghes_panic_timeout __read_mostly = 30;
@@ -855,8 +855,9 @@ static void ghes_estatus_cache_add(
}
/* new_cache must be put into array after its contents are written */
smp_wmb();
- if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
- slot_cache, new_cache) == slot_cache) {
+ if (slot != -1 && unrcu_pointer(cmpxchg(ghes_estatus_caches + slot,
+ RCU_INITIALIZER(slot_cache),
+ RCU_INITIALIZER(new_cache))) == slot_cache) {
if (slot_cache)
call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
} else
--
2.25.1
Only a single edac driver can be enabled for EDAC MC. The igen6_init()
should be returned with EBUSY instead of ENODEV, which is consistent with
other edac drivers.
Signed-off-by: Jia He <[email protected]>
---
drivers/edac/igen6_edac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 4ac6d0c533ec..4646cb72b9f8 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1276,7 +1276,7 @@ static int __init igen6_init(void)
owner = edac_get_owner();
if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
- return -ENODEV;
+ return -EBUSY;
edac_op_state = EDAC_OPSTATE_NMI;
--
2.25.1
Before ghes_edac can be turned back into a proper module again, export
several helpers which are going to be used by it.
Signed-off-by: Jia He <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/cper.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index e4e5ea7ce910..053eae13f409 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -290,6 +290,7 @@ int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
return n;
}
+EXPORT_SYMBOL_GPL(cper_mem_err_location);
int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
{
@@ -310,6 +311,7 @@ int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
return n;
}
+EXPORT_SYMBOL_GPL(cper_dimm_err_location);
void cper_mem_err_pack(const struct cper_sec_mem_err *mem,
struct cper_mem_err_compact *cmem)
@@ -331,6 +333,7 @@ void cper_mem_err_pack(const struct cper_sec_mem_err *mem,
cmem->mem_array_handle = mem->mem_array_handle;
cmem->mem_dev_handle = mem->mem_dev_handle;
}
+EXPORT_SYMBOL_GPL(cper_mem_err_pack);
const char *cper_mem_err_unpack(struct trace_seq *p,
struct cper_mem_err_compact *cmem)
--
2.25.1
edac_mc_add_mc* is too late in the init path and that check should happen
as the very first thing in the driver init function.
Suggested-by: Toshi Kani <[email protected]>
Signed-off-by: Jia He <[email protected]>
---
drivers/acpi/apei/ghes.c | 13 +++++++++++--
drivers/edac/amd64_edac.c | 3 +++
drivers/edac/edac_module.h | 1 +
drivers/edac/i10nm_base.c | 3 +++
drivers/edac/igen6_edac.c | 3 +++
drivers/edac/pnd2_edac.c | 3 +++
drivers/edac/sb_edac.c | 3 +++
drivers/edac/skx_base.c | 3 +++
include/acpi/ghes.h | 2 ++
9 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e17e0ee8f842..327386f3cf33 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1537,16 +1537,25 @@ static struct acpi_platform_list plat_list[] = {
{ } /* End */
};
-struct list_head *ghes_get_devices(void)
+bool ghes_edac_preferred(void)
{
int idx = -1;
if (IS_ENABLED(CONFIG_X86)) {
idx = acpi_match_platform_list(plat_list);
if (idx < 0 && !ghes_edac_force)
- return NULL;
+ return false;
}
+ return true;
+}
+EXPORT_SYMBOL_GPL(ghes_edac_preferred);
+
+struct list_head *ghes_get_devices(void)
+{
+ if (!ghes_edac_preferred())
+ return NULL;
+
return &ghes_devs;
}
EXPORT_SYMBOL_GPL(ghes_get_devices);
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2f854feeeb23..5314a934c2bf 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -4329,6 +4329,9 @@ static int __init amd64_edac_init(void)
int err = -ENODEV;
int i;
+ if (ghes_edac_preferred())
+ return -EBUSY;
+
owner = edac_get_owner();
if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
return -EBUSY;
diff --git a/drivers/edac/edac_module.h b/drivers/edac/edac_module.h
index 96f6de0c8ff6..3826f82de487 100644
--- a/drivers/edac/edac_module.h
+++ b/drivers/edac/edac_module.h
@@ -11,6 +11,7 @@
#ifndef __EDAC_MODULE_H__
#define __EDAC_MODULE_H__
+#include <acpi/ghes.h>
#include "edac_mc.h"
#include "edac_pci.h"
#include "edac_device.h"
diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c
index 6cf50ee0b77c..691df9b51622 100644
--- a/drivers/edac/i10nm_base.c
+++ b/drivers/edac/i10nm_base.c
@@ -548,6 +548,9 @@ static int __init i10nm_init(void)
edac_dbg(2, "\n");
+ if (ghes_edac_preferred())
+ return -EBUSY;
+
owner = edac_get_owner();
if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
return -EBUSY;
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index a07bbfd075d0..4ac6d0c533ec 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1271,6 +1271,9 @@ static int __init igen6_init(void)
edac_dbg(2, "\n");
+ if (ghes_edac_preferred())
+ return -EBUSY;
+
owner = edac_get_owner();
if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
return -ENODEV;
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index a20b299f1202..4ddc43e6c420 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -1528,6 +1528,9 @@ static int __init pnd2_init(void)
edac_dbg(2, "\n");
+ if (ghes_edac_preferred())
+ return -EBUSY;
+
owner = edac_get_owner();
if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
return -EBUSY;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 9678ab97c7ac..3ff604a5e95a 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -3506,6 +3506,9 @@ static int __init sbridge_init(void)
edac_dbg(2, "\n");
+ if (ghes_edac_preferred())
+ return -EBUSY;
+
owner = edac_get_owner();
if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
return -EBUSY;
diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c
index 1abc020d49ab..286370728552 100644
--- a/drivers/edac/skx_base.c
+++ b/drivers/edac/skx_base.c
@@ -653,6 +653,9 @@ static int __init skx_init(void)
edac_dbg(2, "\n");
+ if (ghes_edac_preferred())
+ return -EBUSY;
+
owner = edac_get_owner();
if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
return -EBUSY;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index e29327ee0b83..1c47018a1e43 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -73,9 +73,11 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
struct list_head *ghes_get_devices(void);
+bool ghes_edac_preferred(void);
extern bool ghes_edac_force;
#else
static inline struct list_head *ghes_get_devices(void) { return NULL; }
+static bool ghes_edac_preferred(void) { return false; }
static bool ghes_edac_force;
#endif
--
2.25.1
ghes_edac_init() is too late to set this module flag ghes_edac.force_load.
Also, other edac drivers should not be able to control this flag.
Move this flag to setup parameter in ghes.
Suggested-by: Toshi Kani <[email protected]>
Signed-off-by: Jia He <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 5 +++
drivers/acpi/apei/ghes.c | 24 +++++++++++-
drivers/edac/ghes_edac.c | 38 +++++++------------
include/acpi/ghes.h | 7 +++-
4 files changed, 46 insertions(+), 28 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d7f30902fda0..a5f0ee0d7727 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1593,6 +1593,11 @@
When zero, profiling data is discarded and associated
debugfs files are removed at module unload time.
+ ghes_edac_force= [X86] Skip the platform check and forcibly load the
+ ghes_edac modules.
+ Format: <bool>
+ default: false (0)
+
goldfish [X86] Enable the goldfish android emulator platform.
Don't use this when you are not running on the
android emulator
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 9c52183e3ad9..e17e0ee8f842 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -94,6 +94,26 @@
#define FIX_APEI_GHES_SDEI_CRITICAL __end_of_fixed_addresses
#endif
+/*
+ * "ghes_edac_force=1" forcibly loads ghes_edac and skips the platform
+ * check.
+ */
+bool __read_mostly ghes_edac_force;
+EXPORT_SYMBOL(ghes_edac_force);
+
+static int __init setup_ghes_edac_load(char *str)
+{
+ if (str)
+ if (!strcmp("true", str) || !strcmp("1", str))
+ ghes_edac_force = true;
+
+ if (!IS_ENABLED(CONFIG_X86))
+ ghes_edac_force = true;
+
+ return 1;
+}
+__setup("ghes_edac_force=", setup_ghes_edac_load);
+
static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
static inline bool is_hest_type_generic_v2(struct ghes *ghes)
@@ -1517,13 +1537,13 @@ static struct acpi_platform_list plat_list[] = {
{ } /* End */
};
-struct list_head *ghes_get_devices(bool force)
+struct list_head *ghes_get_devices(void)
{
int idx = -1;
if (IS_ENABLED(CONFIG_X86)) {
idx = acpi_match_platform_list(plat_list);
- if (idx < 0 && !force)
+ if (idx < 0 && !ghes_edac_force)
return NULL;
}
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index bb3ea42ba70b..6a2b54cc7240 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -54,10 +54,6 @@ static DEFINE_MUTEX(ghes_reg_mutex);
*/
static DEFINE_SPINLOCK(ghes_lock);
-/* "ghes_edac.force_load=1" skips the platform check */
-static bool __read_mostly force_load;
-module_param(force_load, bool, 0);
-
static bool system_scanned;
static struct list_head *ghes_devs;
@@ -437,23 +433,12 @@ static int ghes_edac_register(struct device *dev)
mci->ctl_name = "ghes_edac";
mci->dev_name = "ghes";
- if (fake) {
- pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
- pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
- pr_info("work on such system. Use this driver with caution\n");
- } else if (force_load) {
- pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
- pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
- pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
- pr_info("If you find incorrect reports, please contact your hardware vendor\n");
- pr_info("to correct its BIOS.\n");
- pr_info("This system has %d DIMM sockets.\n", ghes_hw.num_dimms);
- }
-
if (!fake) {
struct dimm_info *src, *dst;
int i = 0;
+ pr_info("This system has %d DIMM sockets.\n", ghes_hw.num_dimms);
+
mci_for_each_dimm(mci, dst) {
src = &ghes_hw.dimms[i];
@@ -478,6 +463,17 @@ static int ghes_edac_register(struct device *dev)
} else {
struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
+ pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+ pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+ pr_info("work on such system. Use this driver with caution\n");
+
+ if (ghes_edac_force) {
+ pr_info("This EDAC driver relies on BIOS to enumerate memory and get\n");
+ pr_info("error reports. Unfortunately, not all BIOSes reflect the\n");
+ pr_info("memory layout correctly. If you find incorrect reports, please\n");
+ pr_info("contact your hardware vendor for its in correct BIOS.\n");
+ }
+
dimm->nr_pages = 1;
dimm->grain = 128;
dimm->mtype = MEM_UNKNOWN;
@@ -518,9 +514,6 @@ static void ghes_edac_unregister(struct ghes *ghes)
struct mem_ctl_info *mci;
unsigned long flags;
- if (!force_load)
- return;
-
mutex_lock(&ghes_reg_mutex);
system_scanned = false;
@@ -554,10 +547,7 @@ static int __init ghes_edac_init(void)
{
struct ghes *g, *g_tmp;
- if (!IS_ENABLED(CONFIG_X86))
- force_load = true;
-
- ghes_devs = ghes_get_devices(force_load);
+ ghes_devs = ghes_get_devices();
if (!ghes_devs)
return -ENODEV;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 150c0b9500d6..e29327ee0b83 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -71,9 +71,12 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
* @nb: pointer to the notifier_block structure of the vendor record handler.
*/
void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
-struct list_head *ghes_get_devices(bool force);
+
+struct list_head *ghes_get_devices(void);
+extern bool ghes_edac_force;
#else
-static inline struct list_head *ghes_get_devices(bool force) { return NULL; }
+static inline struct list_head *ghes_get_devices(void) { return NULL; }
+static bool ghes_edac_force;
#endif
int ghes_estatus_pool_init(int num_ghes);
--
2.25.1
Hi,
Sorry for resending the v3.
There is an exceptional interrupt when I tried to post v3 at the first time.
Maybe it is caused by a comma "," inside the mail name.
E.g.
Signed-off-by: Some, one <[email protected]>
Looks like a git sendemail issue?
Anyway, sorry for the inconvenience.
--
Cheers,
Justin (Jia He)
On Tue, Aug 23, 2022 at 3:50 AM Justin He <[email protected]> wrote:
>
> Hi,
> Sorry for resending the v3.
> There is an exceptional interrupt when I tried to post v3 at the first time.
> Maybe it is caused by a comma "," inside the mail name.
> E.g.
> Signed-off-by: Some, one <[email protected]>
> Looks like a git sendemail issue?
>
> Anyway, sorry for the inconvenience.
I've received it twice, but no problem.
I need Boris to tell me what to do with this series.
On Mon, Aug 22, 2022 at 03:40:43PM +0000, Jia He wrote:
> ghes_edac_init() is too late to set this module flag ghes_edac.force_load.
> Also, other edac drivers should not be able to control this flag.
>
> Move this flag to setup parameter in ghes.
>
> Suggested-by: Toshi Kani <[email protected]>
> Signed-off-by: Jia He <[email protected]>
> ---
> .../admin-guide/kernel-parameters.txt | 5 +++
> drivers/acpi/apei/ghes.c | 24 +++++++++++-
> drivers/edac/ghes_edac.c | 38 +++++++------------
> include/acpi/ghes.h | 7 +++-
> 4 files changed, 46 insertions(+), 28 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index d7f30902fda0..a5f0ee0d7727 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1593,6 +1593,11 @@
> When zero, profiling data is discarded and associated
> debugfs files are removed at module unload time.
>
> + ghes_edac_force= [X86] Skip the platform check and forcibly load the
So there already is ghes.disable which is using the module param thing.
Why don't you do that too?
> + ghes_edac modules.
"module" - singular.
> + Format: <bool>
> + default: false (0)
> +
> goldfish [X86] Enable the goldfish android emulator platform.
> Don't use this when you are not running on the
> android emulator
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 9c52183e3ad9..e17e0ee8f842 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -94,6 +94,26 @@
> #define FIX_APEI_GHES_SDEI_CRITICAL __end_of_fixed_addresses
> #endif
>
> +/*
> + * "ghes_edac_force=1" forcibly loads ghes_edac and skips the platform
> + * check.
> + */
> +bool __read_mostly ghes_edac_force;
> +EXPORT_SYMBOL(ghes_edac_force);
> +
> +static int __init setup_ghes_edac_load(char *str)
> +{
> + if (str)
> + if (!strcmp("true", str) || !strcmp("1", str))
> + ghes_edac_force = true;
> +
> + if (!IS_ENABLED(CONFIG_X86))
> + ghes_edac_force = true;
> +
> + return 1;
> +}
> +__setup("ghes_edac_force=", setup_ghes_edac_load);
Why all that?
Isn't specifying
ghes.edac_force_load
on the kernel command line enough? I.e., you don't need to parse the
passed in option - just the presence of the parameter is enough.
> +
> static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
>
> static inline bool is_hest_type_generic_v2(struct ghes *ghes)
> @@ -1517,13 +1537,13 @@ static struct acpi_platform_list plat_list[] = {
> { } /* End */
> };
>
> -struct list_head *ghes_get_devices(bool force)
> +struct list_head *ghes_get_devices(void)
> {
> int idx = -1;
>
> if (IS_ENABLED(CONFIG_X86)) {
> idx = acpi_match_platform_list(plat_list);
> - if (idx < 0 && !force)
> + if (idx < 0 && !ghes_edac_force)
> return NULL;
> }
>
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index bb3ea42ba70b..6a2b54cc7240 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -54,10 +54,6 @@ static DEFINE_MUTEX(ghes_reg_mutex);
> */
> static DEFINE_SPINLOCK(ghes_lock);
>
> -/* "ghes_edac.force_load=1" skips the platform check */
> -static bool __read_mostly force_load;
> -module_param(force_load, bool, 0);
> -
> static bool system_scanned;
>
> static struct list_head *ghes_devs;
> @@ -437,23 +433,12 @@ static int ghes_edac_register(struct device *dev)
> mci->ctl_name = "ghes_edac";
> mci->dev_name = "ghes";
>
> - if (fake) {
> - pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
> - pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
> - pr_info("work on such system. Use this driver with caution\n");
> - } else if (force_load) {
> - pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
> - pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
> - pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
> - pr_info("If you find incorrect reports, please contact your hardware vendor\n");
> - pr_info("to correct its BIOS.\n");
> - pr_info("This system has %d DIMM sockets.\n", ghes_hw.num_dimms);
> - }
> -
> if (!fake) {
> struct dimm_info *src, *dst;
> int i = 0;
>
> + pr_info("This system has %d DIMM sockets.\n", ghes_hw.num_dimms);
> +
> mci_for_each_dimm(mci, dst) {
> src = &ghes_hw.dimms[i];
>
This hunk...
> @@ -478,6 +463,17 @@ static int ghes_edac_register(struct device *dev)
> } else {
> struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
>
> + pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
> + pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
> + pr_info("work on such system. Use this driver with caution\n");
> +
> + if (ghes_edac_force) {
> + pr_info("This EDAC driver relies on BIOS to enumerate memory and get\n");
> + pr_info("error reports. Unfortunately, not all BIOSes reflect the\n");
> + pr_info("memory layout correctly. If you find incorrect reports, please\n");
> + pr_info("contact your hardware vendor for its in correct BIOS.\n");
> + }
> +
> dimm->nr_pages = 1;
> dimm->grain = 128;
> dimm->mtype = MEM_UNKNOWN;
... and this hunk look unrelated to what this patch is doing. What are
they for?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Monday, August 22, 2022 9:41 AM, Jia He wrote:
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index e17e0ee8f842..327386f3cf33 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -1537,16 +1537,25 @@ static struct acpi_platform_list plat_list[] = {
> { } /* End */
> };
>
> -struct list_head *ghes_get_devices(void)
> +bool ghes_edac_preferred(void)
> {
> int idx = -1;
>
> if (IS_ENABLED(CONFIG_X86)) {
> idx = acpi_match_platform_list(plat_list);
> if (idx < 0 && !ghes_edac_force)
> - return NULL;
> + return false;
> }
>
> + return true;
> +}
> +EXPORT_SYMBOL_GPL(ghes_edac_preferred);
> +
> +struct list_head *ghes_get_devices(void)
> +{
> + if (!ghes_edac_preferred())
> + return NULL;
> +
> return &ghes_devs;
> }
ghes_get_devices() changing multiple times in the series is
confusing to me. Can you simply introduce ghes_get_devices()
and ghes_preferred() in the right state in a patch? Perhaps,
patch #2, #5, #6 can collapse to introduce the two funcs?
The rest of patch #5 adding the call to ghes_edac_preferred()
into other edac drivers can remain as a separate patch.
Toshi
> -----Original Message-----
[...]
> > +static int __init setup_ghes_edac_load(char *str) {
> > + if (str)
> > + if (!strcmp("true", str) || !strcmp("1", str))
> > + ghes_edac_force = true;
> > +
> > + if (!IS_ENABLED(CONFIG_X86))
> > + ghes_edac_force = true;
> > +
> > + return 1;
> > +}
> > +__setup("ghes_edac_force=", setup_ghes_edac_load);
>
> Why all that?
>
> Isn't specifying
>
> ghes.edac_force_load
Ok, I will use module parameter ghes.edac_force_load.
>
> on the kernel command line enough? I.e., you don't need to parse the passed
> in option - just the presence of the parameter is enough.
>
> > +
> > static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
> >
> > static inline bool is_hest_type_generic_v2(struct ghes *ghes) @@
> > -1517,13 +1537,13 @@ static struct acpi_platform_list plat_list[] = {
> > { } /* End */
> > };
> >
> > -struct list_head *ghes_get_devices(bool force)
> > +struct list_head *ghes_get_devices(void)
> > {
> > int idx = -1;
> >
> > if (IS_ENABLED(CONFIG_X86)) {
> > idx = acpi_match_platform_list(plat_list);
> > - if (idx < 0 && !force)
> > + if (idx < 0 && !ghes_edac_force)
> > return NULL;
> > }
> >
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index
> > bb3ea42ba70b..6a2b54cc7240 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -54,10 +54,6 @@ static DEFINE_MUTEX(ghes_reg_mutex);
> > */
> > static DEFINE_SPINLOCK(ghes_lock);
> >
> > -/* "ghes_edac.force_load=1" skips the platform check */ -static bool
> > __read_mostly force_load; -module_param(force_load, bool, 0);
> > -
> > static bool system_scanned;
> >
> > static struct list_head *ghes_devs;
> > @@ -437,23 +433,12 @@ static int ghes_edac_register(struct device *dev)
> > mci->ctl_name = "ghes_edac";
> > mci->dev_name = "ghes";
> >
> > - if (fake) {
> > - pr_info("This system has a very crappy BIOS: It doesn't even list the
> DIMMS.\n");
> > - pr_info("Its SMBIOS info is wrong. It is doubtful that the error report
> would\n");
> > - pr_info("work on such system. Use this driver with caution\n");
> > - } else if (force_load) {
> > - pr_info("This EDAC driver relies on BIOS to enumerate memory and
> get error reports.\n");
> > - pr_info("Unfortunately, not all BIOSes reflect the memory layout
> correctly.\n");
> > - pr_info("So, the end result of using this driver varies from vendor to
> vendor.\n");
> > - pr_info("If you find incorrect reports, please contact your hardware
> vendor\n");
> > - pr_info("to correct its BIOS.\n");
> > - pr_info("This system has %d DIMM sockets.\n",
> ghes_hw.num_dimms);
> > - }
> > -
> > if (!fake) {
> > struct dimm_info *src, *dst;
> > int i = 0;
> >
> > + pr_info("This system has %d DIMM sockets.\n",
> ghes_hw.num_dimms);
> > +
> > mci_for_each_dimm(mci, dst) {
> > src = &ghes_hw.dimms[i];
> >
>
> This hunk...
Sorry, should move pr_info after this line
>
> > @@ -478,6 +463,17 @@ static int ghes_edac_register(struct device *dev)
> > } else {
> > struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
> >
> > + pr_info("This system has a very crappy BIOS: It doesn't even list the
> DIMMS.\n");
> > + pr_info("Its SMBIOS info is wrong. It is doubtful that the error report
> would\n");
> > + pr_info("work on such system. Use this driver with caution\n");
> > +
> > + if (ghes_edac_force) {
> > + pr_info("This EDAC driver relies on BIOS to enumerate memory
> and get\n");
> > + pr_info("error reports. Unfortunately, not all BIOSes reflect
> the\n");
> > + pr_info("memory layout correctly. If you find incorrect reports,
> please\n");
> > + pr_info("contact your hardware vendor for its in correct
> BIOS.\n");
> > + }
> > +
> > dimm->nr_pages = 1;
> > dimm->grain = 128;
> > dimm->mtype = MEM_UNKNOWN;
>
> ... and this hunk look unrelated to what this patch is doing. What are they for?
>
I tried to keep the previous same logic.
if (fake)
print_something
- !fake && ghes not preferred.
print_otherthings
--
Cheers,
Justin (Jia He)
> -----Original Message-----
> From: Kani, Toshi <[email protected]>
> Sent: Thursday, August 25, 2022 7:04 AM
> To: Justin He <[email protected]>; Len Brown <[email protected]>; James
> Morse <[email protected]>; Tony Luck <[email protected]>; Borislav
> Petkov <[email protected]>; Mauro Carvalho Chehab <[email protected]>;
> Robert Richter <[email protected]>; Robert Moore <[email protected]>;
> Qiuxu Zhuo <[email protected]>; Yazen Ghannam
> <[email protected]>; Jonathan Corbet <[email protected]>; Jan
> Luebbe <[email protected]>; Khuong Dinh
> <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> Rafael J . Wysocki <[email protected]>; Shuai Xue
> <[email protected]>; Jarkko Sakkinen <[email protected]>;
> [email protected]; nd <[email protected]>; Paul E. McKenney
> <[email protected]>; Andrew Morton <[email protected]>;
> Neeraj Upadhyay <[email protected]>; Randy Dunlap
> <[email protected]>; Damien Le Moal
> <[email protected]>; Muchun Song
> <[email protected]>; [email protected]
> Subject: RE: [RESEND PATCH v3 5/9] EDAC: Don't load chipset-specific edac
> drivers when ghes_edac is preferred
>
> On Monday, August 22, 2022 9:41 AM, Jia He wrote:
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
> > e17e0ee8f842..327386f3cf33 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -1537,16 +1537,25 @@ static struct acpi_platform_list plat_list[] = {
> > { } /* End */
> > };
> >
> > -struct list_head *ghes_get_devices(void)
> > +bool ghes_edac_preferred(void)
> > {
> > int idx = -1;
> >
> > if (IS_ENABLED(CONFIG_X86)) {
> > idx = acpi_match_platform_list(plat_list);
> > if (idx < 0 && !ghes_edac_force)
> > - return NULL;
> > + return false;
> > }
> >
> > + return true;
> > +}
> > +EXPORT_SYMBOL_GPL(ghes_edac_preferred);
> > +
> > +struct list_head *ghes_get_devices(void) {
> > + if (!ghes_edac_preferred())
> > + return NULL;
> > +
> > return &ghes_devs;
> > }
>
> ghes_get_devices() changing multiple times in the series is
> confusing to me. Can you simply introduce ghes_get_devices()
> and ghes_preferred() in the right state in a patch? Perhaps, patch #2, #5, #6
> can collapse to introduce the two funcs?
>
My purpose was to make it easy for review. I am ok to merge these patches into one.
> The rest of patch #5 adding the call to ghes_edac_preferred() into other edac
> drivers can remain as a separate patch.
Okay, I assume you mean to merge all of the ghes_edac_preferred() checking for intel and
Arm edac drivers into 1 separate patch, am I understanding well?
--
Cheers,
Justin (Jia He)
On Thursday, August 25, 2022 3:46 AM, Justin He wrote:
> > ghes_get_devices() changing multiple times in the series is
> > confusing to me. Can you simply introduce ghes_get_devices()
> > and ghes_preferred() in the right state in a patch? Perhaps, patch #2, #5,
> > #6 can collapse to introduce the two funcs?
>
> My purpose was to make it easy for review. I am ok to merge these patches
> into one.
This series starts with your original patchset and then has additional
patches to address the issues with the original patchset. The number
of the patches continues to increase in this way... You do not need to
record the history of discussions and design changes in the series.
> > The rest of patch #5 adding the call to ghes_edac_preferred() into other edac
> > drivers can remain as a separate patch.
>
> Okay, I assume you mean to merge all of the ghes_edac_preferred()
> checking for intel and
> Arm edac drivers into 1 separate patch, am I understanding well?
No, that's not what I meant.
ghes_get_devices() and ghes_edac_preferred() look good to me after
all patches are applied. So, instead of introducing the original design of
ghes_get_devices() and then changing its design multiple times in the
series, please simply add the final version of ghes_get_devices() and
ghes_edac_preferred() in a single patch. They are small functions anyway.
That is, your series includes something like:
- PATCH: Add ghes_get_devices() and ghes_edac_preferred() interfaces
- PATCH: Add ghes_edac_preferred check to chipset-specific edac drivers
Toshi
Hi Borislav
> -----Original Message-----
[...]
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index d7f30902fda0..a5f0ee0d7727 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1593,6 +1593,11 @@
> > When zero, profiling data is discarded and associated
> > debugfs files are removed at module unload time.
> >
> > + ghes_edac_force= [X86] Skip the platform check and forcibly load the
>
> So there already is ghes.disable which is using the module param thing.
> Why don't you do that too?
>
> > + ghes_edac modules.
>
> "module" - singular.
>
> > + Format: <bool>
> > + default: false (0)
> > +
> > goldfish [X86] Enable the goldfish android emulator platform.
> > Don't use this when you are not running on the
> > android emulator
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
> > 9c52183e3ad9..e17e0ee8f842 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -94,6 +94,26 @@
> > #define FIX_APEI_GHES_SDEI_CRITICAL __end_of_fixed_addresses
> > #endif
> >
> > +/*
> > + * "ghes_edac_force=1" forcibly loads ghes_edac and skips the
> > +platform
> > + * check.
> > + */
> > +bool __read_mostly ghes_edac_force;
> > +EXPORT_SYMBOL(ghes_edac_force);
> > +
> > +static int __init setup_ghes_edac_load(char *str) {
> > + if (str)
> > + if (!strcmp("true", str) || !strcmp("1", str))
> > + ghes_edac_force = true;
> > +
> > + if (!IS_ENABLED(CONFIG_X86))
> > + ghes_edac_force = true;
> > +
> > + return 1;
> > +}
> > +__setup("ghes_edac_force=", setup_ghes_edac_load);
>
> Why all that?
>
> Isn't specifying
>
> ghes.edac_force_load
>
> on the kernel command line enough? I.e., you don't need to parse the passed
> in option - just the presence of the parameter is enough.
>
> > +
> > static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
> >
> > static inline bool is_hest_type_generic_v2(struct ghes *ghes) @@
> > -1517,13 +1537,13 @@ static struct acpi_platform_list plat_list[] = {
> > { } /* End */
> > };
> >
> > -struct list_head *ghes_get_devices(bool force)
> > +struct list_head *ghes_get_devices(void)
> > {
> > int idx = -1;
> >
> > if (IS_ENABLED(CONFIG_X86)) {
> > idx = acpi_match_platform_list(plat_list);
> > - if (idx < 0 && !force)
> > + if (idx < 0 && !ghes_edac_force)
> > return NULL;
> > }
> >
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index
> > bb3ea42ba70b..6a2b54cc7240 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -54,10 +54,6 @@ static DEFINE_MUTEX(ghes_reg_mutex);
> > */
> > static DEFINE_SPINLOCK(ghes_lock);
> >
> > -/* "ghes_edac.force_load=1" skips the platform check */ -static bool
> > __read_mostly force_load; -module_param(force_load, bool, 0);
> > -
> > static bool system_scanned;
> >
> > static struct list_head *ghes_devs;
> > @@ -437,23 +433,12 @@ static int ghes_edac_register(struct device *dev)
> > mci->ctl_name = "ghes_edac";
> > mci->dev_name = "ghes";
> >
> > - if (fake) {
> > - pr_info("This system has a very crappy BIOS: It doesn't even list the
> DIMMS.\n");
> > - pr_info("Its SMBIOS info is wrong. It is doubtful that the error report
> would\n");
> > - pr_info("work on such system. Use this driver with caution\n");
> > - } else if (force_load) {
> > - pr_info("This EDAC driver relies on BIOS to enumerate memory and
> get error reports.\n");
> > - pr_info("Unfortunately, not all BIOSes reflect the memory layout
> correctly.\n");
> > - pr_info("So, the end result of using this driver varies from vendor to
> vendor.\n");
> > - pr_info("If you find incorrect reports, please contact your hardware
> vendor\n");
> > - pr_info("to correct its BIOS.\n");
> > - pr_info("This system has %d DIMM sockets.\n",
> ghes_hw.num_dimms);
> > - }
> > -
> > if (!fake) {
> > struct dimm_info *src, *dst;
> > int i = 0;
> >
> > + pr_info("This system has %d DIMM sockets.\n",
> ghes_hw.num_dimms);
> > +
> > mci_for_each_dimm(mci, dst) {
> > src = &ghes_hw.dimms[i];
> >
>
> This hunk...
>
> > @@ -478,6 +463,17 @@ static int ghes_edac_register(struct device *dev)
> > } else {
> > struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
> >
> > + pr_info("This system has a very crappy BIOS: It doesn't even list the
> DIMMS.\n");
> > + pr_info("Its SMBIOS info is wrong. It is doubtful that the error report
> would\n");
> > + pr_info("work on such system. Use this driver with caution\n");
> > +
> > + if (ghes_edac_force) {
> > + pr_info("This EDAC driver relies on BIOS to enumerate memory
> and get\n");
> > + pr_info("error reports. Unfortunately, not all BIOSes reflect
> the\n");
> > + pr_info("memory layout correctly. If you find incorrect reports,
> please\n");
> > + pr_info("contact your hardware vendor for its in correct
> BIOS.\n");
> > + }
> > +
> > dimm->nr_pages = 1;
> > dimm->grain = 128;
> > dimm->mtype = MEM_UNKNOWN;
>
> ... and this hunk look unrelated to what this patch is doing. What are they for?
>
Another purpose for adjusting the pr_info hunk is to suppress the warning dmesg log
on Arm server.
On Arm, the ghes_edac_force_load (module parameter in ghes) should be true unconditionally.
So I moved the following block in fake==true check, is this reasonable?
+If (!fake)
+...
+else
+ if (ghes_edac_force) {
+ pr_info("This EDAC driver relies on BIOS to enumerate memory and get\n");
+ pr_info("error reports. Unfortunately, not all BIOSes reflect the\n");
+ pr_info("memory layout correctly. If you find incorrect reports, please\n");
+ pr_info("contact your hardware vendor for its in correct BIOS.\n");
+ }
What do you think of it?
--
Cheers,
Justin (Jia He)